Publications
Working papers:

C. Innes, A. Lascarides, S.V. Albrecht, S. Ramamoorthy, B. Rosman
Reasoning about Unforeseen Possibilities During Policy Learning
2018
Abstract  arXiv
Abstract:
Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised—its possible states and actions and their causal structure—is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.
Journals:

S.V. Albrecht, P. Stone
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Artificial Intelligence (AIJ), Vol. 258, pp. 6695, 2018
Abstract  Publisher  arXiv  BibTex
Abstract:
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different subcommunities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.

S.V. Albrecht, S. Liemhetcharat, P. Stone
Special Issue on Multiagent Interaction without Prior Coordination: Guest Editorial
Journal of Autonomous Agents and MultiAgent Systems (JAAMAS), Vol. 31(4), pp. 765766, 2017
Abstract  Publisher  BibTex
Abstract:
This special issue of the Journal of Autonomous Agents and MultiAgent Systems sought research articles on the emerging topic of multiagent interaction without prior coordination. Topics of interest included empirical and theoretical investigations of issues arising from assumptions of prior coordination, as well as solutions in the form of novel models and algorithms for effective multiagent interaction without prior coordination.

S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
Belief and Truth in Hypothesised Behaviours
Artificial Intelligence (AIJ), Vol. 235, pp. 6394, 2016
Abstract  Publisher  arXiv  BibTex
Abstract:
There is a long history in game theory on the topic of Bayesian or “rational” learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the longterm, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.

S.V. Albrecht, S. Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Journal of Artificial Intelligence Research (JAIR), Vol. 55, pp. 11351178, 2016
Abstract  Publisher  arXiv  BibTex
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivitybased Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multirobot warehouse, and how PSBF can exploit this to accelerate the filtering task.
Conferences:

S.V. Albrecht, P. Stone
Reasoning about Hypothetical Agent Behaviours and their Parameters
Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017
Abstract  Paper  BibTex
Abstract:
Agents can achieve effective interaction with previously unknown other agents by maintaining beliefs over a set of hypothetical behaviours, or types, that these agents may have. A current limitation in this method is that it does not recognise parameters within type specifications, because types are viewed as blackbox mappings from interaction histories to probability distributions over actions. In this work, we propose a general method which allows an agent to reason about both the relative likelihood of types and the values of any bounded continuous parameters within types. The method maintains individual parameter estimates for each type and selectively updates the estimates for some types after each observation. We propose different methods for the selection of types and the estimation of parameter values. The proposed methods are evaluated in detailed experiments, showing that updating the parameter estimates of a single type after each observation can be sufficient to achieve good performance.

S.V. Albrecht, S. Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)
Invited paper in journal track
Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017
Abstract  Paper  BibTex
Abstract:
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivitybased Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multirobot warehouse, where it outperformed alternative filtering methods by exploiting passivity.

S.V. Albrecht, S. Ramamoorthy
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015
Abstract  Paper  BibTex
Abstract:
The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs.

S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), 2015
Abstract  Paper  Appendix  BibTex
Abstract:
Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the longterm performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically.

S.V. Albrecht, S. Ramamoorthy
On Convergence and Optimality of BestResponse Learning with Policy Types in Multiagent Systems
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014
Abstract  Paper  Appendix  BibTex
Abstract:
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents’ observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.

S.V. Albrecht, S. Ramamoorthy
A GameTheoretic Model and BestResponse Learning Method for Ad Hoc Coordination in Multiagent Systems
Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013
Abstract  Paper  Full Technical Report  BibTex
Abstract:
The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a gametheoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called HarsanyiBellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called levelbased foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a humanmachine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and RockPaperScissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate.

S.V. Albrecht, S. Ramamoorthy
Comparative Evaluation of MAL Algorithms in a Diverse Set of Ad Hoc Team Problems
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012
Abstract  Paper  BibTex
Abstract:
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
Workshops:

S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
EHBA: Using Action Policies for Expert Advice and Agent Typification
Proceedings of the Second Workshop on Multiagent Interaction without Prior Coordination (MIPC), 2015
Abstract  Paper  Appendix  BibTex
Abstract:
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel metaalgorithm, called ExpertHBA (EHBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. EHBA gradually mixes the past payoff with a predicted future payoff, which is computed using the typebased characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several wellknown expert algorithms with and without the aid of EHBA. Our results show that EHBA has the potential to significantly improve the performance of expert algorithms.
Magazines:

S.V. Albrecht
Is Artificial Intelligence Safe for Humanity?
Edinburgh University Science Magazine (EUSci), Issue 18, p. 22, 2015
Essay  Magazine

S.V. Albrecht
Machines That Play Games Against Humans
Edinburgh University Science Magazine (EUSci), Issue 14, p. 19, 2013
Essay  Magazine