Stefano V. Albrecht
AddressGates Dell Complex, 3.4182317 Speedway Austin, TX 78712 USA 
Contactude.saxetu.sc@blavssvalbrecht.de 
Personal
I am a Postdoctoral Fellow in the Department of Computer Science at The University of Texas at Austin, where I work with Prof. Peter Stone in the Learning Agents Research Group. My work is supported by a Feodor Lynen Research Fellowship from the Alexander von Humboldt Foundation.
My research interests are in the broad area of autonomous agents and multiagent systems, with a focus on sequential decision making under uncertainty. The longterm goal of my research is to create intelligent agents that can interact effectively with other agents whose behaviours are initially unknown. This involves a number of challenging problems, such as efficient learning and adaptation in the presence of uncertainty as well as robustness with respect to violations of prior beliefs. The short video on the right describes some of my work toward this goal. For more details, please take a look at my publications below.
Prior to Austin, I obtained a Ph.D. and M.Sc. in Artificial Intelligence from The University of Edinburgh, where I worked with Prof. Subramanian Ramamoorthy in the Robust Autonomy and Decisions Group. Before Edinburgh, I obtained a B.Sc. in Computer Science from Darmstadt University of Technology. See here for a short CV.
News
 I was a speaker at the Colloquium Series on Robust and Beneficial AI at MIRI (slides, video)
 A manuscript containing the key results of my Ph.D. research has been published in Artificial Intelligence
 A short article of mine on "safe AI" appeared in Issue 18 (p.22) of the Edinburgh University Science Magazine
 I will give a tutorial at AAAI16 with Prashant Doshi on Typebased Methods for Interaction in Multiagent Systems
 I am cochairing the AAAI16 Workshop on Multiagent Interaction without Prior Coordination (MIPC 2016)
 I am a guest editor of the JAAMAS Special Issue on Multiagent Interaction without Prior Coordination
Publications
Journals:
S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
Belief and Truth in Hypothesised Behaviours
Artificial Intelligence Journal (AIJ), Vol. 235, pp. 6394, 2016
→ Abstract  Publisher  arXiv  BibTex
Abstract: There is a long history in game theory on the topic of Bayesian or “rational” learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the longterm, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis. 
S.V. Albrecht, S. Ramamoorthy
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Journal of Artificial Intelligence Research (JAIR), Vol. 55, pp. 11351178, 2016
→ Abstract  Publisher  arXiv  BibTex
Abstract: Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivitybased Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multirobot warehouse, and how PSBF can exploit this to accelerate the filtering task.

S.V. Albrecht, S. Ramamoorthy
Are You Doing What I Think You Are Doing? Criticising Uncertain Agent Models
Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015
→ Abstract  Paper  BibTex
Abstract: The key for effective interaction in many multiagent applications is to reason explicitly about the behaviour of other agents, in the form of a hypothesised behaviour. While there exist several methods for the construction of a behavioural hypothesis, there is currently no universal theory which would allow an agent to contemplate the correctness of a hypothesis. In this work, we present a novel algorithm which decides this question in the form of a frequentist hypothesis test. The algorithm allows for multiple metrics in the construction of the test statistic and learns its distribution during the interaction process, with asymptotic correctness guarantees. We present results from a comprehensive set of experiments, demonstrating that the algorithm achieves high accuracy and scalability at low computational costs. 
S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types
Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), 2015
→ Abstract  Paper  Appendix  BibTex
Abstract: Many multiagent applications require an agent to learn quickly how to interact with previously unknown other agents. To address this problem, researchers have studied learning algorithms which compute posterior beliefs over a hypothesised set of policies, based on the observed actions of the other agents. The posterior belief is complemented by the prior belief, which specifies the subjective likelihood of policies before any actions are observed. In this paper, we present the first comprehensive empirical study on the practical impact of prior beliefs over policies in repeated interactions. We show that prior beliefs can have a significant impact on the longterm performance of such methods, and that the magnitude of the impact depends on the depth of the planning horizon. Moreover, our results demonstrate that automatic methods can be used to compute prior beliefs with consistent performance effects. This indicates that prior beliefs could be eliminated as a manual parameter and instead be computed automatically. 
S.V. Albrecht, S. Ramamoorthy
On Convergence and Optimality of BestResponse Learning with Policy Types in Multiagent Systems
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014
→ Abstract  Paper  Appendix  BibTex
Abstract: While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents’ observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types. 
S.V. Albrecht, S. Ramamoorthy
A GameTheoretic Model and BestResponse Learning Method for Ad Hoc Coordination in Multiagent Systems
Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013
→ Abstract  Paper  Technical Report  BibTex
Abstract: The ad hoc coordination problem is to design an autonomous agent which is able to achieve optimal flexibility and efficiency in a multiagent system with no mechanisms for prior coordination. We conceptualise this problem formally using a gametheoretic model, called the stochastic Bayesian game, in which the behaviour of a player is determined by its private information, or type. Based on this model, we derive a solution, called HarsanyiBellman Ad Hoc Coordination (HBA), which utilises the concept of Bayesian Nash equilibrium in a planning procedure to find optimal actions in the sense of Bellman optimal control. We evaluate HBA in a multiagent logistics domain called levelbased foraging, showing that it achieves higher flexibility and efficiency than several alternative algorithms. We also report on a humanmachine experiment at a public science exhibition in which the human participants played repeated Prisoner's Dilemma and RockPaperScissors against HBA and alternative algorithms, showing that HBA achieves equal efficiency and a significantly higher welfare and winning rate. 
S.V. Albrecht, S. Ramamoorthy
Comparative Evaluation of MAL Algorithms in a Diverse Set of Ad Hoc Team Problems
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012
→ Abstract  Paper  BibTex
Abstract: This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.

S.V. Albrecht, J.W. Crandall, S. Ramamoorthy
EHBA: Using Action Policies for Expert Advice and Agent Typification
Proceedings of the Second Workshop on Multiagent Interaction without Prior Coordination (MIPC), 2015
→ Abstract  Paper  Appendix  BibTex
Abstract: Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel metaalgorithm, called ExpertHBA (EHBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. EHBA gradually mixes the past payoff with a predicted future payoff, which is computed using the typebased characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several wellknown expert algorithms with and without the aid of EHBA. Our results show that EHBA has the potential to significantly improve the performance of expert algorithms.

S.V. Albrecht
Is Artificial Intelligence Safe for Humanity?
Edinburgh University Science Magazine (EUSci), Issue 18, p. 22, 2015
→ Essay  Magazine 
S.V. Albrecht
Machines That Play Games Against Humans
Edinburgh University Science Magazine (EUSci), Issue 14, p. 19, 2013
→ Essay  Magazine