Peter Sunehag
Peter Sunehag
Google - DeepMind
Verifierad e-postadress på google.com
TitelCiteras avÅr
Deep reinforcement learning in large discrete action spaces
G Dulac-Arnold, R Evans, H van Hasselt, P Sunehag, T Lillicrap, J Hunt, ...
arXiv preprint arXiv:1512.07679, 2015
1122015
Value-decomposition networks for cooperative multi-agent learning
P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ...
arXiv preprint arXiv:1706.05296, 2017
562017
Value-decomposition networks for cooperative multi-agent learning based on team reward
P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ...
Proceedings of the 17th International Conference on Autonomous Agents and …, 2018
412018
Wearable sensor activity analysis using semi-Markov models with a grammar
O Thomas, P Sunehag, G Dror, S Yun, S Kim, M Robards, A Smola, ...
Pervasive and Mobile Computing 6 (3), 342-350, 2010
302010
Variable metric stochastic approximation theory
P Sunehag, J Trumpf, SVN Vishwanathan, N Schraudolph
Artificial Intelligence and Statistics, 560-566, 2009
302009
The sample-complexity of general reinforcement learning
T Lattimore, M Hutter, P Sunehag
Proceedings of the 30th International Conference on Machine Learning, 2013
262013
Deep reinforcement learning with attention for slate markov decision processes with high-dimensional states and actions
P Sunehag, R Evans, G Dulac-Arnold, Y Zwols, D Visentin, B Coppin
arXiv preprint arXiv:1512.01124, 2015
152015
Adaptive context tree weighting
A O'Neill, M Hutter, W Shao, P Sunehag
2012 Data Compression Conference, 317-326, 2012
142012
Feature Reinforcement Learning In Practice
P Nguyen, P Sunehag, M Hutter
Arxiv preprint arXiv:1108.3614, 2011
132011
Semi-Markov kMeans clustering and activity recognition from body-worn sensors
MW Robards, P Sunehag
2009 Ninth IEEE International Conference on Data Mining, 438-446, 2009
132009
Feature reinforcement learning: state of the art
M Daswani, P Sunehag, M Hutter
Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014
122014
Consistency of feature Markov processes
P Sunehag, M Hutter
Algorithmic Learning Theory, 360-374, 2010
122010
(Non-) equivalence of universal priors
I Wood, P Sunehag, M Hutter
Algorithmic Probability and Friends. Bayesian Prediction and Artificial …, 2013
112013
Optimistic agents are asymptotically optimal
P Sunehag, M Hutter
Australasian Joint Conference on Artificial Intelligence, 15-26, 2012
112012
Context tree maximizing reinforcement learning
P Nguyen, P Sunehag, M Hutter
Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012
102012
Sparse Kernel-SARSA (λ) with an eligibility trace
M Robards, P Sunehag, S Sanner, B Marthi
Machine Learning and Knowledge Discovery in Databases, 1-17, 2011
92011
Axioms for rational reinforcement learning
P Sunehag, M Hutter
Algorithmic Learning Theory, 338-352, 2011
92011
Real method of interpolation on subcouples of codimension one
SV Astashkin, P Sunehag
Stud. Math., 2008
82008
Rationality, optimism and guarantees in general reinforcement learning
P Sunehag, M Hutter
The Journal of Machine Learning Research 16 (1), 1345-1390, 2015
72015
Intelligence as inference or forcing Occam on the world
P Sunehag, M Hutter
International Conference on Artificial General Intelligence, 186-195, 2014
72014
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–20