Pierre Ménard
Pierre Ménard
OvGU Magdeburg
Verifierad e-postadress på inria.fr - Startsida
Titel
Citeras av
Citeras av
År
Explore first, exploit next: The true shape of regret in bandit problems
A Garivier, P Ménard, G Stoltz
Mathematics of Operations Research 44 (2), 377-399, 2019
1052019
A minimax and asymptotically optimal algorithm for stochastic bandits
P Ménard, A Garivier
International Conference on Algorithmic Learning Theory, 223-237, 2017
292017
Non-asymptotic pure exploration by solving games
R Degenne, WM Koolen, P Ménard
arXiv preprint arXiv:1906.10431, 2019
232019
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
A Garivier, H Hadiji, P Menard, G Stoltz
arXiv preprint arXiv:1805.05071, 2018
212018
Gamification of pure exploration for linear bandits
R Degenne, P Ménard, X Shang, M Valko
International Conference on Machine Learning, 2432-2442, 2020
152020
Fano’s inequality for random variables
S Gerchinovitz, P Ménard, G Stoltz
Statistical Science 35 (2), 178-201, 2020
122020
Thresholding bandit for dose-ranging: The impact of monotonicity
A Garivier, P Ménard, L Rossi, P Menard
arXiv preprint arXiv:1711.04454, 2017
122017
Optimal control of a continuous-in-time financial model
E Frénod, P Ménard, M Safa
102013
Regret bounds for kernel-based reinforcement learning
OD Domingues, P Ménard, M Pirotta, E Kaufmann, M Valko
arXiv preprint arXiv:2004.05599, 2020
92020
Two optimization problems using a continuous-in-time financial model
E Frénod, P Menard, M Safa
Journal of Industrial and Management Optimization, 2014
9*2014
Adaptive reward-free exploration
E Kaufmann, P Ménard, OD Domingues, A Jonsson, E Leurent, M Valko
Algorithmic Learning Theory, 865-891, 2021
82021
Gradient ascent for active exploration in bandit problems
P Ménard
arXiv preprint arXiv:1905.08165, 2019
82019
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
O Darwiche Domingues, P Ménard, E Kaufmann, M Valko
arXiv e-prints, arXiv: 2010.03531, 2020
5*2020
Fast active learning for pure exploration in reinforcement learning
P Ménard, OD Domingues, A Jonsson, E Kaufmann, E Leurent, M Valko
arXiv preprint arXiv:2007.13442, 2020
52020
Fixed-confidence guarantees for Bayesian best-arm identification
X Shang, R Heide, P Menard, E Kaufmann, M Valko
International Conference on Artificial Intelligence and Statistics, 1823-1832, 2020
52020
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces
O Darwiche Domingues, P Ménard, M Pirotta, E Kaufmann, M Valko
arXiv e-prints, arXiv: 2007.05078, 2020
4*2020
Planning in markov decision processes with gap-dependent sample complexity
A Jonsson, E Kaufmann, P Ménard, OD Domingues, E Leurent, M Valko
arXiv preprint arXiv:2006.05879, 2020
42020
Planning in entropy-regularized Markov decision processes and games
JB Grill, O Domingues, P Ménard, R Munos, M Valko
42019
A single algorithm for both restless and rested rotting bandits
J Seznec, P Menard, A Lazaric, M Valko
International Conference on Artificial Intelligence and Statistics, 3784-3794, 2020
22020
UCB Momentum Q-learning: Correcting the bias without forgetting
P Menard, OD Domingues, X Shang, M Valko
arXiv preprint arXiv:2103.01312, 2021
12021
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–20