Följ
Lauro Langosco
Lauro Langosco
Verifierad e-postadress på cam.ac.uk
Titel
Citeras av
Citeras av
År
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
1552023
Goal Misgeneralization in Deep Reinforcement Learning
L Langosco, J Koch, L Sharkey, J Pfau, L Orseau, D Krueger
ICML 2022, 9, 2022
76*2022
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
40*2023
Neural Variational Gradient Descent
L Langosco di Langosco, V Fortuin, H Strathmann
ICML Workshop on Uncertainty & Robustness in Deep Learning, 2021
16*2021
Unifying Grokking and Double Descent
X Davies, L Langosco, D Krueger
ML Safety Workshop Neurips 2022, 2023
142023
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
12024
Detecting Backdoors with Meta-Models
L Langosco, N Alex, W Baker, D Quarel, H Bradley, D Krueger
NeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and …, 2023
12023
Training Equilibria in Reinforcement Learning
L Langosco, D Krueger, A Gleave
Deep Reinforcement Learning Workshop NeurIPS 2022, 2022
2022
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–8