Följ
Xander Davies
Xander Davies
Verifierad e-postadress på college.harvard.edu - Startsida
Titel
Citeras av
Citeras av
År
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper*, X Davies*, C Shi, TK Gilbert, J Scheurer, J Rando, ...
arXiv preprint arXiv:2307.15217, 2023
1632023
Unifying Grokking and Double Descent
X Davies*, L Langosco*, D Krueger
arXiv preprint arXiv:2303.06173, 2023
142023
Sparse distributed memory is a continual learner
T Bricken, X Davies, D Singh, D Krotov, G Kreiman
arXiv preprint arXiv:2303.11934, 2023
92023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
M Li*, X Davies*, M Nadeau*
72023
Discovering Variable Binding Circuitry with Desiderata
X Davies*, M Nadeau*, N Prakash*, TR Shaham, D Bau
arXiv preprint arXiv:2307.03637, 2023
52023
Delayed Generalization: Bridging Double Descent and Grokking
X Davies, J Hoogland, L Langosco, D Krueger
2023
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–6