Follow
Devansh Arpit
Devansh Arpit
Startup
No verified email
Title
Cited by
Cited by
Year
A closer look at memorization in deep networks
D Arpit, S Jastrzębski, N Ballas, D Krueger, E Bengio, MS Kanwal, ...
ICML 2017 (arXiv preprint arXiv:1706.05394), 2017
21222017
On the spectral bias of deep neural networks
N Rahaman, D Arpit, A Baratin, F Draxler, M Lin, FA Hamprecht, Y Bengio, ...
ICML 2019 (arXiv preprint arXiv:1806.08734), 2018
1513*2018
Three factors influencing minima in SGD
S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey
ICANN 2018 (arXiv preprint arXiv:1711.04623), 2017
5472017
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
S Jastrzebski, M Szymczak, S Fort, D Arpit, J Tabor, K Cho, K Geras
ICLR 2020 (arXiv preprint arXiv:2002.09572), 2020
1772020
Residual connections encourage iterative inference
S Jastrzebski, D Arpit, N Ballas, V Verma, T Che, Y Bengio
ICLR 2018 (arXiv preprint arXiv:1710.04773), 2017
1582017
Ensemble of averages: Improving model selection and boosting performance in domain generalization
D Arpit, H Wang, Y Zhou, C Xiong
NeurIPS 2022, 2021
1492021
Normalization propagation: A parametric technique for removing internal covariate shift in deep networks
D Arpit, Y Zhou, BU Kota, V Govindaraju
ICML 2016 (arXiv preprint arXiv:1603.01431), 2016
1482016
A walk with sgd
C Xing, D Arpit, C Tsirigotis, Y Bengio
arXiv preprint arXiv:1802.08770, 2018
1232018
Why regularized auto-encoders learn sparse representation?
D Arpit, Y Zhou, H Ngo, V Govindaraju
ICML 2016 (arXiv preprint arXiv:1505.05561), 2015
942015
Deep Nets Don't Learn via Memorization
D Krueger, N Ballas, S Jastrzebski, D Arpit, MS Kanwal, T Maharaj, ...
ICLR 2017 Workshop, 2017
752017
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
D Arpit, V Campos, Y Bengio
NeurIPs 2019, 2019
712019
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents
Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ...
arXiv preprint arXiv:2308.05960, 2023
682023
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
S Jastrzebski, D Arpit, O Astrand, G Kerg, H Wang, C Xiong, R Socher, ...
ICML 2021, 2020
672020
Fraternal Dropout
K Zolna, D Arpit, D Suhubdy, Y Bengio
ICLR 2018 (arXiv preprint arXiv:1711.00066), 2017
652017
Retroformer: Retrospective large language agents with policy gradient optimization
W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ...
arXiv preprint arXiv:2308.02151, 2023
522023
h-detach: Modifying the LSTM Gradient Towards Better Optimization
D Arpit, B Kanuparthi, G Kerg, NR Ke, I Mitliagkas, Y Bengio
ICLR 2019 (arXiv preprint arXiv:1810.03023), 2018
502018
Variational bi-lstms
S Shabanian, D Arpit, A Trischler, Y Bengio
arXiv preprint arXiv:1711.05717, 2017
432017
Is joint training better for deep auto-encoders?
Y Zhou, D Arpit, I Nwogu, V Govindaraju
arXiv preprint arXiv:1405.1380, 2014
392014
Finding Flatter Minima with SGD
S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey
ICLR 2018 Workshop, 2018
382018
The benefits of over-parameterization at initialization in deep ReLU networks
D Arpit, Y Bengio
arXiv preprint arXiv:1901.03611, 2019
342019
The system can't perform the operation now. Try again later.
Articles 1–20