Devansh Arpit

Cited by

	All	Since 2019
Citations	5165	4819
h-index	22	21
i10-index	26	22

1300

650

325

975

20162017201820192020202120222023202421 68 220 362 535 700 1020 1263 938

Public access

View all

4 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Yoshua BengioProfessor of computer science, University of Montreal, Mila, IVADO, CIFARVerified email at umontreal.ca
Stanisław JastrzębskiChief Technology Officer & Chief Scientist @ Molecule.OneVerified email at molecule.one
Aaron CourvilleProfessor, DIRO, Université de Montréal, Mila, Cifar CAI chairVerified email at umontreal.ca
Venu GovindarajuSUNY Distinguished Professor, State University of New York, BuffaloVerified email at buffalo.edu
Yingbo ZhouSenior Research Director, Salesforce ResearchVerified email at salesforce.com
Hung Q. NgoRelationalAIVerified email at relational.ai
Chen Xing (星辰)Scale AIVerified email at scale.com
Ifeoma NwoguComputer Science and Engineering, University at Buffalo, SUNYVerified email at buffalo.edu
Anoop M NamboodiriProfessor, IIIT HyderabadVerified email at iiit.ac.in
Yun Raymond FuNEU, COE Distinguished Professor; MAE, FNAI, FAAAS, FIEEE, FSPIE, FOSA, FIAPRVerified email at neu.edu
Shuang WuAmazon.comVerified email at amazon.com
Nils NappElectrical and Computer Engineering, Cornell UniversityVerified email at cornell.edu

Devansh Arpit

Unknown affiliation

No verified email

Deep Learning NLP


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
A closer look at memorization in deep networks D Arpit, S Jastrzębski, N Ballas, D Krueger, E Bengio, MS Kanwal, ... ICML 2017 (arXiv preprint arXiv:1706.05394), 2017	1940	2017
On the spectral bias of deep neural networks N Rahaman, D Arpit, A Baratin, F Draxler, M Lin, FA Hamprecht, Y Bengio, ... ICML 2019 (arXiv preprint arXiv:1806.08734), 2018	1265*	2018
Three factors influencing minima in SGD S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey ICANN 2018 (arXiv preprint arXiv:1711.04623), 2017	523	2017
The Break-Even Point on Optimization Trajectories of Deep Neural Networks S Jastrzebski, M Szymczak, S Fort, D Arpit, J Tabor, K Cho, K Geras ICLR 2020 (arXiv preprint arXiv:2002.09572), 2020	158	2020
Normalization propagation: A parametric technique for removing internal covariate shift in deep networks D Arpit, Y Zhou, BU Kota, V Govindaraju ICML 2016 (arXiv preprint arXiv:1603.01431), 2016	144	2016
Residual connections encourage iterative inference S Jastrzebski, D Arpit, N Ballas, V Verma, T Che, Y Bengio ICLR 2018 (arXiv preprint arXiv:1710.04773), 2017	143	2017
A walk with sgd C Xing, D Arpit, C Tsirigotis, Y Bengio arXiv preprint arXiv:1802.08770, 2018	113	2018
Ensemble of averages: Improving model selection and boosting performance in domain generalization D Arpit, H Wang, Y Zhou, C Xiong NeurIPS 2022, 2021	112	2021
Why regularized auto-encoders learn sparse representation? D Arpit, Y Zhou, H Ngo, V Govindaraju ICML 2016 (arXiv preprint arXiv:1505.05561), 2015	92	2015
Deep Nets Don't Learn via Memorization D Krueger, N Ballas, S Jastrzebski, D Arpit, MS Kanwal, T Maharaj, ... ICLR 2017 Workshop, 2017	72	2017
Fraternal Dropout K Zolna, D Arpit, D Suhubdy, Y Bengio ICLR 2018 (arXiv preprint arXiv:1711.00066), 2017	61	2017
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets D Arpit, V Campos, Y Bengio NeurIPs 2019, 2019	59	2019
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization S Jastrzebski, D Arpit, O Astrand, G Kerg, H Wang, C Xiong, R Socher, ... ICML 2021, 2020	57	2020
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ... arXiv preprint arXiv:2308.05960, 2023	47	2023
h-detach: Modifying the LSTM Gradient Towards Better Optimization D Arpit, B Kanuparthi, G Kerg, NR Ke, I Mitliagkas, Y Bengio ICLR 2019 (arXiv preprint arXiv:1810.03023), 2018	46	2018
Variational bi-lstms S Shabanian, D Arpit, A Trischler, Y Bengio arXiv preprint arXiv:1711.05717, 2017	41	2017
Is joint training better for deep auto-encoders? Y Zhou, D Arpit, I Nwogu, V Govindaraju arXiv preprint arXiv:1405.1380, 2014	39	2014
Finding Flatter Minima with SGD S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey ICLR 2018 Workshop, 2018	37	2018
Retroformer: Retrospective large language agents with policy gradient optimization W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ... arXiv preprint arXiv:2308.02151, 2023	34	2023
The benefits of over-parameterization at initialization in deep ReLU networks D Arpit, Y Bengio arXiv preprint arXiv:1901.03611, 2019	34	2019

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors