Follow
Matteo Pagliardini
Title
Cited by
Cited by
Year
Unsupervised learning of sentence embeddings using compositional n-gram features
M Pagliardini, P Gupta, M Jaggi
NAACL-HLT, 2018, 2017
9442017
Meditron-70b: Scaling medical pretraining for large language models
Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ...
arXiv preprint arXiv:2311.16079, 2023
2482023
Agree to disagree: Diversity through disagreement for better transferability
M Pagliardini, M Jaggi, F Fleuret, SP Karimireddy
ICLR 2023, 2022
742022
Better word embeddings by disentangling contextual n-gram information
P Gupta, M Pagliardini, M Jaggi
NAACL-HLT, 2019, 2019
532019
Taming gans with lookahead
T Chavdarova, M Pagliardini, SU Stich, M Jaggi, F Fleuret
ICLR 2021, 2020
41*2020
Fast attention over long sequences with dynamic sparse flash attention
M Pagliardini, D Paliotta, M Jaggi, F Fleuret
Advances in Neural Information Processing Systems 36, 59808-59831, 2023
29*2023
Doge: Domain reweighting with generalization estimation
S Fan, M Pagliardini, M Jaggi
arXiv preprint arXiv:2310.15393, 2023
192023
The peril of popular deep learning uncertainty estimation methods
Y Liu, M Pagliardini, T Chavdarova, SU Stich
Bayesian Deep Learning workshop, at NeurIPS 2021, 2021
192021
Unsupervised learning of sentence embeddings using compositional n-gram features (2017)
M Pagliardini, P Gupta, M Jaggi
arXiv preprint arXiv:1703.02507, 2017
132017
The ademamix optimizer: Better, faster, older
M Pagliardini, P Ablin, D Grangier
arXiv preprint arXiv:2409.03137, 2024
62024
A Primal-Dual Approach to Solving Variational Inequalities with General Constraints
T Chavdarova, T Yang, M Pagliardini, M Jordan
The Twelfth International Conference on Learning Representations, 2024
4*2024
Meditron: Open medical foundation models adapted for clinical practice
Z Chen, A Romanou, A Bonnet, A Hernández-Cano, B Alkhamissi, ...
32024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
M Pagliardini, A Mohtashami, F Fleuret, M Jaggi
arXiv preprint arXiv:2402.02622, 2024
32024
Improving generalization via uncertainty driven perturbations
M Pagliardini, G Manunza, M Jaggi, MI Jordan, T Chavdarova
arXiv preprint arXiv:2202.05737, 2022
32022
Diversity through disagreement for better transferability
M Pagliardini, M Jaggi, F Fleuret, SP Karimireddy
NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and …, 2022
22022
CoTFormer: More Tokens With Attention Make Up For Less Depth
A Mohtashami, M Pagliardini, M Jaggi
arXiv preprint arXiv:2310.10845, 2023
12023
Fast causal attention with dynamic sparsity
D Paliotta, M Pagliardini, M Jaggi, F Fleuret
Workshop on Efficient Systems for Foundation Models@ ICML2023, 2023
12023
Improved generalization-robustness trade-off via uncertainty targeted attacks
M Pagliardini, G Manunza, M Jaggi, T Chavdarova
12022
Leveraging the true depth of LLMs
RC González, D Paliotta, M Pagliardini, M Jaggi, F Fleuret
arXiv preprint arXiv:2502.02790, 2025
2025
AdEMAMix: Better and Faster Training with Older Gradients
M Pagliardini, P Ablin, D Grangier
OPT 2024: Optimization for Machine Learning, 0
The system can't perform the operation now. Try again later.
Articles 1–20