Tutel: Adaptive mixture-of-experts at scale C Hwang, W Cui, Y Xiong, Z Yang, Z Liu, H Hu, Z Wang, R Salas, J Jose, ... Proceedings of Machine Learning and Systems 5, 269-287, 2023 | 70 | 2023 |
Flexmoe: Scaling large-scale sparse pre-trained model training via dynamic device placement X Nie, X Miao, Z Wang, Z Yang, J Xue, L Ma, G Cao, B Cui Proceedings of the ACM on Management of Data 1 (1), 1-19, 2023 | 33 | 2023 |
Partial label learning with noisy side information S Wang, M Xia, Z Wang, G Lyu, S Feng Applied Intelligence 52 (11), 12382-12396, 2022 | 3 | 2022 |
Mixture-of-experts layer with dynamic gating Y Xiong, C Hwang, W Cui, Y Ziyue, Z Liu, H Hu, Z Wang, RO Salas, J Jose, ... US Patent App. 18/054,451, 2024 | | 2024 |
Collective communication phases at mixture-of-experts layer Y Xiong, C Hwang, W Cui, Y Ziyue, Z Liu, H Hu, Z Wang, RO Salas, J Jose, ... US Patent App. 18/054,452, 2024 | | 2024 |
Mixture-of-experts layer with switchable parallel modes Y Xiong, C Hwang, W Cui, Y Ziyue, Z Liu, H Hu, Z Wang, RO Salas, J Jose, ... US Patent App. 18/054,446, 2024 | | 2024 |
Sparse encoding and decoding at mixture-of-experts layer Y Xiong, C Hwang, W Cui, Y Ziyue, Z Liu, H Hu, Z Wang, RO Salas, J Jose, ... US Patent App. 18/318,436, 2024 | | 2024 |