Zhengyuan Yang

Cited by

	All	Since 2019
Citations	4104	4090
h-index	26	26
i10-index	35	35

1900

950

475

1425

201820192020202120222023202412 59 130 306 631 1858 1100

Public access

View all

14 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Lijuan WangMicrosoft GenAIVerified email at microsoft.com
Jianfeng WangMicrosoftVerified email at microsoft.com
Zicheng LiuMicrosoftVerified email at microsoft.com
Jiebo LuoAlbert Arendt Hopeman Professor of Engineering, University of RochesterVerified email at cs.rochester.edu
Linjie (Lindsey) LiSenior Researcher, MicrosoftVerified email at microsoft.com
Kevin LinMicrosoftVerified email at microsoft.com
Zhe GanResearch Scientist, AppleVerified email at apple.com
Liwei WangAssistant Professor at The Chinese University of Hong KongVerified email at cse.cuhk.edu.hk
Ce LiuPartner Research Manager, Microsoft GenAI; IEEE FellowVerified email at microsoft.com
Jinsong SuXiamen UniversityVerified email at xmu.edu.cn
Jiajun Deng (邓家俊)University of Adelaide, Australian Institute for Machine LearningVerified email at adelaide.edu.au
Yuncheng LiGoogleVerified email at google.com
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondVerified email at microsoft.com
Chenglei SiStanford UniversityVerified email at stanford.edu
Boqing GongResearch Scientist, GoogleVerified email at google.com

Zhengyuan Yang

Researcher, Microsoft

Verified email at microsoft.com - Homepage

Computer Vision Multimedia Vision + Language Multimodal


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Git: A generative image-to-text transformer for vision and language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang Transactions on Machine Learning Research (TMLR), 2022	353	2022
A fast and accurate one-stage approach to visual grounding Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019	313	2019
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 3081-3089, 2022	301	2022
TransVG: End-to-End Visual Grounding with Transformers J Deng, Z Yang, T Chen, W Zhou, H Li IEEE International Conference on Computer Vision (ICCV), 2021	248	2021
The dawn of lmms: Preliminary explorations with gpt-4v (ision) Z Yang, L Li, K Lin, J Wang, CC Lin, Z Liu, L Wang arXiv preprint arXiv:2309.17421 9 (1), 1, 2023	235	2023
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	218	2022
Mm-react: Prompting chatgpt for multimodal reasoning and action Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ... arXiv preprint arXiv:2303.11381, 2023	197	2023
Improving One-stage Visual Grounding by Recursive Sub-query Construction Z Yang, T Chen, L Wang, J Luo European Conference on Computer Vision (ECCV), 2020	189	2020
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions Z Yang, Y Zhang, J Yu, J Cai, J Luo 2018 24th international conference on pattern recognition (ICPR), 2289-2294, 2018	185	2018
Action recognition with spatio–temporal visual attention on skeleton image sequences Z Yang, Y Li, J Yang, J Luo IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018	181	2018
Attentive relational networks for mapping images to scene graphs M Qi, W Li, Z Yang, Y Wang, J Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019	169	2019
Prompting gpt-3 to be reliable C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang International Conference on Learning Representations (ICLR 23), 2022	157	2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021	149	2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang European Conference on Computer Vision (ECCV), 521--539, 2022	125*	2022
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo Annual Meeting of the Association for Computational Linguistics (ACL), 2020	125	2020
Mm-vet: Evaluating large multimodal models for integrated capabilities W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang arXiv preprint arXiv:2308.02490, 2023	124	2023
Multimodal foundation models: From specialists to general-purpose assistants C Li, Z Gan, Z Yang, J Yang, L Li, L Wang, J Gao Foundations and Trends® in Computer Graphics and Vision 16 (1-2), 1-214, 2024	76	2024
Promptcap: Prompt-guided task-aware image captioning Y Hu, H Hua, Z Yang, W Shi, NA Smith, J Luo arXiv preprint arXiv:2211.09699, 2022	76*	2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding Z Yang, S Zhang, L Wang, J Luo IEEE International Conference on Computer Vision (ICCV), 2021	70	2021
Dynamic context-guided capsule network for multimodal machine translation H Lin, F Meng, J Su, Y Yin, Z Yang, Y Ge, J Zhou, J Luo Proceedings of the 28th ACM International Conference on Multimedia, 1320-1329, 2020	70	2020

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors