Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 975 | 2024 |
Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation J Cho, Y Hu, R Garg, P Anderson, R Krishna, J Baldridge, M Bansal, ... arXiv preprint arXiv:2310.18235, 2023 | 66 | 2023 |
Docci: Descriptions of connected and contrasting images Y Onoe, S Rane, Z Berger, Y Bitton, J Cho, R Garg, A Ku, Z Parekh, ... European Conference on Computer Vision, 291-309, 2024 | 31 | 2024 |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions R Garg, A Burns, BK Ayan, Y Bitton, C Montgomery, Y Onoe, A Bunner, ... arXiv preprint arXiv:2405.02793, 2024 | 17 | 2024 |
Imagen 3 J Baldridge, J Bauer, M Bhutani, N Brichtova, A Bunner, K Chan, Y Chen, ... arXiv preprint arXiv:2408.07009, 2024 | 16 | 2024 |
Mismatch quest: Visual and textual feedback for image-text misalignment B Gordon, Y Bitton, Y Shafir, R Garg, X Chen, D Lischinski, D Cohen-Or, ... European Conference on Computer Vision, 310-328, 2024 | 6 | 2024 |
Greedy growing enables high-resolution pixel-based diffusion models CN Vasconcelos, A Rashwan, A Waters, T Walker, K Xu, J Yan, R Qian, ... Transactions on Machine Learning Research, 2024 | 2 | 2024 |
Automated classification of network-accessible content based on events R Garg US Patent 10,504,145, 2019 | 1 | 2019 |