Akshita Gupta

I am an ELLIS PhD student at TU Darmstadt, co-supervised by Prof. Marcus Rohrbach and Dr. Federico Tombari at Google Zurich. I completed my MASc at the University of Guelph, where I was advised by Prof. Graham Taylor. During that time, I was also a student researcher at the Vector Institute.

I was fortunate to spend time as a research intern at Apple under Dr. Tatiana Likhomanenko, Microsoft under Gaurav Mittal and Mei Chen, Vector Institute under Dr. David Emerson, and as a scientist in residence at NextAI with Prof. Graham Taylor.

Before academia, I worked as a Data Scientist at Bayanat , where I focused on projects related to detection and segmentation. Prior to that, I was a Research Engineer at the Inception Institute of Artificial Intelligence (IIAI), working with Dr. Sanath Narayan, Dr. Salman Khan, and Dr. Fahad Shahbaz Khan.

Email      Google Scholar      Twitter

I'm always open to collaborations or project supervisions! Just drop me a message :)

profile photo

โ˜• ask me for world's best coffee list

TU Darmstadt2025-Present
Google2026
Apple2024-2025
University of Guelph2022-2024
Vector Institute2022-2024
Microsoft Research2023-2024
NextAI2024
Bayanat2022
Inception Institute of Artificial Intelligence2018-2022

What's New โœจ

[Jun 2026]๐Ÿš€ Started as a Student Researcher at Google Zurich with Dr. Yongqian Xian and Dr. Federico Tombari.
[May 2026]๐ŸŒŸ Co-organizing WICV at ECCV 2026 as lead organizer.
[Mar 2025]๐ŸŽ“ Excited to be an ELLIS PhD student at TU Darmstadt under Prof. Marcus Rohrbach and Dr. Federico Tombari (Google Zurich).
[Oct 2024]๐ŸŽ“ Graduated and defended my Masters Thesis.
[Jun 2024]๐Ÿ Joined Apple as a Research Intern.
[May 2024]๐Ÿง  Serving as a Scientist-in-Residence at NextAI.
[Jan 2024]๐Ÿ† Our paper Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization accepted at WACV 2025 (Oral).
[Dec 2023]๐Ÿ“š Our work Open-Vocabulary Temporal Action Localization using Multimodal Guidance accepted at BMVC 2024.
[Jun 2023]๐Ÿงช Our paper Generative Multi-Label Zero-Shot Learning accepted at TPAMI 2023.
[Jun 2023]๐Ÿš€ Started interning at Microsoft, ROAR team.
[Jan 2023]๐Ÿค– Interned at Vector Institute with AI Engineering team.
[Sep 2022]๐Ÿ”ฌ Joined Prof. Graham Taylor's Lab and Vector Institute.
[Mar 2022]๐Ÿ… OW-DETR accepted at CVPR 2022.
[Sep 2021]โœ๏ธ Reviewer for CVPR 2023, CVPR 2022, ECCV 2022, ICCV 2021, TPAMI.
[Jul 2021]๐Ÿ… BiAM accepted at ICCV 2021.
[Feb 2021]โœ๏ธ Serving as a reviewer for ML Reproducibility Challenge 2020.
[Jan 2021]๐Ÿ“ Paper out on arXiv: Generative Multi-Label Zero-Shot Learning.
[Jul 2020]๐Ÿ… TF-VAEGAN accepted at ECCV 2020.
[Aug 2019]๐Ÿ›ฐ๏ธ A Large-scale Instance Segmentation Dataset for Aerial Images (iSAID) available for download.
[Aug 2018]๐ŸŽค One paper accepted at Interspeech, CHiME Workshop 2018.
[May 2018]๐ŸŒŸ Selected as an Outreachy intern with Mozilla.

Publications

2026

GETR
GETR: Guided Exploration Trajectories for Long-Video Reasoning
Akshita Gupta, Aditya Arora, Hector Garcia Rodriguez, Federico Tombari, Anna Rohrbach, Marcus Rohrbach
Under Review
Audio
Description
From Visual Cues to Spoken Narration: Rethinking Audio Description
Akshita Gupta, Aditya Arora, Federico Tombari, Marcus Rohrbach, Anna Rohrbach
Under Review
ReCap paper preview
ReCap: Lightweight Referential Grounding for Coherent Story Visualization
Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach
Under Review
HaloProbe paper preview
HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
Reihaneh Zohrabi, Hosein Hasani, Akshita Gupta, Mahdieh Soleymani Baghshah, Anna Rohrbach, Marcus Rohrbach
ICML 2026
Multi-modal insect biodiversity dataset preview
A Multi-Modal Dataset for Insect Biodiversity with Imagery and DNA at the Trap and Individual Level
John Quinto, Johanna Orsholm, Brendan Furneaux, Tommi Mononen, Tomas Roslin, Iuliia Zarubiieva, Akshita Gupta, Scott C. Lowe, Graham W. Taylor
Nature Scientific Data 2026 NeurIPS 2025 Workshop on Imageomics ORAL
Video-text-to-speech synthesis preview
Mechanisms of Multimodal Synchronization: Insights from Decoder-Based Video-Text-to-Speech Synthesis
Under Review Paper

2025

BugSR paper preview
BugSR: Improving Tiny Instance Segmentation on the MassID45 Dataset
John Quinto, Scott C. Lowe, Akshita Gupta, Johanna Orsholm, Prajakta Darade, Iuliia Zarubiieva, Brendan Furneaux, Tommi Mononen, Tomas Roslin, Graham W. Taylor
NeurIPS 2025 Workshop on Imageomics
AIM Challenge report preview
RestormerL - Efficient Real-World Deblurring using Single Images: AIM 2025 Challenge Report
Aditya Arora, Akshita Gupta, Anna Rohrbach, Marcus Rohrbach
AIM 2025 Challenge Report Runner-up Challenge Award
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta*, Gaurav Mittal*, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen
WACV 2025 ORAL Arxiv

2024

Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor
BMVC 2024 Paper

2023

Generative Multi-Label Zero-Shot Learning
Akshita Gupta*, Sanath Narayan*, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost van de Weijer
TPAMI 2023 Paper

2022

OW-DETR: Open-world Detection Transformer
Akshita Gupta*, Sanath Narayan*, Joseph KJ, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
CVPR 2022
Paper Code

2021

Discriminative Region-based Multi-Label Zero-Shot Learning
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
ICCV 2021
Paper Code

2020

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao
ECCV 2020
Paper Code

2019

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
CVPR Workshop 2019 ORAL
Code Dataset

2018

Acoustic Features Fusion Using Attentive Multi-Channel Deep Architecture
Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman
Interspeech Workshop 2018 Code

I borrowed this website layout from here!