Akshita Gupta

I am an ELLIS PhD student at TU Darmstadt, co-supervised by Prof. Marcus Rohrbach and Dr. Federico Tombari at Google Zurich. I completed my MASc at the University of Guelph, where I was advised by Prof. Graham Taylor. During that time, I was also a student researcher at the Vector Institute.

I was fortunate to spend time as a research intern at Apple under Dr. Tatiana Likhomanenko, Microsoft under Gaurav Mittal and Mei Chen, Vector Institute under Dr. David Emerson, and as a scientist in residence at NextAI with Prof. Graham Taylor.

Before academia, I worked as a Data Scientist at Bayanat , where I focused on projects related to detection and segmentation. Prior to that, I was a Research Engineer at the Inception Institute of Artificial Intelligence (IIAI), working with Dr. Sanath Narayan, Dr. Salman Khan, and Dr. Fahad Shahbaz Khan.

Email  /  Google Scholar  /  Twitter  /  Github  /  Resume/CV

profile photo
TU Darmstadt
2025-Present
Apple
2024-2025
University of Guelph
2022-2024
Vector Institute
2022-2024
Microsoft Research
2023-2024
NextAI
2024
Bayanat for Mapping & Surveying
2022
Inception Institute of Artificial Intelligence
2018-2022

What's New โœจ

[Mar 2025]๐ŸŽ“ Excited to be an ELLIS PhD student at TU Darmstadt under Prof. Marcus Rohrbach and Dr. Federico Tombari (Google Zurich) ๐ŸŽ‰
[Oct 2024]๐ŸŽ“ Graduated and Defended my Masters Thesis
[Nov 2024]๐Ÿ“ Our paper Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis is now on ArXiv!
[Jun 2024]๐Ÿ Joined Apple as a Research Intern!
[May 2024]๐Ÿง  Serving as a Scientist-in-Residence at NextAI.
[Jan 2024]๐Ÿ† Our paper Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization accepted at WACV 2025 (Oral)! ๐ŸŽค
[Dec 2023]๐Ÿ“š Our work Open-Vocabulary Temporal Action Localization using Multimodal Guidance accepted at BMVC 2024!
[Jun 2023]๐Ÿงช Our paper Generative Multi-Label Zero-Shot Learning accepted at TPAMI 2023.
[Jun 2023]๐Ÿš€ Started interning at Microsoft, ROAR team.
[Jan 2023]๐Ÿค– Interned at Vector Institute with AI Engineering team.
[Sep 2022]๐Ÿ”ฌ Joined Prof. Graham Taylor's Lab and Vector Institute.
[Mar 2022]๐Ÿ… OW-DETR accepted at CVPR 2022.
[Sep 2021]โœ๏ธ Reviewer for CVPR 2023, CVPR 2022, ECCV 2022, ICCV 2021, TPAMI.
[Jul 2021]๐Ÿ… BiAM accepted at ICCV 2021.
[Feb 2021]โœ๏ธ Serving as a reviewer for ML Reproducibility Challenge 2020.
[Jan 2021]๐Ÿ“ Paper out on arXiv: Generative Multi-Label Zero-Shot Learning
[Jul 2020]๐Ÿ… TF-VAEGAN accepted at ECCV 2020.
[Aug 2019]๐Ÿ›ฐ๏ธ A Large-scale Instance Segmentation Dataset for Aerial Images (iSAID) available for download.
[Aug 2018]๐ŸŽค One paper accepted at Interspeech, CHiME Workshop 2018.
[May 2018]๐ŸŒŸ Selected as an Outreachy intern with Mozilla.

Conference and Journal Reviewing ๐Ÿ“š

CVPR (2022โ€“2025)   |   ECCV (2022, 2024)   |   ICCV (2021)   |   TPAMI (Journal)

Invited Talks ๐ŸŽค

  • [Mar 2025] โ€” Gave a talk at UCF CRCV lab โ€” thank you Prof. Shah for hosting me!
  • [Dec 2021] โ€” Computer Vision Talks (YouTube Link)

Research Interests ๐Ÿ”

I am broadly interested in building scalable, multimodal models that combine vision, language, and speech modalities with interests in efficient modeling, temporal understanding, and open-world generalization.

Publications ๐Ÿ“„

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
ArXiv 2025 | Paper
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta*, Gaurav Mittal*, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen
WACV 2025 (Oral) | Arxiv
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor
BMVC 2024 | Paper
Generative Multi-Label Zero-Shot Learning
Akshita Gupta*, Sanath Narayan*, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost van de Weijer
TPAMI 2023 | Paper
OW-DETR: Open-world Detection Transformer
Akshita Gupta*, Sanath Narayan*, Joseph KJ, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
CVPR 2022
Paper / Code
Discriminative Region-based Multi-Label Zero-Shot Learning
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
ICCV 2021
Paper / Code
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao
ECCV 2020
Paper / Code
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
CVPR Workshop 2019 (Oral)
Code / Dataset
Acoustic Features Fusion Using Attentive Multi-Channel Deep Architecture
Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman
Interspeech Workshop 2018 | Code

I borrowed this website layout from here!