Akshita Guptaअक्षिता गुप्ता

I am an ELLIS PhD student at TU Darmstadt, co-supervised by Prof. Marcus Rohrbach and Dr. Federico Tombari at Google Zurich. I completed my MASc at the University of Guelph, where I was advised by Prof. Graham Taylor. During that time, I was also a student researcher at the Vector Institute.

I was fortunate to spent time as an research intern at Apple under Dr. Tatiana Likhomanenko, Microsoft under Gaurav Mittal and Mei Chen, Vector Institute under Dr. David Emerson, and as a scientist in residence at NextAI Prof. Graham Taylor.

Before starting coming to academia, I worked as a Data Scientist at Bayanat, where I focused on projects related to detection and segmentation. Prior to that, I was a Research Engineer at the Inception Institute of Artificial Intelligence (IIAI), working with Dr. Sanath Narayan, Dr. Salman Khan, and Dr. Fahad Shahbaz Khan. At IIAI, my research primarily involved open-world and zero-shot object detection, generative adversarial networks (GANs), and few- and zero-shot learning.

Email  /  Google Scholar  /  Twitter  /  Github  /  Resume/CV

profile photo

What's New

[Mar 2025] Excited to start my PhD in Computer Science at TU Darmstadt under Prof. Marcus Rohrbach! 🎉
[Sep 2024] Defended my Masters Thesis
[Jun 2024] Joined Apple as a Research Intern
[May 2024] Serving as a Scientist-in-Residence at NextAI.
[Mar 2024] Our paper Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis is now on ArXiv 2025!
[Jan 2024] Our paper Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization is accepted at WACV 2025 (Oral)! 🎤
[Dec 2023] Our work Open-Vocabulary Temporal Action Localization using Multimodal Guidance is accepted at BMVC 2024!
[Jun 2023] Our paper Generative Multi-Label Zero-Shot Learning is accepted at TPAMI 2023.
[Jun 2023] Started interning at Microsoft, ROAR team.
[Jan 2023] Interned at Vector Institute with AI Eng team.
[Sep 2022] Joined Prof. Graham Taylor's Lab and Vector Institute.
[Mar 2022] OW-DETR accepted at CVPR 2022.
[Sep 2021] Reviewer for CVPR 2023, CVPR 2022, ECCV 2022, ICCV 2021, TPAMI.
[Jul 2021] BiAM accepted at ICCV 2021.
[Feb 2021] Serving as a reviewer for ML Reproducibility Challenge 2020.
[Jan 2021] Paper out on arxiv: Generative Multi-Label Zero-Shot Learning
[Jul 2020] TF-VAEGAN accepted at ECCV 2020.
[Aug 2019] A Large-scale Instance Segmentation Dataset for Aerial Images (iSAID) is available for download .
[Aug 2018] One paper accepted at Interspeech, chime workshop 2018.
[May 2018] Selected as an Outreachy intern, with Mozilla.

Research

I'm interested in developing models which can learn with limited data and few, zero or one training sample(s).

OW-DETR: Open-world Detection Transformer
Akshita Gupta*, Sanath Narayan*, Joseph KJ, Salman Khan, Fahad Shahbaz Khan,
Mubarak Shah
CVPR 2022
paper / code
  • Description: Developed multi-scale context aware detection framework with attention-driven psuedo-labelling.
  • Outcome: Improved state-of-the-art performances on MS-COCO dataset with absolute gains ranging from 1.8% to 3.3% in terms of unknown recall.
Discriminative Region-based Multi-Label Zero-Shot Learning
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan,
Ling Shao, Mubarak Shah
ICCV 2021
paper / code
  • Description: Developed a attention module which combines both region-level and global-level contextual information.
  • Outcome: Improved state-of-the-art performances on NUS-WIDE, OpenImages by 6.9% and 31.9% mAP score.
Generative Multi-Label Zero-Shot Learning
Akshita Gupta*, Sanath Narayan*, Salman Khan, Fahad Shahbaz Khan,
Ling Shao, Joost van de Weijer
Under Review in TPAMI
paper / code
  • Description: Developed a generative model that constructs multi-label features for (generalized) zero-shot learning.
  • Outcome: Improved state-of-the-art performances on NUS-WIDE, OpenImages and MS-COCO by 3.3%, 4.3% and 15.7% mAP score.
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Sanath Narayan*, Akshita Gupta*, Salman Khan, Fahad Shahbaz Khan,
Cees G. M. Snoek, Ling Shao,
ECCV 2020
paper / code
  • Description: Developed a generative feature synthesizing framework for zero-shot learning.
  • Outcome: Improved state-of-the-art performances on CUB, FLO, SUN, and AWA by 4.6%, 7.1%, 1.7%, and 3.1% harmonic mean by enforcing semantic consistency at all stages of zero-shot learning.
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai
CVPR Workshop 2019 (Oral Presentation)
code / dataset
  • Description: Improved state of the art object detector (Mask-RCNN and PANet) for aerial imagery.
  • Outcome: Proposed a large scale instance segmentation and object detection dataset (iSAID) with benchmarking on mask-RCNN and PANet.
Acoustic features fusion using attentive multi-channel deep architecture
Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman
Interspeech Workshop 2018
code
  • Description: Developed an attention based framework for acoustic scene recognition and audio tagging.
  • Outcome: Improved the equal error rate by atleast 3% over the Dcase challenge results.
Research Experience
Data Scientist, Bayanat
January, 2022 - present
Supervisors: Dr Meng Wang, Dr Fan Zhu
Research Engineer, Inception Institute of Artificial Intelligence
Dec 2018 - present
Supervisors: Dr Sanath Narayan, Dr Salman Khan, Dr Fahad Shahbaz khan

  • Developing deep learning algorithms for low- (Few- and zero-) shot detection and classification, generative adversarial models and open-world object detection problems.
  • Developed rock & seismic layer classification system.
  • Worked on satellite-imagery object detection and object counting system.

Research & Development Intern, Mozilla, Outreachy
May 2018 – Aug 2018
Supervisor: Emma Irwin

Developed an open source analytics dashboard prototype with the metrics to evaluate diversity and inclusion across different communities.

Undergraduate Researcher, Indian Institute of Technology
May 2017 – Dec 2018
Supervisor: Dr R Balasubramanian

Worked on acoustic scene recognition and audio tagging using attention networks. Paper accepted in Interspeech-CHIME 2018.


I borrowed this website layout from here!