prithvijit3 [at] gatech [dot] edu
Github | Google Scholar | Semantic Scholar | CV | Twitter | LinkedIn
Update: I am actively looking for full-time positions in the industry starting May 2024. Reach out to me at prithvijit3 [at] gatech [dot] edu if you think I am a good fit for your team.
I am a final year CS Ph.D. student at Georgia Tech, advised by Prof. Judy Hoffman. I am broadly interested in problems at the intersection of Computer Vision and Machine Learning. Despite their remarkable success, computer vision systems often exhibit aberrant behaviors (reduced performance, uncalibrated predictions, etc.) under distribution shifts. My doctoral research (so far) has focused on the problem of out-of-distribution generalization – how can we develop computer vision systems that can generalize or adapt across changing conditions in a reliable manner?
My research has made progress along the following fundamental steps to be closer to accomplishing this goal:
Before this, I earned my Masters in Computer Science (awarded M.S. Research Award) in Spring 2019 from Georgia Tech, advised by Prof. Devi Parikh and Prof. Dhruv Batra, where I worked on a host of vision-language & ML problems – exploring human-AI teams to evaluate explanations, improving responses of & developing cooperative testing for conversational agents, zero-shot transfer, state abstraction in RL and scene-understanding. I earned my Bachelors in Electrical Engineering in 2016 from Delhi Technological University, India, where I worked on developing (now outdated) autonomous underwater vehicles (AUVs).
In past years, I have had the fortune to conduct research at PRIOR, Allen Institute of Artificial Intelligence (Summer 2022, Summer 2020) mentored by Ani Kembhavi & Roozbeh Mottaghi; Deep Learning Group, Microsoft Research Redmond (Summer 2018) mentored by Hamid Palangi; Robotics Research Lab, IIIT Hyderabad (Winter 2014) mentored by Dr. K. Madhava Krishna and Indian Association for the Cultivation of Science (IACS), Kolkata (Summer 2014) mentored by Dr. Soumitra Sengupta on a diverse set of topics - ranging from embodied AI, vision & language to physics.
I also actively participate in reviewing for top computer vision and machine learning conferences & workshops (have accumulated a few reviewer awards - CVPR 2023, CVPR 2022, CVPR 2021, ICLR 2022, MLRC 2021, ICML 2020, NeurIPS 2019, ICLR 2019, NeurIPS 2018 - in the process).
2012-2016
Winter 2014
2016-2017
2017-current
Summer 2018
Summer 2020, 2022
Read More |
Read More |
SkyScenes: A Synthetic Dataset for
Aerial Scene Understanding
Sahil Khose*, Anisha Pal*, Aayushi Agarwal*, Deepanshi
Deepanshi*, Judy Hoffman, Prithvijit
Chattopadhyay
ArXiv, 2023
TL;DR, We introduce SkyScenes, a large-scale synthetic dataset of densely
annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate
SkyScenes images from CARLA to comprehensively capture diversity across layout (urban and rural maps),
weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth
annotations. Through our experiments using SkyScenes, we show that (1) Models trained on SkyScenes
generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes
data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how
models respond to changes in viewpoint conditions, and (4) additionally incorporating other sensor
modalities (depth) can improve aerial scene understanding.
[PDF]
Battle of the Backbones: A Large-Scale Comparison of Pretrained
Models across Computer Vision Tasks
Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Uday Prabhu, Gowthami Somepalli,
Prithvijit Chattopadhyay, Adrien Bardes, Mark Ibrahim, Judy Hoffman, Rama Chellappa, Andrew Gordon
Wilson, Tom Goldstein
NeurIPS Datasets and Benchmarks, 2023
TL;DR, Most neural network based computer vision systems are built on a backbone,
a pretrained or randomly initialized feature extractor. Several years ago, the default option was an
ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless
backbones pretrained using various algorithms and datasets. We benchmark a diverse suite of pretrained
models across a diverse set of computer vision tasks ranging from classification to object detection to OOD
generalization and more. Our Battle of Backbones (BoB) sheds light on promising directions for the research
community to advance computer vision by illuminating strengths and weakness of existing backbones through a
comprehensive analysis conducted on 1500 training runs.
[PDF]
[Code]
LANCE: Stress-testing Visual Models by Generating
Language-guided
Counterfactual Images
Viraj Prabhu,
Sriram Yenamandra,
Prithvijit Chattopadhyay,
Judy Hoffman
NeurIPS, 2023
TL;DR, We propose an automated algorithm to stress-test a trained visual model by
generating language-guided counterfactual test images (LANCE). Our method leverages
recent progress in large language modeling and text-based image editing to augment an IID test set with a
suite of diverse, realistic, and challenging test images
without altering model weights.
[PDF]
[code]
[project page]
AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration
on Augmented Synthetic Images
Prithvijit Chattopadhyay, Bharat Goyal, Boglarka Ecsedi, Viraj Prabhu, Judy Hoffman
Workshop on Uncertainty Quantification for Computer Vision, ICCV, 2023 (Extended
Abstract)
TL;DR, Mispredictions made by Sim2Real adaptation methods on real data can often
be attributed to “miscalibration” – often caused by overconfident predictions. We propose a simple patch,
AugCal, to improve uncertainty calibration of existing Sim2Real adaptation methods.
[PDF]
Benchmarking Low-Shot Robustness to Natural Distribution Shifts
Aaditya Singh,
Kartik Sarangmath,
Prithvijit Chattopadhyay,
Judy Hoffman
ICCV, 2023
TL;DR, Robustness to natural distribution shifts has seen remarkable progress
thanks to recent pre-training strategies combined with better fine-tuning methods. However, such fine-tuning
assumes access to large amounts of labelled data, and the extent to which the observations hold when the
amount of training data is not as high remains unknown. We address this gap by performing the first in-depth
study of robustness to various natural distribution shifts in different low-shot regimes: spanning datasets,
architectures, pre-trained initializations, and state-of-the-art robustness interventions.
[PDF]
[code]
PASTA: Proportional Amplitude Spectrum Training Augmentation for
Syn-to-Real Domain Generalization
Prithvijit Chattopadhyay*,
Kartik Sarangmath*,
Vivek Vijaykumar,
Judy Hoffman
ICCV, 2023
TL;DR, PASTA is a simple and effective frequency domain augmentation strategy to
improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. PASTA involves perturbing
the amplitude spectra of the synthetic images in the Fourier domain in a structured manner to generate
augmented views. For the tasks of semantic segmentation (GTAV→Real), object detection (Sim10K→Real), and
object recognition (VisDA-C Syn→Real), across a total of 5 syn-to-real shifts, we find that PASTA either
outperforms or is consistently competitive with more complex state-of-the-art methods while being
complementary to other generalization approaches.
[PDF]
[code]
RobustNav: Towards Benchmarking Robustness in Embodied
Navigation
Prithvijit Chattopadhyay,
Judy Hoffman,
Roozbeh Mottaghi,
Ani Kembhavi
ICCV, 2021
Oral presentation
TL;DR, As an attempt towards assessing the robustness of embodied navigation
agents, we
propose RobustNav, a framework to quantify the performance of embodied navigation agents when
exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition
dynamics – corruptions. We find that standard end-to-end RL policies significantly underperform (or
fail) in the presence of visual or dynamics corruptions, warranting more research in this direction.
[PDF]
[code]
[project page]
[video]
Likelihood Landscapes: A Unifying Principle Behind Many
Adversarial Defenses
Fu Lin,
Rohit Mittapali,
Prithvijit Chattopadhyay,
Daniel Bolya,
Judy Hoffman
Adversarial Robustness in the Real World (AROW) ECCV, 2020
NVIDIA Best Paper Runner Up
TL;DR, Convolutional Neural Networks (CNNs) have been shown to be vulnerable to
adversarial examples, which are known to locate in subspaces close to where normal data lies but are
not naturally occurring and have low probability. In this work, we investigate the potential effect
defense techniques have on the geometry of the likelihood landscape - likelihood of the input images
under the trained model. We first propose a way to visualize the likelihood landscape by leveraging
an energy-based model interpretation of discriminative classifiers. Then we introduce a measure to
quantify the flatness of the likelihood landscape. We observe that a subset of adversarial defense
techniques results in a similar effect of flattening the likelihood landscape. We further explore
directly regularizing towards a flat landscape for adversarial robustness.
[PDF]
[video]
Learning to Balance Specificity and Invariance for In and Out of
Domain Generalization
Prithvijit Chattopadhyay,
Yogesh Balaji,
Judy Hoffman
ECCV, 2020
Visual Learning with Limited Labels (LwLL) CVPR, 2020
TL;DR, We introduce Domain-specific Masks for Generalization, a model for
improving both
in-domain and out-of-domain generalization performance. To produce a model which best generalizes
to both seen and unseen domains, we propose learning domain specific masks (encouraged
to learn a balance of domain-invariant and domain-specific features) enabling a model to
benefit from the predictive power of specialized features while retaining the universal
applicability of domain-invariant features. We demonstrate competitive performance compared to naive
baselines and state-of-the-art methods on both PACS and DomainNet.
[PDF]
[code]
[video]
Improving Generative Visual Dialog by Answering Diverse
Questions
Vishvak Murahari,
Prithvijit Chattopadhyay,
Dhruv Batra,
Devi Parikh,
Abhishek Das
EMNLP, 2019
Visual Question Answering and Dialog Workshop,
CVPR, 2019
TL;DR, While generative visual dialog models trained with self-talk based RL
perform
better at the associated downstream task, they suffer from repeated interactions -- resulting in
saturation in improvements as the number of rounds increase. To counter this, we devise a simple
auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and
in turn enabling A-Bot to explore a larger state space during RL i.e., be exposed to more visual
concepts to talk about, and varied questions to answer.
[PDF]
[code]
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL
Nirbhay Modhe,
Prithvijit Chattopadhyay,
Mohit Sharma,
Abhishek Das,
Devi Parikh,
Dhruv Batra,
Ramakrishna Vedantam
IJCAI, 2020
Workshop on Task Agnostic Reinforcement Learning (TARL) ICLR, 2019
TL;DR, We propose a novel framework to identify subgoals useful for exploration
in sequential decision
making tasks under partial observability. We utilize
the variational intrinsic control framework (Gregor et.al., 2016) which maximizes empowerment –
the ability to reliably reach a diverse set of states
and show how to identify sub-goals as states with
high necessary option information through an information theoretic regularizer. Despite being discovered
without explicit goal supervision, our subgoals provide better exploration and sample complexity on
challenging grid-world navigation tasks
compared to supervised counterparts in prior work.
[PDF]
EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav,
Rishabh Jain,
Harsh Agrawal,
Prithvijit Chattopadhyay,
Taranjeet Singh,
Akash Jain,
Shiv Baran Singh,
Stefan Lee,
Dhruv Batra
Workshop on AI Systems, SOSP, 2019
TL;DR, We introduce EvalAI, an open source platform for evaluating and comparing
machine
learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a
scalable solution to the research community to fulfill the critical need of evaluating machine
learning models and agents acting in an environment against annotations or with a human-in-the-loop.
This will help researchers, students, and data scientists to create, collaborate, and participate in
AI challenges organized around the globe.
[PDF]
[code]
Choose Your Neuron: Incorporating Domain-Knowledge through
Neuron-Importance
Ramprasaath R. Selvaraju*,
Prithvijit Chattopadhyay*,
Mohamed Elhoseiny,
Tilak Sharma,
Dhruv Batra,
Devi Parikh,
Stefan Lee
ECCV, 2018
Continual Learning Workshop NeurIPS, 2018
Visually Grounded Interaction and Language (ViGIL) Workshop NeurIPS, 2018
TL;DR, We introduce a simple, efficient zero-shot learning approach -- NIWT --
based on
the observation that individual neurons in CNNs have been shown to implicitly learn a dictionary of
semantically meaningful concepts (simple textures and shapes to whole or partial objects). NIWT
learns to map domain knowledge about "unseen" classes onto this dictionary of learned concepts and
optimizes for network parameters that can effectively combine these concepts - essentially learning
classifiers by discovering and composing learned semantic concepts in deep networks.
[PDF]
[code]
[article]
Do explanation modalities make VQA Models more predictable to a
human?
Arjun Chandrasekaran*,
Viraj Prabhu*,
Deshraj Yadav*,
Prithvijit Chattopadhyay*,
Devi Parikh
EMNLP, 2018
TL;DR, A rich line of research attempts to make deep neural networks more
transparent by
generating human-interpretable 'explanations' of their decision process, especially for interactive
tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed
make a VQA model -- its responses as well as failures -- more predictable to a human.
[PDF]
Evaluating Visual Conversational Agents via Cooperative Human-AI
Games
Prithvijit Chattopadhyay*,
Deshraj Yadav*,
Viraj Prabhu,
Arjun Chandrasekaran,
Abhishek Das,
Stefan Lee,
Dhruv Batra,
Devi Parikh
HCOMP, 2017
Oral presentation
TL;DR, We design a cooperative game - GuessWhich - to measure human-AI team
performance in
the specific context of the AI being a visual conversational agent. GuessWhich involves live
interaction between the human and the AI and is designed to gauge the extent to which progress in
isolated metrics for AI (& AI-AI teams) transfers to human-AI collaborative scenarios.
[PDF]
[code]
It Takes Two to Tango: Towards Theory of AI's Mind
Arjun Chandrasekaranu*,
Deshraj Yadav*,
Prithvijit Chattopadhyay*,
Viraj Prabhu*,
Devi Parikh
Chalearn Looking at People Workshop CVPR, 2017
TL;DR, To effectively leverage the progress in Artificial Intelligence (AI) to
make our
lives more productive, it is important for humans and AI to work well together in a team. In this
work, we argue that for human-AI teams to be effective, in addition to making AI more accurate and
human-like, humans must also develop a theory of AI's mind (ToAIM) - get to know its strengths,
weaknesses, beliefs, and quirks.
[PDF]
[code]
Counting Everyday Objects in Everyday Scenes
Prithvijit Chattopadhyay*,
Ramakrishna Vedantam*,
Ramprasaath R. Selvaraju,
Dhruv Batra,
Devi Parikh
CVPR, 2017
Spotlight presentation
TL;DR, We study the numerosity of object classes in natural, everyday images and
build
dedicated models for counting designed to tackle the large variance in counts, appearances, and
scales of objects found in natural scenes. We propose a contextual counting approach inspired by the
phenomenon of subitizing - the ability of humans to make quick assessments of counts given a
perceptual signal, for small count values.
[PDF]
[code]
Exploring Weak-Supervision and Generative Models for Semantic Segmentation 2018 Prithvijit Chattopadhyay, Ramprasaath R. Selvaraju, Viraj Prabhu [Report PDF]
DTU AUV: Autonomous Underwater Vehicle Prithvijit Chattopadhyay (Acoustics & Control Systems Department) (co-authored with DTU AUV members) 2012-2016 [Report PDF]
Evaluating Visual Conversational Agents in the Context of Human-AI Cooperative Games Masters in Computer Science (specialization Machine Learning) 2017-2019 [PDF]
Reviewing: CVPR 2018-23, ICCV 2023, ICRA 2021-22, ECCV 2018, NeurIPS 2018-21,23, ICLR 2019-22, ICML 2019-20, ACL 2019, TPAMI
(Design and CSS Courtesy: Shiori Sagawa)