Update: I am actively looking for full-time positions in the industry starting May 2024. Reach out to me at prithvijit3 [at] gatech [dot] edu if you think I am a good fit for your team.
I am a final year CS Ph.D. student at Georgia Tech, advised by Prof. Judy Hoffman. I am broadly interested in problems at the intersection of Computer Vision and Machine Learning. Despite their remarkable success, computer vision systems often exhibit aberrant behaviors (reduced performance, uncalibrated predictions, etc.) under distribution shifts. My doctoral research (so far) has focused on the problem of out-of-distribution generalization – how can we develop computer vision systems that can generalize or adapt across changing conditions in a reliable manner?
My research has made progress along the following fundamental steps to be closer to accomplishing this goal:
Before this, I earned my Masters in Computer Science (awarded M.S. Research Award) in Spring 2019 from Georgia Tech, advised by Prof. Devi Parikh and Prof. Dhruv Batra, where I worked on a host of vision-language & ML problems – exploring human-AI teams to evaluate explanations, improving responses of & developing cooperative testing for conversational agents, zero-shot transfer, state abstraction in RL and scene-understanding. I earned my Bachelors in Electrical Engineering in 2016 from Delhi Technological University, India, where I worked on developing (now outdated) autonomous underwater vehicles (AUVs).
In past years, I have had the fortune to conduct research at PRIOR, Allen Institute of Artificial Intelligence (Summer 2022, Summer 2020) mentored by Ani Kembhavi & Roozbeh Mottaghi; Deep Learning Group, Microsoft Research Redmond (Summer 2018) mentored by Hamid Palangi; Robotics Research Lab, IIIT Hyderabad (Winter 2014) mentored by Dr. K. Madhava Krishna and Indian Association for the Cultivation of Science (IACS), Kolkata (Summer 2014) mentored by Dr. Soumitra Sengupta on a diverse set of topics - ranging from embodied AI, vision & language to physics.
I also actively participate in reviewing for top computer vision and machine learning conferences & workshops (have accumulated a few reviewer awards - CVPR 2023, CVPR 2022, CVPR 2021, ICLR 2022, MLRC 2021, ICML 2020, NeurIPS 2019, ICLR 2019, NeurIPS 2018 - in the process).
Summer 2020, 2022
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding Sahil Khose*, Anisha Pal*, Aayushi Agarwal*, Deepanshi Deepanshi*, Judy Hoffman, Prithvijit Chattopadhyay ArXiv, 2023 TL;DR, We introduce SkyScenes, a large-scale synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layout (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) Models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions, and (4) additionally incorporating other sensor modalities (depth) can improve aerial scene understanding. [PDF]
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Uday Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Adrien Bardes, Mark Ibrahim, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein NeurIPS Datasets and Benchmarks, 2023 TL;DR, Most neural network based computer vision systems are built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. We benchmark a diverse suite of pretrained models across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Our Battle of Backbones (BoB) sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing backbones through a comprehensive analysis conducted on 1500 training runs. [PDF] [Code]
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images Viraj Prabhu, Sriram Yenamandra, Prithvijit Chattopadhyay, Judy Hoffman NeurIPS, 2023 TL;DR, We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. [PDF] [code] [project page]
AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images Prithvijit Chattopadhyay, Bharat Goyal, Boglarka Ecsedi, Viraj Prabhu, Judy Hoffman Workshop on Uncertainty Quantification for Computer Vision, ICCV, 2023 (Extended Abstract) TL;DR, Mispredictions made by Sim2Real adaptation methods on real data can often be attributed to “miscalibration” – often caused by overconfident predictions. We propose a simple patch, AugCal, to improve uncertainty calibration of existing Sim2Real adaptation methods. [PDF]
Benchmarking Low-Shot Robustness to Natural Distribution Shifts Aaditya Singh, Kartik Sarangmath, Prithvijit Chattopadhyay, Judy Hoffman ICCV, 2023 TL;DR, Robustness to natural distribution shifts has seen remarkable progress thanks to recent pre-training strategies combined with better fine-tuning methods. However, such fine-tuning assumes access to large amounts of labelled data, and the extent to which the observations hold when the amount of training data is not as high remains unknown. We address this gap by performing the first in-depth study of robustness to various natural distribution shifts in different low-shot regimes: spanning datasets, architectures, pre-trained initializations, and state-of-the-art robustness interventions. [PDF] [code]
PASTA: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization Prithvijit Chattopadhyay*, Kartik Sarangmath*, Vivek Vijaykumar, Judy Hoffman ICCV, 2023 TL;DR, PASTA is a simple and effective frequency domain augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. PASTA involves perturbing the amplitude spectra of the synthetic images in the Fourier domain in a structured manner to generate augmented views. For the tasks of semantic segmentation (GTAV→Real), object detection (Sim10K→Real), and object recognition (VisDA-C Syn→Real), across a total of 5 syn-to-real shifts, we find that PASTA either outperforms or is consistently competitive with more complex state-of-the-art methods while being complementary to other generalization approaches. [PDF] [code]
RobustNav: Towards Benchmarking Robustness in Embodied Navigation Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Ani Kembhavi ICCV, 2021 Oral presentation TL;DR, As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition dynamics – corruptions. We find that standard end-to-end RL policies significantly underperform (or fail) in the presence of visual or dynamics corruptions, warranting more research in this direction. [PDF] [code] [project page] [video]
Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses Fu Lin, Rohit Mittapali, Prithvijit Chattopadhyay, Daniel Bolya, Judy Hoffman Adversarial Robustness in the Real World (AROW) ECCV, 2020 NVIDIA Best Paper Runner Up TL;DR, Convolutional Neural Networks (CNNs) have been shown to be vulnerable to adversarial examples, which are known to locate in subspaces close to where normal data lies but are not naturally occurring and have low probability. In this work, we investigate the potential effect defense techniques have on the geometry of the likelihood landscape - likelihood of the input images under the trained model. We first propose a way to visualize the likelihood landscape by leveraging an energy-based model interpretation of discriminative classifiers. Then we introduce a measure to quantify the flatness of the likelihood landscape. We observe that a subset of adversarial defense techniques results in a similar effect of flattening the likelihood landscape. We further explore directly regularizing towards a flat landscape for adversarial robustness. [PDF] [video]
Learning to Balance Specificity and Invariance for In and Out of Domain Generalization Prithvijit Chattopadhyay, Yogesh Balaji, Judy Hoffman ECCV, 2020 Visual Learning with Limited Labels (LwLL) CVPR, 2020 TL;DR, We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance. To produce a model which best generalizes to both seen and unseen domains, we propose learning domain specific masks (encouraged to learn a balance of domain-invariant and domain-specific features) enabling a model to benefit from the predictive power of specialized features while retaining the universal applicability of domain-invariant features. We demonstrate competitive performance compared to naive baselines and state-of-the-art methods on both PACS and DomainNet. [PDF] [code] [video]
Improving Generative Visual Dialog by Answering Diverse Questions Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das EMNLP, 2019 Visual Question Answering and Dialog Workshop, CVPR, 2019 TL;DR, While generative visual dialog models trained with self-talk based RL perform better at the associated downstream task, they suffer from repeated interactions -- resulting in saturation in improvements as the number of rounds increase. To counter this, we devise a simple auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and in turn enabling A-Bot to explore a larger state space during RL i.e., be exposed to more visual concepts to talk about, and varied questions to answer. [PDF] [code]
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam IJCAI, 2020 Workshop on Task Agnostic Reinforcement Learning (TARL) ICLR, 2019 TL;DR, We propose a novel framework to identify subgoals useful for exploration in sequential decision making tasks under partial observability. We utilize the variational intrinsic control framework (Gregor et.al., 2016) which maximizes empowerment – the ability to reliably reach a diverse set of states and show how to identify sub-goals as states with high necessary option information through an information theoretic regularizer. Despite being discovered without explicit goal supervision, our subgoals provide better exploration and sample complexity on challenging grid-world navigation tasks compared to supervised counterparts in prior work. [PDF]
EvalAI: Towards Better Evaluation Systems for AI Agents Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra Workshop on AI Systems, SOSP, 2019 TL;DR, We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe. [PDF] [code]
Choose Your Neuron: Incorporating Domain-Knowledge through Neuron-Importance Ramprasaath R. Selvaraju*, Prithvijit Chattopadhyay*, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee ECCV, 2018 Continual Learning Workshop NeurIPS, 2018 Visually Grounded Interaction and Language (ViGIL) Workshop NeurIPS, 2018 TL;DR, We introduce a simple, efficient zero-shot learning approach -- NIWT -- based on the observation that individual neurons in CNNs have been shown to implicitly learn a dictionary of semantically meaningful concepts (simple textures and shapes to whole or partial objects). NIWT learns to map domain knowledge about "unseen" classes onto this dictionary of learned concepts and optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks. [PDF] [code] [article]
Do explanation modalities make VQA Models more predictable to a human? Arjun Chandrasekaran*, Viraj Prabhu*, Deshraj Yadav*, Prithvijit Chattopadhyay*, Devi Parikh EMNLP, 2018 TL;DR, A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable 'explanations' of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model -- its responses as well as failures -- more predictable to a human. [PDF]
Evaluating Visual Conversational Agents via Cooperative Human-AI Games Prithvijit Chattopadhyay*, Deshraj Yadav*, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh HCOMP, 2017 Oral presentation TL;DR, We design a cooperative game - GuessWhich - to measure human-AI team performance in the specific context of the AI being a visual conversational agent. GuessWhich involves live interaction between the human and the AI and is designed to gauge the extent to which progress in isolated metrics for AI (& AI-AI teams) transfers to human-AI collaborative scenarios. [PDF] [code]
It Takes Two to Tango: Towards Theory of AI's Mind Arjun Chandrasekaranu*, Deshraj Yadav*, Prithvijit Chattopadhyay*, Viraj Prabhu*, Devi Parikh Chalearn Looking at People Workshop CVPR, 2017 TL;DR, To effectively leverage the progress in Artificial Intelligence (AI) to make our lives more productive, it is important for humans and AI to work well together in a team. In this work, we argue that for human-AI teams to be effective, in addition to making AI more accurate and human-like, humans must also develop a theory of AI's mind (ToAIM) - get to know its strengths, weaknesses, beliefs, and quirks. [PDF] [code]
Counting Everyday Objects in Everyday Scenes Prithvijit Chattopadhyay*, Ramakrishna Vedantam*, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh CVPR, 2017 Spotlight presentation TL;DR, We study the numerosity of object classes in natural, everyday images and build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. We propose a contextual counting approach inspired by the phenomenon of subitizing - the ability of humans to make quick assessments of counts given a perceptual signal, for small count values. [PDF] [code]
Exploring Weak-Supervision and Generative Models for Semantic Segmentation 2018 Prithvijit Chattopadhyay, Ramprasaath R. Selvaraju, Viraj Prabhu [Report PDF]
DTU AUV: Autonomous Underwater Vehicle Prithvijit Chattopadhyay (Acoustics & Control Systems Department) (co-authored with DTU AUV members) 2012-2016 [Report PDF]
Evaluating Visual Conversational Agents in the Context of Human-AI Cooperative Games Masters in Computer Science (specialization Machine Learning) 2017-2019 [PDF]
(Design and CSS Courtesy: Shiori Sagawa)