Prithvijit Chattopadhyay

I am a 2nd year CS PhD student at Georgia Tech, advised by Prof. Judy Hoffman. I also collaborate with Prof. Devi Parikh and Prof. Dhruv Batra. I recently earned my Masters in Computer Science (focus on Machine Learning) from Georgia Tech, advised by Prof. Devi Parikh. Prior to joining Georgia Tech, I was working as a Research Assistant in the Computer Vision Machine Learning and Perception Lab (CVMLP) at Virginia Tech, advised by Prof. Devi Parikh and Prof. Dhruv Batra. I earned my Bachelors in Electrical Engineering in 2016 from Delhi Technological University, India.

In the past couple of years, I have had the fortune to intern / conduct research at Deep Learning Group, Microsoft Research Redmond (Summer 2018) mentored by Hamid Palangi; Robotics Research Lab, IIIT Hyderabad (Winter 2014) mentored by Dr. K. Madhava Krishna and Indian Association for the Cultivation of Science (IACS), Kolkata (Summer 2014) mentored by Dr. Soumitra Sengupta on a diverse set of topics - ranging from vision & language to robotics to theoretical physics.

I occasionally play the percussion instrument Tabla. I am very passionate about movies. I love to break them down shot-by-shot and analyze them. Every single frame is important.

Email  /  CV  /  Google Scholar  /  LinkedIn  /  Github  /  Twitter  /  Instagram

profile photo

I am interested in developing reliable computer vision systems (including applications of Machine Learning and Reinforcement Learning) that

  • can perceive and reason based on multimodal sensory information
  • are interpretable so that predictions made by such systems can be explained
  • are transferable so that they can be adapted across different domains with ease and limited supervision

Representative papers are listed under Papers.

  • Recognized as an outstanding reviewer for CVPR 2021!
  • Recognized to be among the top 33 percent reviewers for ICML 2020!
  • Likelihood Landscapes received the NVIDIA Best Runner Up paper award at AROW, ECCV 2020!
  • Awarded the College of Computing's CS7001 Research Award at Georgia Tech!
  • Recognized as one of the highest-scoring reviewers for NeurIPS 2019!
  • Recognized as an outstanding reviewer for ICLR 2019!
  • Recognized to be among the top 30 percent highest scoring reviewers for NeurIPS 2018!
  • Awarded the College of Computing's MS Research Award at Georgia Tech!
  • Our team won VT Hacks 2017, a Major League Hacking Event, 2017!
  • Our undergraduate team, DTU-AUV, qualified for the semi-finals at AUVSI Robosub 2014!
  • Awarded Merit Scholarships from 2012-2014 for undergraduate academic performance!
  • Selected for KVPY and INSPIRE Fellowships, 2012 for undergraduate studies in basic sciences!
  • Placed among the top 1 percent students in the country in INPhO 2012!
  • Selected for rigorous mathematical training camps conducted by mathematicians from Bhabha Atomic Research Center (BARC) and Indian Institute of Science (IISc) in 2012!
  • Selected for CSIR Programme on Youth Leadership in Science, 2010!
Papers (* joint first authors)
3DSP RobustNav: Towards Benchmarking Robustness in Embodied Navigation
Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Ani Kembhavi
ICCV, 2021 (Oral Presentation)
arxiv / code / project page

As an attempt towards assessing the robustness of embodied navigation agents, we propose RobustNav, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition dynamics – corruptions. Most recent efforts in visual navigation have typically focused on generalizing to novel target environments with similar appearance and dynamics characteristics. With RobustNav, we find that some standard embodied navigation agents significantly underperform (or fail) in the presence of visual or dynamics corruptions. We systematically analyze the kind of idiosyncrasies that emerge in the behavior of such agents when operating under corruptions. Finally, for visual corruptions in RobustNav, we show that while standard techniques to improve robustness such as data-augmentation and self-supervised adaptation offer some zero-shot resistance and improvements in navigation performance, there is still a long way to go in terms of recovering lost performance relative to clean “non-corrupt” settings, warranting more research in this direction.

3DSP Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses
Fu Lin, Rohit Mittapali, Prithvijit Chattopadhyay, Daniel Bolya, Judy Hoffman
Adversarial Robustness in the Real World (AROW), ECCV 2020 (Poster)
NVIDIA Best Paper Runner Up
paper / video

Convolutional Neural Networks (CNNs) have been shown to be vulnerable to adversarial examples, which are known to locate in subspaces close to where normal data lies but are not naturally occurring and have low probability. In this work, we investigate the potential effect defense techniques have on the geometry of the likelihood landscape - likelihood of the input images under the trained model. We first propose a way to visualize the likelihood landscape by leveraging an energy-based model interpretation of discriminative classifiers. Then we introduce a measure to quantify the flatness of the likelihood landscape. We observe that a subset of adversarial defense techniques results in a similar effect of flattening the likelihood landscape. We further explore directly regularizing towards a flat landscape for adversarial robustness.

3DSP Learning to Balance Specificity and Invariance for In and Out of Domain Generalization
Prithvijit Chattopadhyay, Yogesh Balaji, Judy Hoffman
ECCV, 2020 (Poster)
paper / video / code

We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance. For domain generalization, the goal is to learn from a set of source domains to produce a single model that will best generalize to an unseen target domain. As such, many prior approaches focus on learning representations which persist across all source domains with the assumption that these domain agnostic representations will generalize well. However, often individual domains contain characteristics which are unique and when leveraged can significantly aid in-domain recognition performance. To produce a model which best generalizes to both seen and unseen domains, we propose learning domain specific masks. The masks are encouraged to learn a balance of domain-invariant and domain-specific features, thus enabling a model which can benefit from the predictive power of specialized features while retaining the universal applicability of domain-invariant features. We demonstrate competitive performance compared to naive baselines and state-of-the-art methods on both PACS and DomainNet.

3DSP Improving Generative Visual Dialog by Answering Diverse Questions
Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
EMNLP, 2019 (Poster); Visual Question Answering and Dialog Workshop, CVPR 2019 (Poster)
arxiv / code

While generative visual dialog models trained with self-talk based RL perform better at the associated downstream task, they suffer from repeated interactions -- resulting in saturation in improvements as the number of rounds increase. To counter this, we devise a simple auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and in turn enabling A-Bot to explore a larger state space during RL i.e., be exposed to more visual concepts to talk about, and varied questions to answer.

3DSP DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL
Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam
arxiv Preprint, 2019; Task-Agnostic RL (TARL) Workshop, ICLR 2019 (Poster)
arxiv / TARL'19 Preliminary Version
(Revised version accepted in IJCAI 2020!)

We learn to identify decision states, namely the parsimonious set of states where decisions meaningfully affect the future states an agent can reach in an environment. We utilize the VIC framework, which maximizes an agent's 'empowerment', i.e. the ability to reliably reach a diverse set of states -- and formulate a sandwich bound on the empowerment objective that allows identification of decision states. Unlike previous work, our decision states are discovered without extrinsic rewards -- simply by interacting with the world. Our results show that our decision states are: (1) often interpretable, and (2) lead to better exploration on downstream goal-driven tasks in partially observable environments.

3DSP EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra
arxiv Preprint, 2019
arxiv / code

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe.

3DSP Choose Your Neuron: Incorporating Domain-Knowledge through Neuron-Importance
Ramprasaath R. Selvaraju*, Prithvijit Chattopadhyay*, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee
ECCV, 2018 (Poster); Continual Learning Workshop, NeurIPS 2018 (Poster); Visually Grounded Interaction and Language (ViGIL) Workshop, NeurIPS 2018 (Poster)
arxiv / blogpost / code

We introduce a simple, efficient zero-shot learning approach -- NIWT -- based on the observation that individual neurons in CNNs have been shown to implicitly learn a dictionary of semantically meaningful concepts (simple textures and shapes to whole or partial objects). NIWT learns to map domain knowledge about "unseen" classes onto this dictionary of learned concepts and optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

3DSP Do explanation modalities make VQA Models more predictable to a human?
Arjun Chandrasekaran*, Viraj Prabhu*, Deshraj Yadav*, Prithvijit Chattopadhyay*, Devi Parikh
EMNLP, 2018 (Poster)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable 'explanations' of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model -- its responses as well as failures -- more predictable to a human.

3DSP Evaluating Visual Conversational Agents via Cooperative Human-AI Games
Prithvijit Chattopadhyay*, Deshraj Yadav*, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh
HCOMP, 2017 (Oral)
arxiv / code

We design a cooperative game - GuessWhich - to measure human-AI team performance in the specific context of the AI being a visual conversational agent. GuessWhich involves live interaction between the human and the AI and is designed to gauge the extent to which progress in isolated metrics for AI (& AI-AI teams) transfers to human-AI collaborative scenarios.

3DSP It Takes Two to Tango: Towards Theory of AI's Mind
Arjun Chandrasekaranu*, Deshraj Yadav*, Prithvijit Chattopadhyay*, Viraj Prabhu*, Devi Parikh
Chalearn Looking at People Workshop, CVPR, 2017 (Oral)
arxiv / code

To effectively leverage the progress in Artificial Intelligence (AI) to make our lives more productive, it is important for humans and AI to work well together in a team. In this work, we argue that for human-AI teams to be effective, in addition to making AI more accurate and human-like, humans must also develop a theory of AI's mind (ToAIM) - get to know its strengths, weaknesses, beliefs, and quirks.

3DSP Counting Everyday Objects in Everyday Scenes
Prithvijit Chattopadhyay*, Ramakrishna Vedantam*, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh
CVPR, 2017 (Spotlight)
arxiv / code

We study the numerosity of object classes in natural, everyday images and build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. We propose a contextual counting approach inspired by the phenomenon of subitizing - the ability of humans to make quick assessments of counts given a perceptual signal, for small count values.

(Design and CSS courtesy: Jon Barron and Amlaan Bhoi)