I am interested in developing
reliable computer vision systems
(including applications of
Machine Learning and
Reinforcement Learning) that
- can perceive and reason based on multimodal sensory information
- are interpretable so that predictions made by such systems can be explained
- are transferable so that they can be adapted across different domains with ease
and limited supervision
Representative papers are listed under Papers.
- Recognized as an outstanding reviewer
for CVPR 2021!
- Recognized to be among the top 33
percent reviewers for ICML 2020!
- Likelihood Landscapes received the NVIDIA Best
Runner Up paper award at AROW, ECCV 2020!
- Awarded the College of Computing's CS7001 Research
Award at Georgia Tech!
- Recognized as one of the highest-scoring reviewers for NeurIPS 2019!
- Recognized as an outstanding reviewer for
- Recognized to be among the top 30 percent highest scoring reviewers for NeurIPS 2018!
- Awarded the College of Computing's MS Research Award at Georgia Tech!
- Our team won VT Hacks 2017, a
Major League Hacking Event, 2017!
- Our undergraduate team, DTU-AUV, qualified for the semi-finals at AUVSI Robosub 2014!
- Awarded Merit Scholarships from 2012-2014 for undergraduate academic performance!
- Selected for KVPY and INSPIRE Fellowships, 2012 for
undergraduate studies in basic sciences!
- Placed among the top 1 percent students in the country in INPhO 2012!
- Selected for rigorous mathematical training camps conducted by mathematicians from Bhabha Atomic
Research Center (BARC) and Indian Institute of Science (IISc) in 2012!
- Selected for CSIR Programme on Youth Leadership in
- May 2020: I'll be interning in the PRIOR Team at Allen
Institute of Artificial Intelligence.
- Serving as a reviewer for CVPR 2018, ECCV 2018, NeurIPS 2018-19-20, ICLR 2019-20, ICML
2019-20, ACL 2019.
- May 2019: Presented our work on Discovery of Decision States through Intrinsic Control at the Task-Agnostic Reinforcement Learning (TARL) Workshop at ICLR
- May 2019: Completed my Masters in Computer Science (focus on Machine
Learning) with a thesis centered on Evaluating Visual Conversational Agents in
the Context of Human-AI Cooperative Games!
- Feb 2019: Our technical report describing EvalAI - an
open source platform to evaluate and compare AI algorithms at scale - is out on ArXiv!
- Dec 2018: Presented our work interpretable zero-shot
learning at the Continual Learning and Visually Grounded Interaction and Language (ViGIL)
workshops at NeurIPS 2018.
- Aug 2018: Our paper titled 'Do explanation
modalities make VQA models more predictable to a human?' was accepted in EMNLP, 2018!
- Jul 2018: Our paper titled 'Choose Your Neuron:
Incorporating Domain Knowledge through Neuron-Importance' was accepted in ECCV, 2018!
- Apr 2018: I was awarded the College of Computing's MS Research Award at Georgia Tech!
- Feb 2018: I'll intern in the Deep Learning Group at Microsoft Research,
Redmond in summer 2018.
- Aug 2017: Our paper titled 'Evaluating Visual
Conversational Agents via Cooperative Human-AI Games' was accepted in HCOMP 2017 as an oral!
- Jul 2017: I will be joining Georgia Tech as a Masters of Computer Science student in Fall 2017.
- May 2017: I will be presenting 'Counting Everyday
Objects in Everyday Scenes' at the LDV Vision
- Mar 2017: Our paper titled 'It Takes Two to Tango:
Towards Theory of AI's Mind' is out on ArXiv!
- Feb 2017: Our paper titled 'Counting Everyday Objects
in Everyday Scenes' was accepted in CVPR 2017 as a
- Feb 2017: Our team built FilterAI - an image
retrival engine - and won VT Hacks 2017, a Major League Hacking Event!
- Dec 2016: 'Counting Everyday Objects in Everyday
Scenes' received an
Amazon Academic Research Award, 2016!
Papers (* joint first authors)
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
arxiv Preprint, 2021
As an attempt towards assessing the robustness of embodied navigation agents, we
propose RobustNav, a framework to quantify the performance of embodied navigation agents when
exposed to a wide variety of visual – affecting RGB inputs – and dynamics – affecting transition
dynamics – corruptions. Most recent efforts in visual navigation have typically focused on
generalizing to novel target environments with similar appearance and dynamics characteristics. With
RobustNav, we find that some standard embodied navigation agents significantly underperform (or
fail) in the presence of visual or dynamics corruptions. We systematically analyze the kind of
idiosyncrasies that emerge in the behavior of such agents when operating under corruptions. Finally,
for visual corruptions in RobustNav, we show that while standard techniques to improve robustness
such as data-augmentation and self-supervised adaptation offer some zero-shot resistance and
improvements in navigation performance, there is still a long way to go in terms of recovering lost
performance relative to clean “non-corrupt” settings, warranting more research in this direction.
Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses
Adversarial Robustness in the Real World (AROW), ECCV 2020
NVIDIA Best Paper Runner Up
Convolutional Neural Networks (CNNs) have been shown to be vulnerable to
adversarial examples, which are known to locate in subspaces close to where normal data lies but are
not naturally occurring and have low probability. In this work, we investigate the potential effect
defense techniques have on the geometry of the likelihood landscape - likelihood of the input images
under the trained model. We first propose a way to visualize the likelihood landscape by leveraging
an energy-based model interpretation of discriminative classifiers. Then we introduce a measure to
quantify the flatness of the likelihood landscape. We observe that a subset of adversarial defense
techniques results in a similar effect of flattening the likelihood landscape. We further explore
directly regularizing towards a flat landscape for adversarial robustness.
Learning to Balance Specificity and Invariance for In and Out of Domain Generalization
ECCV, 2020 (Poster)
We introduce Domain-specific Masks for Generalization, a model for improving both
in-domain and out-of-domain generalization performance. For domain generalization, the goal is to
learn from a set of source domains to produce a single model that will best generalize to an unseen
target domain. As such, many prior approaches focus on learning representations which persist across
all source domains with the assumption that these domain agnostic representations will generalize
well. However, often individual domains contain characteristics which are unique and when leveraged
can significantly aid in-domain recognition performance. To produce a model which best generalizes
to both seen and unseen domains, we propose learning domain specific masks. The masks are encouraged
to learn a balance of domain-invariant and domain-specific features, thus enabling a model which can
benefit from the predictive power of specialized features while retaining the universal
applicability of domain-invariant features. We demonstrate competitive performance compared to naive
baselines and state-of-the-art methods on both PACS and DomainNet.
Improving Generative Visual Dialog by Answering Diverse Questions
EMNLP, 2019 (Poster); Visual Question Answering and Dialog Workshop,
While generative visual dialog models trained with self-talk based RL perform
better at the associated downstream task, they suffer from repeated interactions -- resulting in
saturation in improvements as the number of rounds increase. To counter this, we devise a simple
auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and
in turn enabling A-Bot to explore a larger state space during RL i.e., be exposed to more visual
concepts to talk about, and varied questions to answer.
DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL
arxiv Preprint, 2019; Task-Agnostic RL (TARL)
Workshop, ICLR 2019
(Revised version accepted in IJCAI 2020!)
We learn to identify decision states, namely the parsimonious set of states where
decisions meaningfully affect the future states an agent can reach in an environment. We utilize the
VIC framework, which maximizes an agent's 'empowerment', i.e. the ability to reliably reach a
diverse set of states -- and formulate a sandwich bound on the empowerment objective that allows
identification of decision states. Unlike previous work, our decision states are discovered
without extrinsic rewards -- simply by interacting with the world. Our results show that our
decision states are: (1) often interpretable, and (2) lead to better exploration on downstream
goal-driven tasks in partially observable environments.
EvalAI: Towards Better Evaluation Systems for AI Agents
Shiv Baran Singh,
arxiv Preprint, 2019
We introduce EvalAI, an open source platform for evaluating and comparing machine
learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a
scalable solution to the research community to fulfill the critical need of evaluating machine
learning models and agents acting in an environment against annotations or with a human-in-the-loop.
This will help researchers, students, and data scientists to create, collaborate, and participate in
AI challenges organized around the globe.
Choose Your Neuron: Incorporating Domain-Knowledge through Neuron-Importance
Ramprasaath R. Selvaraju*,
ECCV, 2018 (Poster); Continual Learning Workshop,
(Poster); Visually Grounded
Interaction and Language (ViGIL) Workshop, NeurIPS 2018
We introduce a simple, efficient zero-shot learning approach -- NIWT -- based on
the observation that individual neurons in CNNs have been shown to implicitly learn a dictionary of
semantically meaningful concepts (simple textures and shapes to whole or partial objects). NIWT
learns to map domain knowledge about "unseen" classes onto this dictionary of learned concepts and
optimizes for network parameters that can effectively combine these concepts - essentially learning
classifiers by discovering and composing learned semantic concepts in deep networks.
Do explanation modalities make VQA Models more predictable to a human?
EMNLP, 2018 (Poster)
A rich line of research attempts to make deep neural networks more transparent by
generating human-interpretable 'explanations' of their decision process, especially for interactive
tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed
make a VQA model -- its responses as well as failures -- more predictable to a human.
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
HCOMP, 2017 (Oral)
We design a cooperative game - GuessWhich - to measure human-AI team performance in
the specific context of the AI being a visual conversational agent. GuessWhich involves live
interaction between the human and the AI and is designed to gauge the extent to which progress in
isolated metrics for AI (& AI-AI teams) transfers to human-AI collaborative scenarios.
It Takes Two to Tango: Towards Theory of AI's Mind
Chalearn Looking at People Workshop, CVPR, 2017 (Oral)
To effectively leverage the progress in Artificial Intelligence (AI) to make our
lives more productive, it is important for humans and AI to work well together in a team. In this
work, we argue that for human-AI teams to be effective, in addition to making AI more accurate and
human-like, humans must also develop a theory of AI's mind (ToAIM) - get to know its strengths,
weaknesses, beliefs, and quirks.
Counting Everyday Objects in Everyday Scenes
Ramprasaath R. Selvaraju,
CVPR, 2017 (Spotlight)
We study the numerosity of object classes in natural, everyday images and build
dedicated models for counting designed to tackle the large variance in counts, appearances, and
scales of objects found in natural scenes. We propose a contextual counting approach inspired by the
phenomenon of subitizing - the ability of humans to make quick assessments of counts given a
perceptual signal, for small count values.