Visual Question Answering from Image Sets

Research Project, Carnegie Mellon University, LTI, 2020

  • Adapted a Transformer based VQA model (LXMERT) for the task of Image-Set VQA and developed an Adversarial Regularization method to reduce dependance on language biases & improve performance on out-of-domain data
  • Introduced a new pre-training objective which utilized object bounding boxes extracted from an RCNN to improve model performance on object description questions by 4%
Direct Link