Selected Publications

(2026). An Efficient Training Pipeline for Reasoning Graphical User Interface Agents. In ICLR MMI Workshop.
(2025). Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users. In ACL.
(2025). CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts. In NAACL.
(2024). Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling. In EMNLP.
(2024). Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers. In NAACL.
(2024). Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation. In ACL ALVR.
(2023). Multitask Multimodal Prompted Training for Interactive Embodied Task Completion. In EMNLP.
(2022). Combine to Describe: Evaluating Compositional Generalization in Image Captioning. In 2022 ACL SRW.