NLP Researcher, AI Engineer
I study language models through evaluation, multilingual and multimodal settings, and deployment-grounded NLP systems. Much of my work asks where current evaluations overstate model performance, how behavior shifts across language and culture, and what those patterns tell us about the systems we are building.
// 02 Current Affiliations
June 2025 -- Present
ML Agents Community Lead
Working on large-scale evaluation of LLM reasoning across 50 task categories, with emphasis on LLM-as-judge pipelines and inference design across multiple reasoning strategies.
Sep 2024 -- Present
Researcher
Leading multilingual and multimodal NLP projects including South Asian LLMs, Mantra-14B, regulatory QA, and AI-generated text detection with $20,000+ in grants, with emphasis on model behavior shifts across settings and languages.
June 2025 -- Present
AI Engineer
Building production RAG, OCR, and agentic systems for compliance review, feasibility analysis, and fleet-management workflows, and exploring novel applications of LLMs and vision in operational settings.
// 03 Research Focus
Studying language model behavior, evaluation, and real-world generalization across language, culture, and deployment settings.

// [01]
I work on evaluation as a way to understand language model capabilities and limits, especially when benchmark performance does not translate to actual use. Recent work spans multilingual reasoning, local knowledge, physical commonsense, LLM-as-judge pipelines, and broad-coverage vision evaluation.

// [02]
I often work on how model behavior changes across languages, cultures, and interfaces. This includes multilingual language modeling through SA-LLMs (Ongoing) and Mantra-14B, multilingual machine-generated text detection across 23 languages and 12 generators, and analysis work on cultural representation disparities in vision-language models.
// 04 Featured Work
IJCNLP-AACL 2025 · FindingsWe evaluate how vision-language models represent cultural knowledge across more than 200 countries, showing where current systems still inherit narrow cultural frames.
Jebish Purbey, Ram Mohan Rao Kadiyala, Siddhant Gupta, Srishti Yadav, Suman Debnath, Alejandro Salamanca, Desmond Elliott
I am interested in language models as systems we still do not fully understand. Evaluation, for me, is a way to study behavior, uncover limits, and test whether apparent capability survives real use.