ML Agents Community Lead · Cohere Labs

Jebish
Purbey

NLP Researcher, AI Engineer

I study language models through evaluation, multilingual and multimodal settings, and deployment-grounded NLP systems. Much of my work asks where current evaluations overstate model performance, how behavior shifts across language and culture, and what those patterns tell us about the systems we are building.

12Publications
20+Research Collaborators
$20K+Research Grants

// 02   Current Affiliations

Where I Work Right Now

June 2025 -- Present

Cohere Labs

ML Agents Community Lead

Working on large-scale evaluation of LLM reasoning across 50 task categories, with emphasis on LLM-as-judge pipelines and inference design across multiple reasoning strategies.

Sep 2024 -- Present

ZeroGrad.ai

Researcher

Leading multilingual and multimodal NLP projects including South Asian LLMs, Mantra-14B, regulatory QA, and AI-generated text detection with $20,000+ in grants, with emphasis on model behavior shifts across settings and languages.

June 2025 -- Present

Dogma Group

AI Engineer

Building production RAG, OCR, and agentic systems for compliance review, feasibility analysis, and fleet-management workflows, and exploring novel applications of LLMs and vision in operational settings.

// 03   Research Focus

What I Investigate

Studying language model behavior, evaluation, and real-world generalization across language, culture, and deployment settings.

Globe from space representing global multilingual reach

// [01]

Language Model Evaluation & Real-World Generalization

I work on evaluation as a way to understand language model capabilities and limits, especially when benchmark performance does not translate to actual use. Recent work spans multilingual reasoning, local knowledge, physical commonsense, LLM-as-judge pipelines, and broad-coverage vision evaluation.

EvaluationLM BehaviorReal-World Performance
> Learn more
Abstract visualization representing cultural representation in models

// [02]

Model Behavior Across Language, Culture & Modality

I often work on how model behavior changes across languages, cultures, and interfaces. This includes multilingual language modeling through SA-LLMs (Ongoing) and Mantra-14B, multilingual machine-generated text detection across 23 languages and 12 generators, and analysis work on cultural representation disparities in vision-language models.

Capabilities & LimitsCultural RepresentationVLMs
> Learn more
> All Research Areas

// 04   Featured Work

Featured Publication

Research books and libraryIJCNLP-AACL 2025 · Findings
IJCNLP-AACL 2025 · Findings

Uncovering Cultural Representation Disparities in Vision-Language Models

We evaluate how vision-language models represent cultural knowledge across more than 200 countries, showing where current systems still inherit narrow cultural frames.

Jebish Purbey, Ram Mohan Rao Kadiyala, Siddhant Gupta, Srishti Yadav, Suman Debnath, Alejandro Salamanca, Desmond Elliott

> Read Paper> All Publications

// 05   Writing

View all writing
3+Years of NLP Research Experience
45+Languages Evaluated
10+Research Projects
// Research Philosophy

I am interested in language models as systems we still do not fully understand. Evaluation, for me, is a way to study behavior, uncover limits, and test whether apparent capability survives real use.