ML Agents Community Lead · Cohere Labs

Jebish
Purbey

NLP Researcher, AI Engineer

I study language models through evaluation, multilingual and multimodal settings, and deployment-grounded NLP systems. Much of my work asks where current evaluations overstate model performance, how behavior shifts across language and culture, and what those patterns tell us about the systems we are building.

View Research > Publications

12Publications

20+Research Collaborators

$20K+Research Grants

[email protected]linkedin.com/in/jebish github.com/jebish

// 02 Current Affiliations

Where I Work Right Now

June 2025 -- Present

Cohere Labs

ML Agents Community Lead

Working on large-scale evaluation of LLM reasoning across 50 task categories, with emphasis on LLM-as-judge pipelines and inference design across multiple reasoning strategies.

Sep 2024 -- Present

ZeroGrad.ai

Researcher

Leading multilingual and multimodal NLP projects including South Asian LLMs, Mantra-14B, regulatory QA, and AI-generated text detection with $20,000+ in grants, with emphasis on model behavior shifts across settings and languages.

June 2025 -- Present

Dogma Group

AI Engineer

Building production RAG, OCR, and agentic systems for compliance review, feasibility analysis, and fleet-management workflows, and exploring novel applications of LLMs and vision in operational settings.

// 03 Research Focus

What I Investigate

Studying language model behavior, evaluation, and real-world generalization across language, culture, and deployment settings.

Globe from space representing global multilingual reach

// [01]

Language Model Evaluation & Real-World Generalization

I work on evaluation as a way to understand language model capabilities and limits, especially when benchmark performance does not translate to actual use. Recent work spans multilingual reasoning, local knowledge, physical commonsense, LLM-as-judge pipelines, and broad-coverage vision evaluation.

EvaluationLM BehaviorReal-World Performance

> Learn more

Abstract visualization representing cultural representation in models

// [02]

Model Behavior Across Language, Culture & Modality

I often work on how model behavior changes across languages, cultures, and interfaces. This includes multilingual language modeling through SA-LLMs (Ongoing) and Mantra-14B, multilingual machine-generated text detection across 23 languages and 12 generators, and analysis work on cultural representation disparities in vision-language models.

Capabilities & LimitsCultural RepresentationVLMs

> Learn more

> All Research Areas

// 04 Featured Work

Featured Publication

IJCNLP-AACL 2025 · Findings

Uncovering Cultural Representation Disparities in Vision-Language Models

We evaluate how vision-language models represent cultural knowledge across more than 200 countries, showing where current systems still inherit narrow cultural frames.

Jebish Purbey, Ram Mohan Rao Kadiyala, Siddhant Gupta, Srishti Yadav, Suman Debnath, Alejandro Salamanca, Desmond Elliott

> Read Paper > All Publications

// 05 Writing

Mar 12, 202610 min readKathmandu Diary

Researching from Kathmandu Is Not an Abstract Idea to Me

> Read article

Jan 15, 20265 min readDSBC

What DSBC Taught Me About Data Science Agents

> Read article

Dec 20, 20257 min readIJCNLP-AACL 2025

Why I Wanted to Test VLMs Country by Country

> Read article

View all writing

3+Years of NLP Research Experience

45+Languages Evaluated

10+Research Projects

// Research Philosophy

I am interested in language models as systems we still do not fully understand. Evaluation, for me, is a way to study behavior, uncover limits, and test whether apparent capability survives real use.

JebishPurbey

Where I Work Right Now

Cohere Labs

ZeroGrad.ai

Dogma Group

What I Investigate

Language Model Evaluation & Real-World Generalization

Model Behavior Across Language, Culture & Modality

Featured Publication

Uncovering Cultural Representation Disparities in Vision-Language Models

Researching from Kathmandu Is Not an Abstract Idea to Me

What DSBC Taught Me About Data Science Agents

Why I Wanted to Test VLMs Country by Country

Jebish
Purbey