Jebish Purbey | Projects

ML Agents Reasoning Benchmark

A benchmarking effort for LLM agent reasoning across 50 task categories, spanning coding, instruction following, mathematical reasoning, and tool use. The work centers on evaluation design, inference strategy comparison, and LLM-as-judge pipelines.

AgentsBenchmarkingPythonEvaluation Design

AgentPro + DSBC

At Traversaal.ai, I architected AgentPro, a REACT-based framework for complex data-science workflows, and contributed to DSBC, a benchmark for evaluating agent performance across eight task categories with explicit attention to context engineering and architectural sensitivity.

> Read DSBC > View AgentPro

03Active

AgentsData Science

South Asian LLMs

An ongoing multilingual language modeling effort focused on South Asian languages, with work spanning instruction tuning, capability evaluation, and community-grounded model development for languages that are still poorly served by mainstream LLMs.

> Project Page

04Ongoing

South Asian LLMsMultilingual NLP

Hate Speech Detection in Devanagari Languages

An ongoing project on hate speech detection, hate-target detection, and cross-lingual generalization in closely related Devanagari-script languages (Nepali-Hindi), centered on dataset curation with socio-cultural annotations and multilingual baseline analysis.

> See publications

05Ongoing

Hate SpeechLow-Resource NLP

Global PIQA

An extension of the PIQA physical reasoning benchmark to 100+ languages and cultures. The project exposes how commonsense reasoning shifts once evaluation leaves English, helping separate genuine reasoning from English familiarity and benchmark overfitting.

> arXiv

06Preprint

100+ LanguagesCommonsense

Built &
Maintained

Flagship Project

ML Agents Reasoning Benchmark

Research & Engineering

Mantra-14B

AgentPro + DSBC

South Asian LLMs

Hate Speech Detection in Devanagari Languages

Global PIQA

Built &Maintained

Flagship Project

ML Agents Reasoning Benchmark

Research & Engineering

Mantra-14B

AgentPro + DSBC

South Asian LLMs

Hate Speech Detection in Devanagari Languages

Global PIQA

Built &
Maintained