A 4B agent that researches like a frontier model.

TechAI Architecture

Simplex AI Research

06/26/2026 · 8 min read

Simplex AI Research · June 2026 · 8 min read

Announcing LiteResearcher · 2026

LiteResearcher-4B is the new open-source state of the art for deep research — 71.3 on GAIA and 78.0 on Xbench-DeepSearch. It beats open agents up to 8x its size, matches Claude-4.5-Sonnet on GAIA, and tops GPT-5-high on Xbench — at 10-46x the speed and a fraction of the cost.

Xbench-DeepSearch accuracy (%) — higher is better

Model	Score
LiteResearcher-4B	78.0
GPT-5-high	77.8
Tongyi DeepResearch	75.0
GLM-4.6	70.0
Claude-4.5-Sonnet	66.0

The only sub-30B model in the frontier band — a 4B agent topping GPT-5-high on Xbench-DeepSearch.

Number	Label	Detail
78.0	Xbench-DeepSearch	#1 — tops GPT-5-high
71.3	GAIA	matches Claude-4.5-Sonnet
4B	parameters	runs on a single GPU
10-46x	faster & cheaper	lower latency per turn

The Short Version

Deep research is the most valuable thing an agent can do — read the live web, reason across dozens of sources, and come back with an answer you can actually trust. Until now, that meant a giant model and a giant bill.

LiteResearcher-4B changes the math. Frontier-grade research quality, in a model small enough to run anywhere and cheap enough to put in front of every user — the same answers as the frontier, at a fraction of the size, latency, and cost.

The Highlights — Four Reasons It Turns Heads.

Beats GPT-5-high on Xbench

LiteResearcher-4B posts 78.0% on Xbench-DeepSearch — the highest of any open-source agent, edging out OpenAI's GPT-5-high (77.8%). On GAIA it reaches 71.3%, matching Claude-4.5-Sonnet (71.2%).

Punches 8x Above Its Weight

At just 4B parameters it outperforms open-source deep research agents up to 8x larger, like Tongyi DeepResearch 30B (70.9 / 75.0) — on a model that runs on a single GPU.

10-46x Faster Rollouts in RL Training

Our local environment runs RL rollouts 10-46x faster at a fraction of the cost per turn — the throughput that made training at this scale possible, and the serving muscle that keeps inference fast in production.

Open Weights, Ready to Ship

The model, data, and framework are fully open — download the 4B weights and ship frontier-grade deep research today.

"A 4B model that tops GPT-5-high on Xbench-DeepSearch and matches Claude-4.5-Sonnet on GAIA — while running an order of magnitude faster and cheaper. This is what frontier looks like when it's small."

The Receipts — A 4B Model in the Frontier Band.

Open-source SOTA across the deep-research benchmarks that matter — best open numbers on GAIA, WebWalker and Xbench-DeepSearch (78.0, edging out GPT-5-high), and competitive across the board. Every score below is reported in the paper, measured against each baseline's published numbers.

Model	Params	GAIA	Frames	HLE	BrowseComp	WebWalker	Xbench-DS
LiteResearcher-4B	4B	71.3	83.1	22.0	27.5	72.7	78.0
OpenAI GPT-5-high	frontier	76.4	-	35.2	54.9	-	77.8
Claude-4.5-Sonnet	frontier	71.2	85.0	24.5	19.6	-	66.0
GLM-4.6	frontier	71.9	-	30.4	45.1	-	70.0
DeepSeek-V3.2	frontier	63.5	80.2	40.8	67.6	-	71.0
Tongyi DeepResearch	30B	70.9	90.6	32.9	43.4	72.2	75.0
AgentCPM-Explore	4B	63.9	82.7	19.1	24.1	68.1	70.0

Accuracy / pass-rate (%) as reported in the LiteResearcher paper (Table 1), evaluated under a shared tool setup; baseline numbers from each model's official report. "-" = not reported. The full eight-benchmark table and ablations are in the paper.

How We Built It — A Virtual World for Agentic RL.

The reason a 4B model punches at frontier weight isn't a bigger network — it's where it trains. Scaling agentic RL on the live web is slow, unstable, and expensive: every rollout hits real search APIs, so noise and cost compound with each step.

So we rebuilt the environment. LiteResearcher trains entirely inside a local search/browse world that mirrors real-web dynamics — same partial relevance, noisy snippets, and multi-hop chains, but with no live API in the loop. From a Qwen3-4B-Thinking start, a short SFT stage teaches tool use, then on-policy RL climbs a difficulty curriculum across 700+ stable steps — 73.2M tool calls at zero marginal cost, 10-46x faster rollouts, and steady gains instead of collapse. Full method and ablations are in the paper.

Built at Simplex AI — Small Models, Production Speed.

LiteResearcher didn't come out of nowhere. At Simplex AI we run a mature, high-throughput serving stack purpose-built for small, fast models — the very infrastructure that drove RL rollouts 10-46x faster, powered 73.2M tool calls at zero marginal cost, and keeps inference fast in production today.

Across agentic search, retrieval, and on-device assistants, our team has shipped small-model systems into real deployment scenarios where latency and cost decide whether AI ships at all. LiteResearcher-4B is that playbook applied to deep research: frontier quality, at a size and price you can serve at scale.

Capability	Description
High-throughput serving	A stack tuned for small models — the 10-46x edge that made this training run possible.
Production-grade latency	Sub-second tool calls and rollouts, engineered for real-time agentic workloads.
Deployed at scale	Small-model systems already running in live, cost-sensitive product scenarios.

73.2M search & browse calls to train LiteResearcher-4B

Option	Cost	Detail
On commercial search APIs	$59K-$243K	Serper · SerpAPI · Jina, at list price
On Simplex AI's local stack	$0	zero marginal cost · 100% saved

From Benchmark to Product — The Missing Engine for Agentic Automation.

The future of work won't be built on prompts — it'll be built on agents that search, reason, plan, and act continuously. That takes frontier-level deep research that is fast, affordable, and massively parallel. LiteResearcher is that engine — and the foundation of lev8, Simplex AI's go-to-market platform.

lev8's core is Parallel Agentic Search: every task fires hundreds of deep-research agents that read, reason, and synthesize across the live web. That demands a model that is strong, fast, cheap, and reliable all at once — frontier models were too slow and too costly to make it real. A 4B agent at frontier quality changes the physics, making always-on, massively parallel agents economically viable.

lev8

Simplex AI's go-to-market platform

Parallel agentic search — hundreds of deep-research agents fan out across the live web per query, viable only when every agent is tiny, fast, and cheap.
Company & people deep search — frontier-grade research on any organization or person, synthesized into a 360° profile.
Live-web synthesis at production cost — human-level depth at machine-level breadth, at unit economics that actually ship.

Most products use AI. lev8 is built around it — not a tool, but a teammate.

Explore lev8

Frontier Deep Research, in a 4B Model.

The model, data, and framework are open. Read the paper, grab the weights, or watch the agent work.

References & Sources — Every Number, Sourced.

LiteResearcher

Benchmarks

Baselines

Ecosystem

Simplex AI Research

Research Team

Simplex AI Research builds and evaluates small, fast agentic AI systems for deep research, search, and production automation.

Agentic AIDeep ResearchSmall ModelsAI Infrastructure

A 4B agent that researches like a frontier model.

The Short Version

The Highlights — Four Reasons It Turns Heads.

Beats GPT-5-high on Xbench

Punches 8x Above Its Weight

10-46x Faster Rollouts in RL Training

Open Weights, Ready to Ship

The Receipts — A 4B Model in the Frontier Band.

How We Built It — A Virtual World for Agentic RL.

Built at Simplex AI — Small Models, Production Speed.

From Benchmark to Product — The Missing Engine for Agentic Automation.

Frontier Deep Research, in a 4B Model.

References & Sources — Every Number, Sourced.

LiteResearcher

Benchmarks

Baselines

Ecosystem

Simplex AI Research

Replace your entireGTM Stack.

Replace your entireGTM Stack.

A 4B agent that researches like a frontier model.

The Short Version

The Highlights — Four Reasons It Turns Heads.

Beats GPT-5-high on Xbench

Punches 8x Above Its Weight

10-46x Faster Rollouts in RL Training

Open Weights, Ready to Ship

The Receipts — A 4B Model in the Frontier Band.

How We Built It — A Virtual World for Agentic RL.

Built at Simplex AI — Small Models, Production Speed.

From Benchmark to Product — The Missing Engine for Agentic Automation.

Frontier Deep Research, in a 4B Model.

References & Sources — Every Number, Sourced.

LiteResearcher

Benchmarks

Baselines

Ecosystem

Simplex AI Research

Related Reading

Replace your entireGTM Stack.

Replace your entireGTM Stack.