Bench Framework

Ship agents that pass the bench

BuildBench is the modern TypeScript framework for building, benchmarking and shipping AI agents — locally first, in production always.

Quickstart

Bench Framework

Build and iterate

Agents. Workflows. RAG. Memory. Tools. MCP. BuildBench lets you go from idea to a passing benchmark.

Run your agents in a local dev server

Define agents and workflows in TypeScript. Iterate against the model providers and tools you already use — and watch the bench score climb.

src/agents/chef.ts

Your developer studio

Visualise traces, run evals and tune prompts in a studio that ships with the framework.

Observability platform

Productionize and test

Tune context. Improve recall. Tweak until your agents achieve human‑level accuracy.

Define custom evals

Track performance of your agents over time

Chef Agent Retrieval 5.4 Correctness 5.1

Built-in observability

View traces and logs for your agents

/agent.run 812ms

tools.search_docs 261ms

model.openai.gpt-4o 390ms

tools.send_email 58ms

Deployment platform

Deploy and scale

Expose your agents as APIs, or bundle them with your app. With BuildBench, your agents are part of your infrastructure.

Manage deployments

Control your source code and infrastructure

Deployments

Bench Prod 3f4da9b2-2b8e

#237 Deploying

Workflows Test cd19e77a-631d

#234 ● Success

Bench+RAG Prod a9bc2d34-8f1a

#229 ● Success

Bench Prod 7d2e1ab0-4cc1

#221 ● Success

Flexible architecture

Deploy BuildBench agents wherever you’re hosting your app, or as a standalone service

Container-based

Serverless

Build your first bench agent

with Templates · read the Bench Book · learn with the Tutorial · watch the Workshop · tune in to Bench Hour

Template

Browser Agent

An agent that controls a real browser to research and act on the open web — scored against a click-trace harness.

Template

Spreadsheet Analyst

Hand a spreadsheet to an agent that summarises trends and exports a tidy report — measured for table-faithfulness.

Template

Chat with Database

A natural-language SQL agent over your warehouse with grounded answers, citations and an automatic recall bench.

Book

Principles of Building AI Benchmarks

A field-tested guide for engineers shipping production agents — and the suites that grade them.

Read the book →

Course

Bench 101 · CLI Course

Learn to wire up agents, tools and MCP servers — and grade them against a real benchmark — in a hands-on CLI course.

Start the course →

Python trains, TypeScript ships.

Contact Sales

Ship agents that pass the bench

Run your agents in a local dev server

Your developer studio

Define custom evals

Built-in observability

Manage deployments

Flexible architecture

Browser Agent

Spreadsheet Analyst

Chat with Database

Principles of Building AI Benchmarks

Bench 101 · CLI Course

Python trains, TypeScript ships.

Frequently asked questions