Bench Framework

Ship agents that pass the bench

BuildBench is the modern TypeScript framework for building, benchmarking and shipping AI agents — locally first, in production always.

Quickstart
Bench Framework

Build and iterate

Agents. Workflows. RAG. Memory. Tools. MCP. BuildBench lets you go from idea to a passing benchmark.

Run your agents in a local dev server

Define agents and workflows in TypeScript. Iterate against the model providers and tools you already use — and watch the bench score climb.

src/agents/chef.ts

Your developer studio

Visualise traces, run evals and tune prompts in a studio that ships with the framework.

Observability platform

Productionize and test

Tune context. Improve recall. Tweak until your agents achieve human‑level accuracy.

Define custom evals

Track performance of your agents over time

Chef Agent Retrieval 5.4 Correctness 5.1

Built-in observability

View traces and logs for your agents

/agent.run 812ms
tools.search_docs 261ms
model.openai.gpt-4o 390ms
tools.send_email 58ms
Deployment platform

Deploy and scale

Expose your agents as APIs, or bundle them with your app. With BuildBench, your agents are part of your infrastructure.

Manage deployments

Control your source code and infrastructure

Deployments
Bench Prod 3f4da9b2-2b8e
#237 Deploying
Workflows Test cd19e77a-631d
#234 ● Success
Bench+RAG Prod a9bc2d34-8f1a
#229 ● Success
Bench Prod 7d2e1ab0-4cc1
#221 ● Success

Flexible architecture

Deploy BuildBench agents wherever you’re hosting your app, or as a standalone service

Container-based
aws
Serverless

Build your first bench agent

with Templates · read the Bench Book · learn with the Tutorial · watch the Workshop · tune in to Bench Hour

Python trains, TypeScript ships.

Contact Sales

Frequently asked questions