The harness is the product

Two years ago, the most important number in a product deck was which model you ran. That number is now close to noise. The top frontier models cluster within a few points of each other on the benchmarks that matter, and every one of them is a single API call away from every competitor you have. The strongest agents already live on the user's laptop. When the engine is a commodity, building your company around having the engine is building on sand.

So the question stops being "which model" and becomes "what did you put around it." That scaffolding has a name. It is the harness.

What a harness is

A harness is everything between the model and the world. Concretely, it is four decisions.

Tools: the exact set of actions the agent can take. Not "it has access to a computer," but the named, typed operations you expose and the ones you refuse to. A model that can shell out to anything is a different product than one that can call three audited functions.

Permissions: what each tool is allowed to touch. Sandboxing is a design surface, not a checkbox. You decide which directories are writable, which network calls resolve, which credentials are even in scope for a given run.

Grounding: how the agent's intentions get bound to real state. Models are fluent and confident about things that are not on the screen. Grounding is the machinery that forces a proposed action to resolve against the actual page, file, or record before it executes.

Output shape: what a run produces when it finishes. A transcript, a diff, a deterministic artifact you can replay without the model in the loop. The shape decides whether the work survives the session that made it.

Why the harness is where correctness lives

A great model with a sloppy harness ships garbage at high speed. The model will happily click a button that is not there, edit a file it imagined, or claim a test passed because the text "PASS" appeared in its own narration. None of that is a model failure. It is the absence of a constraint that should have made the bad action impossible.

The data backs this up. Veracode's 2025 GenAI Code Security Report found that AI-generated code introduced a security vulnerability in 45 percent of cases. You do not fix that by waiting for a smarter model. You fix it by building the harness that catches the bad output, grounds the action, and refuses to ship what it cannot verify. Correctness is a property of the system around the model, and that system is yours to engineer.

A worked example from our own work

Hover is a VS Code extension that drives a real Chrome over the Chrome DevTools Protocol. A developer types an instruction in plain language, and an agent operates the actual browser to carry it out. The interesting part is not the model. The interesting part is the cage we built around it.

The agent is sandboxed to a browser-control surface and nothing else. It cannot read your filesystem, run shell commands, or reach the network on its own. The one thing it can do is operate the page.

Its actions are grounded in the live DOM. When the agent decides to click "Add to cart," that intent resolves against a real element on the real page through a control surface that reads off the current snapshot. If the element is not there, the action does not happen. The selector that drives the click is the selector we keep, so what gets recorded is what actually ran.

Then the run crystallizes. A successful exploratory session becomes a plain @playwright/test spec written to disk. There is no AI in CI. The artifact is standard Playwright that any engineer can read, edit, and run on a machine that has never heard of an agent. The model authored the test. The harness made sure the test is real and the test outlives the model.

You can see the whole loop at gethover.dev.

What this means for building AI-native tools

If you are building on top of a frontier model, the model is your raw material, not your product. The work that nobody else can copy is the harness: the tool boundary, the permission model, the grounding layer, the artifact you hand back. That is real engineering, with the same discipline you would bring to any system where wrong answers have a cost.

This is what we mean when we say software rebuilt AI-native. It does not mean bolting a chat box onto an old product and calling the model when the user types. It means designing the constraints first, from line one, so that the most capable agent you can reach is also the safest and the most useful one. The models will keep improving and keep converging. The harness is where you get to be better than everyone else who can make the same API call.

The harness is the product

The harness is the product

What a harness is

Why the harness is where correctness lives

A worked example from our own work

What this means for building AI-native tools

Related Articles

AI authors it. You own and run it.

Building Hover: vibe-test your app, ship a real Playwright spec

AI-adjacent vs. AI-native: how to tell the difference