Agentic Finance #05 Deep Dive:Why Finance Needs a Different AI Harness: From Throughput to Tail Risk
April 30, 2026
Tech companies are spending tens of billions of dollars pushing the frontier of model capability: larger parameter counts, higher benchmark scores, fewer hallucinations. The assumption underneath is simple: once the model is strong enough, it can take over real-world workflows. That assumption may hold in controlled environments. But in financial systems, where tolerance for error approaches zero, the question is not how smart the model is. It is how much risk the system around it can absorb.
For the past two years, the conversation around AI agents has mostly been a conversation about models. Reasoning ability, coding performance, benchmark position. Which model is smarter, which one follows instructions better, which one hallucinates less.
That framing misses the thing that matters in production.
A model is not a system. It is one component inside an execution loop. Between what a model can theoretically do and what a system can reliably produce sits an entire layer of engineering: prompts, tools, memory, runtime environments, validation, and feedback.
That layer is often the difference between a demo and a product.
Viv Trivedy, an engineer at LangChain, has a simple way to describe it:
Agent = Model + Harness
The model determines the lower bound of capability. The harness determines the upper bound of execution.
You see this in practice. The same Claude Opus model can look mediocre in a native setup and become dramatically more capable inside a better harness. In Viv’s example, changing the surrounding system moved the model from outside the top 30 to around the top 5.
HumanLayer makes the same point in its piece on harness engineering: many agent failures are not model failures. They are configuration failures.
How the task is decomposed, how state is preserved, how tools are exposed, how outputs are checked—these design choices often matter more than the model itself. What looks like intelligence at the surface is frequently coordination underneath.
What the coding harness solves
The answer shows up in where agents work today. The most successful ones are concentrated in software engineering. That is not because code is easy. It is because code is an environment where the agent can keep cycling.

From an engineering perspective, the problem coding harnesses solve is not intelligence. It is continuity. A model can reason through a step, but it does not naturally sustain a process. It forgets context, loses state, produces unstable outputs, and sometimes stops before the work is done. The model is less like a worker and more like a component that fires intermittently. Useful, but not enough to carry a long task by itself.
The harness turns that intermittent component into a loop. It gives the model somewhere to store state, a way to manage context, an environment to execute code, a mechanism to catch errors, and a control structure that keeps the task moving. Run the code. Capture the failure. Feed the result back. Adjust the plan. Try again. The model supplies reasoning inside the loop; the harness keeps the loop alive.
That is where throughput comes from. Not from a single better answer, but from linking execution, feedback, and memory into a process that can run longer than any one model call. What used to be a stop-and-start interaction between a human and a model becomes a self-propelling workflow.
This also explains why so much of the harness looks like overhead. Sandboxes, validation steps, execution environments, task planners, loop controls. These are not accessories. They are the machinery that removes the human from the middle of the process. In a continuous loop, every manual approval is a stall. Every stall reduces throughput.
The goal is not simply to make the model smarter. It is to make the system less dependent on human intervention. Once execution, feedback, and memory are connected, the agent can keep working: maintaining context, correcting errors, and pushing the task forward over a longer horizon than the model could manage alone.
From Coding Harness to Financial Harness
In software engineering, this design paradigm rests on an implicit assumption: errors are reversible.
Systems can restart, code can be patched, and state can be restored. Under these conditions, the optimal strategy is rapid iteration—execute, fail, fix, and repeat. A coding harness effectively creates a sandbox where errors are tolerated, turning the model’s intermittent reasoning into a continuous loop capable of sustaining long-horizon tasks. Throughput improves because the system can absorb and learn from failure.
This assumption does not hold in financial systems.
Once a transaction is signed and broadcast, it enters an irreversible settlement process. There is no safe retry, no rollback to a previous state. Errors are no longer part of the process—they are the outcome. Mechanisms that enhance robustness in software, especially retries, take on a different character here: each additional attempt introduces new risk exposure, along with accumulating execution costs such as gas fees.
In software, Fail Fast allows systems to converge toward optimal solutions through trial and error. In financial environments, the priority shifts toward minimizing the probability of error in the first place.
As a result, the design of a financial harness centers on eliminating unsafe paths at the system level—removing them entirely from the agent’s available action space. Rather than relying on post-hoc correction, the system reduces uncertainty at the source, simplifying the agent’s operating environment as much as possible.
Only when hard safety boundaries are enforced upfront and embedded into the infrastructure can the model stop acting as a risk manager. It no longer needs to reason about whether a transaction is permissible. Instead, within a strictly controlled environment, it can focus on what it does best: timing execution, optimizing parameters, and pursuing strategy-level gains.
Financial Harness: Designed Around Constraint
Within a constraint-driven framework, the purpose of a financial harness shifts. It is no longer primarily about enhancing execution, but about bounding it.
Take Cobo’s Agentic Wallet as an example. Its core design compresses the agent’s error surface through a three-layer structure, ensuring that any output is already confined within safe limits before it reaches an irreversible stage.
1. Recipes: Defining the Action Space at the Source
The first step is to narrow what can be done.
In CAW, all financial operations are packaged into audited, immutable Recipes. By restricting execution to predefined paths, the system reduces the agent’s behavior from open-ended construction to a finite set of choices. In practice, the model can reason freely at the strategy level, but execution at the protocol layer must remain strictly within verified boundaries.
2. Dynamic Pacts: From Persistent Access to Ephemeral Permission
The second layer operates at the level of permissions.
Traditional systems tend to grant agents continuous access to funds—for example, a wallet with a persistent balance. This design is efficient, but it carries ongoing risk.
CAW replaces this with task-specific Pacts. Each Pact is generated based on context—defined jointly by the selected Recipe and user intent—and explicitly scopes amount, range, and time window. Once the task is complete, the permission expires. Access becomes ephemeral rather than persistent, and risk is correspondingly localized.
3. Intent Alignment: Filtering “Valid but Wrong” Transactions
A transaction can be technically valid—correctly signed and sufficiently funded—yet still violate the user’s intent.
To address this, CAW introduces a human-in-the-loop mechanism. Before funds are transferred, the system translates low-level call data into clear, human-readable language and delivers it to the user interface. Execution proceeds only after explicit user approval.
The agent proposes; the user decides. A transaction occurs only when both align.
Financial Harness as a Service (FHaaS)
As model capabilities converge and harness systems mature, AI infrastructure is becoming increasingly standardized. Differentiation is shifting away from the model itself toward boundary design: what actions are permitted, under what conditions, and how those constraints are enforced.
In most industries, constraints are often seen as a trade-off against efficiency. In low-tolerance environments, they are a prerequisite for operation. Without them, systems cannot safely execute at all.
This is why AI without a rigorous harness remains closer to a prototype, while AI embedded within a constraint system becomes production-ready.
Financial harnesses like Cobo CAW are beginning to externalize complex risk control and cryptographic logic, much as cloud computing standardized access to compute. Security and constraint are evolving into composable, reusable services.
As a result, institutions and developers no longer need to rebuild foundational layers from scratch. They can integrate AI directly into capital flows, while focusing resources on where differentiation actually matters: strategy, tooling, and business innovation.
In this context, constraint is no longer a limitation. It is the condition that makes execution possible.
