How the agentsaves the tokens.

Every call through a harness passes through the same set of optimizations, on the machine, before it reaches a model. Each one is measured and can be tuned or turned off.

Trim stale context

up to 25%

Strips files, tool output, and history the model no longer needs for the current step.

Drop duplicated history

up to 15%

Detects context the harness has already sent and removes the repeat instead of paying for it twice.

Serve from local cache

up to 20%

Answers identical and near-identical sub-requests from a cache on the box, with no model call.

Route to a lighter model

up to 18%

Sends one-line edits and simple lookups to a smaller model and keeps the frontier one for hard steps.

Compress prompts

up to 10%

Rewrites verbose system and tool prompts to a tighter form that holds the same meaning.

Batch and coalesce

up to 8%

Groups bursts of small calls so fixed per-request overhead is paid once, not many times.

Savings are illustrative and stack; the agent reports the real number per technique on your own traffic.