How the agentsaves the tokens.
Every call through a harness passes through the same set of optimizations, on the machine, before it reaches a model. Each one is measured and can be tuned or turned off.
Trim stale context
up to 25%Strips files, tool output, and history the model no longer needs for the current step.
Drop duplicated history
up to 15%Detects context the harness has already sent and removes the repeat instead of paying for it twice.
Serve from local cache
up to 20%Answers identical and near-identical sub-requests from a cache on the box, with no model call.
Route to a lighter model
up to 18%Sends one-line edits and simple lookups to a smaller model and keeps the frontier one for hard steps.
Compress prompts
up to 10%Rewrites verbose system and tool prompts to a tighter form that holds the same meaning.
Batch and coalesce
up to 8%Groups bursts of small calls so fixed per-request overhead is paid once, not many times.
Savings are illustrative and stack; the agent reports the real number per technique on your own traffic.