AI inferencecontrol plane

TokenShift is an open-source agent that installs on the machines your engineers code on and taps the harnesses they already run. It cuts the tokens your AI agents burn automatically, on the machine. Connect the cloud console when you want to govern and see the whole fleet.

Taps the coding agents your engineers already run

  • Claude Code
  • Codex
  • Cursor
  • GitHub Copilot
  • Windsurf
  • Claude Desktop

A coding agent can spend a milliontokens chasing one bug.

It re-reads the same files, repeats context it already has, and calls the heaviest model for a one-line edit. You feel it as a surprise bill at the end of the month, a quota wall in the middle of a task, and an inner loop that keeps getting slower.

The agent shaves every requestbefore it costs a token.

It sits on the machine, between the harness and the model. On each call it strips what the model does not need, reuses what it has already seen, and picks the lightest model that can do the job. All of it on the box, none of it your problem.

One request, optimized76% fewer tokens
Raw request12,400
Trim stale context-3,100
Drop duplicated history-1,800
Serve from local cache-2,600
Route to a lighter model-1,900
Sent to the provider3,000

Lower bills

Fewer tokens per call, with no change to how anyone works.

Quotas last longer

Spend less per task and the rate limit arrives much later.

Faster loops

Smaller prompts and cache hits come back sooner.

Less babysitting

The agent stops the loops and re-reads on its own.

Free and MIT, forever.

Install, optimize,then connect.

  1. 01

    Install the agent

    One command on a workstation, or into your production image. It starts cutting tokens immediately, on its own.

  2. 02

    It optimizes locally

    Every call through a harness is trimmed, deduplicated, cached, and routed before it ever reaches a model.

  3. 03

    Connect the console

    When the team is ready, point the agents at a console to govern budgets, set policy, and see the fleet in one place.

MIT today.MIT in ten years.

The agent is open source under the MIT license and it stays that way. No relicensing, no bait and switch. It runs on your machines and reports only where you point it. The roadmap is public and the harness adapters are written by the people who use them.

  • MIT licensed, permanently.
  • Runs entirely on your machines by default.
  • Public roadmap, open governance.
  • Harness adapters built and reviewed by the community.
We put the agent on every engineer laptop. Token spend on Claude Code and Cursor dropped by a third in the first week, and nobody changed how they work. When we wanted the fleet picture, the console was one flag away. It all runs on our hardware, so there was nothing for security to argue with.
DO
Dana Okafor
Staff Engineer
33%
Fewer tokens in week one, no workflow change
0
Lines of code changed to adopt it

Cut your agents' tokenbill this afternoon.

Install the agent on one machine in about two minutes. It starts optimizing on its own.