AI inferencecontrol plane
TokenShift is an open-source agent that installs on the machines your engineers code on and taps the harnesses they already run. It cuts the tokens your AI agents burn automatically, on the machine. Connect the cloud console when you want to govern and see the whole fleet.
Taps the coding agents your engineers already run
- Claude Code
- Codex
- Cursor
- GitHub Copilot
- Windsurf
- Claude Desktop
A coding agent can spend a milliontokens chasing one bug.
It re-reads the same files, repeats context it already has, and calls the heaviest model for a one-line edit. You feel it as a surprise bill at the end of the month, a quota wall in the middle of a task, and an inner loop that keeps getting slower.
The agent shaves every requestbefore it costs a token.
It sits on the machine, between the harness and the model. On each call it strips what the model does not need, reuses what it has already seen, and picks the lightest model that can do the job. All of it on the box, none of it your problem.
Lower bills
Fewer tokens per call, with no change to how anyone works.
Quotas last longer
Spend less per task and the rate limit arrives much later.
Faster loops
Smaller prompts and cache hits come back sooner.
Less babysitting
The agent stops the loops and re-reads on its own.
Free and MIT, forever.
Install, optimize,then connect.
- 01
Install the agent
One command on a workstation, or into your production image. It starts cutting tokens immediately, on its own.
- 02
It optimizes locally
Every call through a harness is trimmed, deduplicated, cached, and routed before it ever reaches a model.
- 03
Connect the console
When the team is ready, point the agents at a console to govern budgets, set policy, and see the fleet in one place.
MIT today.MIT in ten years.
The agent is open source under the MIT license and it stays that way. No relicensing, no bait and switch. It runs on your machines and reports only where you point it. The roadmap is public and the harness adapters are written by the people who use them.
- MIT licensed, permanently.
- Runs entirely on your machines by default.
- Public roadmap, open governance.
- Harness adapters built and reviewed by the community.
We put the agent on every engineer laptop. Token spend on Claude Code and Cursor dropped by a third in the first week, and nobody changed how they work. When we wanted the fleet picture, the console was one flag away. It all runs on our hardware, so there was nothing for security to argue with.
Cut your agents' tokenbill this afternoon.
Install the agent on one machine in about two minutes. It starts optimizing on its own.