14 · Token telemetry¶

What you'll do¶

See where your AI spending actually goes — which jobs cost the most, whether caching is saving you money, and which model is eating tokens — on the Token Telemetry screen. This is the detail behind the spending limit you set on page 3.

Open token telemetry¶

Go to System → Token Telemetry. It's built to answer three questions:

Which jobs are costing me the most?
Is caching working, so I'm not paying full price every time?
Which model is using the most tokens?

The Token Telemetry screen, with usage charts.

What the numbers mean¶

AI models charge per token — roughly, per chunk of text in and out. TofuFactory splits the count so you can see what you're paying for:

Input — the prompts, instructions, and code sent to the model.
Output — the code and text the model writes back. Usually the priciest per token.
Cache read — repeated context (your files, the system prompts) the model remembered from last time. Billed at a steep discount — often up to 90% off — so more of this is good.
Cache write — that same context the first time it's sent, before it can be reused cheaply.

Counted once. A job's tokens and dollar cost are recorded a single time, when it finishes — never per step — so the totals you see can't be inflated by double-counting.

Find the expensive jobs¶

The table at the bottom is where you dig in:

Pick a time range — last hour, day, week, month, or all of it.
Sort to surface what you're after: by total tokens (the heaviest runs), by cost (the priciest), or by cache reads (to confirm caching is kicking in).
Filter by project or role to find which one drove a cost spike — was it Planning, or routine Utility work?

Catch a runaway before the daily cap¶

A single job stuck in a loop can burn a lot before it ever reaches your daily limit. The spike threshold is the tripwire: a run bigger than 100,000 tokens raises a warning. You can change that number — or have it pause jobs that cross it — in Settings → Budget (page 3).

You should now see¶

Total tokens and cost over the time range you picked.
A table of which jobs used the most output or cache-read tokens.
A clear read on where your money is going.

If something's not right¶

Problem	What to do
The numbers don't match my provider's bill	Some models report estimated costs. TofuFactory uses standard pricing to convert tokens to dollars, so small gaps are normal when a provider changes its rates.
Cache-read is always zero	Only some models cache (Claude Sonnet/Opus, Gemini). If your role points at one that doesn't, you pay full price for input every turn — switch models if that cost matters.
A running job isn't in the table	Telemetry is written when a job finishes. For one in progress, open its task detail for a live estimate.

Next¶

→ 15 · The Kitty Pool — warm standby agents that start work the instant you hand it over.