- subtractTokens JSDoc said the raw payload lives on Usage.native, but
that field was renamed to providerMetadata earlier in this PR.
- totalTokens JSDoc still described the abandoned "additive" first-pass
contract where inputTokens/outputTokens were non-cached / visible only.
We landed on inclusive totals; the fallback already covers cache and
reasoning.
- Removed a duplicate inline comment in Gemini's mapUsage — the
function-level comment already explains the visible/reasoning sum and
the undefined-when-incomplete rule.
Verifies the new Usage mapper code against live provider responses for
OpenAI Chat, OpenAI Responses, Anthropic, Gemini, DeepSeek, and
TogetherAI — 16 fresh recordings, all assertions pass. No existing
cassettes were modified; these populate test slots that were previously
skipped in replay mode.
Recorded via:
set -a; source .env.recorded.local; set +a
RECORD=true bun test test/provider/*.recorded.test.ts
Redactor stripped all auth headers; no secrets in the cassettes.
Aligns the escape-hatch field name with `LLMEvent.providerMetadata` used
elsewhere in this package (and with AI SDK / pydantic-ai / LangChain
conventions for the same idea). Two parallel escape hatches having
different names was a wart.
The raw payload is now wrapped under the provider key — `{ openai: ... }`,
`{ anthropic: ... }`, `{ google: ... }`, `{ bedrock: ... }` — using the
existing `ProviderMetadata = Record<string, Record<string, unknown>>`
schema rather than a flat record. Same shape as
`LLMEvent.providerMetadata`, so consumers downstream can read both with
the same code.
Anthropic's `mergeUsage` merges the per-provider sub-record across
`message_start` and `message_delta` instead of spreading at the top level.
Final shape after considering ecosystem conventions:
inputTokens — inclusive total (matches AI SDK / OpenAI / LangChain)
outputTokens — inclusive total (includes reasoning)
nonCachedInputTokens — breakdown: fresh prompt
cacheReadInputTokens — breakdown: cache hit
cacheWriteInputTokens — breakdown: cache write
reasoningTokens — subset of outputTokens
Invariant:
nonCached + cacheRead + cacheWrite = inputTokens
reasoningTokens <= outputTokens
Why this shape:
- `inputTokens` keeps its AI-SDK / OpenAI semantics, so a reader from any
major ecosystem sees the number they expect.
- The non-overlapping breakdown fields are populated alongside the
inclusive totals — consumers read whichever they need without
subtracting. This eliminates the underflow bug class (opencode#26620)
structurally without diverging on naming.
- Aligns with the AI SDK v3 spec proposal (vercel/ai#9921), which adds
exactly this kind of non-overlapping breakdown to address the active
ecosystem bugs around cache token double-counting and underflow
(pydantic-ai#4364, langfuse#12306/#11979, vercel/ai#8349,
langchain#32818, langchainjs#10249).
Mappers:
- OpenAI Chat / Responses / Bedrock: provider reports inclusive totals
natively; mapper derives `nonCachedInputTokens` via
`ProviderShared.subtractTokens`.
- Gemini: `promptTokenCount` is inclusive; `candidatesTokenCount` is
*exclusive* of `thoughtsTokenCount`, so mapper sums those to produce
the inclusive `outputTokens`. Only computes the total when the visible
component is reported (avoids fabricating an inclusive number from a
partial breakdown).
- Anthropic: `input_tokens` is *non-cached* natively; mapper sums it with
cache reads/writes to produce the inclusive `inputTokens`.
`output_tokens` is inclusive (Anthropic doesn't break thinking out, so
`reasoningTokens` stays undefined).
Added a `visibleOutputTokens` getter (clamped `outputTokens - reasoningTokens`)
as the one safe escape hatch for consumers wanting the non-reasoning view.
Added `ProviderShared.sumTokens` to derive an inclusive total from a
non-overlapping breakdown, returning `undefined` when every input is
undefined (so we don't fabricate a 0).
Match the `LLMResponse.text` / `reasoning` / `toolCalls` getter pattern
in the same file — `usage.totalInputTokens` reads naturally and lives
where the Usage data does. Both sums are monotonic under the additive
contract, so callers no longer need to remember which fields are
non-overlapping.
Test fixtures that previously asserted with `usage: { ... }` plain
literals are now wrapped with `new Usage({...})` to match the runtime
shape the mappers actually produce (an instance, not a struct).
The additive contract delivers value at the mapper boundary — every
field is non-overlapping and non-negative, so any caller summing
arbitrary subsets is correct by construction. Two-line helpers that
just sum three or two known fields add API surface without paying for
themselves, and there are no in-tree consumers today. If v2 wants them
at integration time, the right place is a getter on the `Schema.Class`
(matching the `LLMResponse.text` / `reasoning` / `toolCalls` pattern in
the same file), not a static namespace helper.
Review pass:
- Drop `Pick<>` type aliases on `Usage.totalInput` / `Usage.totalOutput`
— the helpers can take `Usage` directly since every field is optional.
- Collapse Bedrock's nested `subtractTokens(subtractTokens(...))` into a
single subtraction against the summed cache subtotals.
- Drop arithmetic-walkthrough comments in test fixtures (the raw
fixture values are right next to the expected outputs).
- Generalize the comment on `mapUsage` in `openai-chat.ts` so the
rationale outlives the PR reference.
Defines a single invariant for `LLM.Usage`: every field is non-negative
and every meaningful aggregate is a *sum*, never a difference. Total
billable input = inputTokens + cacheReadInputTokens + cacheWriteInputTokens.
Total billable output = outputTokens + reasoningTokens. Adding two
non-negatives cannot underflow, so consumers can no longer reproduce the
underflow-then-clamp bug class fixed by #26620.
Each protocol mapper now enforces the contract at the provider boundary
via `ProviderShared.subtractTokens`, which clamps with `Math.max(0, …)`
for defense against provider bugs:
- OpenAI Chat / Responses: pull `cached_tokens` out of `prompt_tokens` /
`input_tokens`; pull `reasoning_tokens` out of `completion_tokens` /
`output_tokens`. The provider's `total_tokens` is preserved verbatim.
- Gemini: pull `cachedContentTokenCount` out of `promptTokenCount`.
Gemini already split visible candidates from thoughts.
- Bedrock: pull `cacheReadInputTokens` and `cacheWriteInputTokens` out of
`inputTokens`, matching AWS prompt-caching docs.
- Anthropic: already non-overlapping per the Messages API; pass through.
Adds `Usage.totalInput` / `Usage.totalOutput` helpers for callers that
want the merged view, and a regression test covering the clamp behavior.
The reasoning underflow fixed in #26620 was the most visible symptom of
a broader semantic inconsistency in this package: providers also disagreed
on whether `inputTokens` includes cache reads (Anthropic excluded;
OpenAI/Gemini/Bedrock included), which would silently double-subtract
the moment v2 wired LLM.Usage into Session.getUsage. Normalizing now,
pre-integration, closes both holes in one move.