Reverts the streaming thinking suppression introduced in b15453c.
rewriteStreamEvent should only inject signatures and rewrite model
names — suppressing thinking blocks in streaming mode breaks SSE
index alignment and causes the Amp TUI to render empty responses
on the second message onward (especially with model-mapped
non-Claude providers like GPT-5.4).
Non-streaming responses still suppress thinking when tool_use is
present via rewriteModelInResponse.
When a Claude assistant message contains [text, tool_use, text], the
Antigravity API internally splits the model message at functionCall
boundaries, creating an extra assistant turn between tool_use and the
following tool_result. Claude then rejects with:
tool_use ids were found without tool_result blocks immediately after
Fix: extend the existing 2-way part reordering (thinking-first) to a
3-way partition: thinking → regular → functionCall. This ensures
functionCall parts are always last, so Antigravity's split cannot
insert an extra assistant turn before the user's tool_result.
Fixes#989
- Call suppressAmpThinking in rewriteStreamEvent for streaming path
- Handle nil return from suppressAmpThinking to skip suppressed events
- Narrow looksLikeSSEChunk to line-prefix detection (HasPrefix vs Contains)
- Initialize suppressedContentBlock map in test
Emit signature only when non-empty in both streaming content_block_start
and non-streaming thinking blocks. Avoids turning 'missing signature'
into 'empty/invalid signature' which Claude clients may reject.
Drop the last affinity-related executor artifacts so the PR stays focused on the minimal Codex continuity fix set: stable prompt cache identity, stable session_id, and the executor-only behavior that was validated to restore cache reads.
Drop the chat-completions translator edits from this PR so the branch complies with the repository policy that forbids pull-request changes under internal/translator. The remaining PR stays focused on the executor-level Codex continuity fix that was validated to restore cache reuse.
Restore Claude continuity after the continuity refactor, keep auth-affinity keys out of upstream Codex session identifiers, and only persist affinity after successful execution so retries can still rotate to healthy credentials when the first auth fails.
Align websocket continuity resolution with the HTTP Codex path, make auth-affinity principal keys use a stable string representation, and extract small helpers that remove duplicated continuity and affinity logic without changing the validated cache-hit behavior.
Prompt caching on Codex was not reliably reusable through the proxy because repeated chat-completions requests could reach the upstream without the same continuity envelope. In practice this showed up most clearly with OpenCode, where cache reads worked in the reference client but not through CLIProxyAPI, although the root cause is broader than OpenCode itself.
The proxy was breaking continuity in several ways: executor-layer Codex request preparation stripped prompt_cache_retention, chat-completions translation did not preserve that field, continuity headers used a different shape than the working client behavior, and OpenAI-style Codex requests could be sent without a stable prompt_cache_key. When that happened, session_id fell back to a fresh random value per request, so upstream Codex treated repeated requests as unrelated turns instead of as part of the same cacheable context.
This change fixes that by preserving caller-provided prompt_cache_retention on Codex execution paths, preserving prompt_cache_retention when translating OpenAI chat-completions requests to Codex, aligning Codex continuity headers to session_id, and introducing an explicit Codex continuity policy that derives a stable continuity key from the best available signal. The resolution order prefers an explicit prompt_cache_key, then execution session metadata, then an explicit idempotency key, then stable request-affinity metadata, then a stable client-principal hash, and finally a stable auth-ID hash when no better continuity signal exists.
The same continuity key is applied to both prompt_cache_key in the request body and session_id in the request headers so repeated requests reuse the same upstream cache/session identity. The auth manager also keeps auth selection sticky for repeated request sequences, preventing otherwise-equivalent Codex requests from drifting across different upstream auth contexts and accidentally breaking cache reuse.
To keep the implementation maintainable, the continuity resolution and diagnostics are centralized in a dedicated Codex continuity helper instead of being scattered across executor flow code. Regression coverage now verifies retention preservation, continuity-key precedence, stable auth-ID fallback, websocket parity, translator preservation, and auth-affinity behavior. Manual validation confirmed prompt cache reads now occur through CLIProxyAPI when using Codex via OpenCode, and the fix should also benefit other clients that rely on stable repeated Codex request continuity.
Previously Cursor required a manual ?label=xxx parameter to distinguish
accounts (unlike Codex which auto-generates filenames from JWT claims).
Cursor JWTs contain a "sub" claim (e.g. "auth0|user_XXXX") that uniquely
identifies each account. Now we:
- Add ParseJWTSub() + SubToShortHash() to extract and hash the sub claim
- Refactor GetTokenExpiry() to share the new decodeJWTPayload() helper
- Update CredentialFileName(label, subHash) to auto-generate filenames
from the sub hash when no explicit label is provided
(e.g. "cursor.8f202e67.json" instead of always "cursor.json")
- Add DisplayLabel() for human-readable account identification
- Store "sub" in metadata for observability
- Update both management API handler and SDK authenticator
Same account always produces the same filename (deterministic), different
accounts get different files. Explicit ?label= still takes priority.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cursor executor errors were plain fmt.Errorf — the conductor couldn't
extract HTTP status codes, so exhausted accounts never entered cooldown.
Changes:
- Add ConnectError struct to proto/connect.go: ParseConnectEndStream now
returns *ConnectError with Code/Message fields for precise matching
- Add cursorStatusErr implementing StatusError + RetryAfter interfaces
- Add classifyCursorError() with two-layer classification:
Layer 1: exact match on ConnectError.Code (gRPC standard codes)
resource_exhausted → 429, unauthenticated → 401,
permission_denied → 403, unavailable → 503, internal → 500
Layer 2: fuzzy string match for H2 errors (RST_STREAM → 502)
- Log all ConnectError code/message pairs for observing real server
error codes (we have no samples yet)
- Wrap Execute and ExecuteStream error returns with classifyCursorError
Now the conductor properly marks Cursor auths as cooldown on quota errors,
enabling exponential backoff and round-robin failover.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a Cursor account's quota is exhausted, sessions bound to it can now
seamlessly continue on a different account:
Layer 1 — Checkpoint decoupling:
Key checkpoints by conversationId (not authID:conversationId). Store
authID inside savedCheckpoint. On lookup, if auth changed, discard the
stale checkpoint and flatten conversation history into userText.
Layer 2 — Cross-account session cleanup:
When a request arrives for a conversation whose session belongs to a
different (now-exhausted) auth, close the old H2 stream and remove
the stale session to free resources.
Layer 3 — H2Stream.Err() exposure:
New Err() method on H2Stream so callers can inspect RST_STREAM,
GOAWAY, or other stream-level errors after closure.
Layer 4 — processH2SessionFrames error propagation:
Returns error instead of bare return. Connect EndStream errors (quota,
rate limit) are now propagated instead of being logged and swallowed.
Layer 5 — Pre-response transparent retry:
If the stream fails before any data is sent to the client, return an
error to the conductor so it retries with a different auth — fully
transparent to the client.
Layer 6 — Post-response error logging:
If the stream fails after data was already sent, log a warning. The
conductor's existing cooldown mechanism ensures the next request routes
to a healthy account.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address Gemini Code Assist review feedback: use logrus log package
instead of fmt.Printf/Println in Cursor auth handlers and CLI for
unified log formatting and level control.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add cursor/filename.go for multi-account credential file naming
- Include auth.ID in session and checkpoint keys for per-account isolation
- Record authID in cursorSession, validate on resume to prevent cross-account access
- Management API /cursor-auth-url supports ?label= for creating named accounts
- Leverages existing conductor round-robin + failover framework
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add RequestCursorToken handler with PKCE + polling flow
- Register /v0/management/cursor-auth-url route
- Returns login URL + state for browser auth, polls in background
- Saves cursor.json with access/refresh tokens on success
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Capture conversation_checkpoint_update from Cursor server (was ignored)
- Store checkpoint per conversationId, replay as conversation_state on next request
- Use protowire to embed raw checkpoint bytes directly (no deserialization)
- Extract session_id from Claude Code metadata for stable conversationId across resume
- Flatten conversation history into userText as fallback when no checkpoint available
- Use conversationId as session key for reliable tool call resume
- Add checkpoint TTL cleanup (30min)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Codex/OpenAI Responses API does not support the stream_options
parameter. When clients (e.g. Amp CLI) include stream_options in their
requests, CLIProxyAPI forwards it as-is, causing a 400 error:
{"detail":"Unsupported parameter: stream_options"}
Strip stream_options alongside the other unsupported parameters
(previous_response_id, prompt_cache_retention, safety_identifier)
in Execute, ExecuteStream, and CountTokens.