Add gemini-3 pattern matching to shouldCacheThinkingSignatures() to
enable proper thought signature handling for multi-turn conversations
with function calling on Gemini 3 models, alongside existing Claude support.
- Add guards when accountCount <= 1 to prevent infinite while(true) loop
- Force type: 'object' and inject placeholder property for empty Claude schemas
- Fix THINKING_RECOVERY_NEEDED to actually retry with closeToolLoopForThinking()
instead of returning a useless 'send continue' error message
The thinking recovery now:
1. Detects API error indicating thinking_block_order issue
2. Sets forceThinkingRecovery flag and restarts endpoint loop
3. On retry, closeToolLoopForThinking() strips all thinking and injects
synthetic messages to start a fresh turn
4. Only retries once to avoid infinite loops
- Export detectErrorType() and isRecoverableError() from recovery.ts
- Import detectErrorType in request.ts
- Detect thinking_block_order errors in transformAntigravityResponse
- Throw THINKING_RECOVERY_NEEDED to trigger recovery in fetch wrapper
- Return synthetic error response with recovery instructions
The session.error hook wasn't being triggered for API 400 errors.
This fix catches recoverable errors inline and returns a user-friendly
message instructing them to send 'continue' to resume.
- Added recovery functionality for tool_result_missing, thinking_block_order, and thinking_disabled_violation errors.
- Introduced constants and types for session recovery.
- Created storage utilities for reading and writing session data.
- Enhanced debug logging capabilities in debug.ts.
- Refactored debug state management for better initialization and access.
When Claude's conversation history gets corrupted (thinking blocks stripped
by context compaction, signature cache miss, etc.), the session would break
permanently with: 'Expected thinking or redacted_thinking, but found text'
This adds a 'last resort' recovery mechanism:
1. After all existing thinking/tool processing, analyze conversation state
2. If in tool loop WITHOUT thinking at turn start = corrupted state
3. Instead of trying to fix: close the turn and start fresh
- Strip all (potentially corrupted) thinking blocks
- Inject synthetic MODEL message: '[Tool execution completed.]'
- Inject synthetic USER message: '[Continue]'
4. Clear signature cache for this session
Philosophy: 'Let it crash and start again' - Don't fight corruption,
just abandon the corrupted turn and let Claude generate fresh thinking.
New file: src/plugin/thinking-recovery.ts
- analyzeConversationState(): Detect tool loops and thinking state
- closeToolLoopForThinking(): Inject synthetic messages to start fresh
- needsThinkingRecovery(): Check if recovery is needed
Integration in request.ts:
- Added as LAST RESORT after all existing processing (line ~1272)
- Only triggers when: isClaudeThinkingModel && inToolLoop && !turnHasThinking
- Logs warning when recovery is triggered for debugging
Port CleanJSONSchemaForAntigravity from CLIProxyAPI to convert
unsupported JSON Schema features into description hints and remove
unsupported keywords. Add warmup tracking improvements: retry limit,
success marking, and cleanup when evicting old entries.
Enhances Gemini quota management by introducing dual quota pools (Antigravity and Gemini CLI) for increased quota availability.
Gemini models now automatically fall back to the second quota pool when the first is exhausted, effectively doubling the quota per account.
Refactors account management to support sticky sessions, per-model-family rate limits, and enhanced debugging capabilities.
- Implements sticky account selection, preserving Anthropic's prompt cache by sticking to the same account until a rate limit is encountered.
- Tracks rate limits separately for Claude and Gemini models, allowing an account to be used for one model family even if rate-limited for another.
- Introduces a smart retry threshold for short rate limits, retrying on the same account to avoid unnecessary switching.
- Adds exponential backoff for consecutive rate limits, increasing delays with each subsequent 429 error.
- Includes quota reset times in rate limit toast notifications when available from the API.
- Debounces toast notifications to prevent spam during streaming responses.
- Introduces a quiet mode to suppress account-related toast notifications via an environment variable.
- Enhances debug logging with level-based verbosity, TUI integration, and auto-stripping of injected debug blocks.
Implement comprehensive support for Claude thinking models with interleaved
thinking in multi-turn conversations:
- Add signature caching system to preserve and restore thinking block
signatures across conversation turns, preventing "invalid signature" errors
- Enable real-time SSE streaming with immediate forwarding of thinking tokens
- Add interleaved-thinking-2025-05-14 beta header for Claude thinking models
- Implement smart system hints to encourage thinking during tool use
- Add VALIDATED mode for tool calling on Claude models
- Ensure output token limits accommodate thinking budgets
- Filter and sanitize thinking blocks, removing SDK-injected cache_control
- Add comprehensive test suites for auth, cache, and request-helpers modules
- Update build config to exclude test files from production builds
- Document streaming and thinking features in README
- Add CLI prompt to choose between adding accounts or starting fresh
- Implement automatic retry with backoff for single-account rate limits
- Show toast notifications for account switching and rate limit status
- Clear stale account storage when OpenCode auth state changes
- Add sleep helper function with abort signal support
- Improve README with clearer step-by-step setup instructions
TUI flow now adds accounts non-destructively; CLI flow offers choice.
Adds multi-account support and round-robin load balancing for Google Antigravity OAuth to increase request throughput and resilience. Introduces an on-disk account pool with cooldowns for rate-limited accounts, automatic removal of revoked refresh tokens, and persistence of rotation state.
Improves OAuth flows and UX: CLI flow can add multiple accounts with per-account project IDs, TUI flow remains single-account, improved browser opening/fallback copy-paste handling, and clearer prompts for pasting redirect URLs or codes. Adds robust parsing of callback input and better headless handling.
Makes token refresh handling explicit and typed (throws a specific error on invalid_grant) and centralizes account management logic into an in-memory manager with persistence utilities. Adds tests for account rotation and rate-limit behavior and bumps package version.
Overall, this increases reliability under rate limits, makes multi-account configuration straightforward, and improves error handling and developer/user experience.
- Add transformThinkingParts() to transform thinking content in responses
- Handle both Gemini-style (thought: true) and Anthropic-style (type: thinking)
- Extract thinking config from extra_body and Anthropic-style options
- Auto-enable thinking for thinking-capable models (opus, gemini-3, thinking)
- Filter unsigned thinking blocks for Claude multi-turn conversations
- Apply transformations to both streaming and JSON responses