- Fix IPv4/IPv6 mismatch by binding server to all interfaces
- Add WSL/SSH/remote environment detection to skip unreachable local server
- Add 30s timeout fallback with manual URL input prompt
- Add --no-browser flag support for headless environments
- Add fetch timeout (10s) to fetchProjectID() to prevent indefinite hangs
- Improve openBrowser() with WSL wslview support
- Deleted the RATE_LIMIT_ROUTING_ANALYSIS.md document as it is no longer needed.
- Enhanced the regression test to provide detailed failure information, including the first failure's stderr output.
- Updated the plugin to handle 400 errors ("Prompt too long") with a synthetic response instead of returning a session-locking error.
- Introduced createSyntheticErrorResponse function to generate a synthetic SSE response for error messages, allowing continued session usage.
- Added tests for createSyntheticErrorResponse to ensure correct behavior and structure of the synthetic SSE events.
- Add 2s deduplication window to prevent rate limit counter inflation from concurrent 429s
- Separate cooldown system from rate limits for non-429 errors (auth failures, 5xx)
- Add quota_fallback config option for automatic quota switching on rate limit
- Add toast notification for 400 'Prompt is too long' errors guiding users to /compact
- Add 5 new cooldown unit tests
- Enhance regression test suite with concurrent test infrastructure
- Add comprehensive rate limit analysis documentation
- Add tool hardening for Claude models with parameter signature injection
and system instruction prepending (configurable via claude_tool_hardening)
- Add context error detection (prompt_too_long, tool_pairing) with toast
notifications to guide users on recovery actions
- Improve session recovery to handle cases where messageID isn't provided
by fetching and finding the latest assistant message
- Change empty schema placeholder from reason (string) to _placeholder
(boolean) to reduce token usage
- Add duplicate injection prevention for parameter signatures and tool
hardening instructions
- Fix cache key to strip tier suffix from model name (e.g., -high, -low)
preventing cache misses on tier change
- Add thoughtsTokenCount to usage metadata extraction
- Extract and export applyToolPairingFixes helper for centralized tool
pairing logic
- Add comprehensive tests for recovery error detection and request helpers
- Add max_rate_limit_wait_seconds config option (default: 300s / 5 min)
- Set to 0 to disable fail-fast and wait indefinitely
- Shows clear error message with quota reset time
- Suggests adding more accounts or waiting
- Add guards when accountCount <= 1 to prevent infinite while(true) loop
- Force type: 'object' and inject placeholder property for empty Claude schemas
- Fix THINKING_RECOVERY_NEEDED to actually retry with closeToolLoopForThinking()
instead of returning a useless 'send continue' error message
The thinking recovery now:
1. Detects API error indicating thinking_block_order issue
2. Sets forceThinkingRecovery flag and restarts endpoint loop
3. On retry, closeToolLoopForThinking() strips all thinking and injects
synthetic messages to start a fresh turn
4. Only retries once to avoid infinite loops
- Add 'antigravity-' prefix to route to Antigravity quota
- Models without prefix route to Gemini CLI quota
- Claude/GPT models auto-route to Antigravity (only available there)
- Gemini 3 tiers: Pro supports low/high, Flash supports minimal/low/medium/high
- Add thinkingLevel param for Gemini CLI, keep tier in model name for Antigravity
Resolves#51
- Export detectErrorType() and isRecoverableError() from recovery.ts
- Import detectErrorType in request.ts
- Detect thinking_block_order errors in transformAntigravityResponse
- Throw THINKING_RECOVERY_NEEDED to trigger recovery in fetch wrapper
- Return synthetic error response with recovery instructions
The session.error hook wasn't being triggered for API 400 errors.
This fix catches recoverable errors inline and returns a user-friendly
message instructing them to send 'continue' to resume.
- Added recovery functionality for tool_result_missing, thinking_block_order, and thinking_disabled_violation errors.
- Introduced constants and types for session recovery.
- Created storage utilities for reading and writing session data.
- Enhanced debug logging capabilities in debug.ts.
- Refactored debug state management for better initialization and access.
Port CleanJSONSchemaForAntigravity from CLIProxyAPI to convert
unsupported JSON Schema features into description hints and remove
unsupported keywords. Add warmup tracking improvements: retry limit,
success marking, and cleanup when evicting old entries.
Enhances Gemini quota management by introducing dual quota pools (Antigravity and Gemini CLI) for increased quota availability.
Gemini models now automatically fall back to the second quota pool when the first is exhausted, effectively doubling the quota per account.
- Preserve @ariane-emory scoped package name
- Update version to 1.2.1-fix-account-duplication
- Integrate deduplication logic with upstream v1->v2 migration
- Include all upstream features: auto-update checker, enhanced logging, multi-account improvements
Refactors account management to support sticky sessions, per-model-family rate limits, and enhanced debugging capabilities.
- Implements sticky account selection, preserving Anthropic's prompt cache by sticking to the same account until a rate limit is encountered.
- Tracks rate limits separately for Claude and Gemini models, allowing an account to be used for one model family even if rate-limited for another.
- Introduces a smart retry threshold for short rate limits, retrying on the same account to avoid unnecessary switching.
- Adds exponential backoff for consecutive rate limits, increasing delays with each subsequent 429 error.
- Includes quota reset times in rate limit toast notifications when available from the API.
- Debounces toast notifications to prevent spam during streaming responses.
- Introduces a quiet mode to suppress account-related toast notifications via an environment variable.
- Enhances debug logging with level-based verbosity, TUI integration, and auto-stripping of injected debug blocks.
- Fix integer overflow risk in score calculation (compare fields separately)
- Condense verbose test comments
- Add docstrings to storage.ts for improved documentation coverage
- Merge upstream/main to get toast fix for single-account users
- Update version to 1.1.7-fix-account-duplication
Update CLI prompts and completion message to read the actual account
count from storage after deduplication, instead of showing how many
OAuth exchanges were completed (which could include duplicates).
- Add email-based deduplication in persistAccountPool to prevent duplicates when refresh token changes
- Add deduplicateAccountsByEmail function to clean up existing duplicate accounts on load
- Add comprehensive unit tests for deduplication logic including exact scenario from issue #24
- When same email re-authenticates, replace existing entry instead of creating duplicate
- Preserve newest account (by lastUsed, then addedAt) for each email address
Implement comprehensive support for Claude thinking models with interleaved
thinking in multi-turn conversations:
- Add signature caching system to preserve and restore thinking block
signatures across conversation turns, preventing "invalid signature" errors
- Enable real-time SSE streaming with immediate forwarding of thinking tokens
- Add interleaved-thinking-2025-05-14 beta header for Claude thinking models
- Implement smart system hints to encourage thinking during tool use
- Add VALIDATED mode for tool calling on Claude models
- Ensure output token limits accommodate thinking budgets
- Filter and sanitize thinking blocks, removing SDK-injected cache_control
- Add comprehensive test suites for auth, cache, and request-helpers modules
- Update build config to exclude test files from production builds
- Document streaming and thinking features in README
- Add CLI prompt to choose between adding accounts or starting fresh
- Implement automatic retry with backoff for single-account rate limits
- Show toast notifications for account switching and rate limit status
- Clear stale account storage when OpenCode auth state changes
- Add sleep helper function with abort signal support
- Improve README with clearer step-by-step setup instructions
TUI flow now adds accounts non-destructively; CLI flow offers choice.
Adds multi-account support and round-robin load balancing for Google Antigravity OAuth to increase request throughput and resilience. Introduces an on-disk account pool with cooldowns for rate-limited accounts, automatic removal of revoked refresh tokens, and persistence of rotation state.
Improves OAuth flows and UX: CLI flow can add multiple accounts with per-account project IDs, TUI flow remains single-account, improved browser opening/fallback copy-paste handling, and clearer prompts for pasting redirect URLs or codes. Adds robust parsing of callback input and better headless handling.
Makes token refresh handling explicit and typed (throws a specific error on invalid_grant) and centralizes account management logic into an in-memory manager with persistence utilities. Adds tests for account rotation and rate-limit behavior and bumps package version.
Overall, this increases reliability under rate limits, makes multi-account configuration straightforward, and improves error handling and developer/user experience.