Implement comprehensive support for Claude thinking models with interleaved
thinking in multi-turn conversations:
- Add signature caching system to preserve and restore thinking block
signatures across conversation turns, preventing "invalid signature" errors
- Enable real-time SSE streaming with immediate forwarding of thinking tokens
- Add interleaved-thinking-2025-05-14 beta header for Claude thinking models
- Implement smart system hints to encourage thinking during tool use
- Add VALIDATED mode for tool calling on Claude models
- Ensure output token limits accommodate thinking budgets
- Filter and sanitize thinking blocks, removing SDK-injected cache_control
- Add comprehensive test suites for auth, cache, and request-helpers modules
- Update build config to exclude test files from production builds
- Document streaming and thinking features in README
- Add CLI prompt to choose between adding accounts or starting fresh
- Implement automatic retry with backoff for single-account rate limits
- Show toast notifications for account switching and rate limit status
- Clear stale account storage when OpenCode auth state changes
- Add sleep helper function with abort signal support
- Improve README with clearer step-by-step setup instructions
TUI flow now adds accounts non-destructively; CLI flow offers choice.
Adds multi-account support and round-robin load balancing for Google Antigravity OAuth to increase request throughput and resilience. Introduces an on-disk account pool with cooldowns for rate-limited accounts, automatic removal of revoked refresh tokens, and persistence of rotation state.
Improves OAuth flows and UX: CLI flow can add multiple accounts with per-account project IDs, TUI flow remains single-account, improved browser opening/fallback copy-paste handling, and clearer prompts for pasting redirect URLs or codes. Adds robust parsing of callback input and better headless handling.
Makes token refresh handling explicit and typed (throws a specific error on invalid_grant) and centralizes account management logic into an in-memory manager with persistence utilities. Adds tests for account rotation and rate-limit behavior and bumps package version.
Overall, this increases reliability under rate limits, makes multi-account configuration straightforward, and improves error handling and developer/user experience.
- Add transformThinkingParts() to transform thinking content in responses
- Handle both Gemini-style (thought: true) and Anthropic-style (type: thinking)
- Extract thinking config from extra_body and Anthropic-style options
- Auto-enable thinking for thinking-capable models (opus, gemini-3, thinking)
- Filter unsigned thinking blocks for Claude multi-turn conversations
- Apply transformations to both streaming and JSON responses