- Deleted `iflow` provider implementation, including thinking configuration (`apply.go`) and authentication modules.
- Removed iFlow-specific tests, executors, and helpers across SDK and internal components.
- Updated all references to exclude iFlow functionality.
- Deleted `QwenAuthenticator`, internal `qwen_auth`, and `qwen_executor` implementations.
- Removed all Qwen-related OAuth flows, token handling, and execution logic.
- Cleaned up dependencies and references to Qwen across the codebase.
The Copilot API enforces per-account prompt token limits (128K individual,
168K business) that are lower than the total context window (200K). When
the dynamic /models API fetch fails or returns no capabilities.limits,
the static fallback of 200K exceeds the real enforced limit, causing
intermittent "prompt token count exceeds the limit" errors.
Two complementary fixes:
1. Lower static Copilot Claude model ContextLength from 200000 to 128000
(the conservative default matching defaultCopilotContextLength). Dynamic
API limits override this when available.
2. Add context_length and max_completion_tokens to Claude-format model
responses so Claude Code CLI can learn the actual Copilot limit instead
of relying on its built-in 1M context configuration.
This commit addresses three issues with Claude Code through GitHub
Copilot:
1. **Premium request inflation**: Responses API requests were missing
Openai-Intent headers and proper defaults, causing Copilot to bill
each tool-loop continuation as a new premium request. Fixed by adding
isAgentInitiated() heuristic (checks for tool_result content or
preceding assistant tool_use), applying Responses API defaults
(store, include, reasoning.summary), and local tiktoken-based token
counting to avoid extra API calls.
2. **Context overflow**: Claude Code's modelSupports1M() hardcodes
opus-4-6 as 1M-capable, but Copilot only supports ~128K-200K.
Fixed by stripping the context-1m-2025-08-07 beta from translated
request bodies. Also forwards response headers in non-streaming
Execute() and registers the GET /copilot-quota management API route.
3. **Thinking not working**: Add ThinkingSupport with level-based
reasoning to Claude models in the static definitions. Normalize
Copilot's non-standard 'reasoning_text' response field to
'reasoning_content' before passing to the SDK translator. Use
caller-provided context in CountTokens instead of Background().
- Add kiro-claude-opus-4-6 and kiro-claude-opus-4-6-agentic to model registry
- Add model ID mappings for claude-opus-4.6 variants
- Support both kiro- prefix and native format (claude-opus-4.6)
- Tested and working with Kiro API