diff --git a/gitlab-duo-codex-parity-plan.md b/gitlab-duo-codex-parity-plan.md deleted file mode 100644 index f4fc90d0..00000000 --- a/gitlab-duo-codex-parity-plan.md +++ /dev/null @@ -1,278 +0,0 @@ -# Plan: GitLab Duo Codex Parity - -**Generated**: 2026-03-10 -**Estimated Complexity**: High - -## Overview -Bring GitLab Duo support from the current "auth + basic executor" stage to the same practical level as `codex` inside `CLIProxyAPI`: a user logs in once, points external clients such as Claude Code at `CLIProxyAPI`, selects GitLab Duo-backed models, and gets stable streaming, multi-turn behavior, tool calling compatibility, and predictable model routing without manual provider-specific workarounds. - -The core architectural shift is to stop treating GitLab Duo as only two REST wrappers (`/api/v4/chat/completions` and `/api/v4/code_suggestions/completions`) and instead use GitLab's `direct_access` contract as the primary runtime entrypoint wherever possible. Official GitLab docs confirm that `direct_access` returns AI gateway connection details, headers, token, and expiry; that contract is the closest path to codex-like provider behavior. - -## Prerequisites -- Official GitLab Duo API references confirmed during implementation: - - `POST /api/v4/code_suggestions/direct_access` - - `POST /api/v4/code_suggestions/completions` - - `POST /api/v4/chat/completions` -- Access to at least one real GitLab Duo account for manual verification. -- One downstream client target for acceptance testing: - - Claude Code against Claude-compatible endpoint - - OpenAI-compatible client against `/v1/chat/completions` and `/v1/responses` -- Existing PR branch as starting point: - - `feat/gitlab-duo-auth` - - PR [#2028](https://github.com/router-for-me/CLIProxyAPI/pull/2028) - -## Definition Of Done -- GitLab Duo models can be used via `CLIProxyAPI` from the same client surfaces that already work for `codex`. -- Upstream streaming is real passthrough or faithful chunked forwarding, not synthetic whole-response replay. -- Tool/function calling survives translation layers without dropping fields or corrupting names. -- Multi-turn and session semantics are stable across `chat/completions`, `responses`, and Claude-compatible routes. -- Model exposure stays current from GitLab metadata or gateway discovery without hardcoded stale model tables. -- `go test ./...` stays green and at least one real manual end-to-end client flow is documented. - -## Sprint 1: Contract And Gap Closure -**Goal**: Replace assumptions with a hard compatibility contract between current `codex` behavior and what GitLab Duo can actually support. - -**Demo/Validation**: -- Written matrix showing `codex` features vs current GitLab Duo behavior. -- One checked-in developer note or test fixture for real GitLab Duo payload examples. - -### Task 1.1: Freeze Codex Parity Checklist -- **Location**: [internal/runtime/executor/codex_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/codex_executor.go), [internal/runtime/executor/codex_websockets_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/codex_websockets_executor.go), [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go), [sdk/api/handlers/openai/openai_responses_websocket.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_websocket.go) -- **Description**: Produce a concrete feature matrix for `codex`: HTTP execute, SSE execute, `/v1/responses`, websocket downstream path, tool calling, request IDs, session close semantics, and model registration behavior. -- **Dependencies**: None -- **Acceptance Criteria**: - - A checklist exists in repo docs or issue notes. - - Each capability is marked `required`, `optional`, or `not possible` for GitLab Duo. -- **Validation**: - - Review against current `codex` code paths. - -### Task 1.2: Lock GitLab Duo Runtime Contract -- **Location**: [internal/auth/gitlab/gitlab.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/auth/gitlab/gitlab.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go) -- **Description**: Validate the exact upstream contract we can rely on: - - `direct_access` fields and refresh cadence - - whether AI gateway path is usable directly - - when `chat/completions` is available vs when fallback is required - - what streaming shape is returned by `code_suggestions/completions?stream=true` -- **Dependencies**: Task 1.1 -- **Acceptance Criteria**: - - GitLab transport decision is explicit: `gateway-first`, `REST-first`, or `hybrid`. - - Unknown areas are isolated behind feature flags, not spread across executor logic. -- **Validation**: - - Official docs + captured real responses from a Duo account. - -### Task 1.3: Define Client-Facing Compatibility Targets -- **Location**: [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md), [gitlab-duo-codex-parity-plan.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/gitlab-duo-codex-parity-plan.md) -- **Description**: Define exactly which external flows must work to call GitLab Duo support "like codex". -- **Dependencies**: Task 1.2 -- **Acceptance Criteria**: - - Required surfaces are listed: - - Claude-compatible route - - OpenAI `chat/completions` - - OpenAI `responses` - - optional downstream websocket path - - Non-goals are explicit if GitLab upstream cannot support them. -- **Validation**: - - Maintainer review of stated scope. - -## Sprint 2: Primary Transport Parity -**Goal**: Move GitLab Duo execution onto a transport that supports codex-like runtime behavior. - -**Demo/Validation**: -- A GitLab Duo model works over real streaming through `/v1/chat/completions`. -- No synthetic "collect full body then fake stream" path remains on the primary flow. - -### Task 2.1: Refactor GitLab Executor Into Strategy Layers -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go) -- **Description**: Split current executor into explicit strategies: - - auth refresh/direct access refresh - - gateway transport - - GitLab REST fallback transport - - downstream translation helpers -- **Dependencies**: Sprint 1 -- **Acceptance Criteria**: - - Executor no longer mixes discovery, refresh, fallback selection, and response synthesis in one path. - - Transport choice is testable in isolation. -- **Validation**: - - Unit tests for strategy selection and fallback boundaries. - -### Task 2.2: Implement Real Streaming Path -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/runtime/executor/gitlab_executor_test.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor_test.go) -- **Description**: Replace synthetic streaming with true upstream incremental forwarding: - - use gateway stream if available - - otherwise consume GitLab Code Suggestions streaming response and map chunks incrementally -- **Dependencies**: Task 2.1 -- **Acceptance Criteria**: - - `ExecuteStream` emits chunks before upstream completion. - - error handling preserves status and early failure semantics. -- **Validation**: - - tests with chunked upstream server - - manual curl check against `/v1/chat/completions` with `stream=true` - -### Task 2.3: Preserve Upstream Auth And Headers Correctly -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/auth/gitlab/gitlab.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/auth/gitlab/gitlab.go) -- **Description**: Use `direct_access` connection details as first-class transport state: - - gateway token - - expiry - - mandatory forwarded headers - - model metadata -- **Dependencies**: Task 2.1 -- **Acceptance Criteria**: - - executor stops ignoring gateway headers/token when transport requires them - - refresh logic never over-fetches `direct_access` -- **Validation**: - - tests verifying propagated headers and refresh interval behavior - -## Sprint 3: Request/Response Semantics Parity -**Goal**: Make GitLab Duo behave correctly under the same request shapes that current `codex` consumers send. - -**Demo/Validation**: -- OpenAI and Claude-compatible clients can do non-streaming and streaming conversations without losing structure. - -### Task 3.1: Normalize Multi-Turn Message Mapping -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/translator](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/translator) -- **Description**: Replace the current "flatten prompt into one instruction" behavior with stable multi-turn mapping: - - preserve system context - - preserve user/assistant ordering - - maintain bounded context truncation -- **Dependencies**: Sprint 2 -- **Acceptance Criteria**: - - multi-turn requests are not collapsed into a lossy single string unless fallback mode explicitly requires it - - truncation policy is deterministic and tested -- **Validation**: - - golden tests for request mapping - -### Task 3.2: Tool Calling Compatibility Layer -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go) -- **Description**: Decide and implement one of two paths: - - native pass-through if GitLab gateway supports tool/function structures - - strict downgrade path with explicit unsupported errors instead of silent field loss -- **Dependencies**: Task 3.1 -- **Acceptance Criteria**: - - tool-related fields are either preserved correctly or rejected explicitly - - no silent corruption of tool names, tool calls, or tool results -- **Validation**: - - table-driven tests for tool payloads - - one manual client scenario using tools - -### Task 3.3: Token Counting And Usage Reporting Fidelity -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/runtime/executor/usage_helpers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/usage_helpers.go) -- **Description**: Improve token/usage reporting so GitLab models behave like first-class providers in logs and scheduling. -- **Dependencies**: Sprint 2 -- **Acceptance Criteria**: - - `CountTokens` uses the closest supported estimation path - - usage logging distinguishes prompt vs completion when possible -- **Validation**: - - unit tests for token estimation outputs - -## Sprint 4: Responses And Session Parity -**Goal**: Reach codex-level support for OpenAI Responses clients and long-lived sessions where GitLab upstream permits it. - -**Demo/Validation**: -- `/v1/responses` works with GitLab Duo in a realistic client flow. -- If websocket parity is not possible, the code explicitly declines it and keeps HTTP paths stable. - -### Task 4.1: Make GitLab Compatible With `/v1/responses` -- **Location**: [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go) -- **Description**: Ensure GitLab transport can safely back the Responses API path, including compact responses if applicable. -- **Dependencies**: Sprint 3 -- **Acceptance Criteria**: - - GitLab Duo can be selected behind `/v1/responses` - - response IDs and follow-up semantics are defined -- **Validation**: - - handler tests analogous to codex/openai responses tests - -### Task 4.2: Evaluate Downstream Websocket Parity -- **Location**: [sdk/api/handlers/openai/openai_responses_websocket.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_websocket.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go) -- **Description**: Decide whether GitLab Duo can support downstream websocket sessions like codex: - - if yes, add session-aware execution path - - if no, mark GitLab auth as websocket-ineligible and keep HTTP routes first-class -- **Dependencies**: Task 4.1 -- **Acceptance Criteria**: - - websocket behavior is explicit, not accidental - - no route claims websocket support when the upstream cannot honor it -- **Validation**: - - websocket handler tests or explicit capability tests - -### Task 4.3: Add Session Cleanup And Failure Recovery Semantics -- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/cliproxy/auth/conductor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/cliproxy/auth/conductor.go) -- **Description**: Add codex-like session cleanup, retry boundaries, and model suspension/resume behavior for GitLab failures and quota events. -- **Dependencies**: Sprint 2 -- **Acceptance Criteria**: - - auth/model cooldown behavior is predictable on GitLab 4xx/5xx/quota responses - - executor cleans up per-session resources if any are introduced -- **Validation**: - - tests for quota and retry behavior - -## Sprint 5: Client UX, Model UX, And Manual E2E -**Goal**: Make GitLab Duo feel like a normal built-in provider to operators and downstream clients. - -**Demo/Validation**: -- A documented setup exists for "login once, point Claude Code at CLIProxyAPI, use GitLab Duo-backed model". - -### Task 5.1: Model Alias And Provider UX Cleanup -- **Location**: [sdk/cliproxy/service.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/cliproxy/service.go), [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md) -- **Description**: Normalize what users see: - - stable alias such as `gitlab-duo` - - discovered upstream model names - - optional prefix behavior - - account labels that clearly distinguish OAuth vs PAT -- **Dependencies**: Sprint 3 -- **Acceptance Criteria**: - - users can select a stable GitLab alias even when upstream model changes - - dynamic model discovery does not cause confusing model churn -- **Validation**: - - registry tests and manual `/v1/models` inspection - -### Task 5.2: Add Real End-To-End Acceptance Tests -- **Location**: [internal/runtime/executor/gitlab_executor_test.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor_test.go), [sdk/api/handlers/openai](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai) -- **Description**: Add higher-level tests covering the actual proxy surfaces: - - OpenAI `chat/completions` - - OpenAI `responses` - - Claude-compatible request path if GitLab is routed there -- **Dependencies**: Sprint 4 -- **Acceptance Criteria**: - - tests fail if streaming regresses into synthetic buffering again - - tests cover at least one tool-related request and one multi-turn request -- **Validation**: - - `go test ./...` - -### Task 5.3: Publish Operator Documentation -- **Location**: [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md) -- **Description**: Document: - - OAuth setup requirements - - PAT requirements - - current capability matrix - - known limitations if websocket/tool parity is partial -- **Dependencies**: Sprint 5.1 -- **Acceptance Criteria**: - - setup instructions are enough for a new user to reproduce the GitLab Duo flow - - limitations are explicit -- **Validation**: - - dry-run docs review from a clean environment - -## Testing Strategy -- Keep `go test ./...` green after every committable task. -- Add table-driven tests first for request mapping, refresh behavior, and dynamic model registration. -- Add transport tests with `httptest.Server` for: - - real chunked streaming - - header propagation from `direct_access` - - upstream fallback rules -- Add at least one manual acceptance checklist: - - login via OAuth - - login via PAT - - list models - - run one streaming prompt via OpenAI route - - run one prompt from the target downstream client - -## Potential Risks & Gotchas -- GitLab public docs expose `direct_access`, but do not fully document every possible AI gateway path. We should isolate any empirically discovered gateway assumptions behind one transport layer and feature flags. -- `chat/completions` availability differs by GitLab offering and version. The executor must not assume it always exists. -- Code Suggestions is completion-oriented; lossy mapping from rich chat/tool payloads will make GitLab Duo feel worse than codex unless explicitly handled. -- Synthetic streaming is not good enough for codex parity and will cause regressions in interactive clients. -- Dynamic model discovery can create unstable UX if the stable alias and discovered model IDs are not separated cleanly. -- PAT auth may validate successfully while still lacking effective Duo permissions. Error reporting must surface this explicitly. - -## Rollback Plan -- Keep the current basic GitLab executor behind a fallback mode until the new transport path is stable. -- If parity work destabilizes existing providers, revert only GitLab-specific executor changes and leave auth support intact. -- Preserve the stable `gitlab-duo` alias so rollback does not break client configuration.