diff --git a/gitlab-duo-codex-parity-plan.md b/gitlab-duo-codex-parity-plan.md
deleted file mode 100644
index f4fc90d0..00000000
--- a/gitlab-duo-codex-parity-plan.md
+++ /dev/null
@@ -1,278 +0,0 @@
-# Plan: GitLab Duo Codex Parity
-
-**Generated**: 2026-03-10
-**Estimated Complexity**: High
-
-## Overview
-Bring GitLab Duo support from the current "auth + basic executor" stage to the same practical level as `codex` inside `CLIProxyAPI`: a user logs in once, points external clients such as Claude Code at `CLIProxyAPI`, selects GitLab Duo-backed models, and gets stable streaming, multi-turn behavior, tool calling compatibility, and predictable model routing without manual provider-specific workarounds.
-
-The core architectural shift is to stop treating GitLab Duo as only two REST wrappers (`/api/v4/chat/completions` and `/api/v4/code_suggestions/completions`) and instead use GitLab's `direct_access` contract as the primary runtime entrypoint wherever possible. Official GitLab docs confirm that `direct_access` returns AI gateway connection details, headers, token, and expiry; that contract is the closest path to codex-like provider behavior.
-
-## Prerequisites
-- Official GitLab Duo API references confirmed during implementation:
-  - `POST /api/v4/code_suggestions/direct_access`
-  - `POST /api/v4/code_suggestions/completions`
-  - `POST /api/v4/chat/completions`
-- Access to at least one real GitLab Duo account for manual verification.
-- One downstream client target for acceptance testing:
-  - Claude Code against Claude-compatible endpoint
-  - OpenAI-compatible client against `/v1/chat/completions` and `/v1/responses`
-- Existing PR branch as starting point:
-  - `feat/gitlab-duo-auth`
-  - PR [#2028](https://github.com/router-for-me/CLIProxyAPI/pull/2028)
-
-## Definition Of Done
-- GitLab Duo models can be used via `CLIProxyAPI` from the same client surfaces that already work for `codex`.
-- Upstream streaming is real passthrough or faithful chunked forwarding, not synthetic whole-response replay.
-- Tool/function calling survives translation layers without dropping fields or corrupting names.
-- Multi-turn and session semantics are stable across `chat/completions`, `responses`, and Claude-compatible routes.
-- Model exposure stays current from GitLab metadata or gateway discovery without hardcoded stale model tables.
-- `go test ./...` stays green and at least one real manual end-to-end client flow is documented.
-
-## Sprint 1: Contract And Gap Closure
-**Goal**: Replace assumptions with a hard compatibility contract between current `codex` behavior and what GitLab Duo can actually support.
-
-**Demo/Validation**:
-- Written matrix showing `codex` features vs current GitLab Duo behavior.
-- One checked-in developer note or test fixture for real GitLab Duo payload examples.
-
-### Task 1.1: Freeze Codex Parity Checklist
-- **Location**: [internal/runtime/executor/codex_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/codex_executor.go), [internal/runtime/executor/codex_websockets_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/codex_websockets_executor.go), [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go), [sdk/api/handlers/openai/openai_responses_websocket.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_websocket.go)
-- **Description**: Produce a concrete feature matrix for `codex`: HTTP execute, SSE execute, `/v1/responses`, websocket downstream path, tool calling, request IDs, session close semantics, and model registration behavior.
-- **Dependencies**: None
-- **Acceptance Criteria**:
-  - A checklist exists in repo docs or issue notes.
-  - Each capability is marked `required`, `optional`, or `not possible` for GitLab Duo.
-- **Validation**:
-  - Review against current `codex` code paths.
-
-### Task 1.2: Lock GitLab Duo Runtime Contract
-- **Location**: [internal/auth/gitlab/gitlab.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/auth/gitlab/gitlab.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go)
-- **Description**: Validate the exact upstream contract we can rely on:
-  - `direct_access` fields and refresh cadence
-  - whether AI gateway path is usable directly
-  - when `chat/completions` is available vs when fallback is required
-  - what streaming shape is returned by `code_suggestions/completions?stream=true`
-- **Dependencies**: Task 1.1
-- **Acceptance Criteria**:
-  - GitLab transport decision is explicit: `gateway-first`, `REST-first`, or `hybrid`.
-  - Unknown areas are isolated behind feature flags, not spread across executor logic.
-- **Validation**:
-  - Official docs + captured real responses from a Duo account.
-
-### Task 1.3: Define Client-Facing Compatibility Targets
-- **Location**: [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md), [gitlab-duo-codex-parity-plan.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/gitlab-duo-codex-parity-plan.md)
-- **Description**: Define exactly which external flows must work to call GitLab Duo support "like codex".
-- **Dependencies**: Task 1.2
-- **Acceptance Criteria**:
-  - Required surfaces are listed:
-    - Claude-compatible route
-    - OpenAI `chat/completions`
-    - OpenAI `responses`
-    - optional downstream websocket path
-  - Non-goals are explicit if GitLab upstream cannot support them.
-- **Validation**:
-  - Maintainer review of stated scope.
-
-## Sprint 2: Primary Transport Parity
-**Goal**: Move GitLab Duo execution onto a transport that supports codex-like runtime behavior.
-
-**Demo/Validation**:
-- A GitLab Duo model works over real streaming through `/v1/chat/completions`.
-- No synthetic "collect full body then fake stream" path remains on the primary flow.
-
-### Task 2.1: Refactor GitLab Executor Into Strategy Layers
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go)
-- **Description**: Split current executor into explicit strategies:
-  - auth refresh/direct access refresh
-  - gateway transport
-  - GitLab REST fallback transport
-  - downstream translation helpers
-- **Dependencies**: Sprint 1
-- **Acceptance Criteria**:
-  - Executor no longer mixes discovery, refresh, fallback selection, and response synthesis in one path.
-  - Transport choice is testable in isolation.
-- **Validation**:
-  - Unit tests for strategy selection and fallback boundaries.
-
-### Task 2.2: Implement Real Streaming Path
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/runtime/executor/gitlab_executor_test.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor_test.go)
-- **Description**: Replace synthetic streaming with true upstream incremental forwarding:
-  - use gateway stream if available
-  - otherwise consume GitLab Code Suggestions streaming response and map chunks incrementally
-- **Dependencies**: Task 2.1
-- **Acceptance Criteria**:
-  - `ExecuteStream` emits chunks before upstream completion.
-  - error handling preserves status and early failure semantics.
-- **Validation**:
-  - tests with chunked upstream server
-  - manual curl check against `/v1/chat/completions` with `stream=true`
-
-### Task 2.3: Preserve Upstream Auth And Headers Correctly
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/auth/gitlab/gitlab.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/auth/gitlab/gitlab.go)
-- **Description**: Use `direct_access` connection details as first-class transport state:
-  - gateway token
-  - expiry
-  - mandatory forwarded headers
-  - model metadata
-- **Dependencies**: Task 2.1
-- **Acceptance Criteria**:
-  - executor stops ignoring gateway headers/token when transport requires them
-  - refresh logic never over-fetches `direct_access`
-- **Validation**:
-  - tests verifying propagated headers and refresh interval behavior
-
-## Sprint 3: Request/Response Semantics Parity
-**Goal**: Make GitLab Duo behave correctly under the same request shapes that current `codex` consumers send.
-
-**Demo/Validation**:
-- OpenAI and Claude-compatible clients can do non-streaming and streaming conversations without losing structure.
-
-### Task 3.1: Normalize Multi-Turn Message Mapping
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/translator](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/translator)
-- **Description**: Replace the current "flatten prompt into one instruction" behavior with stable multi-turn mapping:
-  - preserve system context
-  - preserve user/assistant ordering
-  - maintain bounded context truncation
-- **Dependencies**: Sprint 2
-- **Acceptance Criteria**:
-  - multi-turn requests are not collapsed into a lossy single string unless fallback mode explicitly requires it
-  - truncation policy is deterministic and tested
-- **Validation**:
-  - golden tests for request mapping
-
-### Task 3.2: Tool Calling Compatibility Layer
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go)
-- **Description**: Decide and implement one of two paths:
-  - native pass-through if GitLab gateway supports tool/function structures
-  - strict downgrade path with explicit unsupported errors instead of silent field loss
-- **Dependencies**: Task 3.1
-- **Acceptance Criteria**:
-  - tool-related fields are either preserved correctly or rejected explicitly
-  - no silent corruption of tool names, tool calls, or tool results
-- **Validation**:
-  - table-driven tests for tool payloads
-  - one manual client scenario using tools
-
-### Task 3.3: Token Counting And Usage Reporting Fidelity
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [internal/runtime/executor/usage_helpers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/usage_helpers.go)
-- **Description**: Improve token/usage reporting so GitLab models behave like first-class providers in logs and scheduling.
-- **Dependencies**: Sprint 2
-- **Acceptance Criteria**:
-  - `CountTokens` uses the closest supported estimation path
-  - usage logging distinguishes prompt vs completion when possible
-- **Validation**:
-  - unit tests for token estimation outputs
-
-## Sprint 4: Responses And Session Parity
-**Goal**: Reach codex-level support for OpenAI Responses clients and long-lived sessions where GitLab upstream permits it.
-
-**Demo/Validation**:
-- `/v1/responses` works with GitLab Duo in a realistic client flow.
-- If websocket parity is not possible, the code explicitly declines it and keeps HTTP paths stable.
-
-### Task 4.1: Make GitLab Compatible With `/v1/responses`
-- **Location**: [sdk/api/handlers/openai/openai_responses_handlers.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_handlers.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go)
-- **Description**: Ensure GitLab transport can safely back the Responses API path, including compact responses if applicable.
-- **Dependencies**: Sprint 3
-- **Acceptance Criteria**:
-  - GitLab Duo can be selected behind `/v1/responses`
-  - response IDs and follow-up semantics are defined
-- **Validation**:
-  - handler tests analogous to codex/openai responses tests
-
-### Task 4.2: Evaluate Downstream Websocket Parity
-- **Location**: [sdk/api/handlers/openai/openai_responses_websocket.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai/openai_responses_websocket.go), [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go)
-- **Description**: Decide whether GitLab Duo can support downstream websocket sessions like codex:
-  - if yes, add session-aware execution path
-  - if no, mark GitLab auth as websocket-ineligible and keep HTTP routes first-class
-- **Dependencies**: Task 4.1
-- **Acceptance Criteria**:
-  - websocket behavior is explicit, not accidental
-  - no route claims websocket support when the upstream cannot honor it
-- **Validation**:
-  - websocket handler tests or explicit capability tests
-
-### Task 4.3: Add Session Cleanup And Failure Recovery Semantics
-- **Location**: [internal/runtime/executor/gitlab_executor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor.go), [sdk/cliproxy/auth/conductor.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/cliproxy/auth/conductor.go)
-- **Description**: Add codex-like session cleanup, retry boundaries, and model suspension/resume behavior for GitLab failures and quota events.
-- **Dependencies**: Sprint 2
-- **Acceptance Criteria**:
-  - auth/model cooldown behavior is predictable on GitLab 4xx/5xx/quota responses
-  - executor cleans up per-session resources if any are introduced
-- **Validation**:
-  - tests for quota and retry behavior
-
-## Sprint 5: Client UX, Model UX, And Manual E2E
-**Goal**: Make GitLab Duo feel like a normal built-in provider to operators and downstream clients.
-
-**Demo/Validation**:
-- A documented setup exists for "login once, point Claude Code at CLIProxyAPI, use GitLab Duo-backed model".
-
-### Task 5.1: Model Alias And Provider UX Cleanup
-- **Location**: [sdk/cliproxy/service.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/cliproxy/service.go), [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md)
-- **Description**: Normalize what users see:
-  - stable alias such as `gitlab-duo`
-  - discovered upstream model names
-  - optional prefix behavior
-  - account labels that clearly distinguish OAuth vs PAT
-- **Dependencies**: Sprint 3
-- **Acceptance Criteria**:
-  - users can select a stable GitLab alias even when upstream model changes
-  - dynamic model discovery does not cause confusing model churn
-- **Validation**:
-  - registry tests and manual `/v1/models` inspection
-
-### Task 5.2: Add Real End-To-End Acceptance Tests
-- **Location**: [internal/runtime/executor/gitlab_executor_test.go](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/internal/runtime/executor/gitlab_executor_test.go), [sdk/api/handlers/openai](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/sdk/api/handlers/openai)
-- **Description**: Add higher-level tests covering the actual proxy surfaces:
-  - OpenAI `chat/completions`
-  - OpenAI `responses`
-  - Claude-compatible request path if GitLab is routed there
-- **Dependencies**: Sprint 4
-- **Acceptance Criteria**:
-  - tests fail if streaming regresses into synthetic buffering again
-  - tests cover at least one tool-related request and one multi-turn request
-- **Validation**:
-  - `go test ./...`
-
-### Task 5.3: Publish Operator Documentation
-- **Location**: [README.md](/home/luxvtz/projects/cliproxyapi/CLIProxyAPI/README.md)
-- **Description**: Document:
-  - OAuth setup requirements
-  - PAT requirements
-  - current capability matrix
-  - known limitations if websocket/tool parity is partial
-- **Dependencies**: Sprint 5.1
-- **Acceptance Criteria**:
-  - setup instructions are enough for a new user to reproduce the GitLab Duo flow
-  - limitations are explicit
-- **Validation**:
-  - dry-run docs review from a clean environment
-
-## Testing Strategy
-- Keep `go test ./...` green after every committable task.
-- Add table-driven tests first for request mapping, refresh behavior, and dynamic model registration.
-- Add transport tests with `httptest.Server` for:
-  - real chunked streaming
-  - header propagation from `direct_access`
-  - upstream fallback rules
-- Add at least one manual acceptance checklist:
-  - login via OAuth
-  - login via PAT
-  - list models
-  - run one streaming prompt via OpenAI route
-  - run one prompt from the target downstream client
-
-## Potential Risks & Gotchas
-- GitLab public docs expose `direct_access`, but do not fully document every possible AI gateway path. We should isolate any empirically discovered gateway assumptions behind one transport layer and feature flags.
-- `chat/completions` availability differs by GitLab offering and version. The executor must not assume it always exists.
-- Code Suggestions is completion-oriented; lossy mapping from rich chat/tool payloads will make GitLab Duo feel worse than codex unless explicitly handled.
-- Synthetic streaming is not good enough for codex parity and will cause regressions in interactive clients.
-- Dynamic model discovery can create unstable UX if the stable alias and discovered model IDs are not separated cleanly.
-- PAT auth may validate successfully while still lacking effective Duo permissions. Error reporting must surface this explicitly.
-
-## Rollback Plan
-- Keep the current basic GitLab executor behind a fallback mode until the new transport path is stable.
-- If parity work destabilizes existing providers, revert only GitLab-specific executor changes and leave auth support intact.
-- Preserve the stable `gitlab-duo` alias so rollback does not break client configuration.