diff --git a/CHANGELOG.md b/CHANGELOG.md index 1301228fbda..1290abde641 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -81,6 +81,7 @@ Docs: https://docs.openclaw.ai ### Changes +- Docs: add a dedicated ds4 provider page with local DeepSeek V4 Flash config, on-demand startup, context sizing, and live verification steps. - Maintainers: add a Clawdtributor skill for Discrawl-backed contributor PR triage, live status checks, and compact review formatting. - Telegram: support Mini App `web_app` buttons in generic message presentation payloads, allowing `openclaw message send --presentation` to render Telegram Web App inline buttons for private chats. (#81356) Thanks @jzakirov. - Scripts: add `OPENCLAW_HEAVY_CHECK_LOCK_SCOPE=worktree` so high-capacity local worktrees can use independent heavy-check locks while shared locks remain the default. Fixes #80729. (#80734) Thanks @samzong. diff --git a/docs/.i18n/glossary.zh-CN.json b/docs/.i18n/glossary.zh-CN.json index 5a70a9b936a..15686969e76 100644 --- a/docs/.i18n/glossary.zh-CN.json +++ b/docs/.i18n/glossary.zh-CN.json @@ -950,5 +950,9 @@ { "source": "ACP agents setup", "target": "ACP Agents 设置" + }, + { + "source": "ds4 (local DeepSeek V4)", + "target": "ds4(本地 DeepSeek V4)" } ] diff --git a/docs/docs.json b/docs/docs.json index e321927cfb1..46ee9c05d39 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -1368,6 +1368,7 @@ "providers/deepgram", "providers/deepinfra", "providers/deepseek", + "providers/ds4", "providers/elevenlabs", "providers/fal", "providers/fireworks", diff --git a/docs/gateway/local-model-services.md b/docs/gateway/local-model-services.md index 0f746026037..2c103f14ef0 100644 --- a/docs/gateway/local-model-services.md +++ b/docs/gateway/local-model-services.md @@ -142,6 +142,9 @@ OpenClaw. ## ds4 example +For the full setup, context sizing guidance, and verification commands, see +[ds4](/providers/ds4). + ```json5 { models: { @@ -152,18 +155,20 @@ OpenClaw. api: "openai-completions", timeoutSeconds: 300, localService: { - command: "/Users/you/Projects/oss/ds4/ds4-server", + command: "/ds4-server", args: [ "--model", - "/Users/you/Projects/oss/ds4/ds4flash.gguf", + "/ds4flash.gguf", "--host", "127.0.0.1", "--port", "18000", "--ctx", - "393216", + "32768", + "--tokens", + "128", ], - cwd: "/Users/you/Projects/oss/ds4", + cwd: "", healthUrl: "http://127.0.0.1:18000/v1/models", readyTimeoutMs: 300000, idleStopMs: 0, diff --git a/docs/gateway/local-models.md b/docs/gateway/local-models.md index e8990d5621e..ee41ccf2d35 100644 --- a/docs/gateway/local-models.md +++ b/docs/gateway/local-models.md @@ -20,10 +20,11 @@ Aim high: **≥2 maxed-out Mac Studios or an equivalent GPU rig (~$30k+)** for a | Backend | Use when | | ---------------------------------------------------- | --------------------------------------------------------------------------- | +| [ds4](/providers/ds4) | Local DeepSeek V4 Flash on macOS Metal with OpenAI-compatible tool calls | | [LM Studio](/providers/lmstudio) | First-time local setup, GUI loader, native Responses API | -| [Ollama](/providers/ollama) | CLI workflow, model library, hands-off systemd service | -| MLX / vLLM / SGLang | High-throughput self-hosted serving with an OpenAI-compatible HTTP endpoint | | LiteLLM / OAI-proxy / custom OpenAI-compatible proxy | You front another model API and need OpenClaw to treat it as OpenAI | +| MLX / vLLM / SGLang | High-throughput self-hosted serving with an OpenAI-compatible HTTP endpoint | +| [Ollama](/providers/ollama) | CLI workflow, model library, hands-off systemd service | Use Responses API (`api: "openai-responses"`) when the backend supports it (LM Studio does). Otherwise stick to Chat Completions (`api: "openai-completions"`). diff --git a/docs/providers/ds4.md b/docs/providers/ds4.md new file mode 100644 index 00000000000..0f20eb09660 --- /dev/null +++ b/docs/providers/ds4.md @@ -0,0 +1,309 @@ +--- +summary: "Run OpenClaw through ds4, a local DeepSeek V4 Flash OpenAI-compatible server" +read_when: + - You want to run OpenClaw against antirez/ds4 + - You want a local DeepSeek V4 Flash backend with tool calls + - You need the OpenClaw config for ds4-server +title: "ds4" +--- + +[ds4](https://github.com/antirez/ds4) serves DeepSeek V4 Flash from a local +Metal backend with an OpenAI-compatible `/v1` API. OpenClaw connects to ds4 +through the generic `openai-completions` provider family. + +ds4 is not a bundled OpenClaw provider plugin. Configure it under +`models.providers.ds4`, then select `ds4/deepseek-v4-flash`. + +- Provider id: `ds4` +- Plugin: none +- API: OpenAI-compatible Chat Completions (`openai-completions`) +- Suggested base URL: `http://127.0.0.1:18000/v1` +- Model id: `deepseek-v4-flash` +- Tool calls: supported through OpenAI-style `tools` and `tool_calls` +- Reasoning: DeepSeek-style `thinking` and `reasoning_effort` + +## Requirements + +- macOS with Metal support. +- A working ds4 checkout with `ds4-server` and the DeepSeek V4 Flash GGUF file. +- Enough memory for the context you choose. Larger `--ctx` values allocate more + KV memory when the server starts. + + +OpenClaw agent turns include tool schemas and workspace context. A tiny context +such as `--ctx 4096` can pass direct curl tests but fail full agent runs with +`500 prompt exceeds context`. Use at least `--ctx 32768` for agent and tool +smoke tests. Use `--ctx 393216` only when you have enough memory and want ds4 +Think Max behavior. + + +## Quickstart + + + + Replace `` with your ds4 checkout path. + + ```bash + /ds4-server \ + --model /ds4flash.gguf \ + --host 127.0.0.1 \ + --port 18000 \ + --ctx 32768 \ + --tokens 128 + ``` + + + + ```bash + curl http://127.0.0.1:18000/v1/models + ``` + + The response should include `deepseek-v4-flash`. + + + + Add the config from [Full config](#full-config), then run a one-shot model + check: + + ```bash + openclaw infer model run \ + --local \ + --model ds4/deepseek-v4-flash \ + --thinking off \ + --prompt "Reply with exactly: openclaw-ds4-ok" \ + --json + ``` + + + + +## Full config + +Use this config when ds4 is already running on `127.0.0.1:18000`. + +```json5 +{ + agents: { + defaults: { + model: { primary: "ds4/deepseek-v4-flash" }, + models: { + "ds4/deepseek-v4-flash": { + alias: "DS4 local", + }, + }, + }, + }, + models: { + mode: "merge", + providers: { + ds4: { + baseUrl: "http://127.0.0.1:18000/v1", + apiKey: "ds4-local", + api: "openai-completions", + timeoutSeconds: 300, + models: [ + { + id: "deepseek-v4-flash", + name: "DeepSeek V4 Flash (ds4)", + reasoning: true, + input: ["text"], + cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, + contextWindow: 32768, + maxTokens: 128, + compat: { + supportsUsageInStreaming: true, + supportsReasoningEffort: true, + maxTokensField: "max_tokens", + supportsStrictMode: false, + thinkingFormat: "deepseek", + supportedReasoningEfforts: ["low", "medium", "high", "xhigh"], + }, + }, + ], + }, + }, + }, +} +``` + +Keep `contextWindow` aligned with the `ds4-server --ctx` value. Keep `maxTokens` +aligned with `--tokens` unless you intentionally want OpenClaw to request less +output than the server default. + +## On-demand startup + +OpenClaw can start ds4 only when a `ds4/...` model is selected. Add +`localService` to the same provider entry: + +```json5 +{ + models: { + providers: { + ds4: { + baseUrl: "http://127.0.0.1:18000/v1", + apiKey: "ds4-local", + api: "openai-completions", + timeoutSeconds: 300, + localService: { + command: "/ds4-server", + args: [ + "--model", + "/ds4flash.gguf", + "--host", + "127.0.0.1", + "--port", + "18000", + "--ctx", + "32768", + "--tokens", + "128", + ], + cwd: "", + healthUrl: "http://127.0.0.1:18000/v1/models", + readyTimeoutMs: 300000, + idleStopMs: 0, + }, + models: [ + { + id: "deepseek-v4-flash", + name: "DeepSeek V4 Flash (ds4)", + reasoning: true, + input: ["text"], + cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, + contextWindow: 32768, + maxTokens: 128, + compat: { + supportsUsageInStreaming: true, + supportsReasoningEffort: true, + maxTokensField: "max_tokens", + supportsStrictMode: false, + thinkingFormat: "deepseek", + supportedReasoningEfforts: ["low", "medium", "high", "xhigh"], + }, + }, + ], + }, + }, + }, +} +``` + +`command` must be an absolute executable path. Shell lookup and `~` expansion are +not used. See [Local model services](/gateway/local-model-services) for every +`localService` field. + +## Think Max + +ds4 applies Think Max only when both conditions are true: + +- `ds4-server` starts with `--ctx 393216` or higher. +- The request uses `reasoning_effort: "max"` or the equivalent ds4 effort field. + +If you run that large context, update both the server flags and OpenClaw model +metadata: + +```json5 +{ + contextWindow: 393216, + maxTokens: 384000, + compat: { + supportsUsageInStreaming: true, + supportsReasoningEffort: true, + maxTokensField: "max_tokens", + supportsStrictMode: false, + thinkingFormat: "deepseek", + supportedReasoningEfforts: ["low", "medium", "high", "xhigh", "max"], + }, +} +``` + +## Test + +Start with a direct HTTP check: + +```bash +curl http://127.0.0.1:18000/v1/chat/completions \ + -H 'content-type: application/json' \ + -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: ds4-ok"}],"max_tokens":16,"stream":false,"thinking":{"type":"disabled"}}' +``` + +Then test OpenClaw model routing: + +```bash +openclaw infer model run \ + --local \ + --model ds4/deepseek-v4-flash \ + --thinking off \ + --prompt "Reply with exactly: openclaw-ds4-ok" \ + --json +``` + +For a full agent and tool-call smoke, use a context of at least 32768: + +```bash +openclaw agent \ + --local \ + --session-id ds4-tool-smoke \ + --model ds4/deepseek-v4-flash \ + --thinking off \ + --message "Use the shell command pwd once, then reply exactly: tool-ok " \ + --json \ + --timeout 240 +``` + +Expected result: + +- `executionTrace.winnerProvider` is `ds4` +- `executionTrace.winnerModel` is `deepseek-v4-flash` +- `toolSummary.calls` is at least `1` +- `finalAssistantVisibleText` starts with `tool-ok` + +## Troubleshooting + + + + ds4 is not running or not bound to the host and port in `baseUrl`. Start + `ds4-server`, then retry: + + ```bash + curl http://127.0.0.1:18000/v1/models + ``` + + + + + The configured `--ctx` is too small for the OpenClaw turn. Raise + `ds4-server --ctx`, then update `models.providers.ds4.models[].contextWindow` + to match. Full agent turns with tools need substantially more context than a + direct one-message curl request. + + + + ds4 only uses Think Max when `--ctx` is at least `393216` and the request + asks for `reasoning_effort: "max"`. Smaller contexts fall back to high + reasoning. + + + + ds4 has a cold Metal residency and model warmup phase. Use + `localService.readyTimeoutMs: 300000` when OpenClaw starts the server on + demand. + + + +## Related + + + + Start local model servers on demand before model requests. + + + Choose and operate local model backends. + + + Configure provider refs, auth, and failover. + + + Native DeepSeek provider behavior and thinking controls. + + diff --git a/docs/providers/index.md b/docs/providers/index.md index 72d7f599eac..e37bd8c4db9 100644 --- a/docs/providers/index.md +++ b/docs/providers/index.md @@ -36,6 +36,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi - [Cloudflare AI Gateway](/providers/cloudflare-ai-gateway) - [ComfyUI](/providers/comfy) - [DeepSeek](/providers/deepseek) +- [ds4 (local DeepSeek V4)](/providers/ds4) - [ElevenLabs](/providers/elevenlabs) - [fal](/providers/fal) - [Fireworks](/providers/fireworks)