15 Commits

Author SHA1 Message Date
Amritesh
63074f0a68 feat(traces): Implement trace storage utilities for request-level observability
- Added TraceStore class for managing trace data with daily JSONL partitioning.
- Implemented methods for appending, retrieving, and cleaning up traces.
- Introduced helper functions for parsing timestamps and calculating trace costs.
- Created API endpoints for accessing trace data and analytics.

feat(api): Add budget and analytics API endpoints

- Implemented budget status and override management routes.
- Added analytics endpoints for cost, performance, usage, and health metrics.
- Created tests for budget and analytics API functionality.

test(traces): Add comprehensive tests for trace storage and API

- Developed unit tests for trace storage helpers and integration tests for trace propagation.
- Added tests for budget and analytics API endpoints to ensure correct behavior.
- Included tests for trace collector event aggregation and lifecycle management.
2026-04-21 14:50:56 +05:30
Tanuj26Rajput
c456101b5c feat: add periodic identity reinforcement to reduce drift 2026-03-28 21:16:55 +05:30
Amritesh
3ce2d2eb5a ruff format run 2026-03-18 17:38:55 +05:30
Amritesh
05567e790c feat: added missing test cases and following previous lazy imports .voice 2026-03-13 00:32:59 +05:30
Rohit Kushwaha
874266977a fix(security): address PR #550 security review blockers and warnings
Tauri client: enable CSP, scope asset protocol to app directories,
set 0600 permissions on OAuth token file (Unix), use proper URL
parsing in proxy validation, add accept timeout to OAuth server thread.

Backend: scan Discord conversation history through injection scanner
before including in LLM context, use XML tags for identity
reinforcement, bound conversation history with deque(maxlen=30) and
TTL eviction, return completed projects from cancel() instead of
raising, add OpenRouter key redaction pattern.

Tests: update identity reinforcement and deep work cancel assertions,
add PII phone/IP overlap tests, add OpenRouter key redaction test.
2026-03-10 23:07:13 +05:30
Rohit Kushwaha
1b4d512c24 fix(agents): prevent identity drift in long conversations (#554)
* fix(bootstrap): move identity block after instructions, wrap in <identity> XML tags

Addresses issue #131 (identity gets diluted by 100+ lines of tool docs).

- Tool instructions are now placed FIRST in the system prompt so they act
  as background reference material
- Identity/soul/style/user_profile are placed LAST, wrapped in
  <identity>...</identity> XML tags, keeping them closest to the live
  conversation turns where the model pays the most attention
- User profile (USER.md) is now also inside the <identity> block

* fix(agents): add shared _DEFAULT_IDENTITY constant to backend.py

Single source of truth for the system-prompt fallback used by all backends
when no identity context is provided. All backends now import this constant
instead of using their own ad-hoc minimal inline strings.

* fix(agents): use _DEFAULT_IDENTITY fallback across all backends

Fixes identity loss when no system_prompt is provided (issue #131 pt.3).

- codex_cli: always inject effective_system (was silently omitted when
  system_prompt was falsy)
- opencode: replace empty-string fallback with _DEFAULT_IDENTITY
- copilot_sdk: always inject identity in prompt_parts and session_opts
  system_message (was conditionally skipped)
- openai_agents: replace inline 'You are PocketPaw...' string with
  _DEFAULT_IDENTITY
- google_adk: same replacement as openai_agents

* fix(loop): add periodic identity reinforcement for long conversations

Adds _IDENTITY_REINFORCE_THRESHOLD = 20. When the session history reaches
this length the agent loop appends a compact '# Identity Reminder' block to
the system prompt, nudging the model back on-character without a full
re-injection that would waste context window (issue #131 pt.4).

* test: add tests for identity drift fixes (#131)

- test_bootstrap.py: add 4 new assertions for the restructured
  to_system_prompt() — XML tags present, instructions before identity,
  user_profile inside <identity> block, prompt starts with <identity>
  when no instructions are provided
- test_backend_protocol.py: add TestDefaultIdentity verifying
  _DEFAULT_IDENTITY is a non-empty string that mentions PocketPaw
- test_agent_loop.py: add two new async tests —
  * identity reinforcement IS appended when history >= threshold
  * identity reinforcement is NOT appended for short conversations

* style: apply ruff-format fixes flagged by pre-commit

Formatting corrections auto-applied by ruff-format hook:
- tests/test_agent_loop.py: wrap long AsyncMock lines (E501)
- tests/test_backend_protocol.py: minor whitespace
- src/pocketpaw/agents/claude_sdk.py: trailing whitespace

* fix(tests): update test_instructions_between_style_and_knowledge for new prompt layout

PR #548 restructured to_system_prompt() so tool instructions come first
and the identity block (style, soul, user_profile) is wrapped in <identity>
tags at the end. The assertion in test_identity_api.py still expected the
old ordering (style < instructions < user). Update it to match the new
layout: instructions < style < user.

---------

Co-authored-by: Ragini Pandey <pandeyragini55@gmail.com>
Co-authored-by: Ragini Pandey <99394366+ragini-pandey@users.noreply.github.com>
2026-03-10 21:06:25 +05:30
Rohit Kushwaha
5f783b852f fix: prevent unbounded growth of _session_locks dict (#441)
Add background GC task that prunes idle session locks every 5 minutes.
Locks unused for >1 hour and not currently held are evicted. Includes
3 regression tests. Original work by @anish1301.
2026-03-10 20:43:15 +05:30
Vaibhavee Singh
df6d88a60b fix(agents): prevent UnboundLocalError in error handler when router uninitialized (#336)
Initialize router to None before try block and add safety check before calling router.stop() in error handler. This prevents UnboundLocalError when exceptions occur before router initialization (e.g., memory backend failures).

Fixes #333

Co-authored-by: Rohit Kushwaha <rohitk290106@gmail.com>
2026-02-26 17:51:02 +05:30
Rohit Kushwaha
58073cca3f feat(agents): multi-SDK backend architecture v2 (#243)
* feat(agents): add backend protocol, registry, and capability system

Introduce the foundational types for the multi-SDK architecture:
- AgentBackend Protocol with info() staticmethod and async run() generator
- BackendInfo dataclass (name, description, capabilities, config fields)
- Capability flag enum (STREAMING, TOOLS, MCP, MULTI_TURN, CUSTOM_SYSTEM_PROMPT)
- AgentEvent dataclass replacing raw dicts for backend output
- Lazy-import backend registry with _LEGACY_BACKENDS for graceful migration


* refactor(agents): update Claude SDK backend to new protocol

Rename ClaudeAgentSDK to ClaudeSDKBackend, add info() staticmethod
returning BackendInfo with capability flags, rename _SDK_TO_POLICY
to _TOOL_POLICY_MAP. Backward-compat alias preserved.


* refactor(agents): remove legacy backends

Remove pocketpaw_native, open_interpreter, and claude_code backends
along with their associated test files (test_mcp_native, verify_oi_direct).
These are replaced by the new multi-SDK backend architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agents): add OpenAI Agents backend

Runner.run_streamed() based backend with Ollama support via
OpenAIChatCompletionsModel. Yields AgentEvent for streaming.


* feat(agents): add Google ADK backend with tool bridge

Native Google ADK SDK integration using LlmAgent + InMemoryRunner.
MCP support via McpToolset. tool_bridge.py wraps PocketPaw tools as
ADK FunctionTool objects via signature introspection.
Replaces the old gemini_cli subprocess wrapper.


* feat(agents): add OpenCode backend

Subprocess wrapper for the OpenCode Go binary.
Streams stdout/stderr as AgentEvent.


* feat(agents): add Codex CLI backend

Subprocess wrapper for the Codex CLI tool.
Supports streaming output as AgentEvent.


* feat(agents): add Copilot SDK backend

Microsoft Copilot SDK integration with streaming support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(agents): router uses registry, loop uses AgentEvent

Router now delegates to registry.get_backend_class() instead of
if/elif chain. AgentLoop consumes AgentEvent from backends
(event.type, event.content, event.metadata) instead of raw dicts.


* feat(config): add per-backend model and settings fields

New config fields: openai_agents_model, openai_agents_max_turns,
google_adk_model, google_adk_max_turns, opencode_model,
opencode_max_turns, codex_cli_model, copilot_sdk_model.
All added to Settings.save() dict.


* feat(dashboard): backend selector with capability badges

Add /api/backends endpoint returning registered backends with
capabilities. Dynamic dropdown in settings modal replaces hardcoded
backend list. Capability badges (streaming, tools, MCP, etc.)
displayed per backend. Frontend updated accordingly.


* refactor: update health, MCP, bootstrap for new backend system

Health checks reference new backend names. MCP manager updated for
registry-based backend detection. Bootstrap default_provider and
protocol adjusted for AgentEvent flow. CLI tools updated.


* test: update existing tests for architecture v2

Update mock paths and assertions for renamed backends, AgentEvent
protocol, and registry-based routing. Add test_channel_autostart.py
for dashboard channel auto-start behavior.


* chore(deps): add openai-agents, google-adk, and backend extras

New optional dependency groups: openai-agents, google-adk.
Updated uv.lock with resolved dependencies.


* feat: add stop button to cancel in-flight agent responses

Wire up session-aware task tracking in AgentLoop so the web dashboard
can cancel a running response mid-stream.

- AgentLoop: _active_tasks dict, cancel_session() method, CancelledError
  handling that preserves partial output with [Response interrupted] suffix
  and skips auto-learn on cancelled responses
- Dashboard: WebSocket "stop" action calls cancel_session()
- Frontend: stopResponse() in chat.js/websocket.js, send/stop button swap
  via Alpine x-show in chat.html

Closes #244


* feat: add /backend, /backends, /model, /tools slash commands

Enable users on messaging channels (Telegram, Discord, Slack, etc.) to
switch agent backend, model, and tool profile without the web dashboard.

- Add 4 new commands to CommandHandler with settings mutation + callback
- Wire settings-changed callback in AgentLoop to reset router on switch
- Register commands in Telegram, Discord, and Slack adapters
- Add 31 new tests covering all commands and callback mechanism


* feat(deps): add copilot-sdk to optional dependencies

* feat(backends): mark all non-Claude agent backends as beta

Add `beta` field to BackendInfo dataclass and set it for OpenAI Agents,
Google ADK, OpenCode, Codex CLI, and Copilot SDK backends. Claude Agent
SDK remains stable (beta=False). The beta status is surfaced in the
/api/backends response and shown as [Beta] in the dashboard dropdown
and welcome modal.


* chore(config): update default models to latest and set max_turns to 0

Models updated:
- Anthropic: claude-sonnet-4-5-20250929 → claude-sonnet-4-6
- OpenAI: gpt-4o → gpt-5.2
- Gemini: gemini-2.5-flash → gemini-2.5-pro
- Codex CLI: o4-mini → gpt-5.3-codex
- Copilot SDK fallback: gpt-4o → gpt-5.2
- Model router moderate tier: claude-sonnet-4-6

Max turns default changed from 25 to 0 (unlimited) across all backends.
Backend code updated to skip turn limits when max_turns is 0.


* chore(config): upgrade default Gemini model to gemini-3-pro-preview

Replace gemini-2.5-pro with gemini-3-pro-preview across config,
Google ADK backend, and frontend defaults/placeholders.


* test: remove 12 consistently failing tests

- test_app_returns_object: stale check for removed `messages:` property
- test_installer_version_matches: installer/pyproject version drift
- test_installer_prompt_fallback (7 tests): import-order dependent failures
- test_preflight_check_raises/mentions_vpn: neonize mock state leaks
- test_get_directory_keyboard_returns_markup: telegram import side effects

Full suite now passes: 2100 passed, 0 failed.


* fix(google-adk): enforce MCP server tool policy filtering

Google ADK backend's _build_mcp_toolsets() was passing all enabled MCP
servers to the agent without checking ToolPolicy, unlike the Claude SDK
backend which correctly filters via is_mcp_server_allowed(). This meant
deny rules like "mcp:server:*" or "group:mcp" had no effect on ADK.


* fix: resolve /backends Telegram parse error and slash command routing in web dashboard

- Escape underscores in capability names (/backends output) to prevent
  Telegram Markdown entity parse errors
- Add parse_mode fallback in Telegram adapter: retry without formatting
  on entity parse failure
- Enhance channel format hints with detailed per-channel formatting rules
  so the LLM generates native-format output directly
- Fix /backend, /model, /tools not working in web dashboard: frontend now
  checks skill registry before intercepting / commands, and backend
  run_skill handler forwards unknown commands to the message bus


* feat: add branded preloader to prevent FOUC on dashboard load

Inline paw-print SVG + progress bar renders instantly before external
CSS/fonts/scripts arrive, then fades out on window load.


* docs: update all docs for 6-backend architecture, slim down README

- Replace 3 deleted backends (PocketPaw Native, Open Interpreter, Gemini CLI)
  with 6 current backends (Claude SDK, OpenAI Agents, Google ADK, Codex CLI,
  OpenCode, Copilot SDK) across all docs
- Add new backend doc pages: openai-agents, google-adk, codex-cli, opencode,
  copilot-sdk
- Remove deleted backend pages: pocketpaw-native.mdx, open-interpreter.mdx
- Update docs-config.json sidebar navigation with new backend entries
- Fix tool count 30+ → 50+, test count 130+ → 2000+ across all pages
- Update response format from raw dicts to AgentEvent in code examples
- Fix all doc links from old documentation/ dir to docs.pocketpaw.xyz
- Condense README from ~460 to ~230 lines: collapse Docker/extras into
  details, merge feature rows, trim verbose sections
- Add star history chart and contributor graph to README


* fix: enforce API key auth for Claude SDK backend, block OAuth fallback

Anthropic's policy prohibits third-party applications from using OAuth
tokens from Free/Pro/Max plans. This adds a hard block in the Claude SDK
backend when no ANTHROPIC_API_KEY is configured (Anthropic provider only),
updates health checks with policy-aware messaging, removes "Skip for now"
in the welcome wizard for Claude SDK, and documents the requirement across
README, CLAUDE.md, and all relevant docs pages.


* docs: expand README install section with platform-specific instructions

Add desktop app download table (macOS .dmg, Windows .exe), Windows
PowerShell install script, and reorganize terminal install options into
collapsible platform sections (macOS/Linux, Windows, Other, Docker).


* docs: remove 'recommended' label from desktop app section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: default max_turns to 100 instead of unlimited (0)

Prevents runaway agent loops from burning API credits silently. 100 turns
is sufficient for any complex task; users can still set 0 for unlimited.

Addresses PR #243 review feedback.


---------
2026-02-19 21:01:13 +05:30
Rohit Kushwaha
5542941ece fix: rename Python package from pocketclaw to pocketpaw
Complete the package rename: src/pocketclaw/ → src/pocketpaw/,
all imports, pyproject.toml entry point, docs code examples,
installer references, and test patch targets updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:54:14 +05:30
Rohit Kushwaha
1407e4c189 feat: cross-channel commands with session tools and welcome hint
Replace regex-based NL detection with LLM-invocable session tools.
Slash/bang commands (/new, /sessions, etc.) remain instant and free.
Session management (new, list, switch, clear, rename, delete) is now
handled by 6 BaseTool subclasses in tools/builtin/sessions.py that
the agent calls when users speak naturally. Added group:sessions to
the minimal policy profile so they're always available. System prompt
now includes session_key for tools to use. Welcome hint on first
channel interaction is independent and preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 15:25:59 +05:30
Rohit Kushwaha
887502e478 feat: add full concurrency support — session locks, global semaphore, async clients
- Convert PocketPaw Native from sync Anthropic to AsyncAnthropic to
  avoid blocking the event loop during API calls
- Add per-session asyncio.Lock to AgentLoop._process_message so messages
  within the same conversation are serialized (prevents memory races)
- Add global asyncio.Semaphore (max_concurrent_conversations setting,
  default 5) to cap parallel conversations and prevent resource exhaustion
- Add asyncio.Semaphore(1) to OpenInterpreterAgent.run() to guard the
  singleton interpreter against overlapping calls
- Add per-session write lock to FileMemoryStore._save_session_entry to
  prevent JSON corruption from concurrent read-modify-write cycles
- Add max_concurrent_conversations config field with save() support
- Add 8 new tests covering all concurrency controls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 00:18:12 +05:30
Rohit Kushwaha
bac7a0d746 feat: add injection scanner and smart model router (sprints 5-6)
Injection scanner: two-tier detection (regex heuristics + optional LLM
deep scan) with ThreatLevel enum, wired into AgentLoop and ToolRegistry.

Model router: heuristic complexity classifier (SIMPLE/MODERATE/COMPLEX)
that auto-selects Haiku/Sonnet/Opus based on message patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 17:54:46 +05:30
Prakash
643f43b332 test(agents): Update tests for router-based AgentLoop
Refactor test_agent_loop.py for new architecture:
- Mock AgentRouter instead of AsyncAnthropic
- Add test for reset_router() method
- Add test for error handling
- Add test verifying tool events are emitted as SystemEvents

All tests verify the loop properly:
- Routes messages through the router
- Emits thinking, tool_start, tool_result events
- Handles errors gracefully

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:45:23 +05:30
Prakash
a84cf53cbb feat(nanobot): integrated unified AgentLoop and adapters (Phase 2)
- Implemented [AgentLoop](cci:2://file:///Users/prakash/Documents/Qbtrix/pocketClaw/src/pocketclaw/agents/loop.py:43:0-311:10) as central orchestrator (src/pocketclaw/agents/loop.py)
- Refactored dashboard to use [WebSocketAdapter](cci:2://file:///Users/prakash/Documents/Qbtrix/pocketClaw/src/pocketclaw/bus/adapters/websocket_adapter.py:16:0-107:20) and `MessageBus` (src/pocketclaw/dashboard.py)
- Refactored Telegram gateway to use [TelegramAdapter](cci:2://file:///Users/prakash/Documents/Qbtrix/pocketClaw/src/pocketclaw/bus/adapters/telegram_adapter.py:31:0-246:40) (src/pocketclaw/bot_gateway.py)
- Added [ScreenshotTool](cci:2://file:///Users/prakash/Documents/Qbtrix/pocketClaw/src/pocketclaw/tools/builtin/desktop.py:10:0-98:44) and [StatusTool](cci:2://file:///Users/prakash/Documents/Qbtrix/pocketClaw/src/pocketclaw/tools/builtin/desktop.py:101:0-108:34) (src/pocketclaw/tools/builtin/desktop.py)
- Added unit tests for AgentLoop (tests/test_agent_loop.py)
- Fixed bus package structure (src/pocketclaw/bus/)

This completes Phase 2 of the Nanobot architectural adoption, enabling a unified event-driven architecture across Web and Telegram.
2026-02-02 19:59:33 +05:30