Current AgentLoop unconditionally observes into the process-global
default PocketPaw soul, so cloud chats with specific agents evolve
the wrong soul. Spec now routes soul observe + self-eval to the
target agent's SoulManager via AgentPool and gates the global
observe behind a per-run flag (default off in OSS).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Separate /cloud/chat/{scope}/{scope_id}/agent SSE endpoint with
scope-aware context (dm/group/pocket), pocket-scoped tools, ripple
pass-through, and WS broadcast of finished messages. Shares the
AgentLoop engine with OSS; /api/v1/chat.py stays untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as the previous commit — patching pocketpaw.security.guardian.get_settings raises AttributeError because get_settings is imported lazily inside __init__ (circular-import avoidance). Patch pocketpaw.config.get_settings instead. Updates all three occurrences in this file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Patching pocketpaw.security.guardian.get_settings fails because get_settings is imported lazily inside GuardianAgent.__init__ to avoid a circular import (config → security.url_validators → security/__init__ → guardian). Patch pocketpaw.config.get_settings (the real source) instead.
Also add tests/test_logging_scrub.py to the secrets-scan exclude list alongside test_redact.py and test_pii.py — the xoxb- string is a required scrubber-test fixture, not a real credential.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(budget): improve budget enforcement logic and remove blocking behavior in record method
feat(alerts): ensure internal flags are not exposed in API responses
feat(traces): add session ID validation to prevent path traversal attacks
test: add tests for analytics gap fixes and ensure proper trace cleanup on cancellation
Code review caught that the shallow-copy fallback in _safe_publish would
raise TypeError when deepcopy failed AND msg.metadata/msg.media was None
(callers can pass None explicitly even though the dataclass default is an
empty container). TypeError would be swallowed by the outer gather's
return_exceptions=True, silently dropping delivery with only a log entry.
Use falsy-safe fallbacks so None/empty both resolve to fresh containers.
Same Py 3.12 fix pattern as test_api_chat.py — get_event_loop() no longer
auto-creates a loop in the main thread, causing RuntimeErrors and downstream
pymongo teardown errors when these tests run in the full suite. new_event_loop()
works consistently across Py 3.11/3.12/3.13.
Fixes 5 test_deep_work_v2 failures and 13 test_audit errors exposed when
running without -x.
The module tests fail-closed scope enforcement but was missing the
module-level `pytestmark = pytest.mark.enforce_scope`, so the root
conftest's _TESTING_FULL_ACCESS bypass was turned on and all scopeless
API key requests went through, returning 200 instead of the expected
403. Matches the pattern used by test_require_scope_enforcement.py.
Pre-existing dev failure; folding the fix into this branch since the
salvage PR needs green CI.
8 files are pre-existing format drift on dev (ee/widget/*, security/url_validators.py, tests); 1 file (bus/queue.py) is new code from the #732 salvage. Applying ruff format so 'uv run ruff format --check .' passes in CI.
These failures already exist on `dev` HEAD — rolling the fixes into this
branch so the salvage PR can go green instead of waiting on a separate
cleanup PR.
- tests/test_api_chat.py — replace `asyncio.get_event_loop()` with
`asyncio.new_event_loop()`. Python 3.12+ removed the implicit
loop-creation behavior in the main thread, breaking all three
`test_stream_*` cases with RuntimeError.
- ee/widget/projection.py — drop unused `field` from `dataclasses`
import (F401).
- tests/test_dos_hardening.py, tests/test_logging_scrub.py — drop
unused `pytest` imports (F401).
- src/pocketpaw/dashboard.py — remove two duplicated
`lifespan`/`startup_event`/`shutdown_event` definitions (F811).
The upper pair was shadowed by the lower redefinitions; the
post-CORS duplicate was fully dead.
- src/pocketpaw/api/v1/auth.py — move `http_utils` import above
`router = APIRouter(...)` to satisfy E402.
The backend collected `token_usage` events but only used them to update
the usage tracker — callers never saw them. Each of the three response
paths (chat REST, SSE bridge, WebSocket `stream_end`) returned no usage
data, making it impossible for clients to display per-message cost/token
counts without polling `/api/usage`.
- `AgentLoop._process_message_inner` now captures the most recent
`token_usage` payload into `last_usage` and attaches it to
`metadata_out["usage"]` on the final `stream_end` OutboundMessage.
- `WebSocketAdapter._send_to_socket` forwards `metadata["usage"]` on
the `stream_end` frame.
Rewritten from the original PR, which had two blocking bugs:
- `last_usage` was referenced before initialization (NameError on
backends that don't emit token_usage).
- `websocket_adapter.py` left stale `send_json` lines after the early
return, producing dead code.
Drops the original PR's bundled changes:
- `pocketpaw_usage_repro.py` debug scripts (shouldn't ship).
- `tool_bridge.py` empty-properties fix — already in dev.
- `gmail.py` `"required": []` addition — immediately stripped by the
tool_bridge sanitizer so it's self-defeating.
Co-Authored-By: umarkkhann3 <267705286+umarkkhann3@users.noreply.github.com>
Replace `'unsafe-inline'` in the dashboard's `script-src` with a
per-request nonce generated via `secrets.token_urlsafe(16)` (128 bits).
The nonce is set on `request.state.csp_nonce` and consumed by the
handful of legitimate inline `<script>` blocks in `base.html` via
Jinja template interpolation. External scripts (`<script src="...">`)
are still allowed by the CDN entries already in `script-src`.
`'unsafe-eval'` remains — Alpine.js evaluates attribute directives at
runtime, which requires it. Known trade-off, not a regression.
Also:
- Convert remaining `onclick=` handlers in `plan_approval.html` to
Alpine.js `@click` so CSP no longer blocks them (inline event
attributes are not covered by nonces or `unsafe-eval`).
- Move `import secrets` to the top of `dashboard.py` (was shadowed
import inside the middleware body).
- Drops the unrelated `tests/test_a2a_server.py` change from the
original PR — separate concern.
Co-Authored-By: anish1301 <145433865+anish1301@users.noreply.github.com>
**Trust level corrections**
- `pip_install` (elevated → high): installing arbitrary packages deserves
a WARNING-level audit entry, not INFO.
- `python_exec` (elevated → critical): running arbitrary Python is a
CRITICAL-severity action per the registry's severity mapping.
`elevated` silently mapped to INFO, understating blast radius in the
audit log.
**PII scanner additions**
- Space-separated SSN (`123 45 6789`).
- Contextual bare 9-digit SSN (`ssn: 123456789`).
- Contextual passport numbers (`passport #ABC12345`).
- IBAN — rewritten from the original PR to require the `iban` keyword
and 15+ char total length. The original `\b[A-Z]{2}\d{2}[A-Z0-9]{4,30}\b`
pattern matched arbitrary uppercase strings (ARNs, UUIDs, etc) and
generated too many false positives.
Drops the original PR's `fetch.py` change — `get_directory_keyboard` no
longer exists in dev (the InlineKeyboardMarkup-None guard is obsolete).
Co-Authored-By: Dhruv18052003-web <177319013+Dhruv18052003-web@users.noreply.github.com>
Telegram is the legacy pairing-only mode; combining it with
--discord/--slack/--whatsapp/etc produced confusing startup state where
the dashboard never came up but the extra channels silently ran. Fail
fast via argparse error so the user gets a clear message.
Uses `getattr(args, flag, False)` so unregistered flags don't crash.
Leaves `_check_extras_installed` and the rest of the main() control
flow untouched (earlier iterations of the PR moved it too early and
broke `doctor`/`status`).
Reduces the original test to the conflict cases only — asserting on the
non-conflict path would require mocking the full startup pipeline.
Co-Authored-By: Aravindavenge <119057955+Aravindavenge@users.noreply.github.com>
When multiple subscribers are registered on the same channel and the
first mutates `msg.metadata` or an item in `msg.media`, later subscribers
in the same `publish_outbound` call received the mutated state. During
streaming responses (token-by-token) this could corrupt metadata seen by
downstream adapters — e.g. WebSocket and Telegram on the same channel.
Deep-copy metadata + media per subscriber so each gets an isolated view.
Skip the deepcopy when both containers are empty (streaming chunks) to
keep the hot path cheap. Fall back to a shallow copy if deepcopy raises
(e.g. unpickleable objects) — still better than sharing references.
Errors from subscriber callbacks are captured via `return_exceptions=True`
in `gather` rather than re-raised, so one failing subscriber (e.g.
disconnected WebSocket) cannot kill delivery to the rest.
Also:
- `broadcast_outbound` now delegates to `publish_outbound` per channel,
inheriting the same isolation.
- Added regression test `test_outbound_message_isolation`.
- Fix F401/F821 lint nits (unused imports, missing asyncio import).
Co-Authored-By: Vansh0204 <183680538+Vansh0204@users.noreply.github.com>
Both fields carry authentication material but were missing from
SECRET_FIELDS, so the `config show` / dashboard config panels could
surface them unredacted and they were not moved to encrypted storage
by the credential migration path.
Scopes the original PR back to just the credential-leak fix. The other
bundled changes (bug-report template deletion, events timezone fix,
Guardian model pinning, AgentLoop settings refactor, etc.) are dropped
per review feedback — each deserves its own PR.
Closes#765.
Co-Authored-By: aboutttmalay <138196355+aboutttmalay@users.noreply.github.com>
The Activity UI dropped unknown `type` values on the floor instead of
showing them, making debugging of agent lifecycle events
(`agent_start`/`agent_end`) effectively impossible.
Render a fallback entry for unrecognized event types with the raw type
label + content so operators can see the full event stream rather than
a filtered subset.
Drops the unrelated `system_manual.md` addition from the original PR
per review feedback — project docs should land in a separate PR.
Co-Authored-By: kaustubh-d-IITR <215634129+kaustubh-d-IITR@users.noreply.github.com>
On Windows filesystems the mtime resolution can be coarse enough that a
file edited in-place (with the same size... or different) keeps the same
mtime across edits, causing the identity cache to serve stale content.
Include file size in the cache key so any size change invalidates the
cache even when mtime is unchanged. Keeps the fast path (single stat
call) and the original mtime check.
Fixes ruff E302 (two blank lines before defs) flagged in review.
Co-Authored-By: Ayush-yadav7890 <205494381+Ayush-yadav7890@users.noreply.github.com>
The file-viewer highlight.js assets were loaded from cdnjs.cloudflare.com,
which is not in the dashboard's CSP. The result: highlight.js silently
failed to load and syntax highlighting never ran.
Move to cdn.jsdelivr.net (already allowed in script-src/style-src) —
the npm path for styles and the upstream cdn-release repo for the
minified runtime. Both URLs verified reachable.
Drops the unrelated test_neonize_adapter.py AsyncMock tweak from the
original PR (separate follow-up if still needed).
Co-Authored-By: PrinceSharma402 <202914754+PrinceSharma402@users.noreply.github.com>
Replaces `manager._store._notifications.values()` reach-through in the
`/api/mission-control/notifications` endpoint with a public
`get_all_notifications()` method on `MissionControlManager` and the underlying
store, matching the existing `get_notifications_for_agent()` pattern.
Co-Authored-By: Karan20P <42742074+Karan20P@users.noreply.github.com>
Comprehensive implementation of analytics, tracing, and budget enforcement layers:
- Budget Enforcement: Added global and per-agent monthly caps in config.py. Implemented
point-of-call enforcement in UsageTracker.record() with fail-safe logic for unknown
models. Added preflight blocking in AgentLoop and support for temporary dashboard overrides.
- Request Tracing: Implemented trace_id propagation across the bus. Added TraceCollector
to assemble and persist request-level telemetry (tool calls, LLM costs, latency) to
rotated JSONL storage.
- Analytics API: Created async, non-blocking endpoints for cost (including pro-rata
tool attribution), performance, usage, and health. Integrated audit-log based
guardian block rates and ChannelHealthStore uptime metrics.
- Alerting & Monitoring: Implemented AlertManager for periodic threshold checks
(budget, error spikes, tool degradation) and wired it into the app lifecycle.
- Dashboard UI: Developed a multi-tab Analytics modal (Overview, Budget, Alerts)
with real-time unread badges, budget bars, cost-by-tool tables, and health timelines.
- Testing: Added 16+ targeted tests covering budget edge cases, alert flag consistency,
and audit log parsing.
- Added TraceStore class for managing trace data with daily JSONL partitioning.
- Implemented methods for appending, retrieving, and cleaning up traces.
- Introduced helper functions for parsing timestamps and calculating trace costs.
- Created API endpoints for accessing trace data and analytics.
feat(api): Add budget and analytics API endpoints
- Implemented budget status and override management routes.
- Added analytics endpoints for cost, performance, usage, and health metrics.
- Created tests for budget and analytics API functionality.
test(traces): Add comprehensive tests for trace storage and API
- Developed unit tests for trace storage helpers and integration tests for trace propagation.
- Added tests for budget and analytics API endpoints to ensure correct behavior.
- Included tests for trace collector event aggregation and lifecycle management.
Closes#703. Seven Settings URL fields were bare str with no validation:
opencode_base_url, litellm_api_base, openai_compatible_base_url,
mem0_ollama_base_url, embedding_base_url, signal_api_url,
mcp_client_metadata_url. An operator (or a compromised settings write
path) pointing any of them at http://169.254.169.254/ would pull cloud
metadata on the next request cycle; pointing at file:///etc/passwd
could read local files depending on the HTTP client's URL handling.
Fix: a new `security/url_validators.py` exposes
`validate_external_url`, a pydantic AfterValidator that rejects
* non-http/https schemes (file://, ftp://, gopher://, ...)
* loopback, RFC1918, link-local (169.254/16 = EC2 metadata),
carrier-grade NAT, and 0.0.0.0/8 hosts unless
POCKETPAW_ALLOW_INTERNAL_URLS is set to true
Dev defaults (localhost:4096 opencode, localhost:11434 ollama, etc) rely
on the opt-in flag, which tests flip on via a conftest-level os.environ
default. OSS / production deployments leave it off and get a loud
ValidationError at Settings() construction if anything points inside.
Tests: tests/test_config_url_validation.py — 8 cases covering validator
logic + Settings integration.
Reproduces #703 — seven URL config fields in pocketpaw.config.Settings
are bare str, no validation. An operator pointing opencode_base_url at
169.254.169.254/ harvests cloud metadata; pointing signal_api_url at
file:/// reads local files.