mirror of
https://github.com/pocketpaw/pocketpaw.git
synced 2026-05-22 01:34:59 +00:00
deep_work was one-shot: a goal string went straight to GoalParser and on through planning. GoalParser already produced a clarifications_needed list (the exact questions you'd ask to disambiguate a vague goal) but nothing asked them. A developer can hand deep_work a well-formed goal. A non-developer can't. This adds an optional intake mode that closes the loop. GoalIntake asks the clarification questions through an injected answer provider, folds the answers back into the goal, and re-parses so planning starts from a well-formed goal. A well-formed goal produces no clarifications and skips the loop, so the existing one-shot path is unchanged. TaskSpec gains two structured fields, success_criteria (a verifiable end state) and preconditions (when not to act), that used to be free text buried in the description. The planner prompt now emits them per task, and they carry onto each materialized MC Task's metadata so outcome verification can check them later. Two new API endpoints: POST /intake/clarify returns the clarification questions for a goal, and POST /start-with-intake submits the goal plus the collected answers. The plain /start endpoint is untouched. Closes #1161
308 lines
14 KiB
Plaintext
308 lines
14 KiB
Plaintext
---
|
|
title: "Deep Work: Long-Running Autonomous Projects"
|
|
description: "Deep Work is PocketPaw's AI-powered project orchestrator: describe a project and it researches the domain, writes a PRD, decomposes tasks with dependencies, assembles an agent team, and executes autonomously."
|
|
section: Advanced
|
|
ogType: article
|
|
keywords: ["deep work", "project planning", "task decomposition", "multi-agent", "autonomous execution", "prd", "retry", "timeout", "pawkit", "goal intake", "clarification"]
|
|
tags: ["advanced", "orchestration", "multi-agent"]
|
|
---
|
|
|
|
# Deep Work: Long-Running Autonomous Projects
|
|
|
|
Deep Work is PocketPaw's orchestration system for complex, multi-step projects. Describe what you want to build, and PocketPaw researches the domain, writes a product requirements document, breaks it into tasks with dependencies, assembles an agent team, and executes everything autonomously.
|
|
|
|
## How It Works
|
|
|
|
<Steps>
|
|
<Step title="Describe Your Project">
|
|
Provide a natural-language description of what you want to build or accomplish.
|
|
</Step>
|
|
<Step title="Clarify (optional)">
|
|
If the goal is vague, Deep Work asks a few clarification questions first and folds your answers back into the goal. A clear goal skips this step. See [Interactive Goal Intake](#interactive-goal-intake).
|
|
</Step>
|
|
<Step title="AI Research & Planning">
|
|
The planner agent researches the domain, writes a PRD, decomposes the project into atomic tasks, and recommends an agent team.
|
|
</Step>
|
|
<Step title="Review & Approve">
|
|
Review the generated plan — tasks, dependencies, time estimates, and team — in the dashboard. Approve when ready.
|
|
</Step>
|
|
<Step title="Autonomous Execution">
|
|
Tasks execute in dependency order. Agent tasks run via Claude, human tasks notify you for manual completion. Failed tasks retry automatically. Progress streams in real time.
|
|
</Step>
|
|
<Step title="Completion">
|
|
When all tasks finish, the project is marked complete and deliverables are saved to disk.
|
|
</Step>
|
|
</Steps>
|
|
|
|
## Interactive Goal Intake
|
|
|
|
Deep Work plans best from a well-formed goal — one with a verifiable end state, not a wish. A developer can write one straight off ("Build a REST API for recipe management with JWT auth, deployed on Fly.io"). A goal like "chase down overdue invoices" is too vague to plan well: how overdue, on what channel, do reminders escalate?
|
|
|
|
Intake mode closes that gap. Before planning, Deep Work analyzes the goal and, if it is ambiguous, asks a short set of clarification questions. Your answers are folded back into the goal, and the planner works from the enriched version.
|
|
|
|
**Intake is optional.** A goal the analyzer judges well-formed skips the conversation entirely and goes straight to planning — the one-shot path is unchanged.
|
|
|
|
### How intake works
|
|
|
|
<Steps>
|
|
<Step title="Analyze">
|
|
The goal parser inspects the goal. A clear goal produces no clarification questions; a vague one produces up to four.
|
|
</Step>
|
|
<Step title="Ask">
|
|
Each clarification question is surfaced to you — in the dashboard chat, this is a normal back-and-forth turn.
|
|
</Step>
|
|
<Step title="Fold">
|
|
Your answers are appended to the goal as a "Clarifications gathered during intake" block, producing an enriched goal. Skipped (blank) answers are dropped.
|
|
</Step>
|
|
<Step title="Plan">
|
|
The planner runs against the enriched goal. The intake transcript is stored on the project so the dashboard can show what shaped the plan.
|
|
</Step>
|
|
</Steps>
|
|
|
|
### Success criteria and preconditions
|
|
|
|
Intake makes the planner produce two structured fields per task that used to be buried as free text in the description:
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `success_criteria` | A verifiable end state — concrete "this task is done when…" checks (e.g. "the endpoint returns 200", "tests pass"), not vague claims like "works well". |
|
|
| `preconditions` | "When NOT to act" guardrails — conditions that must hold before the task runs, or signals that mean it should be skipped. |
|
|
|
|
`success_criteria` is what makes a task checkable after it runs — it is the foundation the [outcome-verification](#related) work builds on to confirm a task solved the original problem, not just that it finished.
|
|
|
|
### Using intake
|
|
|
|
```
|
|
User: Start a deep work project to chase down overdue invoices
|
|
Agent: A couple of questions before I plan this:
|
|
1. How many days overdue before an invoice counts?
|
|
2. Which channel should reminders go out on?
|
|
User: 30 days, and email
|
|
Agent: Got it. Planning a project to email reminders for invoices 30+ days overdue...
|
|
```
|
|
|
|
Via the REST API, intake is two calls — fetch the questions, then submit the goal with the answers:
|
|
|
|
```bash
|
|
# 1. Get clarification questions
|
|
curl -X POST http://localhost:8000/api/deep-work/intake/clarify \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"description": "Chase down overdue invoices"}'
|
|
|
|
# 2. Submit the goal with the collected answers
|
|
curl -X POST http://localhost:8000/api/deep-work/start-with-intake \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"description": "Chase down overdue invoices",
|
|
"answers": [
|
|
{"question": "How many days overdue before an invoice counts?", "answer": "30 days"},
|
|
{"question": "Which channel should reminders go out on?", "answer": "email"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
An empty `answers` list means the goal was well-formed — `start-with-intake` then behaves identically to `start`.
|
|
|
|
## Project Lifecycle
|
|
|
|
| Status | Description |
|
|
|--------|-------------|
|
|
| `DRAFT` | Project created, not yet planned |
|
|
| `PLANNING` | Planner agent is researching and decomposing tasks |
|
|
| `AWAITING_APPROVAL` | Plan ready for user review |
|
|
| `APPROVED` | User approved, ready to execute |
|
|
| `EXECUTING` | Tasks are running |
|
|
| `PAUSED` | Execution paused by user |
|
|
| `COMPLETED` | All tasks finished |
|
|
| `FAILED` | Planning or execution error |
|
|
| `CANCELLED` | User cancelled the project — all remaining tasks skipped |
|
|
|
|
## Planning Phases
|
|
|
|
The planner runs four sequential phases, each broadcasting progress events to the dashboard:
|
|
|
|
### 1. Research
|
|
|
|
Gathers domain knowledge before planning. Controlled by `research_depth`:
|
|
|
|
| Depth | Behavior |
|
|
|-------|----------|
|
|
| `none` | Skip research entirely |
|
|
| `quick` | Minimal analysis from existing knowledge, no web search |
|
|
| `standard` | Balanced research, may use web search |
|
|
| `deep` | Thorough research with extensive web searching |
|
|
|
|
### 2. Product Requirements Document (PRD)
|
|
|
|
Generates a structured PRD with:
|
|
- Problem statement
|
|
- Scope (in/out)
|
|
- Functional requirements
|
|
- Non-goals
|
|
- Technical constraints
|
|
|
|
### 3. Task Breakdown
|
|
|
|
Decomposes the PRD into atomic tasks. Each task includes:
|
|
|
|
| Field | Description |
|
|
|-------|-------------|
|
|
| `key` | Short unique identifier (e.g., `t1`, `t2`) |
|
|
| `title` | Human-readable name |
|
|
| `description` | Full description of the work |
|
|
| `task_type` | `agent`, `human`, or `review` |
|
|
| `priority` | `low`, `medium`, `high`, or `urgent` |
|
|
| `estimated_minutes` | Time estimate (15-120 min range) |
|
|
| `required_specialties` | Skills needed (e.g., `backend`, `frontend`) |
|
|
| `blocked_by_keys` | Dependencies on other tasks |
|
|
| `max_retries` | How many times to auto-retry on failure (default: 1) |
|
|
| `timeout_minutes` | Per-task execution time limit (optional) |
|
|
| `success_criteria` | Objectively-verifiable conditions that must hold once the task is done — each a single concrete, checkable statement. Carried onto the materialized Mission Control Task so completion can be verified, not just marked finished. See [Success criteria and preconditions](#success-criteria-and-preconditions). |
|
|
| `preconditions` | State or environment conditions that must hold before the task starts (or conditions under which it should not run). Distinct from `blocked_by_keys`, which tracks dependencies on other tasks. |
|
|
|
|
### 4. Team Assembly
|
|
|
|
Recommends the minimal set of AI agents needed. Each agent has a name, role, specialties, and backend. Agents are auto-assigned to tasks based on specialty overlap.
|
|
|
|
## Dependency Scheduling
|
|
|
|
Tasks execute in dependency order using a topological sort (Kahn's algorithm):
|
|
|
|
- Tasks with no blockers run first (concurrently)
|
|
- When a task completes, newly unblocked tasks dispatch automatically
|
|
- The scheduler validates the dependency graph for cycles and missing references before execution
|
|
|
|
```
|
|
Level 0: [t1, t2] ← no dependencies, run in parallel
|
|
Level 1: [t3, t4] ← depend only on level 0
|
|
Level 2: [t5] ← depends on level 1
|
|
```
|
|
|
|
## Task Types
|
|
|
|
| Type | Execution |
|
|
|------|-----------|
|
|
| `agent` | Runs via the agent backend (Claude Agent SDK). Output saved as deliverable document. |
|
|
| `human` | Notification sent to your channels. You complete it manually and mark done in the dashboard. |
|
|
| `review` | Quality gate — agent output is ready for your review before dependents proceed. |
|
|
|
|
## Retry & Timeout
|
|
|
|
Tasks don't just fail and stop — they fight through problems on their own.
|
|
|
|
### Auto-Retry
|
|
|
|
Each task has a `max_retries` count (default: 1). When an agent task fails or times out, the executor checks whether retries remain. If so, it resets the task to `ASSIGNED` and re-dispatches it automatically. The `retry_count` field tracks how many attempts have been made.
|
|
|
|
Once retries are exhausted, the task moves to `BLOCKED` and waits for your attention. You can also manually retry any blocked task from the dashboard or API — manual retries bypass the `max_retries` limit.
|
|
|
|
### Task Timeout
|
|
|
|
Set `timeout_minutes` on a task to enforce a hard execution time limit. The executor wraps the agent call in `asyncio.wait_for` — if the agent hasn't finished within the window, it's interrupted and treated the same as a failure (triggers retry if attempts remain, otherwise goes `BLOCKED`).
|
|
|
|
Leave `timeout_minutes` unset for tasks that genuinely need open-ended time, like deep research or large code generation.
|
|
|
|
## Output Chaining
|
|
|
|
When an agent task completes, its full output is stored directly on the `task.output` field. Downstream tasks can reference this output as context — the executor builds each task's prompt with deliverables from upstream dependencies already included.
|
|
|
|
This means later tasks in the dependency chain know what earlier tasks produced, without you having to wire anything up manually.
|
|
|
|
## Cancellation
|
|
|
|
Cancel a running project when you need to pull the plug. Cancellation is a terminal state — once cancelled, the project can't be resumed.
|
|
|
|
What happens when you cancel:
|
|
1. All currently executing tasks are stopped immediately
|
|
2. All pending and assigned tasks are marked `SKIPPED`
|
|
3. Tasks already completed keep their `DONE` status (work isn't lost)
|
|
4. The project status changes to `CANCELLED`
|
|
|
|
You can't cancel a project that's already completed or already cancelled.
|
|
|
|
## Usage
|
|
|
|
### Starting a Project
|
|
|
|
```
|
|
User: Start a deep work project to build a REST API for a recipe management app
|
|
Agent: Starting Deep Work project... researching domain, writing PRD, decomposing tasks.
|
|
```
|
|
|
|
Or via the REST API:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/api/deep-work/start \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"description": "Build a REST API for recipe management", "research_depth": "standard"}'
|
|
```
|
|
|
|
### Reviewing the Plan
|
|
|
|
The dashboard shows:
|
|
- The generated PRD
|
|
- Task list with dependencies visualized as execution levels
|
|
- Estimated total time
|
|
- Recommended agent team
|
|
|
|
### Controlling Execution
|
|
|
|
| Action | API Endpoint |
|
|
|--------|-------------|
|
|
| Approve plan | `POST /api/deep-work/projects/{id}/approve` |
|
|
| Pause execution | `POST /api/deep-work/projects/{id}/pause` |
|
|
| Resume execution | `POST /api/deep-work/projects/{id}/resume` |
|
|
| Cancel project | `POST /api/deep-work/projects/{id}/cancel` |
|
|
| Skip a task | `POST /api/deep-work/projects/{id}/tasks/{task_id}/skip` |
|
|
| Retry a blocked task | `POST /api/deep-work/projects/{id}/tasks/{task_id}/retry` |
|
|
|
|
Skipping a task marks it as `SKIPPED` and unblocks dependents. Retrying resets a blocked task to `ASSIGNED` and re-dispatches it.
|
|
|
|
## PawKit Templates
|
|
|
|
Deep Work projects can be packaged as [PawKit](/advanced/pawkit) templates — YAML configs that define a full [Command Center](/concepts/command-centers) with layout, automated workflows, user configuration, and bundled skills. The Deep Work engine becomes the "Project Orchestrator" Command Center, and domain experts can publish their own PawKits for others to install.
|
|
|
|
See [Command Centers](/concepts/command-centers) for the concept and [PawKit Reference](/advanced/pawkit) for the full YAML schema.
|
|
|
|
## Project Directories
|
|
|
|
Each project gets a working directory at `~/pocketpaw-projects/{project_id}/`. Agent tasks execute within this directory, and deliverables are saved there.
|
|
|
|
## Recovery
|
|
|
|
If the server restarts during execution:
|
|
- Projects stuck in `PLANNING` are marked `FAILED`
|
|
- Projects in `EXECUTING` have their in-progress tasks reset and re-dispatched
|
|
|
|
## WebSocket Events
|
|
|
|
The dashboard receives real-time updates:
|
|
|
|
| Event | When |
|
|
|-------|------|
|
|
| `dw_planning_phase` | Each planning phase starts (research, prd, tasks, team) |
|
|
| `dw_planning_complete` | Planning finishes or fails |
|
|
| `dw_project_cancelled` | Project was cancelled |
|
|
| `mc_task_started` | A task begins executing |
|
|
| `mc_task_output` | Agent produces output (streamed) |
|
|
| `mc_task_completed` | A task finishes (includes retry info) |
|
|
| `mc_task_retry` | A task is about to be retried after failure |
|
|
|
|
<Callout type="info">
|
|
Deep Work builds on top of [Mission Control](/advanced/mission-control) for task storage, agent management, and the execution engine. The two systems are designed to work together.
|
|
</Callout>
|
|
|
|
## Related
|
|
|
|
<CardGroup>
|
|
<Card title="Mission Control" icon="lucide:layout-dashboard" href="/advanced/mission-control">
|
|
The multi-agent execution engine that powers Deep Work task management.
|
|
</Card>
|
|
<Card title="Plan Mode" icon="lucide:list-checks" href="/advanced/plan-mode">
|
|
Structured approval workflows for reviewing agent-generated plans.
|
|
</Card>
|
|
<Card title="Autonomous Messaging" icon="lucide:send" href="/advanced/autonomous-messaging">
|
|
Get notified when human tasks need attention or projects complete.
|
|
</Card>
|
|
</CardGroup>
|