pentestagent/README.md

<div align="center">

<img src="assets/pentestagent-logo.png" alt="PentestAgent Logo" width="220" style="margin-bottom: 20px;"/>

# PentestAgent
### AI Penetration Testing

[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/) [![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE.txt) [![Version](https://img.shields.io/badge/Version-0.2.0-orange.svg)](https://github.com/GH05TCREW/pentestagent/releases) [![Security](https://img.shields.io/badge/Security-Penetration%20Testing-red.svg)](https://github.com/GH05TCREW/pentestagent) [![MCP](https://img.shields.io/badge/MCP-Compatible-purple.svg)](https://github.com/GH05TCREW/pentestagent)

</div>

https://github.com/user-attachments/assets/a67db2b5-672a-43df-b709-149c8eaee975

## Requirements

- Python 3.10+
- API key for OpenAI, Anthropic, or other LiteLLM-supported provider

## Install

```bash
# Clone
git clone https://github.com/GH05TCREW/pentestagent.git
cd pentestagent

# Setup (creates venv, installs deps)
.\scripts\setup.ps1   # Windows
./scripts/setup.sh    # Linux/macOS

# Or manual
python -m venv venv
.\venv\Scripts\Activate.ps1  # Windows
source venv/bin/activate     # Linux/macOS
pip install -e ".[all]"
playwright install chromium  # Required for browser tool
```

## Configure

Create `.env` in the project root:

```
ANTHROPIC_API_KEY=sk-ant-...
PENTESTAGENT_MODEL=claude-sonnet-4-20250514
```

Or for OpenAI:

```
OPENAI_API_KEY=sk-...
PENTESTAGENT_MODEL=gpt-5
```

Any [LiteLLM-supported model](https://docs.litellm.ai/docs/providers) works.

## Run

```bash
pentestagent                      # Launch TUI
pentestagent -t 192.168.1.1       # Launch with target
pentestagent tui --docker         # Run tools in Docker container
```

## Docker

Run tools inside a Docker container for isolation and pre-installed pentesting tools.

### Option 1: Pull pre-built image (fastest)

```bash
# Base image with nmap, netcat, curl
docker run -it --rm \
  -e ANTHROPIC_API_KEY=your-key \
  -e PENTESTAGENT_MODEL=claude-sonnet-4-20250514 \
  ghcr.io/gh05tcrew/pentestagent:latest

# Kali image with metasploit, sqlmap, hydra, etc.
docker run -it --rm \
  -e ANTHROPIC_API_KEY=your-key \
  ghcr.io/gh05tcrew/pentestagent:kali
```

### Option 2: Build locally

```bash
# Build
docker compose build

# Run
docker compose run --rm pentestagent

# Or with Kali
docker compose --profile kali build
docker compose --profile kali run --rm pentestagent-kali
```

The container runs PentestAgent with access to Linux pentesting tools. The agent can use `nmap`, `msfconsole`, `sqlmap`, etc. directly via the terminal tool.

Requires Docker to be installed and running.

## Modes

PentestAgent has three modes, accessible via commands in the TUI:

| Mode | Command | Description |
|------|---------|-------------|
| Assist | `/assist <task>` | One single-shot instruction, with tool execution |
| Agent | `/agent <task>` | Autonomous execution of a single task |
| Crew | `/crew <task>` | Multi-agent mode. Orchestrator spawns specialized workers |
| Interact | `/interact <task>` | Interactive mode. Chat with the agent, it will help you and guide during the pentesting procedure |

### TUI Commands

```
/assist <task>    One single-shot instruction.
/agent <task>     Run autonomous agent on task
/crew <task>      Run multi-agent crew on task
/interact <task>  Chat with the agent in guided mode
/target <host>    Set target
/tools            List available tools
/notes            Show saved notes
/report           Generate report from session
/memory           Show token/memory usage
/prompt           Show system prompt
/conversations    Browse and restore saved conversations
/mcp <list/add>   Visualizes or adds a new MCP server.
/spawn [target] [--scope CIDR] [--model M] [--no-rag] [--no-mcp]
                  Manually spawn a child MCP agent from the TUI.
/despawn <server_name>
                  Terminate and remove a previously spawned child agent.
/clear            Clear chat and history
/quit             Exit (also /exit, /q)
/help             Show help (also /h, /?)
```

Press `Esc` to stop a running agent. `Ctrl+Q` to quit.

## Playbooks

PentestAgent includes prebuilt **attack playbooks** for black-box security testing. Playbooks define a structured approach to specific security assessments.

**Run a playbook:**

```bash
pentestagent run -t example.com --playbook thp3_web
```

![Playbook Demo](assets/playbook.gif)

## Tools

PentestAgent includes built-in tools and supports MCP (Model Context Protocol) for extensibility.

**Built-in tools:** `terminal`, `browser`, `notes`, `web_search` (requires `TAVILY_API_KEY`), `spawn_mcp_agent`

### Agent Self-Spawning (`spawn_mcp_agent`)

`spawn_mcp_agent` is a built-in tool that allows a running agent to spawn a child copy of itself as a subordinate MCP server connected over stdio. The child process is fully isolated — its own runtime, LLM client, conversation history, and notes store — and its complete tool set is injected back into the parent agent's available tools after spawning.

This enables hierarchical, multi-agent workflows without any external orchestration: the agent self-organises by delegating scoped subtasks to children it spawns on demand.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `target` | string | — | Pentest target to pass to the child |
| `scope` | string[] | — | In-scope targets/CIDRs for the child |
| `model` | string | env var | Model identifier, overrides `PENTESTAGENT_MODEL` on the child |
| `no_rag` | boolean | `false` | Skip RAG engine initialisation on the child |
| `no_mcp` | boolean | `true` | Skip external MCP server connections on the child (recommended) |

After `spawn_mcp_agent` returns, the child's tools (`run_task`, `run_task_async`, `await_tasks`, etc.) are available on the **next** tool call. The child's server name is assigned automatically (e.g. `child_agent_1`) and returned in the result.

**Example — orchestrator delegating parallel recon to two children:**

```
# Turn 1: spawn two isolated child agents
spawn_mcp_agent  target="10.0.1.0/24"  scope=["10.0.1.0/24"]
spawn_mcp_agent  target="10.0.2.0/24"  scope=["10.0.2.0/24"]

# Turn 2: children's tools are now available — delegate work asynchronously
child_agent_1__run_task_async  task="Full port scan and service enumeration"
child_agent_2__run_task_async  task="Full port scan and service enumeration"

# Turn 3: wait and collect
child_agent_1__await_tasks  task_ids=["<id1>"]  timeout_seconds=600
child_agent_2__await_tasks  task_ids=["<id2>"]  timeout_seconds=600
child_agent_1__get_task_result  task_id="<id1>"
child_agent_2__get_task_result  task_id="<id2>"
```

### Manual Child Agent Control (`/spawn` and `/despawn`)

Beyond the automatic `spawn_mcp_agent` tool, the TUI exposes two commands that let you spawn and terminate child agents **manually**, independently of a running agent loop.

#### `/spawn`

```
/spawn [target] [--scope CIDR ...] [--model MODEL] [--no-rag] [--no-mcp]
```

Spawns a new child MCP agent over stdio and attaches it to the current session. The child appears as a collapsible terminal panel in the TUI sidebar and its tools become available to the parent agent on the next tool call.

| Argument | Description |
|----------|-------------|
| `target` | Pentest target to pass to the child (positional or `--target`) |
| `--scope CIDR` | One or more in-scope CIDRs (repeatable) |
| `--model MODEL` | Override the model for the child agent |
| `--no-rag` | Skip RAG engine initialisation on the child |
| `--no-mcp` | Skip external MCP server connections on the child |

**Examples:**

```
/spawn 10.0.1.1
/spawn 10.0.1.1 --scope 10.0.1.0/24 --model claude-sonnet-4-20250514
/spawn --target 10.0.1.1 --scope 10.0.1.0/24 --no-rag
```

#### `/despawn`

```
/despawn <server_name>
```

Terminates the child agent identified by `server_name` (e.g. `child_agent_1`), removes its terminal panel from the TUI, and disconnects its tools from the parent session. Use `/mcp list` to see the names of all currently active child agents.

**Example:**

```
/despawn child_agent_1
```

### MCP RAG Tool Optimizer

When an MCP server exposes more than 128 tools, PentestAgent automatically replaces the full catalogue with a single `mcp_<server>_rag_optimizer` tool. This meta-tool uses embedding similarity (via LiteLLM, default `text-embedding-3-small`) to retrieve the most relevant tools for the task at hand and injects them into the agent's next turn — keeping the context window manageable without losing access to the full tool set.

The optimizer is transparent to the agent: it calls the RAG tool with focused natural-language queries describing what it needs, and the matching tools become available on the next turn to call directly.

**Usage guidance for the agent:**

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `queries` | string[] | *(required)* | One focused query per capability needed. More specific = higher accuracy |
| `top_k` | integer | `20` | Tools to retrieve per query (max 128). Results are merged and deduplicated |

Embeddings are computed once at startup and cached, so repeated queries are fast. The optimizer is built per-server, so each MCP server with a large catalogue gets its own independent index.

> **Tip:** Pass one query per distinct capability rather than combining everything into one query. `["list open ports on a host", "get process memory usage"]` retrieves better results than `["list ports and memory and CPU"]`.

### MCP Integration

PentestAgent supports MCP (Model Context Protocol) in two directions: **consuming** external MCP servers as tool sources, and **exposing itself** as an MCP server so external clients (Claude Desktop, Cursor, etc.) can drive PentestAgent programmatically.

---

#### Consuming External MCP Servers (Client Mode)

Configure `mcp_servers.json` to connect PentestAgent to any external MCP servers. Example config:

```json
{
  "mcpServers": {
    "nmap": {
      "command": "npx",
      "args": ["-y", "gc-nmap-mcp"],
      "env": {
        "NMAP_PATH": "/usr/bin/nmap"
      }
    }
  }
}
```

---

#### Exposing PentestAgent as an MCP Server (Server Mode)

PentestAgent can run as an MCP server, allowing any MCP-compatible client to submit tasks, inspect results, and control the agent remotely. Two transports are supported:

**STDIO** — for local clients (e.g. Claude Desktop, Cursor):

```bash
pentestagent mcp_server --type stdio
pentestagent mcp_server --type stdio --target 192.168.1.1 --scope 192.168.1.0/24
pentestagent mcp_server --type stdio --model claude-sonnet-4-20250514 --docker
```

**SSE (HTTP)** — for remote or networked clients:

```bash
pentestagent mcp_server --type sse
pentestagent mcp_server --type sse --host 0.0.0.0 --port 8080
pentestagent mcp_server --type sse --target 10.0.0.1 --scope 10.0.0.0/24 --docker
```

The SSE transport exposes a single `/mcp` endpoint supporting `POST` (requests), `GET` (persistent SSE stream for server-initiated push), and `DELETE` (session teardown). Sessions are tracked via the `Mcp-Session-Id` header.

**All `mcp_server` flags:**

| Flag | Default | Description |
|------|---------|-------------|
| `--type` | *(required)* | Transport: `stdio` or `sse` |
| `--host` | `0.0.0.0` | SSE bind host |
| `--port` | `8080` | SSE bind port |
| `--target` | none | Primary pentest target (IP / hostname) |
| `--scope` | `[]` | In-scope targets/CIDRs (space-separated) |
| `--model` | env var | Model identifier, overrides `PENTESTAGENT_MODEL` |
| `--docker` | false | Use DockerRuntime instead of LocalRuntime |
| `--no-rag` | false | Skip RAG engine initialisation |
| `--no-mcp` | false | Skip external MCP server connections |

##### Example: Claude Desktop config (`claude_desktop_config.json`)

```json
{
  "mcpServers": {
    "pentestagent": {
      "command": "pentestagent",
      "args": ["mcp_server", "--type", "stdio"]
    }
  }
}
```

---

#### MCP Server Tools Reference

When acting as an MCP server, PentestAgent exposes the following tools:

**Server Status & Config**

| Tool | Description |
|------|-------------|
| `get_server_status` | Live server status: readiness, task counts by state, primary target/scope, memory store size |
| `get_config` | Primary agent configuration: target, scope, max iterations, tool list |
| `update_config` | Update target, scope, or max iterations for all subsequent tasks |

**Task Execution**

| Tool | Description |
|------|-------------|
| `run_task` | Submit a task and **block** until it completes. Returns full result, tools used, and notes snapshot |
| `run_task_async` | Submit a task and **return immediately** with a `task_id`. Poll with `get_task_status` |

**Task Inspection**

| Tool | Description |
|------|-------------|
| `list_tasks` | List all tasks with status, target, and summary. Filterable by status |
| `get_task_status` | Poll the current status and result preview of a task |
| `get_task_result` | Full task result: final output, thinking steps, all tool calls and results, notes snapshot |
| `await_tasks` | Block until a set of async task IDs have all finished (polls every 500 ms, configurable timeout) |

**Task Control**

| Tool | Description |
|------|-------------|
| `cancel_task` | Cancel a running or pending task by ID |

**Tool Management**

| Tool | Description |
|------|-------------|
| `list_tools` | List all tools available to the agent |
| `enable_tool` | Enable a named tool on the primary agent |
| `disable_tool` | Disable a named tool on the primary agent |


**Conversation History**

| Tool | Description |
|------|-------------|
| `get_conversation_history` | Return message history for a task or the primary agent. Supports a `limit` parameter |
| `reset_conversation` | Clear conversation history for a task or the primary agent |

**Memory**

| Tool | Description |
|------|-------------|
| `store_memory` | Persist a key-value pair to the in-process memory store |
| `retrieve_memory` | Retrieve by exact key, search by substring, or list all keys |
| `clear_memory` | Delete a specific key or wipe all memory with `scope='all'` |

**Observability**

| Tool | Description |
|------|-------------|
| `get_logs` | Return recent execution logs, optionally filtered by level (`info` / `warning` / `error`) |
| `get_metrics` | Runtime metrics: task counts, success rate, total tool calls, memory and log sizes |

---

#### Async Task Workflow Example

For long-running recon tasks, use the async pattern:

```
# 1. Submit tasks without blocking
run_task_async  task="Enumerate subdomains of example.com"  target="example.com"
run_task_async  task="Run nmap SYN scan on example.com"     target="example.com"

# 2. Block until both finish (up to 5 minutes)
await_tasks  task_ids=["<id1>", "<id2>"]  timeout_seconds=300

# 3. Retrieve full results
get_task_result  task_id="<id1>"
get_task_result  task_id="<id2>"
```

---

### CLI Tool Management

```bash
pentestagent tools list         # List all tools
pentestagent tools info <name>  # Show tool details
pentestagent mcp list           # List MCP servers
pentestagent mcp add <name> <command> [args...]  # Add MCP server
pentestagent mcp test <name>    # Test MCP connection
```

## Conversation History

PentestAgent automatically persists every conversation so you can review, compare, and restore past sessions.

**Auto-save** triggers after each `/assist`, `/agent`, `/crew`, and `/interact` task, and before `/clear`. Up to 20 conversations are kept; older ones are pruned automatically.

**Storage location:** `workspaces/<active>/memory/conversations/` when a workspace is active, or `conversations/` at the project root otherwise. Each conversation is a JSON file.

**Browse & restore with `/conversations`:**

The `/conversations` command opens a split-pane modal inside the TUI:
- **Left panel** — list of saved conversations with title and date.
- **Right panel** — metadata preview plus the first 5 messages (user messages in blue, agent responses in green, tool calls in yellow, tool results in grey). A count shows how many additional messages exist.

<img width="1657" height="662" alt="imagen" src="https://github.com/user-attachments/assets/da42f083-9b7f-445e-8c59-2402ac8e5ddc" />


Select a conversation and press **Restore** to reload it into the current session, or **Close** to dismiss the modal.

## Knowledge

- **RAG:** Place methodologies, CVEs, or wordlists in `pentestagent/knowledge/sources/` for automatic context injection.
- **Notes:** Agents save findings to `loot/notes.json` with categories (`credential`, `vulnerability`, `finding`, `artifact`). Notes persist across sessions and are injected into agent context.
- **Shadow Graph:** In Crew mode, the orchestrator builds a knowledge graph from notes to derive strategic insights (e.g., "We have credentials for host X").

## Project Structure

```
pentestagent/
  agents/         # Agent implementations
  config/         # Settings and constants
  interface/      # TUI and CLI
  knowledge/      # RAG system and shadow graph
  llm/            # LiteLLM wrapper
  mcp/            # MCP client and server configs
  playbooks/      # Attack playbooks
  runtime/        # Execution environment
  tools/          # Built-in tools
```

## Development

```bash
pip install -e ".[dev]"
pytest                       # Run tests
pytest --cov=pentestagent    # With coverage
black pentestagent           # Format
ruff check pentestagent      # Lint
```

## Legal

Only use against systems you have explicit authorization to test. Unauthorized access is illegal.

## License

MIT