7.9 KiB
PentestAgent — CLAUDE.md
Project overview
PentestAgent (v0.2.0) is an AI-powered penetration testing framework built in Python. It wraps LiteLLM to support any provider (Anthropic, OpenAI, etc.) and exposes a TUI, a CLI, and an MCP server interface. The agent can run tools locally or inside a Docker sandbox (base or Kali image).
Tech stack
- Python 3.10+, packaged with Hatchling (
pyproject.toml) - LiteLLM — provider-agnostic LLM wrapper
- Textual — TUI framework (
pentestagent/interface/) - Typer — CLI framework
- Playwright — browser tool
- MCP (Model Context Protocol) — both client (consuming external servers) and server (exposing PentestAgent to Claude Desktop / Cursor / etc.)
- FAISS + sentence-transformers — optional RAG engine (
pip install -e ".[rag]")
Repository layout
pentestagent/
agents/
crew/ # Multi-agent mode: orchestrator + worker pool + shadow graph
pa_agent/ # Single-agent implementation
state.py # Shared agent state
config/
settings.py # Global Settings dataclass (singleton via get_settings())
constants.py # Model defaults, iteration limits, etc.
interface/
cli.py # Typer CLI entry-point
notifier.py # Event bus between agent and UI
utils.py # Shared UI helpers
knowledge/
graph.py # ShadowGraph — derives strategic insights from notes
indexer.py # Indexes knowledge sources for RAG
rag.py # FAISS-backed retrieval
llm/
config.py # LiteLLM configuration
memory.py # Conversation/token management
utils.py # Streaming helpers
mcp/
stdio_adapter.py # STDIO MCP server transport
example_adapter.py # SSE MCP server transport
playbooks/
base_playbook.py
thp3_recon.py / thp3_network.py / thp3_web.py
runtime/
docker_runtime.py # Runs tool commands inside Docker
tool_server.py # Local runtime
tools/
loader.py # Discovers & dynamically imports tool modules
executor.py # Executes tool calls, tracks tokens
token_tracker.py
terminal/ # Shell execution tool
browser/ # Playwright browser tool
web_search/ # Tavily web search (needs TAVILY_API_KEY)
notes/ # Persistent findings store → loot/notes.json
finish/ # Signals task completion
workspaces/ # Workspace isolation helpers
loot/ # Persisted notes and findings (git-ignored)
mcp_examples/ # Example MCP configs and adapters
scripts/ # setup.sh / setup.ps1
tests/ # pytest suite
Configuration
Create .env in the project root:
ANTHROPIC_API_KEY=sk-ant-...
PENTESTAGENT_MODEL=claude-sonnet-4-20250514
# Optional
TAVILY_API_KEY=... # web_search tool
OPENAI_API_KEY=sk-... # if using OpenAI
Settings are managed by pentestagent/config/settings.py (get_settings() singleton).
The MCP external-server config lives in mcp_servers.json (Claude Desktop format).
Running the project
source venv/bin/activate
pentestagent # TUI
pentestagent -t 192.168.1.1 # TUI with pre-set target
pentestagent tui --docker # Use Docker sandbox for tool execution
pentestagent run -t example.com --playbook thp3_web # Run a playbook
pentestagent mcp_server --type stdio # Expose as MCP server (STDIO)
pentestagent mcp_server --type sse # Expose as MCP server (HTTP/SSE, port 8080)
PentestAgent as MCP server
PentestAgent can expose itself as an MCP server so any MCP-compatible client (Claude Desktop, Cursor, etc.) can drive it programmatically.
Transports
# STDIO — for local clients
pentestagent mcp_server --type stdio
pentestagent mcp_server --type stdio --target 192.168.1.1 --scope 192.168.1.0/24 --docker
# SSE (HTTP) — for remote/networked clients (default: 0.0.0.0:8080)
pentestagent mcp_server --type sse
pentestagent mcp_server --type sse --host 0.0.0.0 --port 8080 --target 10.0.0.1
Claude Desktop config (claude_desktop_config.json)
{
"mcpServers": {
"pentestagent": {
"command": "pentestagent",
"args": ["mcp_server", "--type", "stdio"]
}
}
}
Tools exposed by the MCP server
| Category | Tools |
|---|---|
| Status / config | get_server_status, get_config, update_config |
| Task execution | run_task (blocking), run_task_async (returns task_id) |
| Task inspection | list_tasks, get_task_status, get_task_result, await_tasks |
| Task control | cancel_task |
| Tool management | list_tools, enable_tool, disable_tool |
| Conversation | get_conversation_history, reset_conversation |
| Memory | store_memory, retrieve_memory, clear_memory |
| Observability | get_logs, get_metrics |
Async task pattern
run_task_async task="Enumerate subdomains of example.com"
run_task_async task="Run nmap SYN scan on example.com"
await_tasks task_ids=["<id1>", "<id2>"] timeout_seconds=300
get_task_result task_id="<id1>"
get_task_result task_id="<id2>"
mcp_server flags
| Flag | Default | Description |
|---|---|---|
--type |
(required) | stdio or sse |
--host |
0.0.0.0 |
SSE bind host |
--port |
8080 |
SSE bind port |
--target |
none | Primary pentest target |
--scope |
[] |
In-scope CIDRs (space-separated) |
--model |
env var | Overrides PENTESTAGENT_MODEL |
--docker |
false | Use DockerRuntime |
--no-rag |
false | Skip RAG initialisation |
--no-mcp |
false | Skip external MCP connections |
TUI commands (quick reference)
| Command | Description |
|---|---|
/assist <task> |
Single-shot instruction with tool execution |
/agent <task> |
Autonomous single-agent loop |
/crew <task> |
Multi-agent: orchestrator spawns specialised workers |
/interact <task> |
Guided interactive chat |
/target <host> |
Set target |
/tools |
List available tools |
/notes |
Show saved findings |
/report |
Generate report from session |
/mcp list/add |
Manage MCP servers |
Esc |
Stop running agent |
Key architectural patterns
- Tool registration: Tools self-register via
pentestagent/tools/loader.py. Add a new tool by creating a directory underpentestagent/tools/<name>/with an__init__.pythat registers it with the tool registry. - Modes: assist → single LLM call; agent → agentic loop; crew →
CrewOrchestratormanages aWorkerPool; interact → streaming chat. - MCP server tools:
run_task,run_task_async,await_tasks,get_task_result,list_tasks,cancel_task,list_tools,enable_tool,disable_tool,store_memory,retrieve_memory,get_logs,get_metrics. - Agent self-spawning (
spawn_mcp_agent): running agent can spawn child agents as MCP servers over stdio, enabling hierarchical multi-agent workflows. - RAG tool optimizer: if an MCP server exposes >128 tools, a single
mcp_<server>_rag_optimizermeta-tool replaces them and retrieves relevant subsets via embedding similarity. - Notes persistence: findings saved to
loot/notes.jsonvia thenotestool; categories:credential,vulnerability,finding,artifact. - ShadowGraph (crew mode only): builds a knowledge graph from notes to provide strategic context to the orchestrator.
Development
pip install -e ".[dev]"
pytest # Run tests
pytest --cov=pentestagent # With coverage
black pentestagent # Format
ruff check pentestagent # Lint
Test config: pytest.ini_options in pyproject.toml, asyncio mode = auto.
Docker
docker compose build
docker compose run --rm pentestagent # Base image
docker compose --profile kali run --rm pentestagent-kali # Kali image
Legal
Only use against systems you have explicit written authorisation to test. Unauthorised access is illegal. MIT licence.