Files
pocketpaw/docs/tools/ocr.mdx
Prakash 57b807c117 docs(seo): optimize titles, descriptions, headings, and cross-links
Comprehensive SEO optimization across 80 documentation pages:

Title optimization (all pages):
- Replaced generic titles like "Architecture", "Discord", "Slack"
  with search-intent titles like "PocketPaw Architecture: Event-Driven
  Message Bus", "Discord Bot Setup: Add PocketPaw to Your Server"
- All titles now 50-70 characters with qualifying keywords

Meta descriptions:
- Expanded 7 short descriptions (under 145 chars) to 150-160 chars
- Roadmap description expanded from 76 to 196 chars
- Troubleshooting, Codex CLI, OpenCode, WebMCP all expanded

H1 heading fixes:
- Ensured single H1 per page matching the frontmatter title
- All H1 headings updated to match new optimized titles

Internal cross-links:
- Added Related CardGroup sections to 60+ individual pages
- Each links to 2-3 related pages within and across sections
- Channels link to channel guides, backends link to Ollama guide, etc.

Em dash cleanup:
- Replaced em dashes with colons, periods, or double hyphens
  across multiple files in tools/, channels/, integrations/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:41:36 +05:30

76 lines
2.1 KiB
Plaintext

---
title: "OCR Tool: Extract Text from Images"
description: "Extract text from images using PocketPaw's OCR tool. Primary engine is GPT-4o Vision for high accuracy; falls back to pytesseract for offline use. Supports screenshots, photos, and scanned documents."
section: Tools
ogType: article
keywords: ["ocr", "image text extraction", "gpt-4o vision", "pytesseract", "optical character recognition"]
tags: ["tools", "media"]
---
# OCR Tool: Extract Text from Images
The OCR tool extracts text from images using a two-tier approach: GPT-4o Vision as the primary engine with pytesseract as a fallback.
## How It Works
1. **Primary**: Sends the image to GPT-4o Vision for intelligent text extraction
2. **Fallback**: If GPT-4o is unavailable, falls back to pytesseract (local OCR)
GPT-4o Vision produces superior results for complex layouts, handwriting, and non-standard text. Pytesseract works offline but is limited to printed text.
## Setup
```bash
# For GPT-4o Vision (primary)
export POCKETPAW_OPENAI_API_KEY="sk-..."
# For pytesseract fallback (optional)
# Install tesseract-ocr system package
sudo apt install tesseract-ocr # Ubuntu/Debian
brew install tesseract # macOS
```
## Usage
```
User: What does this image say? /path/to/screenshot.png
Agent: [uses ocr tool] → "The image contains the following text..."
```
## Tool Schema
```json
{
"name": "ocr",
"description": "Extract text from an image file",
"input_schema": {
"type": "object",
"properties": {
"image_path": {
"type": "string",
"description": "Path to the image file"
}
},
"required": ["image_path"]
}
}
```
## Policy Group
Belongs to `group:media`.
## Related
<CardGroup>
<Card title="Image Generation" icon="lucide:image" href="/tools/image-generation">
Generate images with Google Gemini models.
</Card>
<Card title="Desktop Automation" icon="lucide:monitor" href="/tools/desktop">
Capture screenshots and monitor system resources.
</Card>
<Card title="Tools Overview" icon="lucide:wrench" href="/tools">
Browse all 50+ built-in tools available in PocketPaw.
</Card>
</CardGroup>