--- marp: true theme: default class: - lead --- # I built my own AI-agent. Why? **A journey from reading about AI to building a custom agent framework** ![](./_docs/images/header.png) --- layout: image-left image: ./_images/Gemini_Generated_Image_x6y0h0x6y0h0x6y0.png --- ## Mikhail Larchanka - Principal Software Engineer at **Sytac** - https://larchanka.com - https://youtube.com/@larchanka - https://github.com/larchanka - https://x.com/mlarchanka --- ## ๐ŸŒ The AI Catch-Up - The AI landscape is evolving at breakneck speed every single day. - New models, new frameworks (LangChain, AutoGen), new methodologies. - It feels like the revolution is passing by. - **The Challenge:** I do not work with AI in my daily job. Staying actively involved requires intentional effort beyond standard day-to-day tasks. ![](./_images/agents.webp) --- layout: image-right image: ./_images/Gemini_Generated_Image_vu2lluvu2lluvu2l.png --- ## ๐Ÿ“š The Trap of "Reading vs. Doing" - I read a lot of papers, articles, and documentation. - **The Reality Check:** Reading builds awareness, but not genuine *knowledge* or intuition. - Without hands-on practice, you don't discover the edge cases, the latency issues, or the prompt fragility. - I spent time checking what others were building in the space, and understanding their pain points. --- layout: image-right image: ./_images/dontlike.png backgroundSize: contain --- ## ๐Ÿ’ก The Catalyst: My Own Ideas - While observing existing solutions, I realized I had different ideas on how agents should operate. - Existing frameworks often felt either too bloated, too confusing, or too rigid. - I wanted to build something tailored to my intuition of how a system should reason and interact with an environment. --- layout: image-right image: ./_images/ai-prices.png backgroundSize: contain --- ## ๐Ÿ’ธ Cost-Driven Architecture - **The Goal:** Make learning and relentless experimentation "cheap." - Relying on cloud APIs (GPT-4, Claude) for agentic loopsโ€”which run autonomously making dozens of calls and mistakesโ€”gets expensive quickly. - **The Solution:** Local LLMs. - Complete freedom to experiment, fail, retry, and loop infinitely without worrying about API bills. --- layout: image-left image: ./_images/llm-inference.webp backgroundSize: contain --- ## ๐Ÿค– Evaluating Local LLMs - Not all models are created equal for agentic tasks. - **Benchmarking for my agent:** - Need strong coding and reasoning capabilities. - Need reliable JSON/tool-calling formatting. - Need fast inference speed (tokens/sec) for autonomous, multi-step loops. - Explored running models locally using tools like Lemonade. - Tested how models handle context degradation on local machine hardware. --- ## ๐Ÿ”€ Dynamic Model Routing - Running a massive model (like Mixtral) for every tiny task is slow and overkill. - I built a **Model Router** that dynamically selects models based on task complexity. - **The Flow:** 1. **Planner:** Evaluates the user's intent and forces the plan into a `complexity` bucket (`"small"`, `"medium"`, or `"large"`). 2. **Injection:** The Executor injects `_complexity` into the payload for every node. 3. **Router Resolution:** The `GeneratorService` checks the complexity and maps it purely via `config.json`. 4. *Small -> Llama3:8B (Fast system loops)* | *Medium -> Qwen2.5 (Standard)* | *Large -> Mixtral (Deep research)*. --- ## ๐Ÿ›ก๏ธ Context & Token Safety - I didn't want to calculate exact tokens with heavy libraries (like `tiktoken`) on every single loop. - **My Strategy (Heuristics & Compression):** - **Safety Truncation:** If a tool (like a massive `http_get` web scrape) returns over 30,000 characters, the `ExecutorAgent` aggressively truncates it. - **Prompt Summarization:** Instead of keeping an infinitely growing chat history, the Planner produces a `summarize` node using a specialized `SUMMARIZER_SYSTEM_PROMPT` to compress old context dynamically. - Tracking the standard `usage` vectors from OpenAI-compatible tools (`prompt_tokens`, `total_tokens`) for observability rather than strict hard-blocking. --- ## ๐Ÿ—๏ธ The Ultimate Testing Playground - The framework wasn't just a final product; it was a testbed. - **Objectives:** - Test how to actually *code* with agents in different structural scenarios. - Experiment with system architecture and modularity design. - Learn how to construct proper, dynamic task-planning prompts. - Create functional applications autonomously based on structured tasks.


```mermaid flowchart LR Planner[Planner] --> Executor[Executor] Executor --> Tools[Tools] Tools --> Executor Executor --> Planner ``` --- ## โš™๏ธ The Core Loop: Solving Communication - **The Problem:** LLMs naturally output raw text. Agents need structured, executable actions. - **The Implementation:** - Forcing the local LLM to output valid JSON representations of tool calls. - Handling parsing errors seamlessly through self-correction loops. - Designing a robust schema that the LLM understands and adheres to. - Distinguishing between "Thinking" (reasoning) and "Acting" (tool execution). --- ## โš™๏ธ The Core Loop: Solving Communication ### Request ``` { "id": "uuid", "from": "core", "to": "planner", "type": "plan.create", "version": "1.0", "timestamp": 1704067200000, "payload": {} } ``` --- ## โš™๏ธ The Core Loop: Solving Communication ### Response ``` { "id": "same-as-request", "from": "planner", "to": "core", "type": "response", "version": "1.0", "timestamp": 1704067200000, "payload": { "status": "success", "result": {} } } ``` --- layout: image image: ./_images/SCR-20260305-jmkx.png --- --- ## ๐Ÿ› ๏ธ Equipping the Agent: Tools & Skills - Agents are useless without hands. - I built a modular tool host system. - Standardized interfaces for tools: `name`, `description`, `parameters`, `execute()`. - Grouping tools into highly specialized "Skills" (e.g., File System, Terminal, Browser). - Optimization: Injecting only relevant tool schemas into the prompt to preserve context. --- layout: image image: ./_images/SCR-20260305-jmme.png --- --- ## ๐Ÿ”Œ Standardizing with MCP - **Building MCP (Model Context Protocol) Integration:** - Why reinvent the wheel for every custom tool or data source? - Implementing MCP allowed my agent to connect to external, standardized tools seamlessly. - Learned how to expose local environment capabilities (files, API connections) to an agent through standardized, secure boundaries. --- ## ๐Ÿ›๏ธ The Layered Memory Architecture To prevent context contamination and keep prompt sizes manageable, I separated memory into distinct tiers: ```mermaid flowchart LR classDef st fill:#fef3c7,stroke:#b45309,stroke-width:2px,color:#000 classDef mt fill:#dbeafe,stroke:#1d4ed8,stroke-width:2px,color:#000 classDef lt fill:#dcfce7,stroke:#15803d,stroke-width:2px,color:#000 subgraph ST ["Short-Term (In-Context)"] direction TB Conv["๐Ÿ’ฌ Conversation"]:::st Session["๐Ÿ“ Scratchpad"]:::st end subgraph MT ["Mid-Term (SQLite Task Store)"] direction TB Task["โš™๏ธ DAG State"]:::mt Reflect["๏ฟฝ Reflections"]:::mt end subgraph LT ["Long-Term (Persistent)"] direction TB RAG["๐Ÿ”ฎ Semantic RAG"]:::lt Struct["๐Ÿ’พ Structured Data"]:::lt end ST -->|Initiates tasks| MT MT -->|Queries knowledge| LT LT -.->|Injects context| ST ``` --- ## ๐Ÿ•’ When is each memory used? - **Short-Term (Conversation & Session):** - **When:** Active chatting, holding the immediate goal, fast active reasoning. - **Lifecycle:** Evicted rapidly to save prompt context. - **Mid-Term (Task Memory & State - SQLite):** - **When:** Tracking multi-step execution graphs (DAGs), pausing/resuming tasks, storing critic reflections & retry counts. - **Lifecycle:** Persists across agent loops; prevents the agent from getting stuck in circles. - **Long-Term (Vector DB & File System):** - **When:** Finding unseen documents or entire codebase structures based on semantic meaning. - **Lifecycle:** Permanent; grows over time. --- ## ๐Ÿง  Long-Term Memory: RAG from Scratch - Local LLMs have finite (and hardware-bound) context windows. - You can't simply fit an entire large codebase into a localized 8k context window. - **Building RAG (Retrieval-Augmented Generation):** - Used `sqlite-vss` for K-Nearest Neighbors (KNN) vector search natively inside SQLite. - Implementing fallback to dot-product calculations if the VSS extension is unavailable. - Generating and storing embeddings locally to retrieve only the relevant functions immediately needed.


```mermaid graph LR Ctx["๐Ÿ“„ Context"] --> Chunk["โœ‚๏ธ Chunking"] Chunk --> Embed["๐Ÿ”ข Embedding"] Embed --> DB["๐Ÿ—„๏ธ Vector DB"] DB --> Search["๐Ÿ” Search"] Search --> Retrieve["๐Ÿง  Retrieval"] ``` --- ## โš–๏ธ Mastering Context Window Management - **The hardest technical challenge:** Managing prompt size dynamically. - Combining RAG retrieval with the agent's conversational history. - Implementing mechanics to handle max-tokens: - Sliding windows for conversation history. - Context summarization. - Deciding what to precisely evict from memory without making the agent "forget" its core objective. --- ## ๐Ÿ“Š Dashboard ![](./_images/SCR-20260305-jmid.png) --- ## ๐Ÿš€ The Result: Bridging Theory and Practice - I built an entire development framework myself from the ground up. - Moved from passive reading about AI architectures to actively solving their core engineering constraints. - Built a system based entirely on my own ideas, uniquely tailored to my development flow. - Resulted in a fully functional, cost-free, local agentic framework. --- layout: image-right image: ./_images/Gemini_Generated_Image_x6y0h0x6y0h0x6y0.png --- ## Thank You! **Questions & Discussion** https://manbothq.github.io/