BrowserOS/tools-design.md at f736884ca357468c8443fac6508be0572cfc32bc

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-21 12:55:09 +00:00

Files

Felarof 8245dfe0ff Rewrite Agent Loop (#7 )

* clean-up bunch of files for re-write

* more clean-up and adding basic agent

* Minor fix moved types into respective files.

* Deleted bunch of old files

backup

Update gitignore

Deleted a bunch of files

Remove message manager

Deleted old docs

Update rules

rename Profiler to profiler

* Temporarily adding old code

* Adding two small things back

* backup

* Implemented LangChainProvider and updated cursor rules

backup

LangChainProvider

curosr rules

* Implement tests for LangChainProvider -- unit test and integration test

integration test passes

integration test backup

* Tool Design

Tools Desing

tools design

* NavigationTool ready

NavigationTool ready

NavigationTool ready

NaivgationTool ready

backup

* MessageManager

MessageManager

backup

* Fixed integration test

* Agent design new

Updated agent design and added bunch of /NTN commands

agent new design

* Delete old agent design

* MessageManagerReadOnly class

* PlannerTool ready

PlannerTool almost ready

* ToolManager and DoneTool

* Integration of BrowserAgent

* BrowserAgent implementation v0.1

* BrowserAgent small fix v0.2

* Tool calling design

too call design

tool design claude

* Update agent tool design with // NTN

* add zod-to-json npm install

* BrowserAGent v0.3

* BrowserAgent v0.4

* BrowserAgent v0.5

* fixes

* Build error fixes in my NEWLY added code

build errors fix

* Build error fixes in old code (integration work)

backup

* Comment StreamEventProcessor for now, it is not used

* Small build error fix

* Small rename

* Added integration test to check structuredLLM and changed to 4o-mini

change default to nxtscape

integration test

* Small docstring

* Simplified BrowserAgent code and added integration test

Simplified BrowserAgent code

BrowserAGent integrationt est

* Update CLAUDE.md with project memory and instructions on how to write code

Update CLAUDE.md with project memory and instructions on how to write code

Project Memory

* Just a mova.. Moved ToolManager outside. Build works.

* TabOperations tool

TabOperations Tool and fixing some test

tab operations

* Update CLAUDE.md

* Added ClassificationTool

classifiction tool

classification prommpt

* Refactored and simplified PlannerTool unit test and integration test

* Updated Plnnaer tool

* Update CLAUDE.md

* BrowserAgent modified to do classification

BrowserAgent with classification

* minor fix to ToolManager

* Instead of ToolCall and ToolResult -- just updating message manager once

* minor fix to BrowserAgent integration test

* Changed done to "done_tool"

* Updated CLAUDE.md to reflect understanding of claude

* Uncommented stream event processor

* Renamed EventBus to StreamEventBus

* Commented StreamEventProcessor

* Event Processor

* Integrated EventProcessor with BrowserAgent

Added EventProcessor to BrowserAgetn

* Renamed StreamEventBus to EventBus

* Made EventBus required parameter in ExecutionContext

* PlanGenerator rewrite

PlanGenerator rewrite

backup

* For simple task, explicitly tell it to call done tool

* Max attempts for simple task

* backup

* Revert "backup"

This reverts commit 7d79a3d4d5774bfef79ec9827878b74edad3593f.

* Consolidating where EventBus and EventProcessor are created and initialized

backup

* Update CLAUDE.md

Update CLAUDE.md

* Improving agent loop code

Cleaned up processTooCall

classification task

* Create test-writer subAgent

test-agent-prompt

test agent prompt

test-agent-prompt

Update test-writer.md

* BrowserAgent test

Browseragent test

BrowserAgent test

* BrowserAgent refactor

backup

backup

* Minor fixes

* Minor fix

* minor change -- NEW AGENT LOOP IS WORKING WELL

* Update cursor rules

* Small change

* Improved BrowserAgent integration test

Improved BrowserAgent integration test

* Small change

* Update CLAUDE.md

* Different tools

* FindElementTool is ready

Find element update

backup

find element backup

* Updated to test strings to say "tests..."

* ScrollTool is ready

* RefreshStateTool is updated as well

* MessageManager updated

* SearchTool is ready

backup

* Interaction Element is also ready

* Add debugMessage emitter

* ValidatorTool ready and tests are passing

Validation Tool

validator tool

backup

backup

* GroupTabs tool ready

* Registered all the tools

* Planning changed to 5 steps

* BrowserAgent integration test fix

* Minor string changes

* backup

* Removed too many confusing events in EventProcessor -- there is only event.info right now

* Abort control implemented

backup

Abort

* Formatter for toolResult

Formatter for toolResult

backup

* Always render using Markdown

* Minor fix

---------

Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>

2025-07-29 08:14:45 -07:00

5.6 KiB

Raw Blame History

# **Lean LangChain Tooling for a Browser‑Agent (v2)**  

*A minimal contract for inputs **and now outputs***  

This memo explains **why** and **how** to expose your existing browser‑automation classes as LangChain tools while keeping the framework footprint microscopic.  It updates the original design with a **single, universal output contract**:

> **Every tool must return either**  
> *a plain, human‑readable sentence* **or**  
> *a 2‑field JSON envelope* → `{ ok: boolean, output: string }`.

No other structure or nesting is allowed.

---

## 1  Objectives & Philosophy
| Goal | Design Choice |
|------|---------------|
| Keep LangChain “just a thin adapter” | Use *dynamic* helpers instead of subclassing `BaseTool`. |
| Preserve rich TS classes (`FindElementTool`, …) | Wrap each class in a one‑line closure. |
| Strong runtime validation of **inputs** | Re‑use existing **Zod** schemas. |
| Predictable, tiny **outputs** | Plain string **or** `{ ok, output }` JSON. |
| Easy to eject LangChain later | All wrappers in one folder; delete them and nothing else breaks. |

---

## 2  Helper Selection Cheat‑Sheet

| When you need… | Use | Reason |
|----------------|-----|--------|
| **One** raw string argument (e.g. DNS lookup) | `DynamicTool` | Zero schema, one property (`func(input: string)`), minimal overhead. |
| **Multiple** named arguments (most browser actions) | `DynamicStructuredTool` | Accepts full Zod/JSON schema, validated, LLM‑friendly. |
| Compile‑time/static tools | `tool()` or `StructuredTool.from_function` | Leaner if you never create tools at run‑time. |

---

## 3  Universal **Output Contract**

### 3.1 Decision rule

| Situation | Return |
|-----------|--------|
| Simple acknowledgement / status | `"Clicked element #15"` *(plain string)* |
| You need the caller to know success/failure | `{"ok": true,  "output": "Clicked element #15"}` or `{"ok": false, "output": "Selector not found"}` (JSON stringified) |

> The agent will coerce anything to a string; **you** decide if you wrap it in JSON.  
> Keep `output` short—under 200 chars—to stay LLM‑friendly.

### 3.2 Type alias (copy‑paste)

```ts
type Envelope =
  | { ok: true;  output: string }
  | { ok: false; output: string };

4 Wrapper Pattern (Copy–Paste‑Ready)

4.1 `DynamicStructuredTool` example

// tools/wrappers/findElement.ts
import { DynamicStructuredTool } from "@langchain/core/tools";
import { FindElementTool }   from "../legacy/FindElementTool";
import type { Envelope }     from "../types/envelope";

const finder = new FindElementTool(executionContext);

export const findElement = new DynamicStructuredTool({
  name: "find_element",
  description: finder.config.description,
  schema: finder.config.inputSchema,
  func: async (args): Promise<string> => {
    try {
      const idx = await finder.invoke(args);      // <-- your heavy logic
      const res: Envelope = { ok: true, output: idx.toString() };
      return JSON.stringify(res);
    } catch (err) {
      const res: Envelope = {
        ok: false,
        output: err instanceof Error ? err.message : String(err)
      };
      return JSON.stringify(res);
    }
  },
});

4.2 `DynamicTool` example (single‑string input)

// tools/wrappers/dnsLookup.ts
import { DynamicTool } from "@langchain/core/tools";
import dns from "node:dns/promises";
import type { Envelope } from "../types/envelope";

export const dnsLookup = new DynamicTool({
  name: "dns_lookup",
  description: "Resolve A records for a domain",
  func: async (domain): Promise<string> => {
    try {
      const ips = (await dns.resolve4(domain)).join(", ");
      return ips;                                  // plain string is fine
    } catch (err) {
      const res: Envelope = {
        ok: false,
        output: err instanceof Error ? err.message : String(err)
      };
      return JSON.stringify(res);
    }
  },
});

Never throw: wrap every failure path inside the tool and return { ok:false, output:"…" }.

5 Agent Integration

import { ChatOpenAI }  from "@langchain/openai";
import { createReactAgent } from "@langchain/agents";
import { findElement, interact, extract, tabOps, dnsLookup } from "./tools";

const tools = [findElement, interact, extract, tabOps, dnsLookup];
const model = new ChatOpenAI({ modelName: "gpt-4o-mini" });

export const browserAgent = await createReactAgent({
  llm: model,
  tools,
  systemPrompt: "You are an expert browser assistant…",
});

6 Gotchas & Tips

Flat schemas – deep nesting confuses the model.
Short descriptions – keep under 1‑2 sentences; name should be snake_cased.
Return shape discipline – decide once per tool: plain string or envelope, and never mix.
Split supersized tools – if an LLM struggles, make narrower commands.
No UI‑only metadata – leave streamingConfig, icons, etc. out of the wrapper.
Token sanity – truncate output if it might exceed a few hundred characters.

8 Future‑Proofing

OpenAI / Anthropic tool‑calling – The { ok, output } envelope is already JSON; it travels through function‑calling untouched.
If you eject LangChain – Replace each wrapper with (args) => {…} that still returns the same envelope. Nothing else in your app changes.

✨ With this update, each tool speaks a single, predictable dialect: a very small string—either raw or wrapped—so your agent stays lean and your downstream code remains blissfully simple.

::contentReference[oaicite:0]{index=0}

5.6 KiB Raw Blame History Unescape Escape

4 Wrapper Pattern (Copy–Paste‑Ready)

4.1 DynamicStructuredTool example

4.2 DynamicTool example (single‑string input)

5 Agent Integration

6 Gotchas & Tips

8 Future‑Proofing

✨ With this update, each tool speaks a single, predictable dialect: a very small string—either raw or wrapped—so your agent stays lean and your downstream code remains blissfully simple.

5.6 KiB

Raw Blame History

4.1 `DynamicStructuredTool` example

4.2 `DynamicTool` example (single‑string input)