mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-21 04:45:12 +00:00
* clean-up bunch of files for re-write * more clean-up and adding basic agent * Minor fix moved types into respective files. * Deleted bunch of old files backup Update gitignore Deleted a bunch of files Remove message manager Deleted old docs Update rules rename Profiler to profiler * Temporarily adding old code * Adding two small things back * backup * Implemented LangChainProvider and updated cursor rules backup LangChainProvider curosr rules * Implement tests for LangChainProvider -- unit test and integration test integration test passes integration test backup * Tool Design Tools Desing tools design * NavigationTool ready NavigationTool ready NavigationTool ready NaivgationTool ready backup * MessageManager MessageManager backup * Fixed integration test * Agent design new Updated agent design and added bunch of /NTN commands agent new design * Delete old agent design * MessageManagerReadOnly class * PlannerTool ready PlannerTool almost ready * ToolManager and DoneTool * Integration of BrowserAgent * BrowserAgent implementation v0.1 * BrowserAgent small fix v0.2 * Tool calling design too call design tool design claude * Update agent tool design with // NTN * add zod-to-json npm install * BrowserAGent v0.3 * BrowserAgent v0.4 * BrowserAgent v0.5 * fixes * Build error fixes in my NEWLY added code build errors fix * Build error fixes in old code (integration work) backup * Comment StreamEventProcessor for now, it is not used * Small build error fix * Small rename * Added integration test to check structuredLLM and changed to 4o-mini change default to nxtscape integration test * Small docstring * Simplified BrowserAgent code and added integration test Simplified BrowserAgent code BrowserAGent integrationt est * Update CLAUDE.md with project memory and instructions on how to write code Update CLAUDE.md with project memory and instructions on how to write code Project Memory * Just a mova.. Moved ToolManager outside. Build works. * TabOperations tool TabOperations Tool and fixing some test tab operations * Update CLAUDE.md * Added ClassificationTool classifiction tool classification prommpt * Refactored and simplified PlannerTool unit test and integration test * Updated Plnnaer tool * Update CLAUDE.md * BrowserAgent modified to do classification BrowserAgent with classification * minor fix to ToolManager * Instead of ToolCall and ToolResult -- just updating message manager once * minor fix to BrowserAgent integration test * Changed done to "done_tool" * Updated CLAUDE.md to reflect understanding of claude * Uncommented stream event processor * Renamed EventBus to StreamEventBus * Commented StreamEventProcessor * Event Processor * Integrated EventProcessor with BrowserAgent Added EventProcessor to BrowserAgetn * Renamed StreamEventBus to EventBus * Made EventBus required parameter in ExecutionContext * PlanGenerator rewrite PlanGenerator rewrite backup * For simple task, explicitly tell it to call done tool * Max attempts for simple task * backup * Revert "backup" This reverts commit 7d79a3d4d5774bfef79ec9827878b74edad3593f. * Consolidating where EventBus and EventProcessor are created and initialized backup * Update CLAUDE.md Update CLAUDE.md * Improving agent loop code Cleaned up processTooCall classification task * Create test-writer subAgent test-agent-prompt test agent prompt test-agent-prompt Update test-writer.md * BrowserAgent test Browseragent test BrowserAgent test * BrowserAgent refactor backup backup * Minor fixes * Minor fix * minor change -- NEW AGENT LOOP IS WORKING WELL * Update cursor rules * Small change * Improved BrowserAgent integration test Improved BrowserAgent integration test * Small change * Update CLAUDE.md * Different tools * FindElementTool is ready Find element update backup find element backup * Updated to test strings to say "tests..." * ScrollTool is ready * RefreshStateTool is updated as well * MessageManager updated * SearchTool is ready backup * Interaction Element is also ready * Add debugMessage emitter * ValidatorTool ready and tests are passing Validation Tool validator tool backup backup * GroupTabs tool ready * Registered all the tools * Planning changed to 5 steps * BrowserAgent integration test fix * Minor string changes * backup * Removed too many confusing events in EventProcessor -- there is only event.info right now * Abort control implemented backup Abort * Formatter for toolResult Formatter for toolResult backup * Always render using Markdown * Minor fix --------- Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>
5.6 KiB
5.6 KiB
# **Lean LangChain Tooling for a Browser‑Agent (v2)**
*A minimal contract for inputs **and now outputs***
This memo explains **why** and **how** to expose your existing browser‑automation classes as LangChain tools while keeping the framework footprint microscopic. It updates the original design with a **single, universal output contract**:
> **Every tool must return either**
> *a plain, human‑readable sentence* **or**
> *a 2‑field JSON envelope* → `{ ok: boolean, output: string }`.
No other structure or nesting is allowed.
---
## 1 Objectives & Philosophy
| Goal | Design Choice |
|------|---------------|
| Keep LangChain “just a thin adapter” | Use *dynamic* helpers instead of subclassing `BaseTool`. |
| Preserve rich TS classes (`FindElementTool`, …) | Wrap each class in a one‑line closure. |
| Strong runtime validation of **inputs** | Re‑use existing **Zod** schemas. |
| Predictable, tiny **outputs** | Plain string **or** `{ ok, output }` JSON. |
| Easy to eject LangChain later | All wrappers in one folder; delete them and nothing else breaks. |
---
## 2 Helper Selection Cheat‑Sheet
| When you need… | Use | Reason |
|----------------|-----|--------|
| **One** raw string argument (e.g. DNS lookup) | `DynamicTool` | Zero schema, one property (`func(input: string)`), minimal overhead. |
| **Multiple** named arguments (most browser actions) | `DynamicStructuredTool` | Accepts full Zod/JSON schema, validated, LLM‑friendly. |
| Compile‑time/static tools | `tool()` or `StructuredTool.from_function` | Leaner if you never create tools at run‑time. |
---
## 3 Universal **Output Contract**
### 3.1 Decision rule
| Situation | Return |
|-----------|--------|
| Simple acknowledgement / status | `"Clicked element #15"` *(plain string)* |
| You need the caller to know success/failure | `{"ok": true, "output": "Clicked element #15"}` or `{"ok": false, "output": "Selector not found"}` (JSON stringified) |
> The agent will coerce anything to a string; **you** decide if you wrap it in JSON.
> Keep `output` short—under 200 chars—to stay LLM‑friendly.
### 3.2 Type alias (copy‑paste)
```ts
type Envelope =
| { ok: true; output: string }
| { ok: false; output: string };
4 Wrapper Pattern (Copy–Paste‑Ready)
4.1 DynamicStructuredTool example
// tools/wrappers/findElement.ts
import { DynamicStructuredTool } from "@langchain/core/tools";
import { FindElementTool } from "../legacy/FindElementTool";
import type { Envelope } from "../types/envelope";
const finder = new FindElementTool(executionContext);
export const findElement = new DynamicStructuredTool({
name: "find_element",
description: finder.config.description,
schema: finder.config.inputSchema,
func: async (args): Promise<string> => {
try {
const idx = await finder.invoke(args); // <-- your heavy logic
const res: Envelope = { ok: true, output: idx.toString() };
return JSON.stringify(res);
} catch (err) {
const res: Envelope = {
ok: false,
output: err instanceof Error ? err.message : String(err)
};
return JSON.stringify(res);
}
},
});
4.2 DynamicTool example (single‑string input)
// tools/wrappers/dnsLookup.ts
import { DynamicTool } from "@langchain/core/tools";
import dns from "node:dns/promises";
import type { Envelope } from "../types/envelope";
export const dnsLookup = new DynamicTool({
name: "dns_lookup",
description: "Resolve A records for a domain",
func: async (domain): Promise<string> => {
try {
const ips = (await dns.resolve4(domain)).join(", ");
return ips; // plain string is fine
} catch (err) {
const res: Envelope = {
ok: false,
output: err instanceof Error ? err.message : String(err)
};
return JSON.stringify(res);
}
},
});
Never throw: wrap every failure path inside the tool and return
{ ok:false, output:"…" }.
5 Agent Integration
import { ChatOpenAI } from "@langchain/openai";
import { createReactAgent } from "@langchain/agents";
import { findElement, interact, extract, tabOps, dnsLookup } from "./tools";
const tools = [findElement, interact, extract, tabOps, dnsLookup];
const model = new ChatOpenAI({ modelName: "gpt-4o-mini" });
export const browserAgent = await createReactAgent({
llm: model,
tools,
systemPrompt: "You are an expert browser assistant…",
});
6 Gotchas & Tips
- Flat schemas – deep nesting confuses the model.
- Short descriptions – keep under 1‑2 sentences; name should be snake_cased.
- Return shape discipline – decide once per tool: plain string or envelope, and never mix.
- Split supersized tools – if an LLM struggles, make narrower commands.
- No UI‑only metadata – leave
streamingConfig, icons, etc. out of the wrapper. - Token sanity – truncate
outputif it might exceed a few hundred characters.
8 Future‑Proofing
- OpenAI / Anthropic tool‑calling – The
{ ok, output }envelope is already JSON; it travels through function‑calling untouched. - If you eject LangChain – Replace each wrapper with
(args) => {…}that still returns the same envelope. Nothing else in your app changes.
✨ With this update, each tool speaks a single, predictable dialect: a very small string—either raw or wrapped—so your agent stays lean and your downstream code remains blissfully simple.
::contentReference[oaicite:0]{index=0}