Files
BrowserOS/reference-code/old-lib/llm/utils/structuredOutput.ts
Felarof 8245dfe0ff Rewrite Agent Loop (#7)
* clean-up bunch of files for re-write

* more clean-up and adding basic agent

* Minor fix moved types into respective files.

* Deleted bunch of old files

backup

Update gitignore

Deleted a bunch of files

Remove message manager

Deleted old docs

Update rules

rename Profiler to profiler

* Temporarily adding old code

* Adding two small things back

* backup

* Implemented LangChainProvider and updated cursor rules

backup

LangChainProvider

curosr rules

* Implement tests for LangChainProvider -- unit test and integration test

integration test passes

integration test backup

* Tool Design

Tools Desing

tools design

* NavigationTool ready

NavigationTool ready

NavigationTool ready

NaivgationTool ready

backup

* MessageManager

MessageManager

backup

* Fixed integration test

* Agent design new

Updated agent design and added bunch of /NTN commands

agent new design

* Delete old agent design

* MessageManagerReadOnly class

* PlannerTool ready

PlannerTool almost ready

* ToolManager and DoneTool

* Integration of BrowserAgent

* BrowserAgent implementation v0.1

* BrowserAgent small fix v0.2

* Tool calling design

too call design

tool design claude

* Update agent tool design with // NTN

* add zod-to-json npm install

* BrowserAGent v0.3

* BrowserAgent v0.4

* BrowserAgent v0.5

* fixes

* Build error fixes in my NEWLY added code

build errors fix

* Build error fixes in old code (integration work)

backup

* Comment StreamEventProcessor for now, it is not used

* Small build error fix

* Small rename

* Added integration test to check structuredLLM and changed to 4o-mini

change default to nxtscape

integration test

* Small docstring

* Simplified BrowserAgent code and added integration test

Simplified BrowserAgent code

BrowserAGent integrationt est

* Update CLAUDE.md with project memory and instructions on how to write code

Update CLAUDE.md with project memory and instructions on how to write code

Project Memory

* Just a mova.. Moved ToolManager outside. Build works.

* TabOperations tool

TabOperations Tool and fixing some test

tab operations

* Update CLAUDE.md

* Added ClassificationTool

classifiction tool

classification prommpt

* Refactored and simplified PlannerTool unit test and integration test

* Updated Plnnaer tool

* Update CLAUDE.md

* BrowserAgent modified to do classification

BrowserAgent with classification

* minor fix to ToolManager

* Instead of ToolCall and ToolResult -- just updating message manager once

* minor fix to BrowserAgent integration test

* Changed done to "done_tool"

* Updated CLAUDE.md to reflect understanding of claude

* Uncommented stream event processor

* Renamed EventBus to StreamEventBus

* Commented StreamEventProcessor

* Event Processor

* Integrated EventProcessor with BrowserAgent

Added EventProcessor to BrowserAgetn

* Renamed StreamEventBus to EventBus

* Made EventBus required parameter in ExecutionContext

* PlanGenerator rewrite

PlanGenerator rewrite

backup

* For simple task, explicitly tell it to call done tool

* Max attempts for simple task

* backup

* Revert "backup"

This reverts commit 7d79a3d4d5774bfef79ec9827878b74edad3593f.

* Consolidating where EventBus and EventProcessor are created and initialized

backup

* Update CLAUDE.md

Update CLAUDE.md

* Improving agent loop code

Cleaned up processTooCall

classification task

* Create test-writer subAgent

test-agent-prompt

test agent prompt

test-agent-prompt

Update test-writer.md

* BrowserAgent test

Browseragent test

BrowserAgent test

* BrowserAgent refactor

backup

backup

* Minor fixes

* Minor fix

* minor change -- NEW AGENT LOOP IS WORKING WELL

* Update cursor rules

* Small change

* Improved BrowserAgent integration test

Improved BrowserAgent integration test

* Small change

* Update CLAUDE.md

* Different tools

* FindElementTool is ready

Find element update

backup

find element backup

* Updated to test strings to say "tests..."

* ScrollTool is ready

* RefreshStateTool is updated as well

* MessageManager updated

* SearchTool is ready

backup

* Interaction Element is also ready

* Add debugMessage emitter

* ValidatorTool ready and tests are passing

Validation Tool

validator tool

backup

backup

* GroupTabs tool ready

* Registered all the tools

* Planning changed to 5 steps

* BrowserAgent integration test fix

* Minor string changes

* backup

* Removed too many confusing events in EventProcessor -- there is only event.info right now

* Abort control implemented

backup

Abort

* Formatter for toolResult

Formatter for toolResult

backup

* Always render using Markdown

* Minor fix

---------

Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>
2025-07-29 08:14:45 -07:00

196 lines
7.2 KiB
TypeScript
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import { BaseChatModel } from '@langchain/core/language_models/chat_models';
import { z } from 'zod';
import { LLMSettingsReader } from '../settings/LLMSettingsReader';
import { Logging } from '@/lib/utils/Logging';
/**
* OpenAIs Structured Outputs mode does NOT allow optional object properties:
* every key that appears in `properties` must also appear in the
* `required` array. The canonical workaround is to keep the property
* required while permitting the value `null`.
*
* To automate this, we walk the supplied Zod schema and:
* • replace each `z.optional(T)` with `T.nullable()` (required-nullable)
* • recurse into nested objects and arrays so the rule is applied deeply.
*
* The resulting schema can be fed into `zodToJsonSchema` → OpenAI without
* triggering the error: “uses .optional() without .nullable()”.
*/
function makeOpenAICompatible<T extends z.ZodTypeAny>(schema: T): T {
// Handle ZodOptional by unwrapping it and adding .nullable()
if (schema instanceof z.ZodOptional) {
const inner = makeOpenAICompatible((schema as any)._def.innerType);
return (inner.nullable() as unknown) as T; // required + nullable
}
// Recursively process object shapes
if (schema instanceof z.ZodObject) {
const newShape: Record<string, z.ZodTypeAny> = {};
for (const [key, value] of Object.entries(schema.shape)) {
newShape[key] = makeOpenAICompatible(value as z.ZodTypeAny);
}
return (z.object(newShape) as unknown) as T;
}
// Process arrays by transforming their element type
if (schema instanceof z.ZodArray) {
const element = makeOpenAICompatible((schema as any)._def.type);
return (z.array(element) as unknown) as T;
}
// Leave all other schema types unchanged
return schema;
}
/**
* Ollama models sometimes serialise enum values with different capitalisation
* (e.g. "HIGH" vs "high"). This helper walks a schema and replaces every
* ZodEnum/ZodNativeEnum with a case-insensitive equivalent that lower-cases
* the input before validation, then returns the canonical lower-case value.
*/
function makeOllamaCompatible<T extends z.ZodTypeAny>(schema: T): T {
// Handle optional by preserving optionality after transformation
if (schema instanceof z.ZodOptional) {
const inner = makeOllamaCompatible((schema as any)._def.innerType);
return (inner.optional() as unknown) as T;
}
// Case-insensitive enums
if (schema instanceof z.ZodEnum) {
const values: string[] = (schema as any).options ?? (schema as any)._def.values;
const lower = values.map(v => v.toLowerCase()) as [string, ...string[]];
const ciEnum = z.preprocess((val) => typeof val === 'string' ? val.toLowerCase() : val, z.enum(lower));
return (ciEnum as unknown) as T;
}
if (schema instanceof z.ZodNativeEnum) {
const nativeEnum = (schema as any)._def.values;
const stringVals = Object.values(nativeEnum).filter(v => typeof v === 'string') as string[];
const lower = stringVals.map(v => v.toLowerCase()) as [string, ...string[]];
const ciEnum = z.preprocess((val) => typeof val === 'string' ? val.toLowerCase() : val, z.enum(lower));
return (ciEnum as unknown) as T;
}
// Recursively process object shapes
if (schema instanceof z.ZodObject) {
const newShape: Record<string, z.ZodTypeAny> = {};
for (const [key, value] of Object.entries(schema.shape)) {
newShape[key] = makeOllamaCompatible(value as z.ZodTypeAny);
}
return (z.object(newShape) as unknown) as T;
}
// Arrays transform element type
if (schema instanceof z.ZodArray) {
const elem = makeOllamaCompatible((schema as any)._def.type);
return (z.array(elem) as unknown) as T;
}
return schema;
}
/**
* Creates a Zod schema that handles both direct and function-wrapped formats
* This is designed to handle Ollama and other LLMs that return function-calling format
*/
export function createFlexibleSchema<T>(baseSchema: z.ZodSchema<T>): z.ZodSchema<T> {
return z.union([
// Accept direct format (what we expect)
baseSchema,
// Accept function calling format and extract arguments
z.object({
name: z.string(),
arguments: baseSchema
}).transform(data => data.arguments),
// Accept function_call format (older OpenAI style)
z.object({
function_call: z.object({
name: z.string(),
arguments: baseSchema
})
}).transform(data => data.function_call!.arguments),
// Accept arguments as string that needs parsing
z.object({
name: z.string(),
arguments: z.string()
}).transform(data => {
const parsed = JSON.parse(data.arguments);
return baseSchema.parse(parsed);
}),
// Accept tool_calls array format (newer OpenAI style)
z.object({
tool_calls: z.array(z.object({
function: z.object({
name: z.string(),
arguments: z.union([baseSchema, z.string()])
})
})).min(1) // Ensure at least one tool call
}).transform(data => {
const firstCall = data.tool_calls[0];
if (!firstCall || !firstCall.function) {
throw new Error('No tool calls found');
}
const args = firstCall.function.arguments;
if (typeof args === 'string') {
return baseSchema.parse(JSON.parse(args));
}
return args;
})
]) as z.ZodSchema<T>;
}
/**
* Provider-aware wrapper around `llm.withStructuredOutput()` that hides all
* provider-specific edge-cases from callers.
*
* Strategy:
* • Ollama models tend to wrap answers in tool-call envelopes ⇒ try a
* `flexibleSchema` (handles wrappers) with case-insensitive enums via
* `makeOllamaCompatible`, then other fallbacks.
* • Non-Ollama (OpenAI, Claude, …) must satisfy OpenAIs “no optional
* properties” rule ⇒ try `makeOpenAICompatible` first, then
* `flexibleSchema`, finally the plain schema.
*
* Each attempt is sandboxed; on failure we log a `warning` and proceed to the
* next attempt, guaranteeing we always return a configured model instead of
* throwing early.
*/
export async function withFlexibleStructuredOutput<T>(
llm: BaseChatModel,
schema: z.ZodSchema<T>
): Promise<any> { // Return 'any' for compatibility
const settings = await LLMSettingsReader.read();
const provider = settings.defaultProvider;
// Helper: attempt a schema and swallow errors, logging them for debug
const attempt = (factory: () => z.ZodSchema<any>): any | undefined => {
try {
const sch = factory();
return llm.withStructuredOutput(sch as any);
} catch (err) {
Logging.log('structuredOutput', `Schema attempt failed: ${err}`, 'warning');
return undefined;
}
};
if (provider === 'ollama') {
// 1⃣ Ollama: case-insensitive enums + flexible wrappers first
return (
attempt(() => createFlexibleSchema(makeOllamaCompatible(schema))) ??
attempt(() => createFlexibleSchema(schema)) ??
attempt(() => makeOllamaCompatible(schema)) ??
attempt(() => schema)
);
}
// 2⃣ Non-Ollama (OpenAI, Claude, etc.) → OpenAI-compatible → flexible → plain
return (
attempt(() => makeOpenAICompatible(schema)) ??
attempt(() => createFlexibleSchema(schema)) ??
attempt(() => schema)
);
}