Commit Graph

12 Commits

Author SHA1 Message Date
Nikhil
e25d463d95 NewAgent (#91)
* PocAgent + refactor (#77)

* Screenshot tool fixed

* ReactAgent loop

ReactAgent loop

v0.2

* Trim to max tokens implemented correctly

trim max tokens

* JSON Parse fix

Fixed json.parse

* Minor fix -- add system message always at position 0

* minor fix

* Added support for passing screenshot size to captureScreenshot

backup

* Make react agent use screenshot tool

* Refactor backend and execution (#75)

* wip: new exection class and manager

* wip: new pubsub channels

* wip: new background handlers

* new execution logic

* removed execution status

* handle workflow status for processing in sidepanel

* mcp server fix

* sending pause message

* better portName parsing, sidepanel sends tabId, storing tabId too

* 49.0.0.26 release

* docs: OmkarBansod02 signed the CLA in browseros-ai/BrowserOS-agent#$pullRequestNo

* Refactor backend and execution (#75)

* wip: new exection class and manager

* wip: new pubsub channels

* wip: new background handlers

* new execution logic

* removed execution status

* handle workflow status for processing in sidepanel

* mcp server fix

* sending pause message

* better portName parsing, sidepanel sends tabId, storing tabId too

* Moved react agent into POCAgent

* Revert changes of ReacStrategy from BrowserAgent

* Minor fix

* fix: execution class abort issue

* clean-up: removed un-used MessageTypes

* clean-up: execution-manager simplified

* better abort handling

---------

Co-authored-by: Felarof <nithin.sonti@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Screenshot failure handling

* fixes to custom agent -> sidepanel

* fix: portName stable

* fix: pause/reset + newtab connector

* New agent: wip-1

* fix: JSONify message content in MessageManagerReadyOnly

* disable cacheLLM and fixing image type detection

* new agent: wip-2

* rename: <BrowserState> as <browser-state> and <SystemReminder> as <system-reminder>

* new-agent: wip-3

---------

Co-authored-by: Felarof <nithin.sonti@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-09-12 11:12:45 -07:00
Felarof
1abbee638a Braintrust basic evals (#87)
* implement validator eval

* implement online eval foundation

* further implementing online evals

* enhance evaluation data logging

* implement LLM scoring, remove redundant EventEnricher

* cleanup

* fix build errs from merging, extend LLM scorer context

* settled evaluation framework

* update evals documentation

* fix evals screenshots

* fix typos

* Evals config moved to env variables and tested

* test

* Update manifest to 49.1

* Removed duplciate + button

* Just use previous way of registering tools as that is not required for evals

* Add claude commands for research, plan and implement

* evals2 research and plan

implementation plan

new implementation plan

* Evals2 implementation

test test

* Removed old eval hooks

Remove old evals hooks

* evals 2 added to env

* Eval2 enhancement plan

backup

* Make Braintrust project configurable

Make Braintrust project configurable

* Enhanced scorer -- using Gemini 2.5 pro for evaluation

backup v0.1

enhancement v0.2

v0.2

backup v0.3

backup v0.4

* Deleted old evals directory

* Clean up old evals code

* Bunch of fixes and improvements

backup

fixes 0.1

more fixes

fixes

more elaborate prompts

braintrust logger fix

* Renamed files

backup
2025-09-05 18:04:07 -07:00
Felarof
e54f7049eb Tracking for MCP tools
react agent loop
2025-08-21 13:09:55 -07:00
Felarof
92e812a631 aug14 -- ChatMode in agent works much better (#46)
* Better ChatAgent

ChatAgent is context uses BrowserState messages

* Changed system-context to BrowserState and system-reminder to SystemReminder

* Multiple tab selection works with ChatAgent

multiple tabs fix

* Minor fix
2025-08-14 18:26:17 -07:00
Felarof
f401bdeabe Chat Mode design
Chat Mode design v0.1

ChatMode design v0.2

ChatMode desing v0.3

minor v0.4
2025-08-14 09:50:30 -07:00
Felarof
4764248a0a MCP design
MCP design

mcp design 4

kavlsi design 3

new design
2025-08-14 09:50:30 -07:00
gbsierra
3a9eb0e85b Updates to sidepanel v2 (#24)
* Sidepanel v2 design

* Sidepanel v2 works

* v0.2

* phase v0.2

* V0.3

* Changed everything to consistently use TailwindCSS

* Phase v0.4

* BrowserAgent small fixes

Todo manager updated

backup

* fix assistant output

* v2 ui update

* hide select tabs button, add help modal

* general UI refinements

* subtle ui changes

* Fix

---------

Co-authored-by: Felarof <nithin.sonti@gmail.com>
2025-08-07 08:19:54 -07:00
Felarof
0c7e1d8432 Simpler abort implementation and faster token counting (#23)
* Abort design

abort design

new design

* Backup of ExecutionState mgmt

* Revert "Backup of ExecutionState mgmt"

This reverts commit a3e61da85f.

* Simpler Abort Implementation

* token counting backup

* token counting

* token counting

---------

Co-authored-by: Nikhil <nikhilsv92@gmail.com>
2025-08-07 07:31:29 -07:00
Felarof
5f31ccb293 Bunch of minor fixes (#20)
* Removing inner loop, let agent handle the todo

* Renamed to todo_manager_tool

* replace_all todos instead of appending

* POCAgent + other fixes (#18)

* Removing inner loop, let agent handle the todo

* Renamed to todo_manager_tool

* Setting correct max context length in message manager for different providers

Setting correct max context length in message manager for different providers

backup

max tokens

* BrowserAgent inner loop implementation

Setting model name as gpt4 for nxtscape so that token counting works

Don't add browser refresh state twice

inner loop

Add browserstate message as browser state to remove any duplicates

* Fixed prompts -- saving BrowserState as <system-context> and others are being saved as <system-reminder>

Don't emit browser state message

Changed <system-reminder> to <system-context>

backup

updated the prompt

small fix

backup

* Changed default model to opus

* Add simplified BrowserState for RefreshStateTool

* Minor fix

* Make message manager trim messages during getMessages as well

* minor fix

* Implemented simple retry logic for planner tool and validator tool

* Update tools with retry logic

* changed back to gemini 2.5

---------

Co-authored-by: Nikhil <nikhilsv92@gmail.com>
2025-08-06 08:48:45 -07:00
Nikhil
c23c3c4677 New agent architecture (#10)
* add show version in help

* minor update to find_element prompt

* screenshot tool

* adding screenshot to browser agent

* PlannerTool prompt updated

* Minor changes

* Passing Task as context to FindElementTool

* Update todo-tool-design.md

Create todo-tool-design.md

Update todo-tool-design.md

* TodoList Manager

TodoList Manager

TodoList Manager

todo list manager v0.1

* Markdown fix

* Small update to BrowserAgent to populate TODOs from plan

* Added another test

* Small refinement

* Add BrowserState as system reminder

* extract tool added

* extraction integration test

* registring extract tool

* fix extract tool build error

* Updated some event info

* Get selected tabs tool

Get selected tabs tool

* Todo List being displayed

* Format Tool

* Remove newline after format tool

* fix default nextscape model as per liteLLM

* Collapseable event execution

Collapseable event execution

* Revert "Collapseable event execution"

This reverts commit 9e3833931162eff06778e46480e6691eb508ff44.

* move to open router

* Revert "move to open router"

This reverts commit c25f5c68f4f7b5dae54dcbb3a6e97c3faf2efa5d.

* debug test file

* Event emitter

This reverts commit 39c78ec47633616b5a52eea5d323811f60ab8eba.

* Rename

* Created a new markdown rendering engine, which displays table correctly, removed react-markdown

* fix: openAI requires toolMessage after tool call

* fix: Claude can't have system messages other than top. Make systemReminder as humanMessage similar to claude code

* fix: adding gemini support

* Added new ToolResult emitter -- and renamed others to mean what they are

backup

backup

Small fix

* ListTabs fixed

* Tool icon changes

* remove spinner from side panel

* new limits for sub-loop and main loop

* new ResultTool to summarise and output result

* integrate new result tool

* removed emit.complete as emitTaskResult is enough

---------

Co-authored-by: Felarof <nithin.sonti@gmail.com>
2025-07-30 16:56:26 -07:00
Felarof
8245dfe0ff Rewrite Agent Loop (#7)
* clean-up bunch of files for re-write

* more clean-up and adding basic agent

* Minor fix moved types into respective files.

* Deleted bunch of old files

backup

Update gitignore

Deleted a bunch of files

Remove message manager

Deleted old docs

Update rules

rename Profiler to profiler

* Temporarily adding old code

* Adding two small things back

* backup

* Implemented LangChainProvider and updated cursor rules

backup

LangChainProvider

curosr rules

* Implement tests for LangChainProvider -- unit test and integration test

integration test passes

integration test backup

* Tool Design

Tools Desing

tools design

* NavigationTool ready

NavigationTool ready

NavigationTool ready

NaivgationTool ready

backup

* MessageManager

MessageManager

backup

* Fixed integration test

* Agent design new

Updated agent design and added bunch of /NTN commands

agent new design

* Delete old agent design

* MessageManagerReadOnly class

* PlannerTool ready

PlannerTool almost ready

* ToolManager and DoneTool

* Integration of BrowserAgent

* BrowserAgent implementation v0.1

* BrowserAgent small fix v0.2

* Tool calling design

too call design

tool design claude

* Update agent tool design with // NTN

* add zod-to-json npm install

* BrowserAGent v0.3

* BrowserAgent v0.4

* BrowserAgent v0.5

* fixes

* Build error fixes in my NEWLY added code

build errors fix

* Build error fixes in old code (integration work)

backup

* Comment StreamEventProcessor for now, it is not used

* Small build error fix

* Small rename

* Added integration test to check structuredLLM and changed to 4o-mini

change default to nxtscape

integration test

* Small docstring

* Simplified BrowserAgent code and added integration test

Simplified BrowserAgent code

BrowserAGent integrationt est

* Update CLAUDE.md with project memory and instructions on how to write code

Update CLAUDE.md with project memory and instructions on how to write code

Project Memory

* Just a mova.. Moved ToolManager outside. Build works.

* TabOperations tool

TabOperations Tool and fixing some test

tab operations

* Update CLAUDE.md

* Added ClassificationTool

classifiction tool

classification prommpt

* Refactored and simplified PlannerTool unit test and integration test

* Updated Plnnaer tool

* Update CLAUDE.md

* BrowserAgent modified to do classification

BrowserAgent with classification

* minor fix to ToolManager

* Instead of ToolCall and ToolResult -- just updating message manager once

* minor fix to BrowserAgent integration test

* Changed done to "done_tool"

* Updated CLAUDE.md to reflect understanding of claude

* Uncommented stream event processor

* Renamed EventBus to StreamEventBus

* Commented StreamEventProcessor

* Event Processor

* Integrated EventProcessor with BrowserAgent

Added EventProcessor to BrowserAgetn

* Renamed StreamEventBus to EventBus

* Made EventBus required parameter in ExecutionContext

* PlanGenerator rewrite

PlanGenerator rewrite

backup

* For simple task, explicitly tell it to call done tool

* Max attempts for simple task

* backup

* Revert "backup"

This reverts commit 7d79a3d4d5774bfef79ec9827878b74edad3593f.

* Consolidating where EventBus and EventProcessor are created and initialized

backup

* Update CLAUDE.md

Update CLAUDE.md

* Improving agent loop code

Cleaned up processTooCall

classification task

* Create test-writer subAgent

test-agent-prompt

test agent prompt

test-agent-prompt

Update test-writer.md

* BrowserAgent test

Browseragent test

BrowserAgent test

* BrowserAgent refactor

backup

backup

* Minor fixes

* Minor fix

* minor change -- NEW AGENT LOOP IS WORKING WELL

* Update cursor rules

* Small change

* Improved BrowserAgent integration test

Improved BrowserAgent integration test

* Small change

* Update CLAUDE.md

* Different tools

* FindElementTool is ready

Find element update

backup

find element backup

* Updated to test strings to say "tests..."

* ScrollTool is ready

* RefreshStateTool is updated as well

* MessageManager updated

* SearchTool is ready

backup

* Interaction Element is also ready

* Add debugMessage emitter

* ValidatorTool ready and tests are passing

Validation Tool

validator tool

backup

backup

* GroupTabs tool ready

* Registered all the tools

* Planning changed to 5 steps

* BrowserAgent integration test fix

* Minor string changes

* backup

* Removed too many confusing events in EventProcessor -- there is only event.info right now

* Abort control implemented

backup

Abort

* Formatter for toolResult

Formatter for toolResult

backup

* Always render using Markdown

* Minor fix

---------

Co-authored-by: Nikhil Sonti <nikhilsv92@gmail.com>
2025-07-29 08:14:45 -07:00
Felarof
3c83077bbe Separate out side panel agent into a new repo 2025-07-18 08:36:51 -07:00