LLM/BrowserOS

Fork 0

mirror of https://github.com/browseros-ai/BrowserOS.git synced 2026-05-21 04:45:12 +00:00

Files

Felarof 3c83077bbe Separate out side panel agent into a new repo

2025-07-18 08:36:51 -07:00

7.4 KiB

Raw Blame History

Tab Execution Locking Design

Requirements

Problem Statement

When an agent starts executing on a specific tab, it should remain locked to that tab throughout execution. However, the current implementation has issues:

Tab Switching During Execution: When users switch tabs or when navigation creates new tabs, the BrowserContext automatically switches to the newly active tab
Agent Follows User: The agent abandons its original tab and starts operating on whatever tab the user switches to
Execution Disruption: This breaks the agent's workflow and can cause unexpected behavior

Desired Behavior

Agent should stay attached to its initial tab throughout execution
Users should be able to freely switch tabs without affecting the agent
After execution completes, the system should update to reflect the actual active tab

Solutions Discussed

1. ExecutionManager Pattern (Complex)

Create a new ExecutionManager class to manage execution state across tabs.

Pros:

Clean separation of concerns
Supports future multi-tab execution
Single source of truth for execution state

Cons:

Over-engineering for current single-execution model
Requires significant refactoring
Adds complexity without immediate benefit

class ExecutionManager {
  private currentExecution: ExecutionInfo | null = null;
  
  startExecution(params: {...}): string { }
  completeExecution(executionId: string): void { }
  cancelCurrent(isUserInitiated: boolean): boolean { }
}

2. Pass Locked Tab as Parameter

Modify all BrowserContext methods to accept an optional preferred tab ID.

Pros:

Explicit control over which tab to use
Backward compatible
No architectural changes

Cons:

Requires updating every tool's getCurrentPage() call
Scattered changes across codebase
Verbose and repetitive

// Before
const page = await browserContext.getCurrentPage();

// After
const page = await browserContext.getCurrentPage(executionContext.getLockedTabId());

3. LockedBrowserContext Wrapper

Create a wrapper class that overrides BrowserContext behavior during execution.

Pros:

Single point of control
Clean separation of execution logic
No changes to existing tools

Cons:

Requires wrapping/proxying many methods
Complex inheritance or composition patterns
Potential for subtle bugs

class LockedBrowserContext {
  constructor(private browserContext: BrowserContext, private lockedTabId: number) {}
  
  async getCurrentPage(): Promise<Page> {
    return this.browserContext.attachToTab(this.lockedTabId);
  }
  // ... wrap all other methods
}

4. Dynamic Proxy Pattern

Use JavaScript Proxy to dynamically intercept BrowserContext calls.

Pros:

Zero boilerplate
Dynamic behavior modification
Original BrowserContext unchanged

Cons:

"Magic" behavior can be hard to debug
TypeScript type safety challenges
Performance overhead

class LockedBrowserContext {
  static create(browserContext: BrowserContext, lockedTabId: number): BrowserContext {
    return new Proxy(browserContext, {
      get(target, prop) {
        if (prop === 'getCurrentPage') {
          return () => target.attachToTab(lockedTabId);
        }
        return target[prop];
      }
    });
  }
}

5. State Pattern in BrowserContext

Add a "mode" to BrowserContext that changes its behavior.

Pros:

All methods automatically respect the mode
No duplicate classes
Easy to implement

Cons:

Still requires modifying multiple methods
Mixes execution state with browser state

class BrowserContext {
  private _mode: 'normal' | 'locked' = 'normal';
  private _lockedTabId?: number;
  
  enterLockedMode(tabId: number) { }
  exitLockedMode() { }
  
  private getEffectiveTabId(): number {
    return this._mode === 'locked' ? this._lockedTabId! : this._currentTabId;
  }
}

Final Solution: Minimal Lock in updateCurrentTabId()

After analyzing all approaches, we implemented the simplest solution that addresses the root cause:

The Key Insight

The problem isn't that BrowserContext tracks the current tab - it's that tab event listeners update _currentTabId during execution. We only need to prevent these updates during execution.

Implementation

// In BrowserContext
export class BrowserContext {
  // Execution lock state
  private _isExecutionLocked: boolean = false;
  private _executionLockedTabId: number | null = null;
  
  /**
   * Lock execution to a specific tab
   * This prevents tab switches during agent execution
   */
  public lockExecutionToTab(tabId: number): void {
    this._isExecutionLocked = true;
    this._executionLockedTabId = tabId;
    this._currentTabId = tabId; // Force current tab to locked tab
    Logging.log('BrowserContext', `Execution locked to tab ${tabId}`);
  }
  
  /**
   * Unlock execution and update to the actual active tab
   */
  public async unlockExecution(): Promise<void> {
    this._isExecutionLocked = false;
    this._executionLockedTabId = null;
    
    // Get the actual active tab from Chrome
    try {
      const activeTab = await this.getActiveTab();
      if (activeTab && activeTab.id) {
        this._currentTabId = activeTab.id;
        Logging.log('BrowserContext', `Execution unlocked, current tab updated to active tab ${activeTab.id}`);
      }
    } catch (error) {
      Logging.log('BrowserContext', `Failed to get active tab after unlock: ${error}`, 'warning');
    }
  }
  
  /**
   * Update the current tab ID (internal use only)
   */
  private updateCurrentTabId(tabId: number): void {
    // If execution is locked, ignore tab updates
    if (this._isExecutionLocked) {
      Logging.log('BrowserContext', `Ignoring tab switch to ${tabId}, execution locked to tab ${this._executionLockedTabId}`);
      return;
    }
    
    // only update tab id, but don't attach it.
    this._currentTabId = tabId;
  }
}

// In NxtScape
public async run(options: RunOptions): Promise<NxtScapeResult> {
  const currentPage = await this.browserContext.getCurrentPage();
  const currentTabId = currentPage.tabId;
  
  // Lock browser context to the current tab
  this.browserContext.lockExecutionToTab(currentTabId);
  
  try {
    // ... execute agent ...
  } finally {
    // Unlock browser context and update to active tab
    await this.browserContext.unlockExecution();
  }
}

Why This Solution Works

Minimal Changes: Only modifies one method (updateCurrentTabId) and adds two lock methods
Root Cause Fix: Directly addresses the issue of tab event listeners updating during execution
All Methods Work: getCurrentPage(), getState(), etc. automatically use the locked tab ID
Clean State Management: After execution, updates to the actual active tab
No Architectural Changes: No new classes, patterns, or complex refactoring

How It Works

When execution starts, lockExecutionToTab() is called with the current tab ID
This sets _isExecutionLocked = true and forces _currentTabId to the locked tab
Any tab change events (user switching tabs, new tabs opening) call updateCurrentTabId()
But updateCurrentTabId() now checks _isExecutionLocked and ignores updates during execution
When execution ends, unlockExecution() is called
This queries Chrome for the actual active tab and updates _currentTabId to match

This elegant solution solves the immediate problem without over-engineering, while keeping the codebase maintainable and easy to understand.

7.4 KiB Raw Blame History

Tab Execution Locking Design

Requirements

Problem Statement

Desired Behavior

Solutions Discussed

1. ExecutionManager Pattern (Complex)

2. Pass Locked Tab as Parameter

3. LockedBrowserContext Wrapper

4. Dynamic Proxy Pattern

5. State Pattern in BrowserContext

Final Solution: Minimal Lock in updateCurrentTabId()

The Key Insight

Implementation

Why This Solution Works

How It Works

7.4 KiB

Raw Blame History