Files
BrowserOS/docs/tab-execution-locking.md
2025-07-18 08:36:51 -07:00

7.4 KiB

Tab Execution Locking Design

Requirements

Problem Statement

When an agent starts executing on a specific tab, it should remain locked to that tab throughout execution. However, the current implementation has issues:

  1. Tab Switching During Execution: When users switch tabs or when navigation creates new tabs, the BrowserContext automatically switches to the newly active tab
  2. Agent Follows User: The agent abandons its original tab and starts operating on whatever tab the user switches to
  3. Execution Disruption: This breaks the agent's workflow and can cause unexpected behavior

Desired Behavior

  • Agent should stay attached to its initial tab throughout execution
  • Users should be able to freely switch tabs without affecting the agent
  • After execution completes, the system should update to reflect the actual active tab

Solutions Discussed

1. ExecutionManager Pattern (Complex)

Create a new ExecutionManager class to manage execution state across tabs.

Pros:

  • Clean separation of concerns
  • Supports future multi-tab execution
  • Single source of truth for execution state

Cons:

  • Over-engineering for current single-execution model
  • Requires significant refactoring
  • Adds complexity without immediate benefit
class ExecutionManager {
  private currentExecution: ExecutionInfo | null = null;
  
  startExecution(params: {...}): string { }
  completeExecution(executionId: string): void { }
  cancelCurrent(isUserInitiated: boolean): boolean { }
}

2. Pass Locked Tab as Parameter

Modify all BrowserContext methods to accept an optional preferred tab ID.

Pros:

  • Explicit control over which tab to use
  • Backward compatible
  • No architectural changes

Cons:

  • Requires updating every tool's getCurrentPage() call
  • Scattered changes across codebase
  • Verbose and repetitive
// Before
const page = await browserContext.getCurrentPage();

// After
const page = await browserContext.getCurrentPage(executionContext.getLockedTabId());

3. LockedBrowserContext Wrapper

Create a wrapper class that overrides BrowserContext behavior during execution.

Pros:

  • Single point of control
  • Clean separation of execution logic
  • No changes to existing tools

Cons:

  • Requires wrapping/proxying many methods
  • Complex inheritance or composition patterns
  • Potential for subtle bugs
class LockedBrowserContext {
  constructor(private browserContext: BrowserContext, private lockedTabId: number) {}
  
  async getCurrentPage(): Promise<Page> {
    return this.browserContext.attachToTab(this.lockedTabId);
  }
  // ... wrap all other methods
}

4. Dynamic Proxy Pattern

Use JavaScript Proxy to dynamically intercept BrowserContext calls.

Pros:

  • Zero boilerplate
  • Dynamic behavior modification
  • Original BrowserContext unchanged

Cons:

  • "Magic" behavior can be hard to debug
  • TypeScript type safety challenges
  • Performance overhead
class LockedBrowserContext {
  static create(browserContext: BrowserContext, lockedTabId: number): BrowserContext {
    return new Proxy(browserContext, {
      get(target, prop) {
        if (prop === 'getCurrentPage') {
          return () => target.attachToTab(lockedTabId);
        }
        return target[prop];
      }
    });
  }
}

5. State Pattern in BrowserContext

Add a "mode" to BrowserContext that changes its behavior.

Pros:

  • All methods automatically respect the mode
  • No duplicate classes
  • Easy to implement

Cons:

  • Still requires modifying multiple methods
  • Mixes execution state with browser state
class BrowserContext {
  private _mode: 'normal' | 'locked' = 'normal';
  private _lockedTabId?: number;
  
  enterLockedMode(tabId: number) { }
  exitLockedMode() { }
  
  private getEffectiveTabId(): number {
    return this._mode === 'locked' ? this._lockedTabId! : this._currentTabId;
  }
}

Final Solution: Minimal Lock in updateCurrentTabId()

After analyzing all approaches, we implemented the simplest solution that addresses the root cause:

The Key Insight

The problem isn't that BrowserContext tracks the current tab - it's that tab event listeners update _currentTabId during execution. We only need to prevent these updates during execution.

Implementation

// In BrowserContext
export class BrowserContext {
  // Execution lock state
  private _isExecutionLocked: boolean = false;
  private _executionLockedTabId: number | null = null;
  
  /**
   * Lock execution to a specific tab
   * This prevents tab switches during agent execution
   */
  public lockExecutionToTab(tabId: number): void {
    this._isExecutionLocked = true;
    this._executionLockedTabId = tabId;
    this._currentTabId = tabId; // Force current tab to locked tab
    Logging.log('BrowserContext', `Execution locked to tab ${tabId}`);
  }
  
  /**
   * Unlock execution and update to the actual active tab
   */
  public async unlockExecution(): Promise<void> {
    this._isExecutionLocked = false;
    this._executionLockedTabId = null;
    
    // Get the actual active tab from Chrome
    try {
      const activeTab = await this.getActiveTab();
      if (activeTab && activeTab.id) {
        this._currentTabId = activeTab.id;
        Logging.log('BrowserContext', `Execution unlocked, current tab updated to active tab ${activeTab.id}`);
      }
    } catch (error) {
      Logging.log('BrowserContext', `Failed to get active tab after unlock: ${error}`, 'warning');
    }
  }
  
  /**
   * Update the current tab ID (internal use only)
   */
  private updateCurrentTabId(tabId: number): void {
    // If execution is locked, ignore tab updates
    if (this._isExecutionLocked) {
      Logging.log('BrowserContext', `Ignoring tab switch to ${tabId}, execution locked to tab ${this._executionLockedTabId}`);
      return;
    }
    
    // only update tab id, but don't attach it.
    this._currentTabId = tabId;
  }
}

// In NxtScape
public async run(options: RunOptions): Promise<NxtScapeResult> {
  const currentPage = await this.browserContext.getCurrentPage();
  const currentTabId = currentPage.tabId;
  
  // Lock browser context to the current tab
  this.browserContext.lockExecutionToTab(currentTabId);
  
  try {
    // ... execute agent ...
  } finally {
    // Unlock browser context and update to active tab
    await this.browserContext.unlockExecution();
  }
}

Why This Solution Works

  1. Minimal Changes: Only modifies one method (updateCurrentTabId) and adds two lock methods
  2. Root Cause Fix: Directly addresses the issue of tab event listeners updating during execution
  3. All Methods Work: getCurrentPage(), getState(), etc. automatically use the locked tab ID
  4. Clean State Management: After execution, updates to the actual active tab
  5. No Architectural Changes: No new classes, patterns, or complex refactoring

How It Works

  1. When execution starts, lockExecutionToTab() is called with the current tab ID
  2. This sets _isExecutionLocked = true and forces _currentTabId to the locked tab
  3. Any tab change events (user switching tabs, new tabs opening) call updateCurrentTabId()
  4. But updateCurrentTabId() now checks _isExecutionLocked and ignores updates during execution
  5. When execution ends, unlockExecution() is called
  6. This queries Chrome for the actual active tab and updates _currentTabId to match

This elegant solution solves the immediate problem without over-engineering, while keeping the codebase maintainable and easy to understand.