7.4 KiB
Tab Execution Locking Design
Requirements
Problem Statement
When an agent starts executing on a specific tab, it should remain locked to that tab throughout execution. However, the current implementation has issues:
- Tab Switching During Execution: When users switch tabs or when navigation creates new tabs, the BrowserContext automatically switches to the newly active tab
- Agent Follows User: The agent abandons its original tab and starts operating on whatever tab the user switches to
- Execution Disruption: This breaks the agent's workflow and can cause unexpected behavior
Desired Behavior
- Agent should stay attached to its initial tab throughout execution
- Users should be able to freely switch tabs without affecting the agent
- After execution completes, the system should update to reflect the actual active tab
Solutions Discussed
1. ExecutionManager Pattern (Complex)
Create a new ExecutionManager class to manage execution state across tabs.
Pros:
- Clean separation of concerns
- Supports future multi-tab execution
- Single source of truth for execution state
Cons:
- Over-engineering for current single-execution model
- Requires significant refactoring
- Adds complexity without immediate benefit
class ExecutionManager {
private currentExecution: ExecutionInfo | null = null;
startExecution(params: {...}): string { }
completeExecution(executionId: string): void { }
cancelCurrent(isUserInitiated: boolean): boolean { }
}
2. Pass Locked Tab as Parameter
Modify all BrowserContext methods to accept an optional preferred tab ID.
Pros:
- Explicit control over which tab to use
- Backward compatible
- No architectural changes
Cons:
- Requires updating every tool's getCurrentPage() call
- Scattered changes across codebase
- Verbose and repetitive
// Before
const page = await browserContext.getCurrentPage();
// After
const page = await browserContext.getCurrentPage(executionContext.getLockedTabId());
3. LockedBrowserContext Wrapper
Create a wrapper class that overrides BrowserContext behavior during execution.
Pros:
- Single point of control
- Clean separation of execution logic
- No changes to existing tools
Cons:
- Requires wrapping/proxying many methods
- Complex inheritance or composition patterns
- Potential for subtle bugs
class LockedBrowserContext {
constructor(private browserContext: BrowserContext, private lockedTabId: number) {}
async getCurrentPage(): Promise<Page> {
return this.browserContext.attachToTab(this.lockedTabId);
}
// ... wrap all other methods
}
4. Dynamic Proxy Pattern
Use JavaScript Proxy to dynamically intercept BrowserContext calls.
Pros:
- Zero boilerplate
- Dynamic behavior modification
- Original BrowserContext unchanged
Cons:
- "Magic" behavior can be hard to debug
- TypeScript type safety challenges
- Performance overhead
class LockedBrowserContext {
static create(browserContext: BrowserContext, lockedTabId: number): BrowserContext {
return new Proxy(browserContext, {
get(target, prop) {
if (prop === 'getCurrentPage') {
return () => target.attachToTab(lockedTabId);
}
return target[prop];
}
});
}
}
5. State Pattern in BrowserContext
Add a "mode" to BrowserContext that changes its behavior.
Pros:
- All methods automatically respect the mode
- No duplicate classes
- Easy to implement
Cons:
- Still requires modifying multiple methods
- Mixes execution state with browser state
class BrowserContext {
private _mode: 'normal' | 'locked' = 'normal';
private _lockedTabId?: number;
enterLockedMode(tabId: number) { }
exitLockedMode() { }
private getEffectiveTabId(): number {
return this._mode === 'locked' ? this._lockedTabId! : this._currentTabId;
}
}
Final Solution: Minimal Lock in updateCurrentTabId()
After analyzing all approaches, we implemented the simplest solution that addresses the root cause:
The Key Insight
The problem isn't that BrowserContext tracks the current tab - it's that tab event listeners update _currentTabId during execution. We only need to prevent these updates during execution.
Implementation
// In BrowserContext
export class BrowserContext {
// Execution lock state
private _isExecutionLocked: boolean = false;
private _executionLockedTabId: number | null = null;
/**
* Lock execution to a specific tab
* This prevents tab switches during agent execution
*/
public lockExecutionToTab(tabId: number): void {
this._isExecutionLocked = true;
this._executionLockedTabId = tabId;
this._currentTabId = tabId; // Force current tab to locked tab
Logging.log('BrowserContext', `Execution locked to tab ${tabId}`);
}
/**
* Unlock execution and update to the actual active tab
*/
public async unlockExecution(): Promise<void> {
this._isExecutionLocked = false;
this._executionLockedTabId = null;
// Get the actual active tab from Chrome
try {
const activeTab = await this.getActiveTab();
if (activeTab && activeTab.id) {
this._currentTabId = activeTab.id;
Logging.log('BrowserContext', `Execution unlocked, current tab updated to active tab ${activeTab.id}`);
}
} catch (error) {
Logging.log('BrowserContext', `Failed to get active tab after unlock: ${error}`, 'warning');
}
}
/**
* Update the current tab ID (internal use only)
*/
private updateCurrentTabId(tabId: number): void {
// If execution is locked, ignore tab updates
if (this._isExecutionLocked) {
Logging.log('BrowserContext', `Ignoring tab switch to ${tabId}, execution locked to tab ${this._executionLockedTabId}`);
return;
}
// only update tab id, but don't attach it.
this._currentTabId = tabId;
}
}
// In NxtScape
public async run(options: RunOptions): Promise<NxtScapeResult> {
const currentPage = await this.browserContext.getCurrentPage();
const currentTabId = currentPage.tabId;
// Lock browser context to the current tab
this.browserContext.lockExecutionToTab(currentTabId);
try {
// ... execute agent ...
} finally {
// Unlock browser context and update to active tab
await this.browserContext.unlockExecution();
}
}
Why This Solution Works
- Minimal Changes: Only modifies one method (
updateCurrentTabId) and adds two lock methods - Root Cause Fix: Directly addresses the issue of tab event listeners updating during execution
- All Methods Work:
getCurrentPage(),getState(), etc. automatically use the locked tab ID - Clean State Management: After execution, updates to the actual active tab
- No Architectural Changes: No new classes, patterns, or complex refactoring
How It Works
- When execution starts,
lockExecutionToTab()is called with the current tab ID - This sets
_isExecutionLocked = trueand forces_currentTabIdto the locked tab - Any tab change events (user switching tabs, new tabs opening) call
updateCurrentTabId() - But
updateCurrentTabId()now checks_isExecutionLockedand ignores updates during execution - When execution ends,
unlockExecution()is called - This queries Chrome for the actual active tab and updates
_currentTabIdto match
This elegant solution solves the immediate problem without over-engineering, while keeping the codebase maintainable and easy to understand.