feat: isolate new-tab agent navigation from origin tab (#593)

* feat: isolate new-tab agent navigation from origin tab

Add origin-aware navigation isolation so the agent never navigates
away from the new-tab chat UI. This is a two-layer defense:

1. Prompt adaptation: When origin is 'newtab', the system prompt's
   execution and tool-selection sections are rewritten to prohibit
   navigating the active tab and default all lookups to new_page.

2. Tool-level guards: navigate_page and close_page reject attempts
   to act on the origin tab when in newtab mode, returning an error
   that teaches the agent to self-correct.

The client now sends an `origin` field ('sidepanel' | 'newtab')
instead of injecting a soft NEWTAB_SYSTEM_PROMPT that LLMs could
ignore. Backwards compatible — defaults to 'sidepanel'.

Closes TKT-592, addresses TKT-564

* test: add newtab origin navigation guard tests

- 14 new prompt tests verifying the system prompt adapts correctly
  for newtab vs sidepanel origin (execution rules, tool selection table,
  absence of conflicting single-tab guidance)
- 6 new integration tests for navigate_page and close_page guards:
  rejects origin tab in newtab mode, allows non-origin tabs, allows
  all tabs in sidepanel mode, backwards compatible with no session
This commit is contained in:
Dani Akash
2026-03-27 12:06:32 +05:30
committed by GitHub
parent b3003542d8
commit aacb47f7ee
11 changed files with 499 additions and 24 deletions

View File

@@ -76,8 +76,6 @@ export interface ChatSessionOptions {
isIntegrationsSynced?: boolean
}
const NEWTAB_SYSTEM_PROMPT = `IMPORTANT: The user is chatting from the New Tab page. When performing browser actions, ALWAYS open content in a NEW TAB rather than navigating the current tab. The user's new tab page should remain accessible.`
export const useChatSession = (options?: ChatSessionOptions) => {
const {
selectedLlmProviderRef,
@@ -344,12 +342,8 @@ export const useChatSession = (options?: ChatSessionOptions) => {
reasoningEffort: provider?.reasoningEffort,
reasoningSummary: provider?.reasoningSummary,
browserContext,
userSystemPrompt:
options?.origin === 'newtab'
? [personalizationRef.current, NEWTAB_SYSTEM_PROMPT]
.filter(Boolean)
.join('\n\n')
: personalizationRef.current,
origin: options?.origin ?? 'sidepanel',
userSystemPrompt: personalizationRef.current,
userWorkingDir: workingDirRef.current,
supportsImages: provider?.supportsImages,
previousConversation,

View File

@@ -92,10 +92,15 @@ export class AiSdkAgent {
}
// Build browser tools from the unified tool registry
const originPageId = config.browserContext?.activeTab?.pageId
const allBrowserTools = buildBrowserToolSet(
config.registry,
config.browser,
config.resolvedConfig.workingDir,
{
origin: config.resolvedConfig.origin,
originPageId,
},
)
const browserTools = config.resolvedConfig.chatMode
? Object.fromEntries(
@@ -205,6 +210,7 @@ export class AiSdkAgent {
connectedApps: config.browserContext?.enabledMcpServers,
declinedApps: config.resolvedConfig.declinedApps,
skillsCatalog,
origin: config.resolvedConfig.origin,
})
// Configure compaction for context window management

View File

@@ -193,19 +193,41 @@ You have persistent memory across sessions and an evolving personality. See the
function getExecution(
_exclude: Set<string>,
_options?: BuildSystemPromptOptions,
options?: BuildSystemPromptOptions,
): string {
return `<execution>
const isNewTab = options?.origin === 'newtab'
let executionContent = `<execution>
## Execution
### Philosophy
- Execute tasks end-to-end. Don't delegate ("I found the button, you can click it").
- Don't ask permission for routine steps. Act, then report.
- Do not refuse by default, attempt tasks even when outcomes are uncertain.
- For ambiguous/unclear requests, ask one targeted clarifying question.
- For ambiguous/unclear requests, ask one targeted clarifying question.`
if (isNewTab) {
executionContent += `
### New-Tab Origin Rules
You are operating from the user's **New Tab page**. The active tab (Page ID from Browser Context) is the chat UI itself.
**CRITICAL RULES:**
1. **NEVER call \`navigate_page\` on the active tab** — this would destroy the chat UI and navigate the user away.
2. **NEVER call \`close_page\` on the active tab** — same reason.
3. For ALL browsing tasks (including single-page lookups), use \`new_page\` (background) to open URLs.
4. For single-page lookups, open a background tab, extract data, then close it.
5. For multi-page research, open background tabs and group them with \`group_tabs\`.
### Multi-tab workflow`
} else {
executionContent += `
- Stay on the current page for single-page tasks. Use \`navigate_page\` to move within one tab.
### Multi-tab workflow
### Multi-tab workflow`
}
executionContent += `
When a task requires working on multiple pages simultaneously:
1. **Inform the user** that you're creating background tabs for the task.
2. **Open new tabs in background** using \`new_page\` (opens in background by default) — never steal focus from the user's current tab.
@@ -216,15 +238,23 @@ When a task requires working on multiple pages simultaneously:
7. **Never force-switch the user's active tab.** If you need user interaction on a background tab (e.g., login, CAPTCHA), tell the user which tab needs attention and let them switch manually.
8. **Never navigate the user's current tab** during a multi-tab task. The current tab is the user's anchor — use it only for reading (snapshots, content extraction). All navigation should happen on background tabs.
**Do NOT use \`create_hidden_window\` or \`new_hidden_page\` for user-requested tasks.** Hidden windows are invisible to the user and cannot be screenshotted. Use \`new_page\` (background mode) instead — tabs appear in the user's tab strip and can be inspected. Reserve hidden windows for automated/scheduled runs only.
**Do NOT use \`create_hidden_window\` or \`new_hidden_page\` for user-requested tasks.** Hidden windows are invisible to the user and cannot be screenshotted. Use \`new_page\` (background mode) instead — tabs appear in the user's tab strip and can be inspected. Reserve hidden windows for automated/scheduled runs only.`
For single-page lookups (e.g., "go to X and read Y"), use \`navigate_page\` on the current tab. Only create new tabs when the task requires multiple pages open simultaneously.
if (!isNewTab) {
executionContent += `
For single-page lookups (e.g., "go to X and read Y"), use \`navigate_page\` on the current tab. Only create new tabs when the task requires multiple pages open simultaneously.`
}
executionContent += `
### Tab retry discipline
When a background tab fails (404, wrong content, unexpected redirect):
- **Navigate the existing tab** to the correct URL with \`navigate_page\` — do NOT open a new tab for retries.
- If you must abandon a tab, close it with \`close_page\` before opening a replacement.
- Never let orphan tabs accumulate — each task should end with only the tabs that contain useful content.
- Never let orphan tabs accumulate — each task should end with only the tabs that contain useful content.`
executionContent += `
### Observe → Act → Verify
- **Before acting**: Take a snapshot to get interactive element IDs.
@@ -241,13 +271,38 @@ Some tools automatically include a fresh snapshot in their response (labeled "Ad
- 2FA → notify user, pause for completion
- Page not found (404) or server error (500) → report the error to the user
</execution>`
return executionContent
}
// -----------------------------------------------------------------------------
// section: tool-selection
// -----------------------------------------------------------------------------
function getToolSelection(): string {
function getToolSelection(
_exclude: Set<string>,
options?: BuildSystemPromptOptions,
): string {
const isNewTab = options?.origin === 'newtab'
const navTable = isNewTab
? `### Navigation: single-tab vs multi-tab
| Task | Approach |
|------|----------|
| Look up one page | \`new_page\` (background) → extract data → \`close_page\` |
| Research across multiple sites | \`new_page\` (background) for each site + \`group_tabs\` |
| Compare two pages side by side | \`new_page\` (background) × 2 + \`group_tabs\` |
| User says "open a new tab" | \`new_page\` (background) |
**Remember:** The active tab is the New Tab chat UI. Never navigate or close it.`
: `### Navigation: single-tab vs multi-tab
| Task | Approach |
|------|----------|
| Look up one page | \`navigate_page\` on current tab |
| Research across multiple sites | \`new_page\` (background) for each site + \`group_tabs\` |
| Compare two pages side by side | \`new_page\` (background) × 2 + \`group_tabs\` |
| User says "open a new tab" | \`new_page\` (background) — don't steal focus |`
return `<tool_selection>
## Tool Selection
@@ -268,13 +323,7 @@ function getToolSelection(): string {
- Prefer \`fill\` over \`press_key\` for text input. Use \`press_key\` for keyboard shortcuts (Enter, Escape, Tab, Ctrl+A, etc.).
- Prefer clicking links over \`navigate_page\` when the link is visible. Use \`navigate_page\` for direct URL access, back/forward, or reload.
### Navigation: single-tab vs multi-tab
| Task | Approach |
|------|----------|
| Look up one page | \`navigate_page\` on current tab |
| Research across multiple sites | \`new_page\` (background) for each site + \`group_tabs\` |
| Compare two pages side by side | \`new_page\` (background) × 2 + \`group_tabs\` |
| User says "open a new tab" | \`new_page\` (background) — don't steal focus |
${navTable}
### Connected apps: Strata vs browser
When an app is Connected, prefer Strata tools over browser automation. Strata is faster, more reliable, and works without navigating away from the user's current page.
@@ -668,7 +717,10 @@ const promptSections: Record<string, PromptSectionFn> = {
security: getSecurity,
capabilities: getCapabilities,
execution: getExecution,
'tool-selection': getToolSelection,
'tool-selection': (
_exclude: Set<string>,
options?: BuildSystemPromptOptions,
) => getToolSelection(_exclude, options),
'external-integrations': getExternalIntegrations,
'error-recovery': getErrorRecovery,
'memory-and-identity': getMemoryAndIdentity,
@@ -695,6 +747,8 @@ export interface BuildSystemPromptOptions {
/** Apps the user previously declined to connect (chose "do it manually"). */
declinedApps?: string[]
skillsCatalog?: string
/** Where the chat session originates from — determines navigation behavior. */
origin?: 'sidepanel' | 'newtab'
}
export function buildSystemPrompt(options?: BuildSystemPromptOptions): string {

View File

@@ -39,11 +39,13 @@ export function buildBrowserToolSet(
registry: ToolRegistry,
browser: Browser,
workingDir: string,
session?: { origin?: 'sidepanel' | 'newtab'; originPageId?: number },
): ToolSet {
const toolSet: ToolSet = {}
const ctx: ToolContext = {
browser,
directories: { workingDir },
session,
}
for (const def of registry.all()) {

View File

@@ -46,6 +46,8 @@ export interface ResolvedAgentConfig {
isScheduledTask?: boolean
/** Apps the user previously declined to connect via MCP (chose "do it manually"). */
declinedApps?: string[]
/** Where the chat session originates from — determines navigation behavior. */
origin?: 'sidepanel' | 'newtab'
/** BrowserOS installation ID for credit-based tracking. */
browserosId?: string
}

View File

@@ -63,6 +63,7 @@ export class ChatService {
supportsImages: request.supportsImages,
chatMode: request.mode === 'chat',
isScheduledTask: request.isScheduledTask,
origin: request.origin,
declinedApps: request.declinedApps,
browserosId: this.deps.browserosId,
}

View File

@@ -45,6 +45,7 @@ export const ChatRequestSchema = AgentLLMConfigSchema.extend({
userWorkingDir: z.string().min(1).optional(),
supportsImages: z.boolean().optional().default(true),
mode: z.enum(['chat', 'agent']).optional().default('agent'),
origin: z.enum(['sidepanel', 'newtab']).optional().default('sidepanel'),
declinedApps: z.array(z.string()).optional(),
selectedText: z.string().optional(),
selectedTextSource: z

View File

@@ -22,9 +22,15 @@ export interface ToolDirectories {
resourcesDir?: string
}
export interface ToolSessionContext {
origin?: 'sidepanel' | 'newtab'
originPageId?: number
}
export type ToolContext = {
browser: Browser
directories: ToolDirectories
session?: ToolSessionContext
}
export function resolveWorkingPath(

View File

@@ -88,6 +88,17 @@ export const navigate_page = defineTool({
return
}
if (
ctx.session?.origin === 'newtab' &&
ctx.session.originPageId !== undefined &&
args.page === ctx.session.originPageId
) {
response.error(
'Cannot navigate the origin tab in new-tab mode — this would destroy the chat UI. Use `new_page` to open a background tab instead.',
)
return
}
switch (args.action) {
case 'url':
await ctx.browser.goto(args.page, args.url as string)
@@ -266,6 +277,17 @@ export const close_page = defineTool({
action: z.literal('close_page'),
}),
handler: async (args, ctx, response) => {
if (
ctx.session?.origin === 'newtab' &&
ctx.session.originPageId !== undefined &&
args.page === ctx.session.originPageId
) {
response.error(
'Cannot close the origin tab in new-tab mode — this would destroy the chat UI.',
)
return
}
await ctx.browser.closePage(args.page)
response.text(`Closed page ${args.page}`)
response.data({ page: args.page, action: 'close_page' })

View File

@@ -1195,3 +1195,120 @@ describe('nudges', () => {
expect(prompt).toContain('at most once')
})
})
// ---------------------------------------------------------------------------
// 15. NEW-TAB ORIGIN
//
// Why: When the user chats from the new-tab page, the active tab IS the chat
// UI. The agent must never navigate or close it. The prompt must adapt its
// execution and tool-selection sections to prohibit origin tab navigation
// and default all lookups to new_page (background).
// ---------------------------------------------------------------------------
describe('new-tab origin', () => {
/** Build a prompt with newtab origin */
function buildNewTab(overrides?: Partial<BuildSystemPromptOptions>): string {
return buildSystemPrompt({
workspaceDir: '/home/user/workspace',
soulContent: 'Be helpful and concise.',
origin: 'newtab',
...overrides,
})
}
// --- Execution section ---
it('includes New-Tab Origin Rules when origin is newtab', () => {
const prompt = buildNewTab()
expect(prompt).toContain('New-Tab Origin Rules')
expect(prompt).toContain('New Tab page')
expect(prompt).toContain('chat UI itself')
})
it('prohibits navigate_page on active tab in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).toContain('NEVER call `navigate_page` on the active tab')
})
it('prohibits close_page on active tab in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).toContain('NEVER call `close_page` on the active tab')
})
it('requires new_page for all browsing in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).toContain(
'For ALL browsing tasks (including single-page lookups), use `new_page`',
)
})
it('does NOT include single-tab navigate_page guidance in newtab mode', () => {
// The sidepanel prompt says "use navigate_page on the current tab" for
// single-page lookups. This must NOT appear in newtab mode.
const prompt = buildNewTab()
expect(prompt).not.toContain(
'For single-page lookups (e.g., "go to X and read Y"), use `navigate_page` on the current tab',
)
})
it('does NOT include "Stay on the current page" in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).not.toContain(
'Stay on the current page for single-page tasks',
)
})
it('still includes common execution sections in newtab mode', () => {
// Newtab mode should still have multi-tab workflow, observe-act-verify, etc.
const prompt = buildNewTab()
expect(prompt).toContain('Multi-tab workflow')
expect(prompt).toContain('Observe → Act → Verify')
expect(prompt).toContain('Tab retry discipline')
expect(prompt).toContain('CAPTCHA')
})
// --- Sidepanel (default) should NOT have newtab rules ---
it('does NOT include New-Tab Origin Rules in sidepanel mode', () => {
const prompt = buildRegular({ origin: 'sidepanel' })
expect(prompt).not.toContain('New-Tab Origin Rules')
})
it('does NOT include New-Tab Origin Rules when origin is undefined', () => {
const prompt = buildRegular()
expect(prompt).not.toContain('New-Tab Origin Rules')
})
it('includes single-tab navigate_page guidance in sidepanel mode', () => {
const prompt = buildRegular({ origin: 'sidepanel' })
expect(prompt).toContain(
'For single-page lookups (e.g., "go to X and read Y"), use `navigate_page` on the current tab',
)
})
// --- Tool selection section ---
it('tool selection table uses new_page for lookups in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).toContain(
'`new_page` (background) → extract data → `close_page`',
)
})
it('tool selection includes reminder about active tab in newtab mode', () => {
const prompt = buildNewTab()
expect(prompt).toContain(
'The active tab is the New Tab chat UI. Never navigate or close it.',
)
})
it('tool selection table uses navigate_page for lookups in sidepanel mode', () => {
const prompt = buildRegular({ origin: 'sidepanel' })
expect(prompt).toContain('`navigate_page` on current tab')
})
it('tool selection does NOT have newtab reminder in sidepanel mode', () => {
const prompt = buildRegular({ origin: 'sidepanel' })
expect(prompt).not.toContain('The active tab is the New Tab chat UI')
})
})

View File

@@ -0,0 +1,270 @@
/**
* New-tab origin navigation guards.
*
* When the chat session originates from the new-tab page, navigate_page and
* close_page must reject attempts to act on the origin tab. These are
* integration tests that run against a real browser to verify the guards
* work end-to-end through executeTool.
*/
import { describe, it } from 'bun:test'
import assert from 'node:assert'
import type { ToolContext, ToolDefinition } from '../../src/tools/framework'
import { executeTool } from '../../src/tools/framework'
import { close_page, navigate_page, new_page } from '../../src/tools/navigation'
import type { ToolResult } from '../../src/tools/response'
import { withBrowser } from '../__helpers__/with-browser'
function textOf(result: {
content: { type: string; text?: string }[]
}): string {
return result.content
.filter((c) => c.type === 'text')
.map((c) => c.text)
.join('\n')
}
function structuredOf<T>(result: { structuredContent?: unknown }): T {
assert.ok(result.structuredContent, 'Expected structuredContent')
return result.structuredContent as T
}
describe('new-tab origin navigation guards', () => {
// Helper: execute a tool with newtab session context
function executeWithSession(
ctx: { browser: ToolContext['browser'] },
tool: ToolDefinition,
args: unknown,
session: ToolContext['session'],
): Promise<ToolResult> {
const signal = AbortSignal.timeout(30_000)
return executeTool(
tool,
args,
{
browser: ctx.browser,
directories: { workingDir: process.cwd() },
session,
},
signal,
)
}
// -------------------------------------------------------------------------
// navigate_page guards
// -------------------------------------------------------------------------
it('navigate_page rejects navigation on origin tab in newtab mode', async () => {
await withBrowser(async ({ browser }) => {
// Use a new page as the simulated "origin tab"
const setupResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const originPageId = structuredOf<{ pageId: number }>(setupResult).pageId
const result = await executeWithSession(
{ browser },
navigate_page,
{ page: originPageId, action: 'url', url: 'https://example.com' },
{ origin: 'newtab', originPageId },
)
assert.ok(result.isError, 'Expected navigate_page to be rejected')
assert.ok(
textOf(result).includes('Cannot navigate the origin tab'),
`Expected origin tab error, got: ${textOf(result)}`,
)
// Cleanup
await executeTool(
close_page,
{ page: originPageId },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
})
}, 60_000)
it('navigate_page allows navigation on non-origin tab in newtab mode', async () => {
await withBrowser(async ({ browser }) => {
const originResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const originPageId = structuredOf<{ pageId: number }>(originResult).pageId
// Open a second tab — this is NOT the origin tab
const otherResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const otherPageId = structuredOf<{ pageId: number }>(otherResult).pageId
const result = await executeWithSession(
{ browser },
navigate_page,
{ page: otherPageId, action: 'url', url: 'https://example.com' },
{ origin: 'newtab', originPageId },
)
assert.ok(
!result.isError,
`Expected success, got error: ${textOf(result)}`,
)
assert.ok(textOf(result).includes('Navigated to'))
// Cleanup
const noSession = { browser, directories: { workingDir: process.cwd() } }
await executeTool(
close_page,
{ page: otherPageId },
noSession,
AbortSignal.timeout(30_000),
)
await executeTool(
close_page,
{ page: originPageId },
noSession,
AbortSignal.timeout(30_000),
)
})
}, 60_000)
it('navigate_page works normally in sidepanel mode', async () => {
await withBrowser(async ({ browser }) => {
const setupResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const pageId = structuredOf<{ pageId: number }>(setupResult).pageId
const result = await executeWithSession(
{ browser },
navigate_page,
{ page: pageId, action: 'url', url: 'https://example.com' },
{ origin: 'sidepanel', originPageId: pageId },
)
assert.ok(
!result.isError,
`Expected success, got error: ${textOf(result)}`,
)
assert.ok(textOf(result).includes('Navigated to'))
await executeTool(
close_page,
{ page: pageId },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
})
}, 60_000)
it('navigate_page works when session is undefined (backwards compat)', async () => {
await withBrowser(async ({ browser, execute }) => {
const setupResult = await execute(new_page, { url: 'about:blank' })
const pageId = structuredOf<{ pageId: number }>(setupResult).pageId
// execute() from withBrowser passes no session — simulates old clients
const result = await execute(navigate_page, {
page: pageId,
action: 'url',
url: 'https://example.com',
})
assert.ok(
!result.isError,
`Expected success, got error: ${textOf(result)}`,
)
await execute(close_page, { page: pageId })
})
}, 60_000)
// -------------------------------------------------------------------------
// close_page guards
// -------------------------------------------------------------------------
it('close_page rejects closing origin tab in newtab mode', async () => {
await withBrowser(async ({ browser }) => {
const setupResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const originPageId = structuredOf<{ pageId: number }>(setupResult).pageId
const result = await executeWithSession(
{ browser },
close_page,
{ page: originPageId },
{ origin: 'newtab', originPageId },
)
assert.ok(result.isError, 'Expected close_page to be rejected')
assert.ok(
textOf(result).includes('Cannot close the origin tab'),
`Expected origin tab error, got: ${textOf(result)}`,
)
// Clean up the page we created (without newtab guard)
await executeTool(
close_page,
{ page: originPageId },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
})
}, 60_000)
it('close_page allows closing non-origin tab in newtab mode', async () => {
await withBrowser(async ({ browser }) => {
const originResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const originPageId = structuredOf<{ pageId: number }>(originResult).pageId
const otherResult = await executeTool(
new_page,
{ url: 'about:blank' },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
const otherPageId = structuredOf<{ pageId: number }>(otherResult).pageId
const result = await executeWithSession(
{ browser },
close_page,
{ page: otherPageId },
{ origin: 'newtab', originPageId },
)
assert.ok(
!result.isError,
`Expected success, got error: ${textOf(result)}`,
)
assert.ok(textOf(result).includes(`Closed page ${otherPageId}`))
// Cleanup origin page
await executeTool(
close_page,
{ page: originPageId },
{ browser, directories: { workingDir: process.cwd() } },
AbortSignal.timeout(30_000),
)
})
}, 60_000)
})