* feat: add remote skill download and auto-sync
Download default skills from remote catalog on first setup with
bundled fallback when offline. Background sync every 45 minutes
checks for new/updated skills without overwriting user-customized
ones. Tracks installed defaults via content hashes in a local
manifest file.
* feat: make skills catalog URL configurable and add generation script
Add SKILLS_CATALOG_URL env var (following CODEGEN_SERVICE_URL pattern)
with fallback to the default constant. Add script to generate
catalog.json from bundled defaults for static hosting.
* feat: add R2 upload script and use cdn.browseros.com for catalog URL
Add upload-skills-catalog.ts that generates and uploads catalog.json
to Cloudflare R2 (same infra as existing build artifacts). Update
default catalog URL to cdn.browseros.com/skills/v1/catalog.json.
* test: add E2E tests for remote skill sync against live CDN
* fix: address code review findings — security, validation, DRY
- Add path traversal protection via safeSkillDir in writeSkillFile
and readSkillContent (reuses existing validation from service.ts)
- Add runtime type guards for catalog JSON and manifest JSON parsing
- Fix seedFromRemote to return false on partial failure so bundled
fallback kicks in
- Add per-skill error handling in syncRemoteSkills so one bad skill
doesn't crash the entire sync
- Wire stopSkillSync into Application.stop() shutdown path
- Extract version from frontmatter in seedFromBundled instead of
hardcoding '1.0'
- Consolidate duplicated logic: reuse installSkill/writeSkillFile/
contentHash/saveManifest from remote-sync.ts in seed.ts
- Extract shared catalog generation into scripts/catalog-utils.ts
* test: add flow tests for all four sync scenarios against live CDN
* refactor: remove redundant scripts and inline catalog generation
Drop generate-skills-catalog.ts, catalog-utils.ts, and
e2e-remote-sync.test.ts (covered by flows.test.ts). Inline
catalog generation into upload-skills-catalog.ts.
* test: add full E2E server flow test against live CDN
Tests all 7 steps of the real server lifecycle: fresh seed from CDN,
no-op sync, user edit preservation, skill reinstall, custom skill
protection, background timer firing, and second startup skip.
* chore: remove e2e-server-flow test
* fix: address Greptile review — entry validation, size limit, DRY, no-op saves
- Validate individual skill entries in catalog (id, version, content
must all be strings) not just the top-level shape
- Add 1MB response size limit on catalog fetch to prevent resource
exhaustion from compromised/misconfigured CDN
- Skip manifest save when sync cycle had no changes (avoids
unnecessary disk I/O every 45 minutes)
- Share extractVersion via remote-sync.ts export, remove duplicate
from seed.ts
* fix: prevent bundled fallback from overwriting partial remote seeds
When seedFromRemote partially fails, the bundled fallback now skips
skills already in the manifest (installed by the partial remote
seed). Also adds Content-Length early check before downloading the
full catalog response body.
* fix: run sync immediately on startup, not just on interval
Previously the first sync fired 45 minutes after boot. Now
startSkillSync runs one sync immediately so returning users
get skill updates right away.
* refactor: simplify sync — remote always wins, remove manifest
Remote catalog is the source of truth. If a skill exists in the
catalog, its version is compared against local frontmatter and
overwritten when newer. No manifest file, no content hashes.
User-created skills (IDs not in catalog) are never touched.
* fix: skip bundled skills already installed by partial remote seed
* chore: remove unreliable Content-Length check
* chore: remove size limit checks, fetch timeout is sufficient
* feat: add "Rewrite with AI" prompt refinement for scheduled tasks
Add a lightweight /refine-prompt endpoint that uses generateText to
rewrite rough scheduled task prompts into clear, actionable instructions.
The UI adds a sparkle-icon button next to the Prompt label in the
NewScheduledTaskDialog with loading state, undo support, and disabled
state when the textarea is empty.
* fix: clear stale undo ref on dialog re-open and pass providerId to refinePrompt
- Reset originalPromptRef when dialog opens and on form submit to
prevent stale "Undo rewrite" button on re-open
- Accept optional providerId in refinePrompt() so the form's selected
provider is used for refinement instead of always the system default
* fix: hide undo rewrite link while refinement is in flight
* fix: reset isRefining state on dialog re-open
* fix: ignore stale refine-prompt responses after dialog re-open
Use a request generation counter so that if the dialog is closed and
re-opened while a rewrite is in flight, the stale response is silently
discarded instead of overwriting the fresh form state.
* fix: invalidate stale refine requests on dialog reopen and rename to kebab-case
- Increment refineRequestIdRef on dialog open so in-flight requests
from a previous session are discarded when they complete
- Rename refinePrompt.ts to refine-prompt.ts per CLAUDE.md file naming
* feat: add voice input to agent chat sidebar
Allow users to record voice and transcribe to text in the chat input.
Mic button shows when input is empty, waveform visualizer during recording,
transcription via OpenAI (llm.browseros.com/api/transcribe).
- Extract shared useVoiceInput hook to lib/voice/
- Time-domain waveform bars that bounce per-frequency-band
- Bar height capped to fit input container
- Analytics events for recording lifecycle
* fix: address review — add fetch timeout, await stopRecording, deduplicate VoiceInputState
- Add AbortSignal.timeout(30s) to transcription fetch
- Await stopRecording() and track analytics after completion
- Export VoiceInputState from useVoiceInput, import in consumers
* fix: await startRecording before tracking, narrow SurveyChat effect deps
- Await startRecording() so analytics only fires after mic permission granted
- Narrow SurveyChat useEffect dependency from [voice] to [voice.transcript, voice.isTranscribing]
* fix: analytics only tracks on success, clean up stream on failure, type API response
- startRecording returns boolean; track(RECORDING_STARTED) only fires on success
- Catch block cleans up MediaStream tracks and AudioContext on partial failure
- Type transcription API response with TranscribeResponse interface
* fix: keep mic button always visible alongside send button
Mic and send are now separate buttons, both always visible.
Mic is disabled while AI is streaming. Send is disabled during
recording/transcribing. Buttons are no longer absolutely positioned
inside the textarea — they sit beside it in the flex row.
* fix: keep mic button always visible inside input alongside send
Both mic and send buttons are always visible inside the input field,
positioned on the right side (ChatGPT-style). Mic is disabled while
AI is streaming. Send is disabled during recording/transcribing.
* fix: remove unreachable CSS branch in recording waveform div
* feat: add CDP UI inspector script for dev self-testing
* fix: address code review feedback for inspect-ui script
- Use Delete key (not Backspace) to match server's keyboard.ts clearField
- Add windowId resolution to open-sidepanel (chrome.sidePanel.open requires it)
- Make target matching case-insensitive
- Replace process.exit(1) in eval with thrown error for proper cleanup
- Add comment referencing DEV_PORTS source of truth
* docs: add self-testing workflow for UI changes via CDP inspector
* fix: runtime fixes for inspect-ui discovered during live testing
- Remove Input.enable (domain has no enable method)
- Add DOM.getDocument before DOM operations (required by protocol)
- Use BrowserOS-specific sidePanel.browserosToggle API instead of
standard chrome.sidePanel.open (side panel starts disabled)
- Enable side panel with setOptions before toggling
* feat: add test-ui skill for visual testing of agent extension UI
Adds a Claude Code skill that lets the agent visually test both
surfaces of the BrowserOS extension:
- New tab page (app.html) — left sidebar with Home, Scheduled Tasks,
Settings, Skills, Memory, Soul, Connect Apps
- Right side panel (sidepanel.html) — chat interface
Includes all gotchas discovered through real testing: randomized ports,
fresh profile onboarding redirect, stale element IDs after navigation,
BrowserOS-specific sidePanel APIs, DOM.getDocument requirement.
* feat: add press_key, scroll, hover, select_option, wait_for to inspect-ui
Brings inspect-ui.ts to parity with server's MCP input tools:
- press_key: key combos like Enter, Control+A, Meta+Shift+P
(ported from keyboard.ts pressCombo)
- scroll: up/down/left/right with configurable amount
- hover: hover over element by ID for tooltip/hover state testing
- select_option: select dropdown option by value or visible text
(ported from browser.ts selectOption)
- wait_for: poll for text or CSS selector with 10s timeout
Updated skill documentation with new commands and examples.
* docs: prefer snapshot over screenshot, add holistic debugging guidance
- Add snapshot vs screenshot guidance table — prefer snapshot for
structural checks, screenshot only for visual/layout verification
- Add server log checking instructions ([agent], [server], [build] tags)
- Add JS error checking via eval
- Add API connectivity verification
- Add common issues troubleshooting table
- Update all examples to use snapshot as default verification
* fix: address Greptile review feedback
- Replace process.exit(1) with process.exitCode + return in cmdWaitFor
to allow async CDP cleanup in finally blocks
- Fix cmdScroll enabling Runtime instead of Page domain
- Add BROWSEROS_EXTENSION_ID env var override for extension ID
- Align CLAUDE.md dev server command with SKILL.md canonical command
take_snapshot only used the AX tree, which misses custom components
(cursor:pointer divs, onclick handlers, etc.) that lack ARIA roles.
These elements appeared as role="generic" and were invisible to the agent.
Changes:
- Merge findCursorInteractiveElements into snapshot() so take_snapshot
catches cursor:pointer, onclick, and tabindex elements
- Add DisclosureTriangle to INTERACTIVE_ROLES for <summary> elements
- Use aria-label as text fallback in cursor detection for icon-only buttons
- Fix dedup bug in enhancedSnapshot that was silently dropping all
cursor-detected elements by checking against all AX node IDs instead
of only already-included output IDs
- Add hover_at, type_at, drag_at coordinate tools to server
- Add hoverAt, typeAt, dragAt methods to Browser class
- Export server internals (browser, tool-loop, registry) for eval imports
- Copy eval app from enterprise repo with agents, graders, runner, dashboard
- Nest eval-targets inside apps/eval
- Adapt sessionExecutionDir → workingDir for current server API
- Add biome ignore for dashboard HTML to prevent lint breaking onclick handlers
* feat: add get_console_logs tool to surface browser console output
Captures Runtime.consoleAPICalled, Runtime.exceptionThrown, and
Log.entryAdded CDP events per page with a FIFO ring buffer (500 entries).
- ConsoleCollector: per-page buffers with O(1) session routing via Map lookup
- Session-aware CDP event dispatching (onSessionEvent) in CdpBackend
- Log.enable() added alongside Runtime.enable() in attachToPage
- Single tool with level hierarchy, text search, limit, and clear params
- Buffer clears on main-frame navigation, cleaned up on page close
* fix: address review — handle session re-attach, remove dead code
- ConsoleCollector.attach() now updates session mapping on re-attach
instead of early-returning, preventing silent event drops after
target detach/re-attach (e.g. tab crash, cross-process navigation)
- Remove unused clearConsoleLogs() and ConsoleCollector.clear()
* feat: add per-task LLM provider selection for scheduled tasks
Allow users to choose which AI provider a scheduled task runs with,
using the same ChatProviderSelector component from the new-tab page.
Falls back to the global default provider when none is selected or
if the selected provider has been deleted.
* fix: lint issues
* chore: updated to latest schema.graphql file
---------
Co-authored-by: Dani Akash <DaniAkash@users.noreply.github.com>
The AI SDK can produce assistant messages with empty parts (parts:[]) when
a stream is aborted, and providers reject assistant messages with empty text
content. This adds a validation utility that filters both cases before
sending messages to createAgentUIStreamResponse and when persisting them.
Mintlify deploys docs by cloning the repo but does not run `git lfs
pull`. The `.gitattributes` rule `docs/images/** filter=lfs` caused
all doc images to be stored as ~130-byte LFS pointer files, which
Mintlify served as-is — breaking every image on the site.
Removing the LFS rule and re-adding the files as regular git blobs
fixes all images without changing any paths or MDX files.
Also fixes broken Slack link placeholder in troubleshooting page.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Images in docs/images/ are served as broken 130-byte placeholders by
Mintlify CDN. Co-locating images with the MDX file (matching the
working pattern in features/workflow/ and features/cowork/) bypasses
this issue. Also fixes the Slack link placeholder.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: fallback to default BrowserOS provider when provider is null
When the extension first loads, provider config is loaded async from
storage. If a chat request fires before loading completes (race
condition), provider is null and the server receives provider: undefined,
causing a Zod validation error. This adds a fallback to
createDefaultBrowserOSProvider() in both chat paths (sidepanel and
scheduled tasks) so provider.type is always defined.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: fallback to first provider when default provider ID is stale
When defaultProviderId in storage doesn't match any loaded provider
(e.g. after Kimi/Moonshot rollout), selectedProvider was null causing
provider: undefined in chat requests. Now falls back to providers[0].
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: repair stale defaultProviderId in storage on load
When the stored default provider ID doesn't match any loaded provider,
write back the corrected ID (providers[0].id) to storage so it doesn't
silently persist across sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Comment out non-working Canva and Exa integrations from the OAuth MCP
servers list and remove their imports/icon mappings from the UI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: replace rate limit CTAs with Kimi/Moonshot partnership links
Comment out old "Learn more" and "take a quick survey" links on the
daily limit error banner. Replace with Kimi API key docs link and
direct Moonshot AI platform link for conversion tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove partnership tagline from rate limit banner
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The Docs link in the settings sidebar was using the Info icon (circle
with "i"). Changed it to BookOpen which is the standard icon for
documentation links.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Track docs/images/** and docs/videos/** with Git LFS
- Add packages/browseros/build/tools/ to .gitignore
- Remove appimagetool-x86_64.AppImage from version control (downloaded on demand by build script)
* fix: scheduled task agent not using hidden window for new pages
The agent prompt only told the agent to pass windowId with `new_page`
but not `new_hidden_page`, which the agent prefers for background work.
The agent also had no instruction against closing or replacing its
dedicated hidden window, causing pages to scatter across uncontrolled
windows.
Expanded the scheduled task prompt rules to:
- Cover both `new_page` and `new_hidden_page` windowId requirement
- Forbid closing the dedicated hidden window
- Forbid creating new windows
- Added `new_hidden_page` to tool reference for MCP consumers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove duplicate hidden window creation from scheduled task frontend
The server's ChatService already creates a hidden window for scheduled
tasks (chat-service.ts:99-126), but the frontend (scheduledJobRuns.ts)
was also creating a minimized Chrome window that the server immediately
overwrote. This caused two windows to be created per scheduled task run,
with only one being used.
Removed from scheduledJobRuns.ts:
- chrome.windows.create() call
- 1-second race condition delay hack (FIXME)
- chrome.windows.remove() cleanup
- windowId/activeTab params to getChatServerResponse()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump server version
* fix: remove dead getCdpToolReference and unused prompt exports
The getCdpToolReference function was always excluded by the AI SDK agent
(tool schemas are injected by the SDK itself) and never used by the MCP
server (which has its own MCP_INSTRUCTIONS). Also removes unused exports
getSystemPrompt and PROMPT_SECTION_KEYS.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump server version
* fix: move session dirs to ~/.browseros/sessions and update skill paths
Session directories now live under ~/.browseros/sessions/{conversationId}/
instead of executionDir/sessions/. Adds 30-day cleanup for stale sessions
at server startup. Updates 6 default skills to reference the working
directory instead of hardcoding ~/Downloads/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: rename sessionExecutionDir to workingDir across server
Consistent naming for the per-conversation working directory:
- ResolvedAgentConfig.sessionExecutionDir → workingDir
- ToolDirectories.executionDir → workingDir
- resolveExecutionPath() → resolveWorkingPath()
- buildBrowserToolSet param: executionDir → workingDir
Server-level executionDir (DB, logs) unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review — restore emoji folder name, refresh session mtime
- Revert "Read Later" back to "📚 Read Later" to avoid creating
duplicate bookmark folders for existing users
- Touch session dir mtime on each message via utimes() so cleanup
correctly reflects last activity, not just directory creation time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review round 2 — remove dead executionDir, fix emoji
- Remove executionDir from ChatServiceDeps and ChatRouteDeps since
resolveSessionDir now uses getSessionsDir() directly
- Fix missed emoji in notification format template
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
safeSkillDir() used a hardcoded `/` in the startsWith path traversal
check. On Windows, path.resolve() returns backslash paths, so the check
always failed — blocking getSkill, createSkill, updateSkill, deleteSkill.
Replace `${skillsDir}/` with `${skillsDir}${sep}` using path.sep from
node:path, which returns `\` on Windows and `/` on POSIX.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: persist default kimi hub provider to BrowserOS prefs on first load
When VITE_PUBLIC_KIMI_LAUNCH is enabled, loadProviders() returned default
Kimi provider in-memory but never saved it to the BrowserOS pref. The
browser's C++ code reads the pref directly and found it empty, so Kimi
didn't appear in the toolbar until the user manually edited and saved.
Now loadProviders() persists defaults and ensureKimiFirst() additions to
the pref, keeping the browser in sync with what the extension UI shows.
Fixes#428
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use reference equality for ensureKimiFirst change detection
Address PR review: reference check (normalized !== providers) is more
semantically precise than length comparison since ensureKimiFirst returns
the same reference when unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Return a friendly JSON response when users curl GET /mcp instead of
an opaque 503. Narrows the catch-all .all() to .post() since the MCP
Streamable HTTP transport only needs POST for stateless servers.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>