mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-18 19:16:22 +00:00
* feat(agent): attach images and text files to chat messages
Adds end-to-end support for image and text file attachments in the chat
composer, with the staged files round-tripping through the OpenClaw
gateway as OpenAI-compatible content blocks and persisting in the JSONL
so they show up in the historical view.
Server
- HTTP client: new OpenClawChatContentPart union and a buildUserContent
helper that emits multimodal content arrays when messageParts is
supplied, falls back to the legacy string content otherwise.
- Service: chatStream takes an optional messageParts array and forwards
it; BrowserOSChatHistoryItem gains an attachments field.
- JSONL reader: PiContentBlock learns the OpenAI image_url and Anthropic
image source/data shapes; user messages now emit user.attachment
events that the history mapper accumulates onto the next user item.
- Route: validates an inbound attachments[] (kind/mime/size/count),
inlines text-shaped files as <attachment> blocks in the message body,
attaches images via image_url parts. Replaces the immediate 409 on
active monitoring session with a 30s waitForSessionFree(agentId) wait
(registry now exposes onSessionEnd) so cron/hook contention does not
reject a user-chat send outright. Returns 503 if the wait times out.
Client
- New lib/attachments.ts: validateAttachment / compressImageIfNeeded
(canvas downscale to 2048px long edge, JPEG 0.85 re-encode for >1.5
MB inputs) / stageAttachment / stageAttachments that produces the
staged-attachment shape the composer renders and the payload the
server accepts.
- ConversationInput: drag-and-drop, paperclip button, clipboard paste,
staged attachment chip strip with thumbnails for images and a
paperclip+name chip for text files. Send button enables on either
text or attachments. Drop-zone overlay during drag.
- chatWithAgent forwards attachments[]; useAgentConversation.send
accepts a SendInput shape and renders user attachments on the
optimistic streaming turn via MessageAttachments / MessageAttachment.
- ClawChatMessage groups historical attachment parts into a single
MessageAttachments strip, ordered before reasoning/tools/text.
- claw-chat-types adds an attachment ClawChatMessagePart variant; the
history mapper emits attachment parts first and skips the text part
when the user only sent media.
- AgentCommandHome forwards the new SendInput shape — home composer
drops attachments at the boundary in v1 (the conversation page is
where staging is most useful; carrying bytes through the URL bar
is not sensible).
Limits: 10 attachments per message, 5 MB per image (post compression),
1 MB per text file, mime types png/jpeg/webp/gif and text/* +
application/json. PDFs and other binaries are deferred to v2.
* feat(agent): outbound message queue for chats while agent is mid-turn
Lets users keep typing and submitting messages while the agent is still
streaming a previous turn. Each press is appended to a single-flight
queue and dispatched as soon as `streaming` flips false; the queued
state renders as a strip above the composer so the user sees what's
pending vs. what's already sending.
- New `useOutboundQueue` hook owns the queue, the worker effect, and
cancel/retry actions. Single-flight by design — a re-entrancy ref
guard prevents two simultaneous dispatches when `streaming` flickers.
- Composer (`ConversationInput`) accepts optional `outboundQueue`,
`onCancelQueued`, `onRetryQueued` props. When the queue is provided
the send-button gate stops blocking on `streaming`; the spinner stays
as the visual cue that the agent is still busy. Legacy direct-send
callers keep the old streaming-blocks-send semantic.
- Renders an OutboundQueueStrip above the staged-attachment strip with
per-item status (queued / sending / failed), a cancel button on
queued items, and retry + discard on failed items.
- AgentCommandConversation wires `onSend` to `queue.enqueue` and routes
the home composer's `?q=` initial-message handoff through the queue
too, so it inherits the same single-flight serialization.
The server-side `waitForSessionFree` (added with attachments) and this
client-side queue together cover both contention sources: cron / hook
turns and back-to-back user sends. Persistence across reloads is
intentionally out of scope for v1 — losing the queue on extension
reload is documented as a known limitation.
* feat(server): server-side outbound message queue
Replaces the client-only React-state queue from 123ef21d with a
proper server-owned queue. Closing the tab is now safe — the server
holds queued messages and dispatches them through the existing
chatStream path the moment the agent's ClawSession status flips to
idle.
Server
- New OutboundQueueService (apps/server/src/api/services/queue) — per
agent FIFO, in-memory. Subscribes to ClawSession.onStateChange
through OpenClawService.onAgentStatusChange, and dispatches via
OpenClawService.chatStream so attachments / history / monitoring
all behave identically to the existing /chat route. The worker
drains the SSE response server-side so the gateway run finalizes
cleanly even with no client connected.
- Four new routes under /claw/agents/:id/queue:
POST /queue enqueue
DELETE /queue/:itemId cancel a queued item
POST /queue/:itemId/retry re-queue a failed item
GET /queue/stream SSE feed of the per-agent queue state.
Validation reuses validateChatAttachments and
buildMessagePartsFromAttachments from the existing chat route.
- Singleton wired in apps/server/src/main.ts; shutdown on SIGTERM.
- New OpenClawService.getAgentState getter for the queue worker's
pre-dispatch sanity check.
Client
- useOutboundQueue rewritten as an SSE-backed projection over server
state. Public API unchanged so the composer still works.
- enqueue POSTs to /queue and shows an optimistic local entry until
the server's SSE snapshot reflects it; local-only entries get a
`local-` id prefix so cancel can short-circuit them without
hitting the server.
- AgentCommandConversation watches the queue for sending items
dropping out and refetches history so the new assistant turn shows
up in the conversation view (the server worker streams the
dispatched turn into OpenClaw without exposing per-turn SSE to
the client).
Out of scope (documented in the plan as v2 follow-ups): disk
persistence (server restart loses queue), per-turn live streaming
of queued sends in the conversation view, and switching the
underlying dispatch from /v1/chat/completions to the chat.send RPC
(which would also fix the multimodal attachment routing problem).
* fix(server): outbound queue must reuse existing session, not spawn UUIDs
The queue worker was generating a fresh randomUUID() as the sessionKey
when the queued item didn't carry one — and the client wasn't sending
one. Result: every queued message kicked off a brand-new OpenClaw
session, orphaning the user's active conversation behind the new
"most recent" entry in sessions.json. The history endpoint then
resolved to the orphan and the chat appeared to disappear.
Fix is layered:
- Client (useOutboundQueue): forward the current resolvedSessionKey
in the POST /queue body so every queued message targets the same
conversation the user is viewing. AgentCommandConversation passes
resolvedSessionKey into the hook.
- Server (OutboundQueueService): the worker now resolves to the
agent's existing user-chat session when no sessionKey is provided
on the queued item, via OpenClawService.resolveAgentSession. UUID
fallback is now reserved for the first-ever message on a brand
new agent — same semantic the existing /chat route has implicitly
through the catalog of historical sessions.
No JSONL data was lost by the original bug (the prior conversations
are intact on disk); the orphan sessions just shadowed the original
in sessions.json.
* fix(agent,server): address PR review feedback for chat queue
- Tighten image data URL cap to base64-aware ~6.7 MB (was ~7.5 MB
through `MAX_IMAGE_BYTES * 2`).
- Forward chat history from useOutboundQueue.enqueue so queued sends
preserve conversation context like direct sends do.
- Match local attachment previews to server snapshots by id (not by
message text), and prune the preview map as items drain.
- Pass an AbortSignal into chatStream so a queue shutdown cancels the
initial OpenClaw handshake, not just the SSE drain loop.
- Track previously gitignored apps/agent/lib/attachments.ts (was caught
by global lib/ ignore) so CI typecheck can resolve @/lib/attachments.
- Update server-api openclaw route tests to the new chatStream signature
and the waitForSessionFree-based busy-agent path.
* fix(agent): dedupe optimistic queue entries for text-only sends
The localId↔serverId map was only populated when the message had
attachments, so plain-text sends left the optimistic local entry in
place after the server snapshot arrived — the user saw the same
message rendered twice in the queue strip.
* fix(agent): prune optimistic queue entry on POST ack, not just SSE
The server broadcasts the new queue snapshot before its POST response
returns, so the SSE handler often runs first — at that point the
localId↔serverId map has no entry for the new server id yet, so the
SSE-based dedupe path can't drop the optimistic local entry. Pruning
on POST success closes the race deterministically.
* fix(agent): hand off optimistic queue entry without a render gap
Pruning the local entry on POST success only worked when the SSE
snapshot had already overwritten it; if the POST response landed
first, the optimistic row disappeared for a frame before the SSE
snapshot brought back the server-keyed row, producing a visible
flicker. Gate the POST-side prune on the SSE snapshot already
carrying the server id, and rely on the SSE-based dedupe (now
guaranteed to find the localId↔serverId link in the map) to clean
up when SSE arrives later.
* fix(agent,server): client-generated queue id eliminates render flicker
The server used to assign its own UUID when an item was enqueued, so
the optimistic client row carried a `local-` id while the SSE snapshot
carried a server UUID — the client had to wait for the POST response
to learn the mapping before it could dedupe, and during that window
both rows rendered.
Now the browser generates the id, sends it in the POST body, and the
server uses it verbatim (falling back to a fresh UUID only if the id
collides with an existing item). The client collapses to a single
id-keyed list, so the optimistic row and the SSE row reconcile on the
same key from the very first render.
370 lines
10 KiB
TypeScript
370 lines
10 KiB
TypeScript
/**
|
|
* Composer attachment helpers — validation, image compression, and the
|
|
* client-side payload shape sent to /agents/:id/chat.
|
|
*
|
|
* Image attachments travel as `data:` URLs (base64) so the gateway, which
|
|
* runs on 127.0.0.1 over Lima virtiofs, can ingest them as standard
|
|
* OpenAI-style content blocks. Non-image text-shaped files are read into
|
|
* memory and travel as their extracted text body — the server inlines
|
|
* them as a fenced `<attachment>` block on the user message.
|
|
*/
|
|
|
|
export const MAX_ATTACHMENTS_PER_MESSAGE = 10
|
|
export const MAX_IMAGE_BYTES = 5 * 1024 * 1024 // 5 MB after compression
|
|
export const MAX_FILE_TEXT_BYTES = 1 * 1024 * 1024 // 1 MB extracted text
|
|
export const IMAGE_LONG_EDGE_CAP = 2048
|
|
|
|
export const ALLOWED_IMAGE_MEDIA_TYPES = [
|
|
'image/png',
|
|
'image/jpeg',
|
|
'image/jpg',
|
|
'image/webp',
|
|
'image/gif',
|
|
] as const
|
|
|
|
export const ALLOWED_FILE_MEDIA_TYPE_PREFIXES = [
|
|
'text/',
|
|
'application/json',
|
|
] as const
|
|
|
|
export type ServerImageAttachment = {
|
|
kind: 'image'
|
|
mediaType: string
|
|
dataUrl: string
|
|
name?: string
|
|
}
|
|
|
|
export type ServerFileAttachment = {
|
|
kind: 'file'
|
|
mediaType: string
|
|
name: string
|
|
text: string
|
|
}
|
|
|
|
export type ServerAttachmentPayload =
|
|
| ServerImageAttachment
|
|
| ServerFileAttachment
|
|
|
|
/** UI-side representation: what the composer needs to render a chip. */
|
|
export interface StagedAttachment {
|
|
id: string
|
|
kind: 'image' | 'file'
|
|
mediaType: string
|
|
name: string
|
|
// Set for images so the chip thumbnail can render directly. For files
|
|
// we don't need a preview yet, but the field exists for v2 PDF previews.
|
|
dataUrl?: string
|
|
// Pre-computed payload for the server. Built once at staging time so
|
|
// re-renders don't re-encode large blobs.
|
|
payload: ServerAttachmentPayload
|
|
}
|
|
|
|
export type AttachmentValidationError =
|
|
| { code: 'too_many'; message: string }
|
|
| { code: 'unsupported_type'; message: string; mediaType: string }
|
|
| { code: 'too_large'; message: string }
|
|
| { code: 'read_failed'; message: string }
|
|
|
|
export type StageAttachmentResult =
|
|
| { ok: true; attachment: StagedAttachment }
|
|
| { ok: false; error: AttachmentValidationError }
|
|
|
|
function isImageMediaType(mediaType: string): boolean {
|
|
return (ALLOWED_IMAGE_MEDIA_TYPES as readonly string[]).includes(mediaType)
|
|
}
|
|
|
|
function isAllowedFileMediaType(mediaType: string): boolean {
|
|
return ALLOWED_FILE_MEDIA_TYPE_PREFIXES.some((prefix) =>
|
|
mediaType.startsWith(prefix),
|
|
)
|
|
}
|
|
|
|
/** Build a unique id without depending on `crypto.randomUUID` outside DOM. */
|
|
function makeId(): string {
|
|
if (typeof crypto !== 'undefined' && crypto.randomUUID) {
|
|
return crypto.randomUUID()
|
|
}
|
|
return `att-${Date.now().toString(36)}-${Math.random().toString(36).slice(2, 10)}`
|
|
}
|
|
|
|
/**
|
|
* Read a `File` and produce the staged-attachment shape — validate type,
|
|
* compress if it's a large image, and pre-build the server payload.
|
|
*/
|
|
export async function stageAttachment(
|
|
file: File,
|
|
): Promise<StageAttachmentResult> {
|
|
const mediaType = file.type || 'application/octet-stream'
|
|
|
|
if (isImageMediaType(mediaType)) {
|
|
try {
|
|
const compressed = await compressImageIfNeeded(file)
|
|
const dataUrl = await readAsDataUrl(compressed)
|
|
// Rough byte ceiling — `data:image/png;base64,...` doubles size with
|
|
// base64. Reject early so we never POST something the route will 400.
|
|
if (dataUrl.length > MAX_IMAGE_BYTES * 2) {
|
|
return {
|
|
ok: false,
|
|
error: {
|
|
code: 'too_large',
|
|
message: `Image "${file.name}" is too large (max ${humanBytes(
|
|
MAX_IMAGE_BYTES,
|
|
)}).`,
|
|
},
|
|
}
|
|
}
|
|
return {
|
|
ok: true,
|
|
attachment: {
|
|
id: makeId(),
|
|
kind: 'image',
|
|
mediaType,
|
|
name: file.name || 'image',
|
|
dataUrl,
|
|
payload: {
|
|
kind: 'image',
|
|
mediaType,
|
|
dataUrl,
|
|
name: file.name || undefined,
|
|
},
|
|
},
|
|
}
|
|
} catch (err) {
|
|
return {
|
|
ok: false,
|
|
error: {
|
|
code: 'read_failed',
|
|
message:
|
|
err instanceof Error
|
|
? err.message
|
|
: `Failed to read image "${file.name}".`,
|
|
},
|
|
}
|
|
}
|
|
}
|
|
|
|
if (isAllowedFileMediaType(mediaType)) {
|
|
let text: string
|
|
try {
|
|
text = await file.text()
|
|
} catch (err) {
|
|
return {
|
|
ok: false,
|
|
error: {
|
|
code: 'read_failed',
|
|
message:
|
|
err instanceof Error
|
|
? err.message
|
|
: `Failed to read file "${file.name}".`,
|
|
},
|
|
}
|
|
}
|
|
if (text.length > MAX_FILE_TEXT_BYTES) {
|
|
return {
|
|
ok: false,
|
|
error: {
|
|
code: 'too_large',
|
|
message: `File "${file.name}" is too large (max ${humanBytes(
|
|
MAX_FILE_TEXT_BYTES,
|
|
)}).`,
|
|
},
|
|
}
|
|
}
|
|
return {
|
|
ok: true,
|
|
attachment: {
|
|
id: makeId(),
|
|
kind: 'file',
|
|
mediaType,
|
|
name: file.name || 'attachment',
|
|
payload: {
|
|
kind: 'file',
|
|
mediaType,
|
|
name: file.name || 'attachment',
|
|
text,
|
|
},
|
|
},
|
|
}
|
|
}
|
|
|
|
return {
|
|
ok: false,
|
|
error: {
|
|
code: 'unsupported_type',
|
|
message: `Unsupported attachment type: ${mediaType || 'unknown'}`,
|
|
mediaType,
|
|
},
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Stage multiple files at once, enforcing the per-message cap. The result
|
|
* partitions successful stages and any errors so the caller can show
|
|
* granular toasts.
|
|
*/
|
|
export async function stageAttachments(
|
|
files: File[],
|
|
alreadyStaged: number,
|
|
): Promise<{
|
|
staged: StagedAttachment[]
|
|
errors: AttachmentValidationError[]
|
|
}> {
|
|
const remainingSlots = Math.max(
|
|
0,
|
|
MAX_ATTACHMENTS_PER_MESSAGE - alreadyStaged,
|
|
)
|
|
const staged: StagedAttachment[] = []
|
|
const errors: AttachmentValidationError[] = []
|
|
|
|
if (remainingSlots === 0 && files.length > 0) {
|
|
errors.push({
|
|
code: 'too_many',
|
|
message: `At most ${MAX_ATTACHMENTS_PER_MESSAGE} attachments per message.`,
|
|
})
|
|
return { staged, errors }
|
|
}
|
|
|
|
const overflow = files.length - remainingSlots
|
|
if (overflow > 0) {
|
|
errors.push({
|
|
code: 'too_many',
|
|
message: `Only the first ${remainingSlots} of ${files.length} files were attached (max ${MAX_ATTACHMENTS_PER_MESSAGE}).`,
|
|
})
|
|
}
|
|
|
|
for (const file of files.slice(0, remainingSlots)) {
|
|
const result = await stageAttachment(file)
|
|
if (result.ok) {
|
|
staged.push(result.attachment)
|
|
} else {
|
|
errors.push(result.error)
|
|
}
|
|
}
|
|
|
|
return { staged, errors }
|
|
}
|
|
|
|
/**
|
|
* Resize images that are oversized to a sane long-edge cap. JPEG/WebP
|
|
* source files are re-encoded to JPEG; PNGs/GIFs that are already small
|
|
* are passed through untouched.
|
|
*/
|
|
export async function compressImageIfNeeded(file: File): Promise<Blob> {
|
|
// Cheap path: small files don't need any transform.
|
|
if (file.size <= 1.5 * 1024 * 1024) return file
|
|
|
|
const bitmap = await blobToImageBitmap(file)
|
|
const { width, height } = bitmap
|
|
const longEdge = Math.max(width, height)
|
|
if (longEdge <= IMAGE_LONG_EDGE_CAP && file.size <= MAX_IMAGE_BYTES) {
|
|
bitmap.close?.()
|
|
return file
|
|
}
|
|
|
|
const scale = Math.min(1, IMAGE_LONG_EDGE_CAP / longEdge)
|
|
const targetWidth = Math.max(1, Math.round(width * scale))
|
|
const targetHeight = Math.max(1, Math.round(height * scale))
|
|
|
|
const canvas =
|
|
typeof OffscreenCanvas !== 'undefined'
|
|
? new OffscreenCanvas(targetWidth, targetHeight)
|
|
: Object.assign(document.createElement('canvas'), {
|
|
width: targetWidth,
|
|
height: targetHeight,
|
|
})
|
|
|
|
const ctx = canvas.getContext('2d') as
|
|
| CanvasRenderingContext2D
|
|
| OffscreenCanvasRenderingContext2D
|
|
| null
|
|
if (!ctx) {
|
|
bitmap.close?.()
|
|
return file
|
|
}
|
|
ctx.drawImage(bitmap, 0, 0, targetWidth, targetHeight)
|
|
bitmap.close?.()
|
|
|
|
const outputType = 'image/jpeg'
|
|
if (canvas instanceof HTMLCanvasElement) {
|
|
return await new Promise<Blob>((resolve, reject) => {
|
|
canvas.toBlob(
|
|
(blob) => {
|
|
if (blob) resolve(blob)
|
|
else reject(new Error('Image compression failed.'))
|
|
},
|
|
outputType,
|
|
0.85,
|
|
)
|
|
})
|
|
}
|
|
return await (canvas as OffscreenCanvas).convertToBlob({
|
|
type: outputType,
|
|
quality: 0.85,
|
|
})
|
|
}
|
|
|
|
async function blobToImageBitmap(blob: Blob): Promise<ImageBitmap> {
|
|
if (typeof createImageBitmap === 'function') {
|
|
return createImageBitmap(blob)
|
|
}
|
|
// Fallback: load via an Image element and use the canvas decode path.
|
|
const url = URL.createObjectURL(blob)
|
|
try {
|
|
const img = await new Promise<HTMLImageElement>((resolve, reject) => {
|
|
const el = new Image()
|
|
el.onload = () => resolve(el)
|
|
el.onerror = () =>
|
|
reject(new Error('Failed to decode image for compression.'))
|
|
el.src = url
|
|
})
|
|
const canvas = document.createElement('canvas')
|
|
canvas.width = img.naturalWidth
|
|
canvas.height = img.naturalHeight
|
|
const ctx = canvas.getContext('2d')
|
|
if (!ctx) throw new Error('Canvas 2D context unavailable.')
|
|
ctx.drawImage(img, 0, 0)
|
|
const blobOut = await new Promise<Blob | null>((resolve) =>
|
|
canvas.toBlob(resolve, 'image/png'),
|
|
)
|
|
if (!blobOut) throw new Error('Canvas toBlob returned null.')
|
|
return await createImageBitmap(blobOut)
|
|
} finally {
|
|
URL.revokeObjectURL(url)
|
|
}
|
|
}
|
|
|
|
async function readAsDataUrl(blob: Blob): Promise<string> {
|
|
if ('arrayBuffer' in blob && typeof FileReader === 'undefined') {
|
|
const buffer = await blob.arrayBuffer()
|
|
const base64 = arrayBufferToBase64(buffer)
|
|
const type = blob.type || 'application/octet-stream'
|
|
return `data:${type};base64,${base64}`
|
|
}
|
|
return await new Promise<string>((resolve, reject) => {
|
|
const reader = new FileReader()
|
|
reader.onload = () => resolve(reader.result as string)
|
|
reader.onerror = () =>
|
|
reject(reader.error ?? new Error('FileReader failed to read blob.'))
|
|
reader.readAsDataURL(blob)
|
|
})
|
|
}
|
|
|
|
function arrayBufferToBase64(buffer: ArrayBuffer): string {
|
|
const bytes = new Uint8Array(buffer)
|
|
let binary = ''
|
|
const chunkSize = 0x8000
|
|
for (let i = 0; i < bytes.byteLength; i += chunkSize) {
|
|
binary += String.fromCharCode.apply(
|
|
null,
|
|
Array.from(bytes.subarray(i, Math.min(i + chunkSize, bytes.byteLength))),
|
|
)
|
|
}
|
|
return btoa(binary)
|
|
}
|
|
|
|
function humanBytes(bytes: number): string {
|
|
if (bytes >= 1024 * 1024) return `${(bytes / 1024 / 1024).toFixed(0)} MB`
|
|
if (bytes >= 1024) return `${(bytes / 1024).toFixed(0)} KB`
|
|
return `${bytes} B`
|
|
}
|