mirror of
https://github.com/browseros-ai/BrowserOS.git
synced 2026-05-20 20:39:10 +00:00
* feat(agent): attach images and text files to chat messages
Adds end-to-end support for image and text file attachments in the chat
composer, with the staged files round-tripping through the OpenClaw
gateway as OpenAI-compatible content blocks and persisting in the JSONL
so they show up in the historical view.
Server
- HTTP client: new OpenClawChatContentPart union and a buildUserContent
helper that emits multimodal content arrays when messageParts is
supplied, falls back to the legacy string content otherwise.
- Service: chatStream takes an optional messageParts array and forwards
it; BrowserOSChatHistoryItem gains an attachments field.
- JSONL reader: PiContentBlock learns the OpenAI image_url and Anthropic
image source/data shapes; user messages now emit user.attachment
events that the history mapper accumulates onto the next user item.
- Route: validates an inbound attachments[] (kind/mime/size/count),
inlines text-shaped files as <attachment> blocks in the message body,
attaches images via image_url parts. Replaces the immediate 409 on
active monitoring session with a 30s waitForSessionFree(agentId) wait
(registry now exposes onSessionEnd) so cron/hook contention does not
reject a user-chat send outright. Returns 503 if the wait times out.
Client
- New lib/attachments.ts: validateAttachment / compressImageIfNeeded
(canvas downscale to 2048px long edge, JPEG 0.85 re-encode for >1.5
MB inputs) / stageAttachment / stageAttachments that produces the
staged-attachment shape the composer renders and the payload the
server accepts.
- ConversationInput: drag-and-drop, paperclip button, clipboard paste,
staged attachment chip strip with thumbnails for images and a
paperclip+name chip for text files. Send button enables on either
text or attachments. Drop-zone overlay during drag.
- chatWithAgent forwards attachments[]; useAgentConversation.send
accepts a SendInput shape and renders user attachments on the
optimistic streaming turn via MessageAttachments / MessageAttachment.
- ClawChatMessage groups historical attachment parts into a single
MessageAttachments strip, ordered before reasoning/tools/text.
- claw-chat-types adds an attachment ClawChatMessagePart variant; the
history mapper emits attachment parts first and skips the text part
when the user only sent media.
- AgentCommandHome forwards the new SendInput shape — home composer
drops attachments at the boundary in v1 (the conversation page is
where staging is most useful; carrying bytes through the URL bar
is not sensible).
Limits: 10 attachments per message, 5 MB per image (post compression),
1 MB per text file, mime types png/jpeg/webp/gif and text/* +
application/json. PDFs and other binaries are deferred to v2.
* feat(agent): outbound message queue for chats while agent is mid-turn
Lets users keep typing and submitting messages while the agent is still
streaming a previous turn. Each press is appended to a single-flight
queue and dispatched as soon as `streaming` flips false; the queued
state renders as a strip above the composer so the user sees what's
pending vs. what's already sending.
- New `useOutboundQueue` hook owns the queue, the worker effect, and
cancel/retry actions. Single-flight by design — a re-entrancy ref
guard prevents two simultaneous dispatches when `streaming` flickers.
- Composer (`ConversationInput`) accepts optional `outboundQueue`,
`onCancelQueued`, `onRetryQueued` props. When the queue is provided
the send-button gate stops blocking on `streaming`; the spinner stays
as the visual cue that the agent is still busy. Legacy direct-send
callers keep the old streaming-blocks-send semantic.
- Renders an OutboundQueueStrip above the staged-attachment strip with
per-item status (queued / sending / failed), a cancel button on
queued items, and retry + discard on failed items.
- AgentCommandConversation wires `onSend` to `queue.enqueue` and routes
the home composer's `?q=` initial-message handoff through the queue
too, so it inherits the same single-flight serialization.
The server-side `waitForSessionFree` (added with attachments) and this
client-side queue together cover both contention sources: cron / hook
turns and back-to-back user sends. Persistence across reloads is
intentionally out of scope for v1 — losing the queue on extension
reload is documented as a known limitation.
* feat(server): server-side outbound message queue
Replaces the client-only React-state queue from 123ef21d with a
proper server-owned queue. Closing the tab is now safe — the server
holds queued messages and dispatches them through the existing
chatStream path the moment the agent's ClawSession status flips to
idle.
Server
- New OutboundQueueService (apps/server/src/api/services/queue) — per
agent FIFO, in-memory. Subscribes to ClawSession.onStateChange
through OpenClawService.onAgentStatusChange, and dispatches via
OpenClawService.chatStream so attachments / history / monitoring
all behave identically to the existing /chat route. The worker
drains the SSE response server-side so the gateway run finalizes
cleanly even with no client connected.
- Four new routes under /claw/agents/:id/queue:
POST /queue enqueue
DELETE /queue/:itemId cancel a queued item
POST /queue/:itemId/retry re-queue a failed item
GET /queue/stream SSE feed of the per-agent queue state.
Validation reuses validateChatAttachments and
buildMessagePartsFromAttachments from the existing chat route.
- Singleton wired in apps/server/src/main.ts; shutdown on SIGTERM.
- New OpenClawService.getAgentState getter for the queue worker's
pre-dispatch sanity check.
Client
- useOutboundQueue rewritten as an SSE-backed projection over server
state. Public API unchanged so the composer still works.
- enqueue POSTs to /queue and shows an optimistic local entry until
the server's SSE snapshot reflects it; local-only entries get a
`local-` id prefix so cancel can short-circuit them without
hitting the server.
- AgentCommandConversation watches the queue for sending items
dropping out and refetches history so the new assistant turn shows
up in the conversation view (the server worker streams the
dispatched turn into OpenClaw without exposing per-turn SSE to
the client).
Out of scope (documented in the plan as v2 follow-ups): disk
persistence (server restart loses queue), per-turn live streaming
of queued sends in the conversation view, and switching the
underlying dispatch from /v1/chat/completions to the chat.send RPC
(which would also fix the multimodal attachment routing problem).
* fix(server): outbound queue must reuse existing session, not spawn UUIDs
The queue worker was generating a fresh randomUUID() as the sessionKey
when the queued item didn't carry one — and the client wasn't sending
one. Result: every queued message kicked off a brand-new OpenClaw
session, orphaning the user's active conversation behind the new
"most recent" entry in sessions.json. The history endpoint then
resolved to the orphan and the chat appeared to disappear.
Fix is layered:
- Client (useOutboundQueue): forward the current resolvedSessionKey
in the POST /queue body so every queued message targets the same
conversation the user is viewing. AgentCommandConversation passes
resolvedSessionKey into the hook.
- Server (OutboundQueueService): the worker now resolves to the
agent's existing user-chat session when no sessionKey is provided
on the queued item, via OpenClawService.resolveAgentSession. UUID
fallback is now reserved for the first-ever message on a brand
new agent — same semantic the existing /chat route has implicitly
through the catalog of historical sessions.
No JSONL data was lost by the original bug (the prior conversations
are intact on disk); the orphan sessions just shadowed the original
in sessions.json.
* fix(agent,server): address PR review feedback for chat queue
- Tighten image data URL cap to base64-aware ~6.7 MB (was ~7.5 MB
through `MAX_IMAGE_BYTES * 2`).
- Forward chat history from useOutboundQueue.enqueue so queued sends
preserve conversation context like direct sends do.
- Match local attachment previews to server snapshots by id (not by
message text), and prune the preview map as items drain.
- Pass an AbortSignal into chatStream so a queue shutdown cancels the
initial OpenClaw handshake, not just the SSE drain loop.
- Track previously gitignored apps/agent/lib/attachments.ts (was caught
by global lib/ ignore) so CI typecheck can resolve @/lib/attachments.
- Update server-api openclaw route tests to the new chatStream signature
and the waitForSessionFree-based busy-agent path.
* fix(agent): dedupe optimistic queue entries for text-only sends
The localId↔serverId map was only populated when the message had
attachments, so plain-text sends left the optimistic local entry in
place after the server snapshot arrived — the user saw the same
message rendered twice in the queue strip.
* fix(agent): prune optimistic queue entry on POST ack, not just SSE
The server broadcasts the new queue snapshot before its POST response
returns, so the SSE handler often runs first — at that point the
localId↔serverId map has no entry for the new server id yet, so the
SSE-based dedupe path can't drop the optimistic local entry. Pruning
on POST success closes the race deterministically.
* fix(agent): hand off optimistic queue entry without a render gap
Pruning the local entry on POST success only worked when the SSE
snapshot had already overwritten it; if the POST response landed
first, the optimistic row disappeared for a frame before the SSE
snapshot brought back the server-keyed row, producing a visible
flicker. Gate the POST-side prune on the SSE snapshot already
carrying the server id, and rely on the SSE-based dedupe (now
guaranteed to find the localId↔serverId link in the map) to clean
up when SSE arrives later.
* fix(agent,server): client-generated queue id eliminates render flicker
The server used to assign its own UUID when an item was enqueued, so
the optimistic client row carried a `local-` id while the SSE snapshot
carried a server UUID — the client had to wait for the POST response
to learn the mapping before it could dedupe, and during that window
both rows rendered.
Now the browser generates the id, sends it in the POST body, and the
server uses it verbatim (falling back to a fresh UUID only if the id
collides with an existing item). The client collapses to a single
id-keyed list, so the optimistic row and the SSE row reconcile on the
same key from the very first render.
203 lines
6.4 KiB
TypeScript
203 lines
6.4 KiB
TypeScript
import { Bot, CheckCircle2, Loader2, Wrench, XCircle } from 'lucide-react'
|
|
import { type FC, useMemo } from 'react'
|
|
import {
|
|
Message,
|
|
MessageAttachment,
|
|
MessageAttachments,
|
|
MessageContent,
|
|
MessageResponse,
|
|
} from '@/components/ai-elements/message'
|
|
import {
|
|
Reasoning,
|
|
ReasoningContent,
|
|
ReasoningTrigger,
|
|
} from '@/components/ai-elements/reasoning'
|
|
import {
|
|
Task,
|
|
TaskContent,
|
|
TaskItem,
|
|
TaskTrigger,
|
|
} from '@/components/ai-elements/task'
|
|
import type {
|
|
AgentConversationTurn,
|
|
ToolEntry,
|
|
} from '@/lib/agent-conversations/types'
|
|
|
|
interface ConversationMessageProps {
|
|
turn: AgentConversationTurn
|
|
streaming: boolean
|
|
}
|
|
|
|
interface RenderEntry {
|
|
kind: 'thinking' | 'text' | 'task'
|
|
partIndex: number
|
|
text?: string
|
|
done?: boolean
|
|
tools?: ToolEntry[]
|
|
}
|
|
|
|
/**
|
|
* Build the render plan for an assistant turn:
|
|
* - thinking and text parts render in place
|
|
* - all tool-batch parts collapse into a single Task entry at their first
|
|
* appearance position, with tools listed in arrival order
|
|
*/
|
|
function buildRenderEntries(turn: AgentConversationTurn): RenderEntry[] {
|
|
const entries: RenderEntry[] = []
|
|
const aggregatedTools: ToolEntry[] = []
|
|
let taskInserted = false
|
|
|
|
turn.parts.forEach((part, partIndex) => {
|
|
if (part.kind === 'thinking') {
|
|
entries.push({
|
|
kind: 'thinking',
|
|
partIndex,
|
|
text: part.text,
|
|
done: part.done,
|
|
})
|
|
} else if (part.kind === 'text') {
|
|
entries.push({ kind: 'text', partIndex, text: part.text })
|
|
} else if (part.kind === 'tool-batch') {
|
|
aggregatedTools.push(...part.tools)
|
|
if (!taskInserted) {
|
|
entries.push({
|
|
kind: 'task',
|
|
partIndex,
|
|
tools: aggregatedTools,
|
|
})
|
|
taskInserted = true
|
|
}
|
|
}
|
|
})
|
|
|
|
return entries
|
|
}
|
|
|
|
function ToolStatusIcon({ status }: { status: ToolEntry['status'] }) {
|
|
if (status === 'running') {
|
|
return (
|
|
<Loader2 className="size-3.5 shrink-0 animate-spin text-muted-foreground" />
|
|
)
|
|
}
|
|
if (status === 'completed') {
|
|
return <CheckCircle2 className="size-3.5 shrink-0 text-green-500" />
|
|
}
|
|
return <XCircle className="size-3.5 shrink-0 text-destructive" />
|
|
}
|
|
|
|
export const ConversationMessage: FC<ConversationMessageProps> = ({
|
|
turn,
|
|
streaming,
|
|
}) => {
|
|
const entries = useMemo(() => buildRenderEntries(turn), [turn])
|
|
|
|
return (
|
|
<div className="space-y-3">
|
|
<Message from="user">
|
|
<MessageContent>
|
|
{turn.userAttachments && turn.userAttachments.length > 0 && (
|
|
<MessageAttachments>
|
|
{turn.userAttachments.map((attachment) => (
|
|
<MessageAttachment
|
|
key={attachment.id}
|
|
data={{
|
|
type: 'file',
|
|
url: attachment.dataUrl ?? '',
|
|
mediaType: attachment.mediaType,
|
|
filename: attachment.name,
|
|
}}
|
|
/>
|
|
))}
|
|
</MessageAttachments>
|
|
)}
|
|
{turn.userText && (
|
|
<pre className="whitespace-pre-wrap font-sans text-sm">
|
|
{turn.userText}
|
|
</pre>
|
|
)}
|
|
</MessageContent>
|
|
</Message>
|
|
|
|
{entries.length > 0 && (
|
|
<Message from="assistant">
|
|
<MessageContent>
|
|
{entries.map((entry) => {
|
|
const key = `${turn.id}-entry-${entry.partIndex}`
|
|
|
|
if (entry.kind === 'thinking') {
|
|
return (
|
|
<Reasoning
|
|
key={key}
|
|
className="w-full"
|
|
isStreaming={!entry.done}
|
|
defaultOpen={!entry.done}
|
|
>
|
|
<ReasoningTrigger />
|
|
<ReasoningContent>{entry.text ?? ''}</ReasoningContent>
|
|
</Reasoning>
|
|
)
|
|
}
|
|
|
|
if (entry.kind === 'text') {
|
|
return (
|
|
<MessageResponse key={key}>
|
|
{entry.text ?? ''}
|
|
</MessageResponse>
|
|
)
|
|
}
|
|
|
|
const tools = entry.tools ?? []
|
|
const allDone = tools.every((t) => t.status !== 'running')
|
|
const taskTitle = allDone
|
|
? `Agent activity (${tools.length} ${tools.length === 1 ? 'action' : 'actions'})`
|
|
: `Working… (${tools.length} ${tools.length === 1 ? 'action' : 'actions'})`
|
|
|
|
return (
|
|
<Task key={key} defaultOpen={!turn.done}>
|
|
<TaskTrigger title={taskTitle} TriggerIcon={Wrench} />
|
|
<TaskContent>
|
|
{tools.map((tool) => (
|
|
<TaskItem
|
|
key={tool.id}
|
|
className="flex items-center gap-2"
|
|
>
|
|
<ToolStatusIcon status={tool.status} />
|
|
<span className="text-foreground text-xs">
|
|
{tool.label}
|
|
</span>
|
|
{tool.subject ? (
|
|
<span className="ml-1.5 truncate text-muted-foreground/70 text-xs">
|
|
· {tool.subject}
|
|
</span>
|
|
) : null}
|
|
{tool.durationMs != null && (
|
|
<span className="ml-auto text-muted-foreground/60 text-xs tabular-nums">
|
|
{(tool.durationMs / 1000).toFixed(1)}s
|
|
</span>
|
|
)}
|
|
</TaskItem>
|
|
))}
|
|
</TaskContent>
|
|
</Task>
|
|
)
|
|
})}
|
|
</MessageContent>
|
|
</Message>
|
|
)}
|
|
|
|
{!turn.done && turn.parts.length === 0 && streaming && (
|
|
<div className="flex gap-2">
|
|
<div className="flex size-7 shrink-0 items-center justify-center rounded-full bg-[var(--accent-orange)] text-white">
|
|
<Bot className="size-3.5" />
|
|
</div>
|
|
<div className="flex items-center gap-1 rounded-xl rounded-tl-none border border-border/50 bg-card px-3 py-2.5 shadow-sm">
|
|
<span className="size-1.5 animate-bounce rounded-full bg-[var(--accent-orange)] [animation-delay:-0.3s]" />
|
|
<span className="size-1.5 animate-bounce rounded-full bg-[var(--accent-orange)] [animation-delay:-0.15s]" />
|
|
<span className="size-1.5 animate-bounce rounded-full bg-[var(--accent-orange)]" />
|
|
</div>
|
|
</div>
|
|
)}
|
|
</div>
|
|
)
|
|
}
|