release: v0.1.5 web_fetch direct links

2026-05-13 15:46:00 +00:00 · 2026-03-23 15:25:52 +03:00
parent 61a43dc45a
commit 5d3b1218ab
14 changed files with 308 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,21 @@

 All notable changes to this project will be documented in this file.

+## [0.1.5] - 2026-03-23
+
+### Added
+- New `web_fetch` tool for direct URL reading and page extraction.
+- New prompt guide `tool-web_fetch.md` for link-specific workflows.
+
+### Changed
+- `search_web` is now explicitly discovery-oriented; direct links should use `web_fetch`.
+- Chat tool output UI now shows `Web Fetch` calls with the target URL.
+- Request lifecycle docs updated with `web_fetch` in tool catalog.
+
+### Fixed
+- Direct link requests no longer degrade into generic search queries.
+- Health endpoint version updated to `0.1.5`.
+
 ## [0.1.4] - 2026-03-23

 ### Added
--- a/README.md
+++ b/README.md
@@ -22,8 +22,8 @@ The app runs as a Next.js service and stores runtime state on disk (`./data`).

 ## Releases

- Latest release snapshot: [0.1.4 - Web Search Autostart](./docs/releases/0.1.4-web-search-autostart.md)
- GitHub release body : [v0.1.4](./docs/releases/github-v0.1.4.md)
+- Latest release snapshot: [0.1.5 - Web Fetch for Direct Links](./docs/releases/0.1.5-web-fetch-direct-links.md)
+- GitHub release body : [v0.1.5](./docs/releases/github-v0.1.5.md)
 - Release archive: [docs/releases/README.md](./docs/releases/README.md)

 ## Contributing and Support
--- a/docs/releases/0.1.5-web-fetch-direct-links.md
+++ b/docs/releases/0.1.5-web-fetch-direct-links.md
@@ -0,0 +1,42 @@
+# Eggent 0.1.5 - Web Fetch for Direct Links
+
+Date: 2026-03-23  
+Type: Patch release snapshot
+
+## Release Name
+`Web Fetch for Direct Links`
+
+This release adds a dedicated `web_fetch` tool so agents can open and read a specific URL directly, instead of treating links as generic search queries.
+
+## What Is Included
+
+### 1) New `web_fetch` Tool
+- Added a dedicated tool for fetching direct `http(s)` URLs.
+- Supports redirected URLs and returns readable page content.
+- Extracts useful text from HTML pages, plus supports JSON/text responses.
+
+### 2) URL Fetch Reliability Guards
+- Added URL normalization and validation for direct-link requests.
+- Added timeout and response-size limits to keep tool execution stable.
+- Added content trimming for large pages to keep results manageable for the model.
+
+### 3) Tooling and Prompt Separation
+- `search_web` remains discovery-focused for broad web lookup.
+- Added explicit prompt guidance so direct links use `web_fetch`.
+- Added `tool-web_fetch.md` with usage rules for link-based tasks.
+
+### 4) Chat UI and Docs Updates
+- Tool output panel now renders `Web Fetch` with target URL context.
+- Request lifecycle docs now include `web_fetch` in the tool catalog.
+
+## New in 0.1.5
+
+- Direct link reading with a first-class `web_fetch` tool.
+- Cleaner separation: `search_web` for search, `web_fetch` for specific pages.
+- Package/app health version bumped to `0.1.5`.
+
+## Upgrade Notes
+
+- No migration is required.
+- Existing workflows continue to work.
+- For link-specific tasks, use `web_fetch` instead of `search_web`.
--- a/docs/releases/README.md
+++ b/docs/releases/README.md
@@ -4,6 +4,7 @@ This directory contains release summaries and publish-ready notes.

 | Version | Name | Date | Notes |
 | --- | --- | --- | --- |
+| `0.1.5` | Web Fetch for Direct Links | 2026-03-23 | [Full snapshot](./0.1.5-web-fetch-direct-links.md), [GitHub body](./github-v0.1.5.md) |
 | `0.1.4` | Web Search Autostart | 2026-03-23 | [Full snapshot](./0.1.4-web-search-autostart.md), [GitHub body](./github-v0.1.4.md) |
 | `0.1.3` | OAuth Native CLI Providers | 2026-03-06 | [Full snapshot](./0.1.3-oauth-native-cli-providers.md), [GitHub body](./github-v0.1.3.md) |
 | `0.1.2` | Dark Theme and Python Recovery | 2026-03-06 | [Full snapshot](./0.1.2-dark-theme-python-recovery.md), [GitHub body](./github-v0.1.2.md) |
--- a/docs/releases/github-v0.1.5.md
+++ b/docs/releases/github-v0.1.5.md
@@ -0,0 +1,23 @@
+## Eggent v0.1.5 - Web Fetch for Direct Links
+
+Patch release focused on direct-link handling via a dedicated web fetch tool.
+
+### Highlights
+
+- Added new `web_fetch` tool for opening and reading specific URLs.
+- Added HTML-to-text extraction, JSON/text handling, timeout, and response-size limits for stable fetch behavior.
+- Kept `search_web` focused on discovery; direct links now use `web_fetch`.
+- Updated chat tool output UI with `Web Fetch` label and target URL preview.
+- Updated request-flow documentation and tool prompts for the new split.
+- Version bump to `0.1.5` across package metadata and `GET /api/health`.
+
+### Upgrade Notes
+
+- No migration required.
+- Existing search behavior is preserved.
+- For URL-specific tasks, call `web_fetch` directly.
+
+### Links
+
+- Full release snapshot: `docs/releases/0.1.5-web-fetch-direct-links.md`
+- Installation and update guide: `README.md`
--- a/docs/request-flow.md
+++ b/docs/request-flow.md
@@ -46,6 +46,7 @@ A tool set is created depending on context and settings:
 | `memory_delete`    | If memory is enabled       | Delete memory records                |
 | `knowledge_query`  | Always                     | Search knowledge base documents      |
 | `search_web`       | If web search is enabled   | Search the internet                  |
+| `web_fetch`        | If web tools are enabled   | Fetch a specific URL                 |
 | `load_skill`       | If `projectId` exists      | Load full skill instructions         |
 | `call_subordinate` | For agents 0-2 only        | Delegate to a subordinate agent      |

@@ -159,7 +160,7 @@ When **`load_skill`** is called, the tool reads the selected skill's full **SKIL
 [agent.ts] runAgent:
   1. getSettings() -> model, settings
   2. getChat(chatId) -> context.history
-   3. createAgentTools(context, settings) -> tools (response, code_execution, memory_*, knowledge_query, search_web?, load_skill?, call_subordinate?)
+   3. createAgentTools(context, settings) -> tools (response, code_execution, memory_*, knowledge_query, search_web?, web_fetch?, load_skill?, call_subordinate?)
   4. buildSystemPrompt(projectId, agentNumber, toolNames) ->
        system.md + Agent Identity + tool-*.md per tool + Active Project + project.instructions + loadProjectSkillsMetadata -> <available_skills> + date/time
   5. messages = history + { user, userMessage }
--- a/package-lock.json
+++ b/package-lock.json
@@ -1,12 +1,12 @@
 {
  "name": "design-vibe",
-  "version": "0.1.4",
+  "version": "0.1.5",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "design-vibe",
-      "version": "0.1.4",
+      "version": "0.1.5",
      "dependencies": {
        "@ai-sdk/anthropic": "^3.0.37",
        "@ai-sdk/google": "^3.0.21",
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "design-vibe",
-  "version": "0.1.4",
+  "version": "0.1.5",
  "private": true,
  "scripts": {
    "dev": "next dev",
--- a/src/app/api/health/route.ts
+++ b/src/app/api/health/route.ts
@@ -2,6 +2,6 @@ export async function GET() {
  return Response.json({
    status: "ok",
    timestamp: new Date().toISOString(),
-    version: "0.1.4",
+    version: "0.1.5",
  });
 }
--- a/src/components/chat/tool-output.tsx
+++ b/src/components/chat/tool-output.tsx
@@ -7,6 +7,7 @@ import {
  Terminal,
  Brain,
  Search,
+  Globe,
  FileText,
  Bot,
  Puzzle,
@@ -27,6 +28,7 @@ const TOOL_ICONS: Record<string, React.ElementType> = {
  memory_load: Brain,
  memory_delete: Brain,
  search_web: Search,
+  web_fetch: Globe,
  knowledge_query: FileText,
  call_subordinate: Bot,
  load_skill: Puzzle,
@@ -51,6 +53,7 @@ const TOOL_LABELS: Record<string, string> = {
  memory_load: "Memory Load",
  memory_delete: "Memory Delete",
  search_web: "Web Search",
+  web_fetch: "Web Fetch",
  knowledge_query: "Knowledge Query",
  call_subordinate: "Subordinate Agent",
  load_skill: "Load Skill",
@@ -101,6 +104,11 @@ export function ToolOutput({ toolName, args, result }: ToolOutputProps) {
            &quot;{String(args.query)}&quot;
          </span>
        ) : null}
+        {toolName === "web_fetch" && args.url ? (
+          <span className="text-xs text-muted-foreground truncate">
+            {String(args.url)}
+          </span>
+        ) : null}
      </button>

      {expanded && (
--- a/src/lib/tools/search-engine.ts
+++ b/src/lib/tools/search-engine.ts
@@ -9,6 +9,69 @@ interface SearchResult {
 const MAX_RESULTS = 10;
 const DDG_HTML_ENDPOINT = "https://html.duckduckgo.com/html";
 const DDG_INSTANT_ENDPOINT = "https://api.duckduckgo.com/";
+const WEB_FETCH_TIMEOUT_MS = 20000;
+const WEB_FETCH_MAX_BYTES = 1_500_000;
+const WEB_FETCH_MAX_CHARS = 12000;
+
+export async function fetchWebPage(rawUrl: string): Promise<string> {
+  const url = normalizeFetchUrl(rawUrl);
+  const abortController = new AbortController();
+  const timeout = setTimeout(() => abortController.abort(), WEB_FETCH_TIMEOUT_MS);
+
+  try {
+    const response = await fetch(url.toString(), {
+      method: "GET",
+      redirect: "follow",
+      signal: abortController.signal,
+      headers: {
+        Accept:
+          "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,application/json;q=0.7,*/*;q=0.5",
+        "User-Agent":
+          "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
+      },
+    });
+
+    if (!response.ok) {
+      throw new Error(`HTTP ${response.status} ${response.statusText}`);
+    }
+
+    const contentType = (response.headers.get("content-type") || "").toLowerCase();
+    const finalUrl = response.url || url.toString();
+    const rawBody = await readResponseBodyLimited(response, WEB_FETCH_MAX_BYTES);
+
+    const parsed = parseFetchedBody(rawBody, contentType);
+    const content = parsed.content.trim();
+    const trimmed = content.slice(0, WEB_FETCH_MAX_CHARS);
+    const wasTrimmed = content.length > WEB_FETCH_MAX_CHARS;
+
+    if (!trimmed) {
+      return `Fetched URL: ${finalUrl}\nContent-Type: ${contentType || "unknown"}\nNo readable text content found.`;
+    }
+
+    const lines: string[] = [
+      `Fetched URL: ${finalUrl}`,
+      `Content-Type: ${contentType || "unknown"}`,
+    ];
+    if (parsed.title) {
+      lines.push(`Title: ${parsed.title}`);
+    }
+    lines.push("");
+    lines.push(trimmed);
+    if (wasTrimmed) {
+      lines.push("");
+      lines.push(`[truncated to ${WEB_FETCH_MAX_CHARS} chars]`);
+    }
+
+    return lines.join("\n");
+  } catch (error) {
+    if (error instanceof Error && error.name === "AbortError") {
+      return `Web fetch error: timed out after ${Math.round(WEB_FETCH_TIMEOUT_MS / 1000)} seconds`;
+    }
+    return `Web fetch error: ${error instanceof Error ? error.message : String(error)}`;
+  } finally {
+    clearTimeout(timeout);
+  }
+}

 /**
 * Search the web using configured provider
@@ -168,6 +231,119 @@ function stripHtml(text: string): string {
    .trim();
 }

+function normalizeFetchUrl(raw: string): URL {
+  const input = raw.trim();
+  if (!input) {
+    throw new Error("URL is required.");
+  }
+
+  let normalized = input;
+  if (!/^[a-z][a-z\d+\-.]*:\/\//i.test(normalized)) {
+    if (/^(www\.)/i.test(normalized) || /^[a-z0-9.-]+\.[a-z]{2,}(?:[/:?#]|$)/i.test(normalized)) {
+      normalized = `https://${normalized}`;
+    } else {
+      throw new Error("Invalid URL. Expected an absolute http(s) URL.");
+    }
+  }
+
+  let url: URL;
+  try {
+    url = new URL(normalized);
+  } catch {
+    throw new Error("Invalid URL format.");
+  }
+
+  if (url.protocol !== "http:" && url.protocol !== "https:") {
+    throw new Error("Only http(s) URLs are supported.");
+  }
+
+  return url;
+}
+
+async function readResponseBodyLimited(response: Response, maxBytes: number): Promise<string> {
+  const reader = response.body?.getReader();
+  if (!reader) {
+    return await response.text();
+  }
+
+  const decoder = new TextDecoder();
+  let total = 0;
+  let text = "";
+
+  while (true) {
+    const { done, value } = await reader.read();
+    if (done) break;
+    if (!value) continue;
+
+    total += value.byteLength;
+    if (total > maxBytes) {
+      await reader.cancel();
+      throw new Error(`Response too large. Limit: ${maxBytes} bytes.`);
+    }
+
+    text += decoder.decode(value, { stream: true });
+  }
+
+  text += decoder.decode();
+  return text;
+}
+
+function parseFetchedBody(
+  body: string,
+  contentType: string
+): { title?: string; content: string } {
+  if (contentType.includes("application/json")) {
+    try {
+      const parsed = JSON.parse(body) as unknown;
+      return { content: JSON.stringify(parsed, null, 2) };
+    } catch {
+      return { content: body };
+    }
+  }
+
+  if (contentType.includes("text/html") || looksLikeHtml(body)) {
+    const titleMatch = /<title[^>]*>([\s\S]*?)<\/title>/i.exec(body);
+    const title = titleMatch ? normalizeFetchedText(stripHtml(decodeHtmlEntities(titleMatch[1]))) : "";
+    return {
+      title: title || undefined,
+      content: htmlToText(body),
+    };
+  }
+
+  return { content: normalizeFetchedText(body) };
+}
+
+function looksLikeHtml(body: string): boolean {
+  const sample = body.slice(0, 1000).toLowerCase();
+  return sample.includes("<html") || sample.includes("<body") || sample.includes("<!doctype html");
+}
+
+function htmlToText(html: string): string {
+  const cleaned = html
+    .replace(/<!--[\s\S]*?-->/g, " ")
+    .replace(/<script[\s\S]*?<\/script>/gi, " ")
+    .replace(/<style[\s\S]*?<\/style>/gi, " ")
+    .replace(/<noscript[\s\S]*?<\/noscript>/gi, " ")
+    .replace(/<svg[\s\S]*?<\/svg>/gi, " ")
+    .replace(/<template[\s\S]*?<\/template>/gi, " ");
+
+  const withBreaks = cleaned.replace(
+    /<\/?(h[1-6]|p|div|section|article|header|footer|main|aside|nav|li|ul|ol|table|tr|td|th|blockquote|pre|br)[^>]*>/gi,
+    "\n"
+  );
+
+  return normalizeFetchedText(decodeHtmlEntities(stripHtml(withBreaks)));
+}
+
+function normalizeFetchedText(text: string): string {
+  return text
+    .replace(/\r/g, "")
+    .replace(/[ \t]+\n/g, "\n")
+    .replace(/\n{3,}/g, "\n\n")
+    .replace(/[ \t]{2,}/g, " ")
+    .trim();
+}
+
 function decodeDuckDuckGoUrl(rawUrl: string): string {
  try {
    const parsed = new URL(
--- a/src/lib/tools/tool.ts
+++ b/src/lib/tools/tool.ts
@@ -17,7 +17,7 @@ import {
 } from "@/lib/tools/code-execution";
 import { memorySave, memoryLoad, memoryDelete } from "@/lib/tools/memory-tools";
 import { knowledgeQuery } from "@/lib/tools/knowledge-query";
-import { searchWeb } from "@/lib/tools/search-engine";
+import { fetchWebPage, searchWeb } from "@/lib/tools/search-engine";
 import { callSubordinate } from "@/lib/tools/call-subordinate";
 import { createCronTool } from "@/lib/tools/cron-tool";
 import { installPackages } from "@/lib/tools/install-orchestrator";
@@ -1271,11 +1271,11 @@ export function createAgentTools(
  if (settings.search.enabled && settings.search.provider !== "none") {
    tools.search_web = tool({
      description:
-        "Search the internet for current information. Use this when you need up-to-date information, facts you're unsure about, or any web-based research.",
+        "Search the internet for current information. Use this for broad discovery and multiple sources. For a specific URL, use web_fetch.",
      inputSchema: z.object({
        query: z
          .string()
-          .describe("The search query"),
+          .describe("The search query (not a direct URL)"),
        limit: z
          .number()
          .default(5)
@@ -1287,6 +1287,21 @@ export function createAgentTools(
    });
  }

+  if (settings.search.enabled) {
+    tools.web_fetch = tool({
+      description:
+        "Fetch and read content from a specific web page URL. Use this when the user gives a direct link.",
+      inputSchema: z.object({
+        url: z
+          .string()
+          .describe("Absolute http(s) URL to fetch, for example https://example.com/article"),
+      }),
+      execute: async ({ url }) => {
+        return fetchWebPage(url);
+      },
+    });
+  }
+
  const telegramRuntime = getTelegramRuntimeData(context);
  if (telegramRuntime) {
    tools.telegram_send_file = tool({
--- a/src/prompts/tool-search_web.md
+++ b/src/prompts/tool-search_web.md
@@ -13,6 +13,7 @@ Search the internet for current information.
 ## Best Practices

 - Use specific, targeted search queries
+- Do not pass raw URLs here; use `web_fetch` for direct link reading
 - For technical queries, include technology names and versions
 - Review multiple results before drawing conclusions
 - Cite sources when presenting information from search results
--- a/src/prompts/tool-web_fetch.md
+++ b/src/prompts/tool-web_fetch.md
@@ -0,0 +1,16 @@
+# Web Fetch Tool
+
+Fetch a specific web page by URL and return readable page content.
+
+## When to Use
+
+- The user provides a direct link and asks to read/summarize it
+- You need content from one known page, not broad discovery
+- You must verify details from a specific source URL
+
+## Best Practices
+
+- Pass a full `http(s)` URL
+- Prefer `web_fetch` for direct links, `search_web` for discovery
+- If fetch fails, explain the error and ask for another link if needed
+- Quote or summarize only the relevant sections in your final response