## Summary
- Add null checks for `config.setting` in `get_chat_model()` and
`aget_chat_model()` to prevent `AttributeError` when memories are
disabled
- When the memory toggle creates a `UserConversationConfig` via
`get_or_create` with `setting=None`, accessing
`config.setting.price_tier` crashes — now falls through to the default
chat model instead
## Root Cause
The "Enable Memories" toggle PATCH endpoint uses `get_or_create` on
`UserConversationConfig`, which can create a config with `setting=None`.
Both `get_chat_model()` and `aget_chat_model()` then crash:
- For subscribed users: `if config:` passes but `return config.setting`
returns `None`, causing downstream crashes
- For non-subscribed users: `config.setting.price_tier` raises
`AttributeError` on `None`
## Fix
Change `if config:` → `if config and config.setting:` (subscribed path)
and add `and config.setting` guard before `.price_tier` access
(non-subscribed path), in both sync and async variants.
## Test plan
- [ ] Toggle memories off with no prior chat model configured — settings
page should still load
- [ ] Chat responses should use default model when setting is None
- [ ] Existing users with configured chat models should be unaffected
Fixes#1287
Signed-off-by: majiayu000 <1835304752@qq.com>
Starlette 1.0.0 removed the deprecated TemplateResponse signature
where `name` was the first positional arg and `request` was passed
inside `context`. The new signature requires `request` as the first
positional argument: TemplateResponse(request, name=...).
This caused a 500 error in production on web client endpoints with:
"Jinja2Templates.TemplateResponse() missing 1 required positional
argument: 'name'" (with older Starlette) or "'request'" (with 1.0.0).
Update all TemplateResponse calls in web_client.py to use the new
Starlette 1.0.0 signature: pass `request` as the first positional
arg and `name` as an explicit keyword argument.
Issue didn't trigger locally as uv is used locally and pip in docker
builds. These resolve dependencies including starletter version to
install differently. Locally 0.52.0 was installed while on production
starlette 1.0.0 was used. This is what caused the issue and the
mismatch in expectation
Add banner to home, chat, shared chat and settings pages for coverage.
Link to settings account section to export data and mention Khoj
self-host option in banner
- Add missing skipif decorator to test_create_automation
- Change skip condition from 'is None' to 'not' (falsy check) to
also handle empty string, which happens when GitHub secrets are
unavailable in fork PRs
Changes (4 files):
- pyproject.toml: authlib 1.6.6 → 1.6.9
- src/interface/web/package.json: dompurify ^3.2.6 → ^3.3.2, eslint-config-next 14.2.3 → 14.2.35
- documentation/package.json: @docusaurus/* → ^3.9.2, added serialize-javascript resolution
And regenerated lock files.
The only resolution override is serialize-javascript in documentation,
which is unavoidable since Docusaurus still pins old
copy-webpack-plugin and css-minimizer-webpack-plugin that depend on
serialize-javascript ^6.x.
## Summary
`src/khoj/processor/content/org_mode/orgnode.py:57` opens a file with
`open(filename, "r")` but never closes it. The file handle leaks for the
lifetime of the returned `Orgnode` list.
## Fix
Replaced bare `open()` with a `with` statement to ensure the file is
closed after `makelist()` finishes reading.
```python
# Before
def makelist_with_filepath(filename):
f = open(filename, "r")
return makelist(f, filename)
# After
def makelist_with_filepath(filename):
with open(filename, "r") as f:
return makelist(f, filename)
```
This is safe because `makelist()` fully consumes the file during the
call (building the Orgnode list from file contents), so the file handle
is no longer needed after it returns.
When PyMuPDFLoader fails to process an invalid PDF file, the exception
is caught but pdf_entry_by_pages is referenced before assignment,
causing an UnboundLocalError.
Initialized pdf_entry_by_pages to an empty list before the try block so
the return statement always has a valid value, even when an exception
occurs.
Verified with both invalid input (returns []) and valid PDFs (returns
extracted text).
Fixes#1289
Co-authored-by: BillionClaw <267901332+BillionClaw@users.noreply.github.com>
## Problem
When `ChatModel.friendly_name` is `None`, the `__str__` method returns
`None`, causing:
```
TypeError: __str__ returned non-string (type NoneType)
```
## Solution
Fall back to `name` field when `friendly_name` is `None`.
Related issue: #1251
Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>
## Summary
In `extract_from_webpage()`, the `content` parameter is unconditionally
overwritten to `None` on the line before the `is_none_or_empty(content)`
check. This means any pre-fetched content (e.g. text content already
retrieved by the Exa search engine) is always discarded, forcing an
unnecessary re-scrape of the webpage.
## Bug
```python
async def extract_from_webpage(
url: str,
subqueries: set[str] = None,
content: str = None, # <-- caller passes pre-fetched content
...
) -> Tuple[set[str], str, Union[None, str]]:
content = None # <-- BUG: immediately overwrites it
if is_none_or_empty(content): # always True
content = await scrape_webpage_with_fallback(url)
```
## Fix
Remove the `content = None` assignment so the passed-in content is used
when available, falling back to scraping only when needed.
This bug was introduced in a refactor and causes:
- Wasted API calls to web scrapers for pages whose content is already
available
- Increased latency for search results that include inline content (e.g.
Exa)
Signed-off-by: JiangNan <1394485448@qq.com>
## Summary
Fix a Python operator precedence bug in the `research()` function that
causes `current_iteration` to be set to a boolean instead of the actual
count of previous iterations.
## Bug
```python
if current_iteration := len(previous_iterations) > 0:
```
Python evaluates this as:
```python
if current_iteration := (len(previous_iterations) > 0): # assigns True or False
```
So `current_iteration` becomes `True` (1) or `False` (0) regardless of
how many previous iterations exist.
## Fix
```python
if (current_iteration := len(previous_iterations)) > 0:
```
With parentheses, `current_iteration` is correctly set to the count
(e.g. 4), and then compared to 0.
## Impact
When resuming research with previous iterations, the loop counter was
effectively reset to 1 instead of the true count. This allowed the
research loop to run significantly more iterations than `MAX_ITERATIONS`
intended, wasting compute and API calls.
Signed-off-by: JiangNan <1394485448@qq.com>
Remove redundant SDK version check in LauncherActivity since both
branches set the same orientation value. This simplifies the code
without changing behavior
Signed-off-by: Olexandr88 <radole1203@gmail.com>
## Summary
- Fixes AttributeError: 'str' object has no attribute 'iter_content' in
text_to_speech endpoint
- When `ELEVEN_LABS_API_KEY` is not configured, the function was
returning a string instead of a Response object
## Changes
- Introduced `TextToSpeechError` exception class in `text_to_speech.py`
- Changed `generate_text_to_speech` to raise exception instead of
returning error string
- Updated API endpoint to catch the exception and return HTTP 501 (Not
Implemented)
## Test plan
- [x] Code passes ruff lint check
- [ ] Manual testing with and without Eleven Labs API key configured
Fixes#1049
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Debanjum <debanjum@gmail.com>
Add a "Copy References" button to the references pane in the web app.
In ReferencePanel Component
- Add a "Copy References" button to the `ReferencePanel` component.
- Implement functionality to copy all references (notes, online, and
code) as a markdown bullet list.
- Update the `TeaserReferencesSection` component to include the "Copy
References" button.
- Show copied to clipboard indicator when references copied on button click
Closes#1021
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
- When you type in search modal, and matches the pattern `file:`, you
should see list of all files in vault and non-vault
- This list is filtered down as you type more letters
### Technical Details
- Added file filter mode (`isFileFilterMode` state) to filter search
results by specific files
- Updated `getSuggestions()` function to search file from vault and
non-vault via khoj backend.
- Updated the selection behavior to handle both file selection and
search result selection
Closes https://github.com/khoj-ai/khoj/issues/1025
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
### **feat(obsidian): Enhance Sync Experience with Progress Bars and Bug
Fixes**
This pull request significantly improves the content synchronization
experience for Obsidian users by fixing a critical bug and introducing
new UI elements for better feedback and monitoring.
The previous implementation could fail with `403 Forbidden` errors when
syncing a large number of files due to server-side rate limiting. This
update addresses that issue and provides users with clear, real-time
feedback on storage usage and sync progress.
---
### Key Changes
* **Improve Sync Robustness**
Refactor `updateContentIndex` to sync files prioritized by file type (md
> pdf > image) and batched by size (10Mb) and item limits (50 items).
This respects server rate limits and ensures that large vaults can be
indexed reliably without triggering `403` errors.
* **Show Cloud Storage Usage Bar**
A progress bar has been added to the settings page to display cloud
storage usage.
* **Total Limit**: The storage limit (**10 MB** for free, **500 MB** for
premium) is now reliably determined by the `is_active` flag returned
from the `/api/v1/user` endpoint, eliminating fragile client-side
heuristics.
* **Used Space**: The used space is calculated via a **client-side
estimation** of all files configured for synchronization. This provides
a clear and immediate indicator of the vault's storage footprint.
* **Show Real-time Sync Progress Bar**
When a manual sync is triggered via the "Force Sync" button, a progress
bar now appears, providing real-time feedback on the operation.
* It displays the number of files processed against the total number of
files to be indexed or deleted.
* This is implemented using a **callback mechanism** (`onProgress`) to
cleanly communicate progress from the sync logic (`utils.ts`) to the UI
(`settings.ts`) without coupling them.
* **Auto-refresh Storage Used After Sync**
The Cloud Storage Usage bar is now automatically refreshed upon the
completion of a "Force Sync". This ensures the user immediately sees the
updated storage estimation without needing to reopen the settings panel.
---
### Visuals
<img width="980" height="237" alt="image"
src="https://github.com/user-attachments/assets/2b3ce420-766b-476f-9fc0-c6b38c0226fb"
/>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
# Motivation
A major component of useful AI systems is adaptation to the user
context. This is a major reason why we'd enabled syncing knowledge
bases. The next steps in this direction is to dynamically update the
evolving state of the user as conversations take place across time and
topics. This allows for more personalized conversations and to maintain
context across conversations.
# Overview
This change introduces medium and long term memories in Khoj.
- The scope of a conversation can be thought of as short term memory.
- Medium term memory extends to the past week.
- Long term memory extends to anytime in the past, where a search query
results in a match.
# Details
- Enable user to view and manage agent generated memories from their
settings page
- Fully integrate the memory object into all downstream usage, from
image generation, notes extraction, online search, etc.
- Scope memory per agent. The default agent has access to memories
created by other agents as well.
- Enable users and admins to enable/disable Khoj's memory system
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Fix
- Ensure researcher and coder know to save files to /home/user dir
- Make E2B code executor check for generated files in /home/user
- Do not re-add file types already downloaded from /home/user
Issues
- E2B has a mismatch in default home_dir for run_code & list_dir cmds
So run_code was run with /root as home dir. And list_dir("~") was
checking under /home/user. This caused files written to /home/user
by code not to be discovered by the list_files step.
- Previously the researcher did not know that generated files should
be written to /home/user. So it could tell the coder to save files to
a different directory. Now the researcher knows where to save files to
show them to user as well.
- Add excludeFolders field to KhojSetting interface
- Rename 'Sync Folders' to 'Include Folders' for clarity
- Add 'Exclude Folders' UI section with folder picker
- Filter out excluded folders during content sync
- Show file counts when syncing (X of Y files)
- Prevent excluding root folder
This allows users to exclude specific directories (e.g., Inbox,
Highlights) from being indexed, while the existing Include Folders acts
as a whitelist.
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
This change had been removed in 9a8c707 to avoid overwrites. We now
use random filename for generated files to avoid overwrite from
subsequent runs.
Encourage model to write code that writes files in home folder to
capture with logical filenames.
Add khoj app landing page to khoj monorepo. Show in a more natural
place, when non logged in users open the khoj app home page.
Authenticated users still see logged in home page experience.
Delete old login html page. Login via popup on home is the single,
unified login experience.
Have docs mention khoj home url, no need to mention /login as login
popup shows on home page too
Why
--
- The models are now smart enough to usually understand which tools to
call in parallel and when.
- The LLM can request more work for each call to it, which is usually
the slowest step. This speeds up work by reearch agent. Even though
each tool is still executed in sequence (for now).
Old thought messages are dropped by default by the Anthropic API. This
change ensures old thoughts are kept. This should improve cache
utilization to reduce costs. And keeping old thoughts may also improve
model intelligence.
Khoj doesn't handle parallel tool calling right now. Models were told
to call tools in serial but it wasn't enforced via the Anthropic API.
So if model did try make parallel tool call, next response would fail
as it expects a tool result for the other tool calls. But khoj just
returned the first tool calls results. This mostly affected haiku due
to its lower fine-grained instruction following capabilities.
This changes enforces serial tool calls at the API layer to avoid this
issue altogether for claude models.
Logical error due to else conditional being not correctly indented.
This would result in error in using gemini 3 pro image when images are
in S3 bucket.
Overview
---
This change enables specifying fallback chat models for each task
type (fast, deep, default) and user type (free, paid).
Previously we did not fallback to other chat models if the chat model
assigned for a task failed.
Details
---
You can now specify multiple ServerChatSettings via the Admin Panel
with their usage priority. If the highest priority chat model for the
task, user type fails, the task is assigned to a lower priority chat
model configured for the current user and task type.
This change also reduces the retry attempts for openai chat actor
models from 3 to 2 as:
- multiple fallback server chat settings can now be created. So
reducing retries with same model reduces latency.
- 2 attempts is inline with retry attempts with other model
types (gemini, anthropic)
What
--
- Default to using fast model for most chat actors. Specifically in this
change we default to using fast model for doc, web search chat actors
- Only research chat director uses the deep chat model.
- Make using fast model by chat actors configurable via func argument
Code chat actor continues to use deep chat model and webpage reader
continues to use fast chat model.
Deep, fast chat models can be configured via ServerChatSettings on the
admin panel.
Why
--
Modern models are good enough at instruction following. So defaulting
most chat actor to use the fast model should improve chat speed with
acceptable response quality.
The option to fallback to research mode for higher quality
responses or deeper research always exists.
Avoids rendering flicker from attempt to render invalid image paths
referenced in message by khoj on web app.
The rendering flicker made it very annoying to interact with
conversations containing such messages on the web app.
The current change does lightweight validation of image url before
attempting to render it. If invalid image url detected, the image is
replaced with just its alt text.
- Use qwen style <think> tags to extract Minimax M2 model thoughts
- Use function to mark models that use in-stream thinking (including
Kimi K2 thinking)
- Server admin can add MCP servers via the admin panel
- Enabled MCP server tools are exposed to the research agent for use
- Use MCP library to standardize interactions with mcp servers
- Support SSE or Stdio as transport to interact with mcp servers
- Reuse session established to MCP servers across research iterations
Google and Firecrawl do not provide good web search descriptions (within
given latency requirements). Exa does better than them.
So prioritize using Exa over Google or Firecrawl when multiple web
search providers available.
Support using Exa for webpage reading. It seems much faster than
currently available providers.
Remove Jina as a webpage reader and remaining references to Jina from
code, docs. It was anyway slow and API may shut down soon (as it was
bought by Elastic).
Update docs to mention Exa for web search and webpage reading.
Issue
---
When agent personality/instructions are safe, we do not require the
safety agent to give a reason. The safety check agent was told this in
the prompt but it was not reflected in the json schema being used.
Latest openai library started throwing error if response doesn't match
requested json schema.
This broke creating/updating agents when using openai models as safety
agent.
Fix
---
Make reason field optional.
Also put send_message_to_model_wrapper in try/catch for more readable
error stacktrace.
Previously we only showed unsafe prompt errors to user when
creating/updating agent. Errors in name collision were not shown on
the web app ux.
This change ensures that such validation errors are bubbled up to the
user in the UX. So they can resolve the agent create/update error on
their end.
Count cached tokens, reasoning tokens for better cost estimates for
models served over an openai compatible api. Previously we didn't
include cached token or reasoning tokens in costing.
There are faster, better web search, webpage read providers. Only keep
reasonable quality online context providers.
Jina was good for self-hosting quickstart as it provided a free api
key without login. It does not provide that now. Its latencies are
pretty high vs other online context providers.
Groq API has stopped support minimum and maximum items fields from
tool schema. This unexpectedly broke using AI models served via Groq
API like Kimi K2 and GPT-OSS in research mode.
Improve typing of relevant fields
Previously eval run across modes would use different dataset shuffles.
This change enables a strict apples to apples perf comparison of the
different khoj modes across the same (random) subset of questions by
using a dataset seed per workflow run to sample questions
Instead of implicitly defaulting to assuming it is available as:
- For pip install searxng has to be explicitly setup to work
- For docker install we explicitly do set it up and set the
KHOJ_SEARXNG_URL env var already
Also check if Searxng URL is also unset before disable web search
tools now that it is required explicit enablement.
Using prompt cache key enables sticky routing to openai servers.
This increases probability of a chat actor hitting same server and
reusing cached prompts.
We use stable hash of first N characters to uniquely identify a chat
actor prompt
Webpage read is gated behind having a web search engine configured for
now. It can later be decoupled from web search and depend on whether
any web scrapers is configured.
New truncation logic return a new message list.
It does not update message list by reference/in place since 8a16f5a2a.
So truncation tests should run verification on the truncated chat
history returned by the truncation func instead of the original chat
history passed into the truncation func.
- It does not support strict mode for json schema, tool use
- It likes text content to be plain string, not nested in a dictionary
- Verified to work with gpt oss models on cerebras
Responses API is starting to get supported by other ai apis as well.
This change does preparatory improvements to ease moving to use
responses api with other ai apis.
Use the new, better named `supports_responses_api' method.
The method currently just maps to `is_openai_api'. It will add other
ai apis once support for using responses api with them is added.
- Fix identifying gpt-oss as openai reasoning model
- Drop unsupported stop param for openai reasoning models
- Drop the Formatting re-enabled logic for openai reasoing only models
We use responses api for openai models and latest openai models are
hybrid models, they don't seem to need this convoluted system
message to format response as markdown
is_automated_task check isn't required as automation cannot be created
via chat anymore.
conversation specific file_filters are extracted directly in document
search, so doesn't need to be passed down from chat api endpoint
The context building logic was nearly identical across all model
types.
This change extracts that logic into a shared function and calls it
once in the `agenerate_chat_response', the entrypoint to the converse
methods for all 3 model types.
Main differences handled are
- Gemini system prompt had additional verbosity instructions. Keep it
- Pass system messsage via chatml messages list to anthropic, gemini
models as well (like openai models) instead of passing it as
separate arg to chat_completion_* funcs.
The model specific message formatters for both already extract
system instruction from the messages list. So system messages wil be
automatically extracted from the chat_completion_* funcs to pass as
separate arg required by anthropic, gemini api libraries.
Overview
Enable improving speed and cost of chat by setting fast, deep think
models for intermediate steps and non user facing operations.
Details
- Allow decoupling default chat models from models used for
intermediate steps by setting server chat settings on admin panel
- Use deep think models for most intermediate steps like tool
selection, subquery construction etc. in default and research mode
- Use fast think models for webpage read, chat title setting etc.
Faster webpage read should improve conversation latency
What
Explicit selection of notes tool/conversation command by agent is
required now.
Why
- Newer models are good at deciding when to look up notes
- Modern khoj is less of a notes only chat to search notes by default
generated_files wasn't being set (anymore?). But it was being passed
around through for chat context and being saved to db.
Also reduce variables used to set mermaid diagram description
- Process chat history in default order instead of processing it in
reverse. Improve legibility of context construction for minor
performance hit in dropping message from front of list.
- Handle multiple system messages by collating them into list
- Remove logic to drop system role for gemma-2, o1 models. Better to
make code more readable than support old models.
Use seed to stabilize image change consistency across turns when
- KHOJ_LLM_SEED env var is set
- Using Image models via Replicate
OpenAI, Google do not support image seed
Inferred queries is stored with underscore in db but aliased with - in memory.
This conversation.messages logic was broken, so inferred queries field
of chat message history was getting ignored.
This change fixes that issue and improve previous image generation
description for better context for subsequent image generation attempts.
Overview
- Khoj references files it used in its response as markdown links.
For example [1](file://path/to/file.txt#line=121)
- Previously these file links were just shown as raw text
- This change renders khoj's inline file references as a proper links
and shows file content preview (around specified line if deeplink)
on hover or click in the web app
Details
- Render inline file references as links in chat message on web app.
Previously references like [1](file://path/to/file.txt#line=120)
would be shown as plain text. Now they are rendered as links
- Preview file content of referenced files on click or hover.
If reference uses a deeplink with line number, the file content
around that line is shown on hover, click. Click allows viewing file
preview on mobile, unlike hover. Hover is easier with mouse.
Fixes
- Fix to allow khoj to delete content in obsidian write mode
- Do not throw error when no edit blocks in write mode on obsidian
- Limit retries to fix invalid edit blocks in obsidian write mode
Improvements
- Only show 3 recent files as context in obsidian file read, write mode
- Persist open file access mode setting across restarts in obsidian
- Make khoj obsidian keyboard shortcuts toggle voice chat, chat history
- Do not show <SYSTEM> instructions in chat session title on obsidian
Closes#1209
In obsidian we have a hacky system instruction being passed in read,
write file access modes. This shouldn't be shown in chat sessions list
during view or edit. It is an internal implementation detail.
Previously hitting voice chat keybinding would just start voice chat,
not end it and just open chat history and not close it.
This is unintuitive and different from the equivalent button click
behaviors.
Fix toggles voice chat on/off and shows/hides chat history when hit
Ctrl+Alt+V, Ctrl+Alt+O keybindings in khoj obsidian chat view
Better support for GPT OSS
- Tune reasoning effort, temp, top_p for gpt-oss models
- Extract thoughts of openai style models like gpt-oss from api response
Tool use improvements
- Improve view file, code tool prompts. Format other research tool prompts
- Truncate long words in code tool stdout, stderr for context efficiency
- Use instruction instead of query as code tool argument
- Simplify view file tool. Limit viewing upto 50 lines at a time
- Make regex search tool results look more like grep results
- Update khoj personality prompts with better style, capability guide
Web UX improvements
- Wrap long words in train of thought shown on web app
- Do not overwrite charts created in previous code tool use during research
- Update web UX when server side error or hit stop + no task running
Fix AI API Usage
- Use subscriber type specific context window to generate response
- Fix max thinking budget for gemini models to generate final response
- Fix passing temp kwarg to non-streaming openai completion endpoint
- Handle unset reasoning, response chunk from openai api while streaming
- Fix using non-reasoning openai model via responses API
- Fix to calculate usage from openai api streaming completion
- Add more color to personality and communication style
- Split prompt into capabilities and style sections
- Remove directives in personality meant for older, less smart models.
- Discourage model from unnecessarily sharing code snippets in final
response unless explicitly requested.
- Ack websocket interrupt even when no task running
Otherwise chat UX isn't updated to indicate query has stopped
processing for this edge case
- Mark chat request as not being procesed on server side error
It is already being passed in model_kwargs, so not required to be
passed explicitly as well.
This code path isn't being used currently, but better to fix for
if/when it is used
- Set the agent of the current conversation in the agent dropdown when a new conversation with a non-default agent is initialized. This was unset previously.
- Pass the current selected agent in the dropdown when creating new chat
- Correctly select the `khoj-header-agent-select' element
- A regression had stopped indicating to user that the websocket
connection had broken. Now the interrupt has some visual indication.
- Websocket disconnects from client didn't trigger the partial
research to be saved. Now we use an interrupt signal to save partial
research before closing task.
Although we had handling in place for retrying after gemini suggested
backoff on hitting rate limits. The actual rate limit exception was
getting caught to render friendly message, so retry wasn't actually
getting triggered.
This change allows both
- Retry on hitting 429 rate limit exceptions
- Return friendly message if rate limit triggered retry eventually fails
Related:
- Changes to retry with gemini suggested backoff time in 0f953f9
Issue: chosen_io variable was accessed before initialization when
ValueError was raise.
Fix: Set chosen_io to fallback values on failure to select default
chat tools
Make researcher handle ambiguous requests better by working with
reasonable assumptions (clearly told to user in response) instead of
burdering user with clarification requests.
Fix portions of the researcher prompt that had gone stale since moving
to tool use and making researcher more task (vs q&a) oriented
Previously the researcher was passing the whole code to execute in its
queries to the tool AI instead of asking it to write the code and
limiting its query to a natural language request (with required data).
The division of responsibility should help researcher just worry about
constructing a request with all the required details instead of also
worrying about writing correct code.
Their tool call response may not strictly follow expected response
format. Let researcher handle incorrect arguments to code tool (i.e
triggers type error)
What
- Get reasoning of openai reasoning models from responses api for sho
- Improves cache hits and reasoning reuse for iterative agents like
research mode.
This should improve speed, quality, cost and transparency of using
openai reasoning models.
More cache hits and better reasoning as reasoning blocks are included
while model is researching (reasoning intersperse with tool calls)
when using the responses api.
Previously line start, end anchors would just work if the whole file
started or ended with the regex pattern rather than matching by line.
Fix it to work like a standard grep tool and match by line start, end.
Reduce usage of boolean operators like "hello OR bye OR see you" which
doesn't work and reduces search quality. They're trying to stuff the
search query with multiple different queries.
## Overview
Speed up app install and development using a faster, modern development
toolchain
## Details
### Major
- Use [uv](https://docs.astral.sh/uv/) for faster server install (vs
pip)
- Use [bun](https://bun.sh/) for faster web app install (vs yarn)
- Use [ruff](https://docs.astral.sh/ruff/) for faster formatting of
server code (vs black, isort)
- Fix devcontainer builds. See if uv and bun can speed up server and
client installs
### Minor
- Format web app with prettier and server with ruff. This is most of the
file changes in this PR.
- Simplify copying web app built files in pypi workflow to make it less
flaky.
- CI runners don't have GPUs
- Pytorch related Nvidia cuda packages are not required for testing,
evals or pre-commit checks.
- Avoiding these massive downloads should speed up workflow run.
### Overview
Make server leaner to increase development speed.
Remove old indexing code and the native offline chat which was hard to
maintain.
- The native offline chat module was written when the local ai model api
ecosystem wasn't mature. Now it is. Reuse that.
- Offline chat requires GPU for usable speeds. Decoupling offline chat
from Khoj server is the recommended way to go for practical inference
speeds (e.g Ollama on machine, Khoj in docker etc.)
### Details
- Drop old code to index files on server filesystem. Clean cli, init
paths.
- Drop native offline chat support with llama-cpp-python.
Use established local ai APIs like Llama.cpp Server, Ollama, vLLM etc.
- Drop old pre 1.0 khoj config migration scripts
- Update test setup to index test data after old indexing code removed.
- Delete tests testing deprecated server side indexing flows
- Delete `Local(Plaintext|Org|Markdown|Pdf)Config' methods, files and
references in tests
- Index test data via new helper method, `get_index_files'
- It is modelled after the old `get_org_files' variants in main app
- It passes the test data in required format to `configure_content'
Allows maintaining the more realistic tests from before while
using new indexing mechanism (rather than the deprecated server
side indexing mechanism
This stale code was originally used to index files on server file
system directly by server. We currently push files to sync via API.
Server side syncing of remote content like Github and Notion is still
supported. But old, unused code for server side sync of files on
server fs is being cleaned out.
New --log-file cli args allows specifying where khoj server should
store logs on fs. This replaces the --config-file cli arg that was
only being used as a proxy for deciding where to store the log file.
- TODO
- Tests are broken. They were relying on the server side content
syncing for test setup
It is recommended to chat with open-source models by running an
open-source server like Ollama, Llama.cpp on your GPU powered machine
or use a commercial provider of open-source models like DeepInfra or
OpenRouter.
These chat model serving options provide a mature Openai compatible
API that already works with Khoj.
Directly using offline chat models only worked reasonably with pip
install on a machine with GPU. Docker setup of khoj had trouble with
accessing GPU. And without GPU access offline chat is too slow.
Deprecating support for an offline chat provider directly from within
Khoj will reduce code complexity and increase developement velocity.
Offline models are subsumed to use existing Openai ai model provider.
Clarify that the tool AI will perform a maximum of X sub-queries for
each query passed to it by the manager AI.
Avoids the manager AI from trying to directly pass a list of queries
to the search tool AI. It should just pass just a single query.
Send larger thought chunks to improve streaming efficiency and
reduce rendering load on web client.
This rendering load was most evident when using high throughput
models or low compute clients.
The server side message buffering should result in fewer re-renders,
faster streaming and lower compute load on client.
Related commit to buffer message content in fc99f8b37
- Ask both manager and code gen AI to not run or write
unsafe code for some safety improvement (over code exec in sandbox).
- Disallow custom agent prompts instructing unsafe code gen
## PR Summary
This PR resolves the deprecation warnings of the Pydantic library, which
you can find in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/16528997676/job/46749452047#step:9:142):
```python
PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
```
Save to conversation in normal flow should only be done if
interrupt wasn't triggered.
Saving conversations on interrupt is handled completely by the
disconnect monitor since the improvements to interrupt.
This abort is handled correctly for steps before final response. But
not if interrupt occurs while final response is being sent. This
changes checks for cancellation after final response send attempt and
avoids duplicate chat turn save.
- Extract llm thoughts from more openai compatible ai api providers
like llama.cpp server vllm and litellm.
- Try structured thought extraction by default
- Try in-stream thought extraction for specific model families like
qwen and deepseek.
- Show thoughts with tool use. For intermediate steps like research
mode from openai compatible models
Some consensus on thought in model response is being reached with
using deepseek style thoughts in structured response (via
"reasoning_content" field) or qwen style thoughts in main
response (i.e <think></think> tags).
Default to try deepseek style structured thought extraction. So the
previous default stream processor isn't required.
A previous regression resulted in the start llm response event being
sent with every (non-thought) message chunk. It should only be sent
once after thoughts and before first normal message chunk is streamed.
Regression probably introduced with changes to stream thoughts.
This should fix the chat streaming latency logs.
Send larger message chunks to improve streaming efficiency and
reduce rendering load on web client.
This rendering load was most evident when using high throughput
models, low compute clients and message with images. As message
content was rerendered on every token sent to the web app.
The server side message buffering should result in fewer re-renders
and lower compute load on client.
Fixes calling websocket rate limiter from async chat_ws method.
Not sure why the issue did not trigger in local setups. Maybe has to
do with gunicorn vs uvicorn / multi-workers setup in prod vs local.
- Add a websocket api endpoint for chat. Reuse most of the existing chat
logic.
- Communicate from web app using the websocket chat api endpoint.
- Pass interrupt messages using websocket to guide research, operator
trajectory
Previously we were using the abort and send new POST /api/chat
mechanism.
This didn't scale well to multi-worker setups as a different worker
could pick up the new interrupt message request.
Using websocket to send messages in the middle of long running tasks
should work more naturally.
- Chat history is retrieved and updated with new messages just before
write. This is to reduce chance of message loss due to conflicting
writes making last to save to conversation win conflict.
- This was problematic artifact of old code. Removing it should reduce
conflict surface area.
- Interrupts and live chat could hit this issue due to different reasons
- Use websocket library to handle setup, reconnection from web app
Use react-use-websocket library to handle websocket connection and
reconnection logic. Previously connection wasn't re-established on
disconnects.
- Send interrupt messages with ws to update research, operator trajectory
Previously we were using the abort and send new POST /api/chat
mechanism.
But now we can use the websocket's bi-directional messaging capability
to send users messages in the middle of a research, operator run.
This change should
1. Allow for a faster, more interactive interruption to shift the
research direction without breaking the conversation flow. As
previously we were using the DB to communicate interrupts across
workers, this would take time and feel sluggish on the UX.
2. Be a more robust interrupt mechanism that'll work in multi worker
setups. As same worker is interacted with to send interrupt messages
instead of potentially new worker receiving the POST /api/chat with
the interrupt user message.
On the server we're using an asyncio Queue to pass messages down from
websocket api to researcher via event generator. This can be extended
to pass to other iterative agents like operator.
Fix using research tool names instead of slash command tool names
(exposed to user) in research mode conversation history construction.
Map agent input tools to relevant research tools. Previously
using agents with a limited set of tools in research mode reduces
tools available to agent in research mode.
Fix checks to skip tools if not configured.
The chat model friendly name field was introduced in a8c47a70f. But
we weren't setting the friendly name for ollama models, which get
automatically loaded on first run.
This broke setting chat model options, server admin settings and
creating new chat pages (at least) as they display the chat model's
friendly name.
This change ensures the friendly name for auto loaded chat models is
set to resolve these issues. We also add a null ref check to web app
model selector as an additional safeguard to prevent new chat page
crash due to missing friendly name going forward.
Resolves#1208
We'd reversed the formatting of openai messages to drop invalid
messages without affecting the other messages being appended . But we
need to reverse the final formatted list to return in the right order.
Previously
- message with invalid content were getting dropped in normal order
which would change the item index being iterated for gemini and
anthropic models
- messages with empty content weren't getting dropped for openai
compatible api models. While openai api is resilient to this, it's
better to drop these invalid messages as other openai compatible
APIs may not handle this.
We see messages with empty or no content when chat gets interrupted
due to disconnections, interrupt messages or explicit aborts by user.
This changes should now drop invalid messages and not mess formatting
of the other messages in a conversation. It should allow continuing
interrupted conversations with any ai model.
Inspired by my previous turnstyle ux explorations.
But basically user message becomes section title and khoj message
becomes section body with the timestamp being used a section title,
body divider.
Previous organic results enumerator only handled the scenario where
organic key wasn't present in online search results.
It did not handle the case where there were no organic online search
results.
- Methods calling send_message_to_model_wrapper_sync handn't been
update to handle the function returning new ResponseWithThought
- Store, load request.url to DB as, from string to avoid serialization
issues
For files not synced after the previous release, context uri is unset.
This results in failure to save chat messages that retrieve documents
as the uri field cannot be unset so pre save validation fails.
We'd use a db migration to handle this but this is a quick mitigation
for now.
For files not synced after the previous release, context uri is unset.
This results in failure to save chat messages that retrieve documents
as the uri field cannot be unset so pre save validation fails
It'll work similar to the master branch but with pre-1x and latest-1x
tagged series of docker images.
This should ease deployment changes from 1.x vs 2.x series
## Overview
Show deep link URI and raw document context to provide deeper, richer
context to Khoj. This should allow it better combine semantic search
with other new document retrieval tools like line range based file
viewer and regex tools added in #1205
## Details
- Attach line number based deeplinks to each indexed document entry
Document URI follows URL fragment based schema of form
`file:///path/to/file.txt#line=123`
- Show raw indexed document entries with deep links to LLM when it uses
the semantic search tool
- Reduce structural changes to raw org-mode entries for easier deep
linking.
Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>
Compute line number during (recursive) markdown entry chunking.
Test line number in URI maps to line number of chunk in actual md file.
This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.
Grep tool already passed matching line number. This change passes
line number in URIs of markdown entries matched by the semantic search
tool.
Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>
Compute line number during (recursive) org-mode entry chunking.
Thoroughly test line number in URI maps to line number of chunk in
actual org mode file.
This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.
Grep tool already passed matching line number. This change passes
line number in URIs of org entries matched by the semantic search tool
Only embedding models see, operate on compiled text.
LLMs should see raw entry to improve combining it with other document
traversal tools for better regex and line matching.
Users see raw entry for better matching with their actual notes.
Reduce structural changes to raw entry allows better deep-linking and
re-annotation. Currently done via line number in new uri field.
Only add properties drawer to raw entry if entry has properties
Previously line and source properties were inserted into raw entries.
This isn't done anymore. Line, source are deprecated for use in khoj.el.
## Why
Move to function calling paradigm to give models tool call -> tool
result in formats they're fine-tuned to understand. Previously we were
giving them results in our specific format (as function calling paradigm
wasn't well-established yet).
And improve prompt cache hits by caching tool definitions.
This is a **breaking change**. AI Models and APIs that do not support
function calling will not work with Khoj in research mode. Function
calling is supported by:
- Standard commercial AI Models and APIs like Anthropic, Gemini, OpenAI,
OpenRouter
- Standard open-source AI APIs like llama.cpp server, Ollama
- Standard open source models like Qwen, DeepSeek, Gemma, Llama, Mistral
## What
### Use Function Calling for Tool Use
- Add Function Calling support to Anthropic, Gemini, OpenAI AI Model
APIs
- Move Existing Research Mode Tools to Use Function Calling
### Get More Comprehensive Results from your Knowledge Base (KB)
- Give Research Agent better Document Retrieval Tools
- Add grep files tool to enable researcher to find documents by regex
- Add list files tool to enable researcher to find documents by path
- Add file viewer tool to enable researcher to read documents
### Miscellaneous
- Improve Research Prompt, Truncation, Retry and Caching
- Show reasoning model thoughts in Khoj train of thought for
intermediate steps as well
- Cache last anthropic message. Given research mode now uses function
calling paradigm and not the old research mode structure.
- Cache tool definitions passed to anthropic models
- Stop dropping first message if by assistant as seems like Anthropic
API doesn't complain about it any more.
- Drop tool result when tool call is truncated as invalid state
- Do not truncate tool use message content, just drop the whole tool
use message.
AI model APIs need tool use assistant message content in specific
form (e.g with thinking etc.). So dropping content items breaks
expected tool use message content format.
Handle tool use scenarios where iteration query isn't set for retry
- Deepcopy messages before formatting message for Anthropic to allow
idempotency so retry on failure behaves as expected
- Handle failed calls to pick next tools to pass failure warning and
continue next research iteration. Previously if API call to pick
next failed, the research run would crash
- Add null response check for when Gemini models fail to respond
Previously if anthropic models were using tools, the models text
response accompanying the tool use wouldn't be shown as they were
overwritten in aggregated response with the tool call json.
This changes appends the text response to the thoughts portion on tool
use to still show model's thinking. Thinking and text response are
delineated by italics vs normal text for such cases.
This should avoid the need to reformat the Khoj standardized tool call
for cache hits and satisfying ai model api requirements.
Previously multi-turn tool use calls to anthropic reasoning models
would fail as needed their thoughts to be passed back. Other AI model
providers can have other requirements.
Passing back the raw response as is should satisfy the default case.
Tracking raw response should make it easy to apply any formatting
required before sending previous response back, if any ai model
provider requires that.
Details
---
- Raw response content is passed back in ResponseWithThoughts.
- Research iteration stores this and puts it into model response
ChatMessageModel when constructing iteration history when it is
present.
Fallback to using parsed tool call when raw response isn't present.
- No need to format tool call messages for anthropic models as we're
passing the raw response as is.
Researcher is expanding into accomplish task behavior, especially with
tool use from the previous collect information to answer user query
behavior.
Update the researcher's system prompt to reflect the new objective better.
Encourage model to not stop working on task until achieve objective
Earlier khoj could technically only answer questions existential
questions, i.e question that would terminate once any relevant note to
answer that question was found.
This change enables khoj to answer universal questions, i.e questions
that require searching through all notes or finding all instances.
It enables more thorough retrieval from user's knowledge base by
combining semantic search, regex search, view and list files tools.
For more development details including motivation, see live coding
session 1.1 at https://www.youtube.com/live/-2s_qi4hd2k
Allow getting a map of user's knowledge base under specified path.
This enables more thorough retrieval from user's knowledge base by
combining search, view and list files tools.
Why
---
Previously researcher had a uniform response schema to pick next tool,
scratchpad, query and tool. This didn't allow choosing different
arguments for the different tools being called. And the tool call,
result format passed by khoj was custom and static across all LLMs.
Passing the tools and their schemas directly to llm when picking next
tool allows passing multiple, tool specific arguments for llm to
select. For example, model can choose webpage urls to read or image
gen aspect ratio (apart from tool query) to pass to the specific tool.
Using the LLM tool calling paradigm allows model to see tool call,
tool result in a format that it understands best.
Using standard tool calling paradigm also allows for incorporating
community builts tools more easily via MCP servers, clients tools,
native llm api tools etc.
What
---
- Return ResponseWithThought from completion_with_backoff ai model
provider methods
- Show reasoning model thoughts in research mode train of thought.
For non-reasoning models do not show researcher train of thought.
As non-reasoning models don't (by default) think before selecing
tool. Showing tool call is lame and resembles tool's action shown in
next step.
- Store tool calls in standardized format.
- Specify tool schemas in tool for research llm definitions as well.
- Transform tool calls, tool results to standardized form for use
within khoj. Manage the following tool call, result transformations:
- Model provider tool_call -> standardized tool call
- Standardized tool call, result -> model specific tool call, result
- Make researcher choose webpages urls to read as well for the webpage
tool. Previously it would just decide the query but let the webpage
reader infer the query url(s). But researcher has better context on
which webpages it wants to have read to answer their query.
This should eliminate the webpage reader deciding urls to read step
and speed up webpage read tool use.
Handle unset response thoughts. Useful when retry on failed request
Previously resulted in unbound local variable response_thoughts error
Previously these models could use response schema but not tools use
capabilities provided by these AI model APIs.
This change allows chat actors to use the function calling feature to
specify which tools the LLM by these providers can call.
This should help simplify tool definition and structure context in
forms that these LLMs natively understand.
(i.e in tool_call - tool_result ~chatml format).
Gemini models, especially flash models, seems to have a tendency to go
into long, repetitive output tokens loop. Unsure why.
Tune temp, top_p as gemini api doesn't seem to allow setting frequency
or presence penalty, at least for reasoning models. Those would have
been a more direct mechanism to avoid model getting stuck in a loop.
This should help prevent partial updates to agent. Especially useful
for agent's with large knowledge bases being updated. Failing the call
should raise an exception. This will allow your to retry save instead
of losing your previous agent changes or saving only partial.
Running safety check isn't required if the agent persona wasn't
updated this time around as it would have passed safety check
previously.
This should speed up editing agents when agent persona isn't updated.
It seems to me that it would be useful to be able to be explicit about
where the embedded database should live - as well as where it _does_
live (via the info log), when not specifying.
Extract questions has chat history in prompt and in actual chat history.
Only pass in prompt for now. Later update prompts to pass chat history
in chat messages list for better truncation flexibility.
1. Due to the interaction of two changes:
- dedupe by corpus_id, where corpus_id tracks logical content blocks
like files, org/md headings.
- return compiled, not logical blocks, where compiled track smaller
content chunks that fit within search model, llm context windows.
When combined they showed only 1 hit compiled chunk per logical
block. Even if multiple chunks match within a logical content block.
Fix is to either dedupe by compiled text or to return deduped
logical content blocks (by corpus_id) corresponding to matched
compiled chunks. This commit fixes it by the first method.
2. Due to inferred query, search results zip which resulted in a
single search result being returned per query!
This silently cut down matching search results and went undetected.
The tailwing theme spacing of the scroll area surrounding chat history
on large screens was what was causing the large gap between chat input
box and chat history on some screen layouts.
This change reduces the spacing to a more acceptable level.
Previously summarizedResult would be unset when a tool call failed.
This caused research to fail due to ChatMessageModel failures when
constructing tool chat histories and would have caused similar errors
in other constructed chat histories.
Putting a failed iteration message in the summary prevents that while
letting the research agent continue its research.
All web search providers, like Jina/Searxng?, do not return a text
snippet. Making snippet optional allows processing search results by
these web search providers, without hitting validation errors.
This bug was introduced in 05d4e19cb, version 1.42.2, during migration
to save deeply typed ChatMessageModel. As the ChatMessageModel did
not use the right field name for organic results (since the start).
Previously it did not matter as it was storing to DB irrespective but
now the mapping of dictionary to ChatMessageModel drops that field
before save to conversation in DB.
This was resulting in organic context being lost on page reload and
only being shown on first response.
Not sure why but it some cases when interacting with o3 (which needs
non-streaming) the stream_options seems to be set.
Cannot reproduce but hopefully dropping the stream_options explicitly
should resolve this issue.
Related 985a98214
Older package (like 1.84.0) seem to always pass reasoning_effort
argument to openai api, which now seems to be throwing unexpected
request argument error when used with non-reasoning models (like
4o-mini).
There had been a regression that made all agents display the default
chat model instead of the actual chat model associated with the agent.
This change resolves that issue by prioritizing agent specific chat
model from DB (over user or server chat model).
- Fix code context data type for validation on server. This would
prevent the chat message from being written to history
- Handle null code results on web app
We now pass deeply typed chat messages throughout the application to
construct tool specific chat history views since 05d4e19cb.
This ChatMessageModel didn't allow intent.query to be unset. But
interrupted research iteration history can have unset query. This
changes allows makes intent.query optional.
It also uses message by user entry to populate user message in tool
chat history views. Using query from khoj intent was an earlier
shortcut used to not have to deal with message by user. But that
doesn't scale to current scenario where turns are not always required
to have a single user, assistant message pair.
Specifically a chat history can now contain multiple user messages
followed by a single khoj message. The new change constructs a chat
history that handles this scenario naturally and makes the code more
readable.
Also now only previous research iterations that completed are
populated. Else they do not serve much purpose.
Clean non useful slash commands to make chat API more maintanable.
- App version, chat model via /help is visible in other parts of the
UX. Asking help questions with site:docs.khoj.dev filter isn't used
or known to folks
- /summarize is esoterically tuned. Should be rewritten if add back.
It wasn't being used by /research already
- Automations can be configured via UX. It wasn't being shown in UX
already
The chat actor (and director) tests haven't been looked into in a long
while. They'd gone stale in how they were calling thee functions. And
what was required to run them. Now the online chat actor tests work
again.
Using model specific extract questions was an artifact from older
times, with less guidable models.
New changes collate and reuse logic
- Rely on send_message_to_model_wrapper for model specific formatting.
- Use same prompt, context for all LLMs as can handle prompt variation.
- Use response schema enforcer to ensure response consistency across models.
Extract questions (because of its age) was the only tool directly within
each provider code. Put it into helpers to have all the (mini) tools
in one place.
- Rename GET /api/automations to GET /api/automation
- Rename POST /api/trigger/automation to POST /api/automation/trigger
- Update calls to the automations API from the web app.
- Add context based on information provided rather than conversation
commands. Let caller handle passing appropriate context to ai
provider converse methods
Increase timeout to 180 (from 120s previous) and graceful timeout to
90 (from 30s default) to reduce
Increase default gunicorn workers and make it configurable to better
utilize (v)CPUs. This is manually configured (instead of using
multiprocessing.cpu_count()) as VMs/containers may read cpu count of
host machine instead of their VMs/containers.
The chat dictionary is an artifact from earlier non-db chat history
storage. We've been ensuring new chat messages have valid type before
being written to DB for more than 6 months now.
Move to using the deeply typed chat history helps avoids null refs,
makes code more readable and easier to reason about.
Next Steps:
The current update entangles chat_history written to DB
with any virtual chat history message generated for intermediate
steps. The chat message type written to DB should be decoupled from
type that can be passed to AI model APIs (maybe?).
For now we've made the ChatMessage.message type looser to allow
for list[dict] type (apart from string). But later maybe a good idea
to decouple the chat_history recieved by send_message_to_model from
the chat_history saved to DB (which can then have its stricter type check)
- Converts response schema into a anthropic tool call definition.
- Works with simple enums without needing to rely on $defs, $refs as
unsupported by Anthropic API
- Do not force specific tool use as not supported with deep thought
This puts anthropic models on parity with openai, gemini models for
response schema following. Reduces need for complex json response
parsing on khoj end.
There seems to be a more standard mechanism of specifying launch.json
params for devcontainers. Previous mechanism to write launch.json to
.vscode/launch.json in post creation step does not work.
Improve default launch.json to include khoj admin username, password
with placeholder values to get started with local development faster.
Define dockerfile for devcontainer to pre-built server, web app
dependencies during dev container image creation stage. So install on
dev container startup is sped up as no need to install dependencies.
## Description
This PR introduces significant improvements to the Obsidian Khoj
plugin's chat interface and editing capabilities, enhancing the overall
user experience and content management functionality.
## Features
### 🔍 Enhanced Communication Mode
I've implemented radio buttons below the chat window for easier
communication mode selection. The modes are now displayed as emojis in
the conversation for a cleaner interface, replacing the previous
text-based system (e.g., /default, /research). I've also documented the
search mode functionality in the help command.
#### Screenshots
- Radio buttons for mode selection
- Emoji display in conversations

### 💬 Revamped Message Interaction
I've redesigned the message buttons with improved spacing and color
coding for better visual differentiation. The new edit button allows
quick message modifications - clicking it removes the conversation up to
that point and copies the message to the input field for easy editing or
retrying questions.
#### Screenshots
- New message styling and color scheme

- Edit button functionality

### 🤖 Advanced Agent Selection System
I've added a new chat creation button with agent selection capability.
Users can now choose from their available agents when starting a new
chat. While agents can't be switched mid-conversation to maintain
context, users can easily start fresh conversations with different
agents.
#### Screenshots
- Agent selection dropdown

### 👁️ Real-Time Context Awareness
I've added a button that gives Khoj access to read Obsidian opened tabs.
This allows Khoj to read open notes and track changes in real-time,
maintaining a history of previous versions to provide more contextual
assistance.
#### Screenshots
- Window access toggle

### ✏️ Smart Document Editing
Inspired by Cursor IDE's intelligent editing and ChatGPT's Canvas
functionality, I've implemented a first version of a content creation
system we've been discussing. Using a JSON-based modification system,
Khoj can now make precise changes to specific parts of files, with
changes previewed in yellow highlighting before application.
Modification code blocks are neatly organized in collapsible sections
with clear action summaries. While this is just a first step, it's
working remarkably well and I have several ideas for expanding this
functionality to make Khoj an even more powerful content creation
assistant.
#### Screenshots
- JSON modification preview
- Change highlighting system
- Collapsible code blocks
- Accept/cancel controls

---------
Co-authored-by: Debanjum <debanjum@gmail.com>
## Summary
- Enable Khoj to operate computers: Add experimental computer operator
functionality that allows Khoj to interact with desktop environments,
browsers, and terminals to accomplish complex tasks
- Multi-environment support: Implement computer environments with GUI,
file system, and terminal access. Can control host computer or Docker
container computer
## Key Features
### Computer Operation Capabilities
- Desktop control (screenshots, clicking, typing, keyboard shortcuts)
- File editing and management
- Terminal/bash command execution
- Web browser automation
- Visual feedback via train-of-thought video playback
### Infrastructure & Architecture:
- Docker container (ghcr.io/khoj-ai/computer:latest) with Ubuntu 24.04,
XFCE desktop, VNC access
- Local computer environment support with pyautogui
- Modular operator agent system supporting multiple environment types
- Trajectory compression and context management for long-running tasks
### Model Integration:
- Anthropic models only (Claude Sonnet 4, Claude 3.7 Sonnet, Claude Opus
4)
- OpenAI and binary operator agents temporarily disabled
- Enhanced caching and context management for operator conversations
### User Experience:
- `/operator` command or just ask Khoj to use operator tool to invoke
computer operation
- Integrate with research mode for extended 30+ minute task execution
- Video of computer operation in train of thought for transparency
### Configuration
- Set `KHOJ_OPERATOR_ENABLED=True` in `docker-compose.yml`
- Requires Anthropic API key
- Computer container runs on port 5900 (VNC)
- You can seek through the train of thought video of computer operation or
follow it in live mode.
- Interleaves video with normal text thoughts.
- Video available of old interactions and currently streaming message.
- Add type guards for action.path in drag vs text editor actions
- Added type guards for Union type attribute access
- Fixed variable naming conflicts between drag and text editor cases
- Resolved remaining typing issues in OpenAI, Anthropic agents
- Type guard without requiring another code indent level
- Create reusable method to call model
- Fix to summarize messages on operator run.
- Mark assistant tool calls with role = assistant, not environment
- Try fix message format when load after interrupts.
Does not work well yet
Previously CTRL+A would get triggered instead of ctrl+a. CTRL+A is
equivalent to ctrl+shift+a. This isn't intended and should be
called directly when required.
Now key combos like ctrl+a on computer firefox etc. work as expected
Track research and operator results at each nested iteration step
using python object references + async events bubbled up from nested
iterators.
Instantiates operator with interrupted operator messages from research
or normal mode.
Reflects actual interaction trajectory as closely as possible to agent
including conversation history, partial operator trajectory and new
query for fine grained, corrigible steerability.
Research mode continues with operator tool directly if previous
iteration was an interrupted operator run.
Since partial state reload after interrupt drops Khoj messages. The
assumption that there will always be a Khoj message after a user
message is broken. That is, there can now be multiple user messages
preceding a Khoj user message now.
This change allow for user queries to still be extracted for chat
history even if no khoj message follow.
Minor logic update to only include non image inferred queries for
gemini, anthropic models as well instead of just for openai models.
Apart from that the extracted function should be functionally same.
We were passing operator results as a simple dictionary. Strongly
typing it makes sense as operator results becomes more complex.
Storing operator results with trajectory on interrupts will allow
restarting interrupted operator run with agent messages of interrupted
trajectory loaded into operator agents
This allows:
- Each operator agent to own its summarization prompt. That it can
tune if it wants
- The outer operator loop to pass an override summarize prompt when it
invokes the summarize func but it does not have to
It had become broken at some point due to refactoring. The cache
control was getting added and removed right after in add_action_results
What we actually wanted to do is clear the old cache breakpoint and
put a new one at the latest operator tool result message.
This should improve operator speed and lower costs with anthropic
models.
- Generalize building pyautogui into executable python code snippet.
This should work across docker and local. And should be easier to
extend to operate a remote computer over the network as well.
- Create dockerfile for pyautogui operate-able containerized computer
Previously it could only operate a (playwright) browser. Now
- The operator logic and naming has been updated assuming
multiple environment types can be operated
- The operator entrypoint is now at __init__.py to simplify imports
and the entrypoint function is called operate_environment
- All operator agents have been updated to select their system prompts
and tools based on the environment they'll operate
- Khoj can now save and restore research from partial state
This triggers an interrupt that saves the partial research, then
when a new query is sent it loads the previous partial research as
context and continues utilizing with the new user query to orient
its future research
- Support natural interrupt and send query behavior from web app
This triggers an abort and send when a user sends a chat message
while khoj is in the middle of some previous research.
This interrupt mechanism enables a more natural, interactive
research flow
- Just send your new query. If a query was running previously it'd
be interrupted and new query would start processing. This improves on
the previous 2 click interrupt and send ux.
- Utilizes partial research for interrupted query, so you can now
redirect khoj's research direction. This is useful if you need to
share more details, change khoj's research direction in anyway or
complete research. Khoj's train of thought can be helpful for this.
- Track operator, research context in ChatMessage
- Track query field in (document) context field of ChatMessage
This allows validating chat message before inserting into DB
These seem to be a new class of errors showing up. Explicitly using
django timezone functions to add awareness to date time files stored
in DB seems to mitigate the issue.
Related #1180
- Engage reasoning when using claude 4 models
- Allow claude 4 models as monolithic operator agents
- Ease identifying which anthropic models can reason, operate GUIs
- Track costs, set default context window of claude 4 models
- Handle stop reason on calls to new claude 4 models
- Normalize conversation_id type to str instead of str or UUID
- Do not pass conversation_id to agenerate_chat_response as
the associated conversation is also being passed. So can get its id
directly.
## Overview
1. Create base framework to compose different operators and environments
for Khoj to operate.
2. Enable Khoj to operate a web browser using anthropic, openai, gemini
or open-source models
**Note**: *This is an alpha level feature release. It is meant for local
testing by contributors and self-hosters.*
## Capabilities
- Have Khoj operate a web browser to complete tasks that require actions
and visual feedback.
- Experiment with any vision model as operator. Khoj supports monolithic
and binary operator
- Monolithic operators rely on a single models like claude, openai to
both reason and ground operator actions
- Binary operators allow bootstrapping a fully local operator. It can
use any vision model for visual reasoning when paired with a capable
visual grounding model.
## Limitations
- In general, it is slower, more expensive and less comprehensive than
standard Khoj for research
## Setup
1. Install Khoj with playwright by either
- running `pip install khoj[local]`
- installing playwright separately via `pip install playwright` and
`playwright install chromium`
2. Set `KHOJ_OPERATOR_ENABLED` env var to true (i.e
`KHOJ_OPERATOR_ENABLED=true`)
3. Start Khoj (e.g `USE_EMBEDDED_DB="true" khoj --anonymous-mode -vv`)
4. Add the necessary chat model(s) with `vision enabled` via your [Khoj
Admin Panel](http://localhost:42110/server/admin)
- To use Anthropic claude: `claude-3.7-sonnet*` chat model is required
with vision enabled
- To use Openai operator: `gpt-4o` chat model is required with vision
enabled
- For other operator configurations: a chat model named `ui-tars-1.5` is
required with vision enabled
This can technically be any visual grounding model served via an openai
compatible api. I've just tested with ui-tars-1.5-7b deployed to an HF
inference endpoint for now. See [deployment
instructions](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md)
5. Set your desired vision chat model via [user
settings](http://localhost:42110/settings) to use as operator.
6. Run your queries with either the `/operator` slash command or by just
asking Khoj in your query to use the operator tool. You can combine run
operator in research mode a well
### Advanced Usage
- Reuse Browser Session
- Why: Have Khoj operate web services you've logged into. E.g manage
your gmail, github, social media etc.
- Setup
1. Start Chromium or Edge in Remote Debugging mode. For example, on Mac
you can start Edge by running the following in your terminal:
`/Applications/Microsoft\ Edge.app/Contents/MacOS/Microsoft\ Edge
--remote-debugging-port=9222`
4. Connect Khoj to that browser instance by setting the environment
variable `KHOJ_CDP_URL` to its URL.
By default you'd set `KHOJ_CDP_URL="http://localhost:9222"`
## Architecture
### Operator Agents
| Type | Design |
|----- |-----|
| Monolithic | <img
src="https://github.com/user-attachments/assets/7a96440f-1732-482b-9bd9-0920cb0c60890"
width=400> |
| Binary | <img
src="https://github.com/user-attachments/assets/c5d101c0-3475-43c2-a301-daa943cde190"
width=400> |
The generic grounding agent has not been tested properly but at least
it should be aligned with the interface being used by the ui-tars
grounding agent which has been tested.
It sometimes outputs coordinates in string rather than list. Make
parser more robust to those kind of errors.
Share error with operator agent to fix/iterate on instead of exiting
the operator loop.
- Encourage grounder to adhere to the reasoners action instruction
- Encourage reasoner to explore other actions when stuck in a loop
Previously seemed to be forcing it too strongly to choose
"single most important" next action. So may not have been exploring
other actions to achieve objective on initial failure.
- Do not catch errors messages just to re-throw them. Results in
confusing exception happened during handling of an exception
stacktrace. Makes it harder to debug
- Log error when action_results.content isn't set or empty to
debug this operator run error
Goto and back functions are chosen by the visual reasoning model for
increased reliability in selecting those tools. The ui-tars grounding
models seems too tuned to use a specific set of tools.
Documentation about this is currently limited, confusing. But it seems
like reasoning item should be kept if computer_call after, else drop.
Add noop placeholder for reasoning item to prevent termination of
operator run on response with just reasoning.
The reasoning messages in openai cua needs to be passed back or some
such. Else it throws missing response with required id error.
Folks are confused about expected behavior for this online as well.
The documentation to handle this seems to be sparse, unclear.
Show natural language, formatted text for each action. Previously we
were just showing json dumps of the actions taken.
Pass screenshot at each step for openai, anthropic and binary operator agents
Use text and image field in json passed to client for rendering both.
Show actions, env screenshot after actions applied in train of thought.
Showing the post action application screenshot seems more intuitive.
Previously we were showing the screenshot used to decide next action.
This pre action application screenshot was being shown after next
action decided (in train of thought). This was anyway misleading to
the actual ordering of event.
Rendered response is now a structured payload (dict) passing image
and text to be rendered up from operator to clients for rendering of
train of thought.
Operator is still early in development. To enable it:
- Set KHOJ_OPERATOR_ENABLE environment variable to true
- Run any one of the commands below:
- `pip install khoj[local]'
- `pip install khoj[dev]'
- `pip install playwright'
Grounding agent does not have the full context and capabilities to
make this call. Only let reasoning agent make termination decision.
Add a wait action instead when grounder requests termination.
UI tars grounder doesn't like calling non-standard functions like
goto, back.
Directly parse visual reasoner instruction to bypass uitars grounder
model.
At least for goto and back functions grounding isn't necessary, so
this works well.
Previously the grounding agent would be reset on every call. So it
only saw the most recent instruction and screenshot to make its next
action suggestion.
This change allows the visual grounders to see past instructions and
actions to prevent looping and encourage more exploratory action
suggestions by it when stuck or see errors.
Split visual grounder into two implementations:
- A ui-tars specific visual grounder agent. This uses the canonical
implementation of ui-tars with specialized system prompt and action
parsing.
- Fallback to generic visual grounder utilizing tool-use and served over
any openai compatible api. This was previously being used for our
ui-tars implementation as well.
Adds the results of each action in a separate item in message content.
Previously we were adding this as a single larger text blob. This
changes adds structure to simplify post processing (e.g truncation).
The updated add_action_results should also require less work to
generalize if we pass tool call history to grounding model as
action results in valid openai format.
The initial user query isn't updated during an operator run. So set it
when initializing the operator agent. Instead of passing it on every
call to act.
Pass summarize prompt directly to the summarize function. Let it
construct the summarize message to query vision model with.
Previously it was being passed to the add_action_results func as
previous implementation that did not use a separate summarize func.
Also rename chat_model to vision_model for a more pertinent var name.
These changes make the code cleaner and implementation more readable.
For some reason the page.go_back() action in playwright had a much
higher propensity to timeout. Use goto instead to reduce these page
traversal timeouts.
This requires tracking navigation history.
Only let the visual reasoner handle terminating the operator run.
Previously the grounder was also able to trigger termination.
Make catching the termination by the reasoner more robust
The previous browser_operator.py file had become pretty massive and
unwieldy. This change breaks it apart into separate files for
- the abstract environment and operator agent base
- the concrete agents: anthropic, openai and binary
- the concrete environment browser operator
- the operator actions used by agents and environment
- This operator works with model served over an openai compatible api
- It uses separate vision models to reason and ground actions.
This improves flexibility in the operator agents that can be created.
We do not know need our operator agent ot rely on monolithic models to
can both reason over visual data and ground their actions.
We can create operator agent from 2 separate models:
1. To reason over screenshots to suggest natural language next action
2. To ground those suggestion into visually grounded actions
This allows us to create fully local operators or operators combining
the best visual reasoner with the best visual grounder models.
Inform it can only control a single playwright browser page.
Previously it was assuming it is operating a whole browser, so would
have trouble navigating to different pages.
Improve handling of error in action parsing
Remove each OperatorAgent specific code from leaking out into the
operator. The Oprator just calls the standard OperatorAgent functions.
Each AgentOperator specific logic is handled by the OperatorAgent
internally.
The improve the separation of responsibility between the operator,
OperatorAgent and the Environment.
- Make environment pass screenshot data in agent agnostic format
- Have operator agents providers format image data to their AI model
specific format
- Add environment step type to distinguish image vs text content
- Clearly mark major steps in the operator iteration loop
- Handle anthropic models returning computer tool actions as normal
tool calls by normalizing next action retrieval from response for it
- Remove unused ActionResults fields
- Remove unnnecessary placeholders to content of action results like
for screenshot data
Decouple applying action on Environment from next action decision by
OperatorAgent
- Create an abstract Environment class with a `step' method
and a standardized set of supported actions for each concrete Environment
- Wrap playwright page into a concrete Environment class
- Create abstract OperatorAgent class with an abstract `act' method
- Wrap Openai computer Operator into concrete OperatorAgent class
- Wrap Claude computer Operator into a concrete OperatorAgent class
Handle interaction between Agent's action
Previously some link clicks would open in new tab. This is out of the
browser operator's context and so the new page cannot be interacted
with by the browser operator.
This change catches new page opens and opens them in the context page
instead.
Give the Khoj browser operator access to browser with existing
context (auth, cookies etc.) by starting it with CDP enabled.
Process:
1. Start Browser with CDP enabled:
`Edge/Chromium/Chrome --remote-debugging-port=9222'
2. Set the KHOJ_CDP_URL env var to the CDP url of the browser to use.
3. Start Khoj and ask it to get browser based work done with operator
+ research mode
The github run_eval workflow sets OPENAI_BASE_URL to empty string.
The ai model api created during initialization for openai models gets
set to empty string rather than None or the actual openai base url
This tries to call llm at to empty string base url instead of the
default openai api base url, which obviously fails.
Fix is to map empty base url's to the actual openai api base url.
Previously all exceptions were being caught. So retry logic wasn't
getting triggered.
Exception catching had been added to close llm thread when threads
instead of async was being used for final response generation.
This isn't required anymore since moving to async. And we can now
re-enable retry on failures.
Raise error if response is empty to retry llm completion.
Send image as png to non-openai models served via an openai compatible
api. As more models support png than webp.
Continue storing images as webp on server for efficiency.
Convert to png at the openai api layer and only for non-openai models
served via an openai compatible api.
Enable using vision models like ui-tars (via llama.cpp server), grok.
### Major
* Do more granular truncation on hitting context limits
* Pack research iterations as list of message content instead of
separate messages
* Update message truncation logic to truncate items in message content
list
* Make researcher aware of number of web, doc queries allowed per
iteration
### Minor
* Prompt web page reader to extract quantitative data as is from pages
* Track gemini 2.0 flash lite cost. Reduce max prompt size for 4o-mini
* Ensure time to first token logged only once per chat response
* Upgrade tenacity to respect min_time passed to exponential backoff
with jitter function
Fix for issue is in tenacity 9.0.0. But older langchain required
tenacity <0.9.0.
Explicitly pin version of langchain sub packages to avoid indexing
and doc parsing breakage.
- Construct tool description dynamically based on configurable query
count
- Inform the researcher how many webpage reads, online searches and
document searches it can perform per iteration when it has to decide
which next tool to use and the query to send to the tool AI.
- Pass the query counts to perform from the research AI down to the
tool AIs
Time to first token Log lines were shown multiple times if new chunk
bein streamed was empty for some reason.
This change makes the logic robust to empty chunks being recieved.
Previously research iterations and conversation logs were added to a
single user message. This prevented truncating each past iteration
separately on hitting context limits. So the whole past research
context had to be dropped on hitting context limits.
This change splits each research iteration into a separate item in a
message content list.
It uses the ability for message content to be a list, that is
supported by all major ai model apis like openai, anthropic and gemini.
The change in message format seen by pick next tool chat actor:
- New Format
- System: System Message
- User/Assistant: Chat History
- User: Raw Query
- Assistant: Iteration History
- Iteration 1
- Iteration 2
- User: Query with Pick Next Tool Nudge
- Old Format
- User: System + Chat History + Previous Iterations Message
- User: Query
- Collateral Changes
The construct_structured_message function has been updated to always
return a list[dict[str, Any]].
Previously it'd only use list if attached_file_context or vision model
with images for wider compatibility with other openai compatible api
Previously the research agent would have a hard time getting
quantitative data extracted by the web page reader tool AI.
This change aims to encourage the web page reader tool to extract
relevant data in verbatim form for higher granularity research and
responses.
Code tool should see code context and webpage tool should see online
context during research runs
Fix to include code context from past conversations to answer queries.
Add all queries to tool chat history when no specific tool to limit
extracting inferred queries for provided.
- Use much larger read, connect timeout if llm served over local url
- Use larger timeout duration than default (5s) for online llms too
This matches timeout duration increase calls to gemini api
- Improve overall flow of the contribute section of Readme
- Fix where to look for good first issues. The contributors board is outdated. Easier to maintain and view good-first-issue with issue tags directly.
Co-authored-by: Debanjum <debanjum@gmail.com>
Fallback to assume not a subscribed user if user not passed.
This allows user arg to be actually optional in the async
send_message_to_model_wrapper function
### Major
All reasoning models return thoughts differently due to lack of
standardization.
We normalize thoughts by reasoning models and providers to ease handling
within Khoj.
The model thoughts are parsed during research mode when generating final
response.
These model thoughts are returned by the chat API and shown in train of
thought shown on web app.
Thoughts are enabled for Deepseek, Anthropic, Grok and Qwen3 reasoning
models served via API.
Gemini and Openai reasoning models do not show their thoughts via
standard APIs.
### Minor
- Fix ability to use Deepseek reasoner for intermediate stages of chat
- Enable handling Qwen3 reasoning models
Previously Deepseek reasoner couldn't be used via API for completion
because of the additional formatting constrains it required was being
applied in this function.
The formatting fix was being applied in the chat completion endpoint.
DeepSeek reasoners returns reasoning in reasoning_content field.
Create an async stream processor to parse the reasoning out when using
the deepseek reasoner model.
The Qwen3 reasoning models return thoughts within <think></think> tags
before response.
This change parses the thoughts out from final response from the
response stream and returns as structured response with thoughts.
These thoughts aren't passed to client yet
OpenAI API doesn't support thoughts via chat completion by default.
But there are thinking models served via OpenAI compatible APIs like
deepseek and qwen3.
Add stream handlers and modified response types that can contain
thoughts as well apart from content returned by a model.
This can be used to instantiate stream handlers for different model
types like deepseek, qwen3 etc served over an OpenAI compatible API.
Recent changes enabled free tier users to switch free tier chat models
per conversation or the default.
This change enables free tier users to generate responses with their
conversation specific chat model.
Related: #725, #1151
Reason
---
- Simplify code and logic to stream chat response by solely relying on
asyncio event loop.
- Reduce overhead of managing threads to increase efficiency and
throughput (where possible).
Details
---
- Use async/await with no threading when generating chat response via
OpenAI, Gemini, Anthropic AI model APIs
- Use threading for offline chat model as llama-cpp doesn't support
async streaming yet
# PR Summary
This small PR resolves the deprecation warnings on `datetime` in
Python3.12+. You can find them in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/14538833837/job/40792624987#step:9:134):
```python
/__w/khoj/khoj/src/khoj/processor/content/images/image_to_entries.py:61: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp_now = datetime.utcnow().timestamp()
```
Overview
---
Enable free tier users to chat with any AI model made available on free tier
of production deployments like [Khoj cloud](https://app.khoj.dev).
Previously model switching was completely disabled for users on free tier.
Details
---
- Track price tier of each Chat, Speech, Image, Voice AI model in DB
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
Previously the model switching APIs and UX fields on web app were
completely disabled for free tier users
Rely on deepthought flag to control reasoning effort of low/high for
the grok model
This is different from the openai reasoning models which support
low/medium/high and for which we use low/medium effort based on the
deepthought flag
Note: grok is accessible over an openai compatible API
Disregard chart types as not using rich chart rendering
and they are duplicate of chart images that are rendered
Disregard text output associated with generated image files
Added a “Troubleshooting & Tips” section to the GCP Vertex documentation.
This section provides guidance for self-hosted users on common issues
they may encounter when setting up Google Vertex AI integration in Khoj.
Topics covered include permissions, region compatibility, prompt size
limits, API key testing, and secure key management with environment
variables. The goal is to improve the onboarding experience and reduce
setup errors for contributors and self-hosters using Vertex AI models
like Claude and Gemini.
Signed off by: brightally6@gmail.com
Server:
- Rate limit based on unverified email before creating user
- Check email address for deliverability before creating user
- Track rate limit for unverified email in new non-user keyed table
Web app:
- Show error in login popup to user on failure/throttling
- Simplify login popup logic by moving magic link handling logic
into EmailSigninContext instead of passing require props via parent
- Set chatSidebar prompt, Setting name fields to empty str if value null
- Track if agent modified in chatSidebar to simplify code, fix looping
- Suppress spurious dark mode hydration warnings on the web app
- Set key for chatMessage parent to get UX efficiently updated by react
- Let only root next.js layout handle html, body tags, not child layouts
Previously the sidebar could recurse on opening chat page (from home?)
due to child modelSelector component updating parent chatSidebar prop
which was passed back down to it in a loop.
The chatSidebar decides if agent has been modified in a single
useEffect and enables the Save button accordingly.
- Track agent modification wrt agent info received from server in
chatSidebar instead.
- Reduce modelSelector's mandate to just notify
when the user changes the model.
- Fix to infer, show & update agent state from chat sidebar on web app
This logic is fragile and convoluted because:
- the default agent chat model is dynamically determined.
- need to disambiguate tools not set vs none set vs all set by user
The default agent's tool selection is stored as undefined to show
not set scenario, which allows for all tools to be dynamically
used by agent.
But the user can also set no tools or all tools for their agents.
All 3 scenarios are handled differently.
- Track tools to be displayed vs tools to be stored
This is triggered by mismatch between "dark" class present on server
sent layout but not in client sent layout on initial render.
That mismatch exists because the server applies dark-mode styling
early to avoid FOUC flickering of UX.
Related 43e032e
Remove html, body elements from child page layouts. Let only the root
layout handle it.
Next.js router structure mounts child layouts inside parent layouts,
as defined by their directory hierarchy. So the html, body component
should only be defined in the parent layout.
This avoids the child layout mounting its html, body component within
the actual root layout's existing html, body component.
Previously the chat model associated with the default agent was always
the first chat model populated on the server. This doesn't match
behavior of the rest of the system, where the server chat settings is
preferred over the user chat settings over the first chat model.
This change brings the default agent's chat model in line with the
preference order used in the reset of the system.
Previous change to fallback to default agent was not functional. It
would error out if the conversation agent wasn't set when trying to
get conversation.agent.slug for calling aget_agent_by_slug func
We were previously relying on an older, unmaintained version of
pgvector docker image, ankane/pgvector.
Moving to new docker image requires selecting from tags based on the
pg major version (14, 15, 16 or 17).
This change uses pg15 tag to resolve image pull.
Note: we use postgres 15 for khoj docker images currently
Fixes#1154
Issue introduced in commit 5a3c7b1.
Usage of KHOJ_DOMAIN
---
KHOJ_DOMAIN is tri-state for local, official and other production deployments:
- If KHOJ_DOMAIN is unset (for local):
- sets CSRF cookie to localhost
- adds khoj.dev variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds app.khoj.dev variants to CORS origins
- If KHOJ_DOMAIN is set to empty (for official):
- sets CSRF to khoj.dev
- adds khoj.dev variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds app.khoj.dev variants to CORS origins
- If KHOJ_DOMAIN is set (for other prod deployments):
- sets CSRF cookie to KHOJ_DOMAIN
- adds KHOJ_DOMAIN variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds KHOJ_DOMAIN variants to CORS origins
Related #1137, #1152Resolves#1123
Unsure why this error triggers on every request to the Django admin
panel these days but all the requests are completing fine and the
client is clearly not aborting the request when the RequestAborted
exception is raised.
Suppress these errors for now via middleware to prevent them from
unnecessarily cluttering up the server logs and confusing folks.
Related #1152
### Improve Gemini usage
- Allow text tool to give agent ability to terminate research
- Set default context for gemini 2 flash models
2x context window for small, commercial models to 120K
- Default temperature of Gemini models to 1.0 to reduce repetition
### Improve evaluation harness
- Add more knobs to control eval workflow
- Allow running eval with any chat model served over an openai compatible api
- Control random sampling from eval set
- Auto read web page
- Use embedded postgres instead of postgres server for eval workflow
- Use Gemini 2.0 flash as evaluator. Set seed for evaluator to reduce decision variance
Previously Gemini 2 flash and flash lite were using context window of
10K by default as no defaults were added for it.
Increase default context for small commercial models to 120K from 60K
as cheaper and faster than their pro models equivalents at 60K context.
We'd moved research planner to only use tools in enum of schema. This
enum tool enforcement prevented model from terminating research by
setting tool field to empty.
Fix the issue by adding text tool to research tools enum and tell
model to use that to terminate research and start response instead.
Make research planner consistently select tool before query. As the
model should tune it's query for the selected tool. It got space to
think about tool to use in the scratchpad already.
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
Default to false as it is the default that is going to be used in prod
going forward.
- Set openai api key via input param in manual eval workflow runs
- Simplify evaluating other chat models available over openai
compatible api via eval workflow.
- Mask input api key as secret in workflow.
- Discard unnecessary null setting of env vars.
- Control randomization of samples in eval workflow.
If randomization is turned off, it'll take the first SAMPLE_SIZE
items from the eval dataset instead of a random collection of
SAMPLE_SIZE items.
This PR implements a new feature request template with a few UX/UI improvements.
Key changes:
- Use of GitHub forms.
- Provide note info for a submitter about feature request submitting rules.
- Adds a few handy fields like "Describe the feature" or "Use Case"
Overall, with a template like this feature requests will be more structured and meaningful.
Only setup speech to text and text to image models served via openai
compatible APIs when explicitly specified during initialization.
This avoids setup of whisper and dalle when an openai compatible API
is being setup instead of the openai API itself.
- Specify min, max number of list items expected in AI response via JSON schema enforcement. Used by Gemini models
- Warn and drop invalid/empty messages when format messages for Gemini models
- Make Gemini response adhere to the order of the schema property definitions
- Improve agent creation safety checker by using response schema, better prompt
Without explicitly using the property ordering field, gemini returns
responses in alphabetically sorted property order.
We want the model to respect the schema property definition order.
This ensures control during development to maintain response quality.
For example in CoT make it fill scratchpad before answers.
Require at least 1 item in lists. Otherwise gemini flash will
sometimes return an empty list. For chat actors where max items is
known, set that as well.
OpenAI API does not support specifying min, max items in response
schema lists, so drop those properties when response schema is
passed. Add other enforcements to response schema to comply with
response schema format expected by OpenAI API.
Previously we were setting message content part with empty text. This
results in error from Gemini API. Warn and drop such messages instead.
Log empty message content found during construction to root-cause the
issue but allow Khoj to respond without the offending messages in
context for call to Gemini API.
New
- Support Firecrawl as a online search provider
Improve
- Fallback to other enabled online search providers on failure
- Speed up online search with Jina by excluding webpage content in search results
Fix
- Fix Jina webpage reader. Improve it to include generated alt text to each image on webpage
- Truncate online query to Serper if query exceeds max supported length
Previously query to serper with longer than max supported would throw
error instead of returning at least some results.
Truncating the onlien search query to serper to max supported length
mitigates that issue.
- Improve webpage read to include image alt text
- Improve Jina webpage search to not include each page content
- Use POST instead of GET for web search, webpage read with Jina
This avoids installing pgserver on linux arm64 docker builds, which it
doesn't currently support and isn't required to support as Khoj docker
images can use standard postgres server made available via our
docker-compose.yml
Use pgserver python package as an embedded postgres db,
installed directly as a khoj python package dependency.
This significantly simplifies self-hosting with just a `pip install khoj'.
No need to also install postgres separately.
Still use standard postgres server for multi-user, production use-cases.
- Update default anthropic chat models to latest good models.
- Now that Google supports a good text to image model. Suggest adding
that if Google AI API is setup on first run.
Previously agent slug was not considered on create even when passed
explicitly in agent creation step.
This made the default agent slug different until next run when it was
updated after creation. And didn't allow chat to work on first run
The fix to use the agent slug when explicitly passed allows users to
chat on first run.
Previously messages got Anthropic specific formatting done before
being passed to Anthropic (chat) completion functions.
Move the code to format messages of type list[ChatMessage] into Anthropic
specific format down to the Anthropic (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalized list[ChatMesssage] type of chat messages across AI API
providers
Previously we'd always request up to 3 webpage url via the prompt but
read only one of the requested webpage url.
This would degrade quality of research and default mode. As model may
request reading upto 3 webpage links but get only one of the requested
webpages read.
This change passes the number of webpages to read down to the AI model
dynamically via the updated prompt. So number of webpages requested to
be read should mostly be same as number of webpages actually read.
Note: For now, the max webpages to read is kept same as before at 1.
Previously the research mode planner ignored the current agent or
conversation specific chat model the user was chatting with. Only the
server chat settings, user default chat model, first created chat model
were considered to decide the planner chat model.
This change considers the agent chat model to be used for the planner
as well. The actual chat model picked is decided by the existing
prioritization of server > agent > user > first chat model.
This change enables the creator of a shared conversation to stop sharing the conversation publicly.
### Details
1. Create an API endpoint to enable the owner of the shared conversation to unshare it
2. Unshare a public conversations from the title pane of the public conversation on the web app
Only show the unshare button on public conversations created by the
currently logged in user. Otherwise hide the button
Set conversation.isOwner = true only if currently logged in user
shared the current conversation.
This isOwner information is passed by the get shared conversation API
endpoint
Previously messages passed to gemini (chat) completion functions
got a little of Gemini specific formatting mixed in.
These functions expect a message of type list[ChatMessage] to work
with prompt tracer etc.
Move the code to format messages of type list[ChatMessage] into gemini
specific format down to the gemini (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalize list[ChatMesssage] type of chat messages across
providers
This is analogous to how we enable extended thinking for claude models
in research mode.
Default to medium effort irrespective of deepthought for openai
reasoning models as high effort is currently flaky with regular
timeouts and low effort isn't great.
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
- Anthropic expects a 0-1 range. Gemini & OpenAI expect a 0-2 range
- Anneal temperature to explore reasoning trajectories but respond factually
- Default send_message_to_model and extract_question temps to the same
Enable configuring a Khoj AI model API for Vertex AI using GCP credentials.
Specifically use the api key & api base url fields of the AI Model API
associated with the current chat model to extract gcp region, gcp
project id & credentials. This helps create a AnthropicVertex client.
The api key field should contain the GCP service account keyfile as a
base64 encoded string.
The api base url field should be of the form
`https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`
Accepting GCP credentials via the AI model API makes it easy to use
across local and cloud environments. As it bypasses the need for a
separate service account key file on the Khoj server.
- The 3.4.1 release of sentence tranformer fixes offline load latency
of sentence transformer models (and Khoj) by avoiding call to HF
- The 4.50.0 release of transformers is resulting in
jax error (unexpected keyword argument 'flatten_with_keys') on load.
Previously google auth library was explicitly installed only for the
cloud variant of Khoj to minimize packages installed for non
production use-cases.
But it was being implicitly installed as a dependency of an explicit
package in the default installation anyway.
Making the dependency on google auth package explicit simplifies
the conditional import of google auth in code while not incurring any
additional cost in terms of space or complexity.
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
- Fix regression: Inline images were not getting passed to the AI
models since #992
- Format inline images passed to Gemini models correctly
- Format inline images passed to Anthropic models correctly
Verified vision working with inline and url images for OpenAI,
Anthropic and Gemini models.
Resolves#1112
Previously on slow connection you'd see the agent dropdown flicker
from undefined to Khoj default agent on phones and other thin screens.
This is unnecessary and jarring. Populate with default agent to remove
this issue
Previously the chat input area didn't allow inputting text while Khoj is
researching and generating response.
This change allows the user to add their next text while Khoj
responds. This should speed up interaction cycles as user can have
their next query ready to send when Khoj finishes its response.
- Trigger
Gemini 2.0 Flash doesn't always follow JSON schema in research prompt
- Details
- Use json schema to enforce generate online queries format
- Use json schema to enforce research mode tool pick format
- Support constraining Gemini model output to specified response schema
- Support constraining OpenAI model output to specified response schema
- Only enforce json output in supported AI model APIs
- Simplify OpenAI reasoning model specific arguments to OpenAI API
Previously OpenAI reasoning models didn't support stream_options and
response_format
Add reasoning_effort arg for calls to OpenAI reasoning models via API.
Right now it defaults to medium but can be changed to low or high
Previously was encoding E2B code execution text output content as b64.
This was breaking
- The AI model's ability to see the content of the file
- Downloading the output text file with appropriately encoded content
Issue created when adding E2B code sandbox in #1120
* Implement better bug issue template
* Fix IDs in new bug issue template
* Reduce, reorder and improve field descriptions in the bug issue template
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Claude 3.7 Sonnet is Anthropic's first reasoning model. It provides a
single model/api capable of standard and extended thinking. Utilize
the extended thinking in Khoj's research mode.
Increase default max output tokens to 8K for Anthropic models.
# Improve Code Tool, Sandbox
- Improve code gen chat actor to output code in inline md code blocks
- Stop code sandbox on request timeout to allow sandbox process restarts
- Use tenacity retry decorator to retry executing code in sandbox
- Add retry logic to code execution and add health check to sandbox container
- Add E2B as an optional code sandbox provider
# Improve Gemini Chat Models
- Default to non-zero temperature for all queries to Gemini models
- Default to Gemini 2.0 flash instead of 1.5 flash on setup
- Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code gen chat actor to improve correct code gen success,
especially for smaller models & models with limited json mode support
Allow specify code blocks inline with reasoning to try improve
code quality
Infer input files based on user file paths referenced in code.
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template
- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
- Batch sync files by size to try not exceed API request payload size limits
- Fix force sync of large vaults from Obsidian
- Add API endpoint to delete all indexed files by file type
- Fix to also delete file objects when call DELETE content source API
Previously if you tried to force sync a vault with more than 1000
files it would only end up keeping the last batch because the PUT API
call would delete all previous entries.
This change calls DELETE for all previously indexed data first, followed by
a PATCH to index current vault on a force sync (regenerate) request.
This ensures that files from previous batches are not deleted.
- Delete file objects on deleting content by source via API
Previously only entries were deleted, not the associated file objects
- Add new db adapter to delete multiple file objects (by name)
- Set KHOJ_ALLOWED_DOMAIN to the domain that Khoj is accessible on
from the host machine. This can be the internal i.p or domain of the
host machine.
It can be used by your load balancer/reverse_proxy to access Khoj.
For example, if the load balancer service is in the khoj docker
network, KHOJ_DOMAIN will be `server' (i.e service name).
- Set KHOJ_DOMAIN to your externally accessible DOMAIN or I.P to avoid
CSRF trusted origin or unset cookie issue when trying to access the
khoj admin panel.
Resolves#1114
- Also, when in not subscribed state, fallback to the default model when chatting with an agent
- With conversion, create a brand new agent from inside the chat view that can be managed separately
Make it easier to determine which model you're chatting with, and to effortlessly update said model from within a given chat.
In this change, we introduce a side bar that allows users to quickly change their chat model, tools, custom instructions, and file filters, directly within the chat view. This removes the need for setting up custom agents for simple instructions and mitigates the requirement to go to the settings page to verify the chat model in action.
The settings page will still configure a per-user *default*, but the sidebar will allow for greater customization based on the needs of a conversation.
We also extend the chat model to include more attributes that help users make decisions about model selection, including `strengths` and `description`. This can help people quickly understand which model might work best for their use case.
Currently, it's rather opaque and difficult to substantially browse through the uploaded knowledge base. Effectively, you can only do this through the small file modal in the settings page.
Update to include all indexed files in the search page for viewing & deletion. Function to delete all files is still in the settings page.
Add a migration that associates file objects with `entry`s using a foreign key. Add a migration command that deletes dangling fileobjects.
## Description
Added file path autocompletion to enhance the search experience in Khoj. Users can now easily search for specific files by typing "file:" followed by the file name, with real-time suggestions appearing as they type.
## Changes
- Added new API endpoint `/api/file-suggestions` to get file path suggestions
- Enhanced search UI with dropdown suggestions for file paths
- Implemented debounced search to optimize API calls
- Added keyboard (Enter) and mouse click support for selecting suggestions
## Features
- Type "file:" to trigger file path suggestions
- Real-time filtering of suggestions as you type
- Top 10 alphabetically sorted suggestions
- Case-insensitive matching
- Keyboard and mouse interaction support
- Clear visual feedback with a dropdown UI
A hidden agent basically allows each individual conversation to maintain custom settings, via an agent that's not exposed to the traditional functionalities allotted for manually created agents (e.g., browsing, maintenance in agents page).
This will be hooked up to the front-end such that any conversation that's initiated with the default agent can then be given custom settings, which in the background creates a hidden agent. This allows us to repurpose all of our existing agents infrastructure for chat-level customization.
Previously emails with url special characters would not get
successfully identified for login. Account creation was fine due to
email being in POST request body. But login with such emails did not
work due to query params not being escaped before being sent to server
This change escapes both the code and email in login URL sent to
server. So login with emails containing special characters like
`email+khoj@gmail.com' works. It fixes both the URL web app sent by
web app directly and the magic link sent to users to their email
This change also fixes accessibility issue of having a DialogTitle in
DialogContent for screen readers.
Resolves#1090
We've been having issues generating diagrams with Excalidraw that are any degree of complexity. By contrast, LLMs are able to handle Mermaid.js syntax a lot better, as it's much more forgiving and has a simpler declarative style. Refer to https://mermaid.js.org/.
Update so that new diagrams are generated with Mermaid.js, while old diagrams generated with Excalidraw can still be viewed.
Fix for #1082 pushed down adding the `data:image/webp;base64' prefix
of the base64 images to the server image gen API. But the code on the
Obsidian and Desktop client still add these prefixes.
This change stops the Desktop, Obsidian clients from adding the prefix
as it is being handled by the API now. It should resolve showing
images inline in those clients as well
Previously we'd use the general openai client, even if the image
generation model has a different api key and base url set.
This change uses the openai config of the image generation models when
set. Otherwise it fallbacks to use the first openai api provider set
This change adds the ability to use OpenAI, Azure OpenAI or any embedding model exposed behind an OpenAI compatible API (like Ollama, LiteLLM, vLLM etc.).
Khoj previously only supported HuggingFace embedding models running locally on device or via HuggingFaceW inference API endpoint. This allows using commercial embedding models to index your content with Khoj.
Previously if the call to this tool AI failed, the API call would
non-gracefully fail on server. This would leave the client hanging in
a wierd state (e.g with spinner running on web app with no indication
of issue).
Also do not show filters in query debug log lines when no filters in query
- Background
Access to the clipboard API is disabled by certain browsers in non
localhost http scenarios for security reasons.
So the copy API key button doesn't work when khoj is self-hosted
with authentication enabled at a non localhost domain
- Change
This change enables copying API keys by manual text highlight + copy
if copy button is disabled
Resolves#1070
This change makes Automations (and possibly other entrypoints) use the configured OpenAI-compatible server if that has been set. Without this change it tries to use the hardcoded OpenAI provider.
- One current issue in the Khoj application is that managing the files being referenced as the user's knowledge base is slightly opaque and difficult to access
- Add a migration for associating the fileobjects directly with the Entry objects, making it easier to get data via foreign key
- Add the new page that shows all indexed files in the search view, also allowing you to upload new docs directly from that page
- Support new APIs for getting / deleting files
- Just say using default config. This old khoj.yml settings mechamism
isn't standard, so not having a configured khoj.yml isn't a concern
- Deep link to desktop download instead of the whole download page as
android etc. are also on it, which don't help with syncing docs
OpenAI chat models deployed on Azure are (ironically) not OpenAI API compatible endpoints.
This change enables using OpenAI chat models deployed on Azure with Khoj.
The github integration has not been tested and may still be broken.
This change at least makes it easier to add repositories without
needing a PAT token if/when it does work.
- Add the mermaid package and apply front-end parsing for interpreting the diagrams. Retain processing of the excalidraw type for backwards compatibility
- Translated comments from French to English for better accessibility and understanding.
- Updated CSS comment for loading animation to reflect the change in language.
- Enhanced code readability by ensuring consistent language usage across multiple files.
- Improved user experience by clarifying the purpose of various functions and settings in the codebase.
Fix AttributeError: 'NoneType' object has no attribute 'model_extra'
* cost = chunk.usage.model_extra.get("estimated_cost", 0) if hasattr(chunk, "usage") and chunk.usage else 0 # Estimated costs returned by DeepInfra API
- Encode article urls in filename indexed in Khoj KB
Makes it easier for humans to compare, trace retrieval performance
by looking at logs than using content hash (which was previously
explored)
- Added visual loading indicators to the search modal for improved user experience during search operations.
- Implemented logic to check if search results correspond to files in the vault, with color-coded results for better clarity.
- Refactored the getSuggestions method to handle loading states and abort previous requests if necessary.
- Updated CSS styles to support new loading animations and result file status indicators.
- Improved the renderSuggestion method to display file status and provide feedback for files not in the vault.
- Added a new setting for users to configure the sync interval in minutes, allowing for more flexible automatic synchronization.
- Introduced methods to start and restart the synchronization timer based on the configured interval.
- Updated the synchronization logic to use the user-defined interval instead of a fixed 60 minutes.
- Improved code readability and organization by refactoring the sync timer logic.
- Added a new setting to manage sync folders, allowing users to specify which folders to sync or to sync the entire vault.
- Implemented a modal for folder suggestions to facilitate folder selection.
- Updated the folder list display to show currently selected folders with options to remove them.
- Improved CSS styles for chat interface and folder list for better user experience.
- Refactored code for consistency and readability across multiple files.
- Introduced a new command 'Sync new changes' to allow users to manually synchronize new modifications.
- The command updates the content index without regenerating it, ensuring only new changes are synced.
- User-triggered notifications are displayed upon successful sync.
- Format AI response to send in automation email
- Let Khoj infer chat query based on user automation query
- Decide if automation emails should be sent based on response
- Fix the `to_notify_or_not` decider AI
- Ask reason before decision to improve to_notify decider AI
- Show error message on web app when fail to create/update automation
Previously we sent the AI response directly. This change post
processes the AI response with the awareness that it is to be sent to
the user as an email to improve rendering and quality of the emails.
This tries to decouple the automation query from the chat query. So
the chat model doesn't have to know it is running in an automation
context or figure how to notify user or send automation response. It
just has to respond to the AI generated `query_to_run' corresponding
to the `scheduling_request` automation by the user.
For example, a `scheduling_request' of `notify me when X happens'
results in the automation calling the chat api with a `query_to_run`
like `tell me about X` and deciding if to notify based on information
gathered about X from the scheduled run. If these two are not
decoupled, the chat model may respond with how it can notify about X
instead of just asking about it.
Swap query_to_run with scheduling_request on the automation web page
- Rather than chunky generic cards, make the suggested actions more action oriented, around the problem a user might want to solve. Give them follow-up options. Design still in progress.
- De facto, was being assumed everywhere if authenticatedData is null, that it's not logged in. This isn't true because the data can still be loading. Update the hook to send additional states.
- Bonus: Delete model picker code and a slew of unused imports.
- Add support for seeing all steps of the agent modification flow via tabs at the top of the modal
- Separate knowledge base & tool selection into two separate parts
Use the [shadcn sidebar](https://ui.shadcn.com/docs/components/sidebar#sidebarmenusub) across Khoj and standardize how the side panel experience works across the app. This helps us generalize the code better, while re-using the same components.
Deprecates current usage of the chat history side panel, replacing it with the new `appSidebar.tsx` component.
We'll eventually move out the `Manage Files` section and move it into a separate panel dedicated to chat-level controls.
- Rather than chunky generic cards, make the suggested actions more action oriented, around the problem a user might want to solve. Give them follow-up options. Design still in progress.
- De facto, was being assumed everywhere if authenticatedData is null, that it's not logged in. This isn't true because the data can still be loading. Update the hook to send additional states.
- Bonus: Delete model picker code and a slew of unused imports.
New conversation have a slug, but older conversation may not. This
change allows those older conversations to still be shareable by using
a random uuid for constructing their url instead
- Some of the instructions were duplicated (e.g illegal, harmful)
- Return format requested was inconsistent
- Safety prompt felt overtly prudish which lowered their utility.
Make it laxer for now, add checks later if required
- Print hash in CI to ease verifying ci built python package matches
khoj package published on pypi
- Newer pypi publish github action should speed up workflow by ~30s
Major
---
Previously we couldn't enable research mode or use other slash
commands in automated tasks.
This change separates determining if a chat query is triggered via
automated task from the (other) conversation commands to run the query
with.
This unlocks the ability to enable research mode in automations
apart from other variations like /image or /diagram etc.
Minor
---
- Clean the code to get data sources and output formats
- Have some examples shown on automations page run in research mode now
This allows online search to work out of the box again
for self-hosting users, as no auth/api key setup required.
Docker users do not need to change anything in their setup flow.
Direct installers can setup Searxng locally or use public instances if
they do not want to use any of the other providers (like Jina, Serper)
Resolves#749. Resolves#990
This allows online search to work out of the box again for
self-hosting users, as no auth/api key setup required.
Docker users do not need to change anything in their setup flow.
Direct installers can setup searxng locally or use public instances if
they do not want to use any of the other providers (like Jina, Serper)
Resolves#749. Resolves#990
The welcome, feedback and automation emails were still using the Khoj
domain, which wouldnt work for self-hosted users with their RESEND key
Resolves#1004
- Previous was incorrectly plural but was defining only a single model
- Rename chat model table field to name
- Update documentation
- Update references functions and variables to match new name
- Move khoj message border to left like in web ui
- Remove user, sender emojis and name
- Ensure conversation title always at top of chat sessions view,
even if no chat sessions loaded yet, instead of causing layout shift
on chat sessions load
Improve handling of multiple output modes
- Use the generated descriptions / inferred queries to supply context to the model about what it's created and give a richer response
- Stop sending the generated image in user message. This seemed to be confusing the model more than helping.
- Collect generated assets in a structured objects to provide model context. This seems to help it follow instructions and separate responsibility better
- Also, rename the open ai converse method to converse_openai to follow patterns with other providers
- Use the generated descriptions / inferred queries to supply context to the model about what it's created and give a richer response
- Stop sending the generated image in user message. This seemed to be confusing the model more than helping.
- Also, rename the open ai converse method to converse_openai to follow patterns with other providers
Previously id were used (by default) for model display strings.
This made it hard to select chat model options, server chat settings
etc. in the admin panel dropdowns.
This change uses more recognizable names for the DB objects to ease
selection in dropdowns and display in general on the admin panel.
* Rename OpenAIProcessorConversationConfig to more apt AiModelAPI
The DB model name had drifted from what it is being used for,
a general chat api provider that supports other chat api providers like
anthropic and google chat models apart from openai based chat models.
This change renames the DB model and updates the docs to remove this
confusion.
Using Ai Model Api we catch most use-cases including chat, stt, image generation etc.
Simplify the desktop app
- Make the desktop app mainly a file-syncing client for users who have lots of documents that they need to share with Khoj. This is because the web app provides a fairly robust chat client which can be used by anyone on their computer.
- The chat client in the desktop app had significantly drifted from our current brand / them, and didn't provide enough value add to update. Later, we will make it easier to install the existing web app as a desktop PWA.
Currently, Khoj has terminal states with respect to what assets it outputs. We limit it to image, text, and excalidraw diagram. This limitation is unnecessary and provides undue constraints for creating more dynamic, diverse experiences. For instance, we may want the chat view to morph for document editing or generation, in which case having limited output modes would be a detriment.
This change allows us to emit generated assets and then continue on to more text generation in final response. It forces text response for all messages. It adds a new stream event, GENERATED_ASSETS, which holds content that the AI is emitting in response to the query.
Overview
- The default django admin panel UI looks pretty dated and didn't
have any Khoj specific branding
- Used the Unfold Django admin panel theme for a modern look
- Used the Khoj logo and name in Admin panel title, headings, favicons
Details:
All models shown on Admin panel need to inherit from unfold's
ModelAdmin to get styling applied. So
- Make all models on Admin panel inherit from unfold's ModelAdmin
- Subclassed UserAdmin to inherit from unfold's ModelAdmin
- Deregistered the unused Auth Group model from the Admin panel
We can add it back when its actually used. Avoid confusion for now
- Explicitly register DjangoJobExecution on admin panel and again make
it inherit from the unfold.admin.ModelAdmin
- Make the desktop app mainly a file-syncing client for users who have lots of documents that they need to share with Khoj. This is because the web app provides a fairly robust chat client which can be used by anyone on their computer.
- The chat client in the desktop app had significantly drifted from our current brand / them, and didn't provide enough value add to update. Later, we will make it easier to install the existing web app as a desktop PWA.
- Use bubblewrap generated splash screen, notification icons from
1200x1200 high res khoj icon in assets.khoj.dev.
- Discard bubblewrap generated launcher icons.
- Fix orientation to respect phone orientation settings. "any" was not.
- Add 512, 192 Khoj maskable icons to web app manifest for android rendering
- Add id, categories etc suggested by pwabuilder
- Use higher quality icon images for splash screen than what
bubblewrap creates by default
Encode api key in header, POST request body and GET query param for
search as utf-8 to avoid the multibyte char in request issue when
making API calls from khoj.el to khoj server.
Resolves#935
### Issue
- Path with / are converted to \\ on Windows using the `Path' operator.
- The `markdown_to_entries' module was trying to normalize file paths with`Path' for some reason.
This would store the file paths in DB Entry differently than the file to entries map if Khoj ran on Windows.
That'd result in a KeyError when trying to look up the entry file path from `file_to_text_map' in the `text_to_entries:update_embeddings()' function.
### Fix
- Removing the unnecessary OS dependent Path normalization in `markdown_to_entries' should keep the file path storage consistent across `file_to_text_map' var, `FileObjectAdaptor', `Entry' DB tables on Windows for Markdown files as well.
This issue will affect users hosting Khoj server on Windows and attempting to index markdown files.
Resolves#984
- Add type hints to improve maintainability of stabilzed indexing code
- It shouldn't be necessary to wrap string variables in an f-string
This change aims to improve code quality. It should not affect
functionality.
Issue
- Path with / are converted to \\ on Windows using the Path operator.
- The markdown to entries method for some reason was doing this.
This would store the file paths in DB entry differently than the file
to entries map. Resulting in a KeyError when trying to look up the
entry file path from file_to_text_map in the
text_to_entries:update_embeddings() function.
Fix
- Removing the unnecessary OS dependendent Path normalization in
markdown_to_entries should keep the file path storage consistent
across file_to_text_map var, FileObjectAdaptor, Entry DB tables on
Windows for Markdown files as well
This issue would only affect users hosting Khoj server on Windows and
attempting to index markdown files.
Resolves#984
- Added it to the Django migrations so that it auto-triggers when someone updates their server and starts it up again for the first time. This will require that they update their clients as well in order to view/consume image content.
- Remove server-side references in the code that allow to parse the text-to-image intent as it will no longer be necessary, given the chat logs will be migrated
This avoid unnecessarily throwing an internal server error when the
user tries to sign-up using multiple mechanisms (e.g first by email, then
by google oauth)
- Improve escaping to load complex json objects
- Fallback to a more forgiving [json5](https://json5.org/) loader if `json.loads` cannot parse complex json str
This should reduce failures to pick research tool and run code by agent
JSON5 spec is more flexible, try to load using a fast json5 parser if
the stricter json.loads from the standard library can't load the
raw complex json string into a python dictionary/list
Gemini doesn't work well when trying to output json objects. Using it
to output raw json strings with complex, multi-line structures
requires more intense clean-up of raw json string for parsing
- Use pre-built wheels for torch and llama-cpp-python
- Install and link musl as it's used by llama-cpp-python pre-built
wheel instead of glibc
- Join Install git and Install Dependencies steps in pytest workflow
To remove unnecessary steps
- Building arm64 image on an ubuntu arm64 runner reduces `yarn build'
step time by 75% from 12mins to 3mins.
- This is because no QEMU emulation for arm64 on x86 is required now
- Parallelizing x64 and arm64 platform builds halves build time on top
- Revert to use standard ubuntu-latest runner as large x64 runner
doesn't give much more speed improvements
This results an effective additional 50%-66% reduction in build time
on top of #987.
So a full dockerize workflow run now takes *10 mins* vs previous 35+mins.
This is a total of *72% improvement* in max dockerize run time.
Get additional speed improvements when docker layer cache hit.
## Objective
Improve build speed and size of khoj docker images
## Changes
### Improve docker image build speeds
- Decouple web app and server build steps
- Build the web app and server in parallel
- Cache docker layers for reuse across dockerize github workflow runs
- Split Docker build layers for improved cacheability (e.g separate `yarn install` and `yarn build` steps)
### Reduce size of khoj docker images
- Use an up-to-date `.dockerignore` to exclude unnecessary directories
- Do not installing cuda python packages for cpu builds
### Improve web app builds
- Use consistent mechanism to get fonts for web app
- Make tailwind extensions production instead of dev dependencies
- Make next.js create production builds for the web app (via `NODE_ENV=production` env var)
The current fix should improve Khoj responses when charts in response
context. It truncates code context before sharing with response chat actors.
Previously Khoj would respond with it not being able to create chart
but than have a generated chart in it's response in default mode.
The truncate code context was added to research chat actor for
decision making but it wasn't added to conversation response
generation chat actors.
When khoj generated charts with code for its response, the images in
the context would exceed context window limits.
So the truncation logic to drop all past context, including chat
history, context gathered for current response.
This would result in chat response generator 'forgetting' all for the
current response when code generated images, charts in response context.
It needs to be used across routers and processors. It being in
run_code tool makes it hard to be used in other chat provider contexts
due to circular dependency issues created by
send_message_to_model_wrapper func
Previous changes to depend on just the PROMPTRACE_DIR env var instead
of KHOJ_DEBUG or verbosity flag was partial/incomplete.
This fix adds all the changes required to only depend on the
PROMPTRACE_DIR env var to enable/disable prompt tracing in Khoj.
Pass your domain cert files via the --sslcert, --sslkey cli args.
For example, to start khoj at https://example.com, you'd run command:
KHOJ_DOMAIN=example.com khoj --sslcert example.com.crt --sslkey
example.com.key --host example.com
This sets up ssl certs directly with khoj without requiring a
reverse proxy like nginx to serve khoj behind https endpoint for
simple setups. More complex setups should, of course, still use a
reverse proxy for efficient request processing
- Track, return cost and usage metrics in chat api response
Track input, output token usage and cost of interactions with
openai, anthropic and google chat models for each call to the khoj chat api
- Collect, display and store costs & accuracy of eval run currently in progress
This provides more insight into eval runs during execution
instead of having to wait until the eval run completes.
Collect, display and store running costs & accuracy of eval run.
This provides more insight into eval runs during execution instead of
having to wait until the eval run completes.
- Previously when settings list became long the dropdown height would
overflow screen height. Now it's max height is clamped and y-scroll
- Previously the dropdown content would take width of content. This
would mean the menu could sometimes be less wide than the button. It
felt strange. Now dropdown content is at least width of parent button
- Track input, output token usage and cost for interactions
via chat api with openai, anthropic and google chat models
- Get usage metadata from OpenAI using stream_options
- Handle openai proxies that do not support passing usage in response
- Add new usage, end response events returned by chat api.
- This can be optionally consumed by clients at a later point
- Update streaming clients to mark message as completed after new
end response event, not after end llm response event
- Ensure usage data from final response generation step is included
- Pass usage data after llm response complete. This allows gathering
token usage and cost for the final response generation step across
streaming and non-streaming modes
- Previously online chat actors, director tests only worked with openai.
This change allows running them for any supported onlnie provider
including Google, Anthropic and Openai.
- Enable online/offline chat actor, director in two ways:
1. Explicitly setting KHOJ_TEST_CHAT_PROVIDER environment variable to
google, anthropic, openai, offline
2. Implicitly by the first API key found from openai, google or anthropic.
- Default offline chat provider to use Llama 3.1 3B for faster, lower
compute test runs
- Set output mode to single string. Specify output schema in prompt
- Both thesee should encourage model to select only 1 output mode
instead of encouraging it in prompt too many times
- Output schema should also improve schema following in general
- Standardize variable, func name of io selector for readability
- Fix chat actors to test the io selector chat actor
- Make chat actor return sources, output separately for better
disambiguation, at least during tests, for now
- Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across *general*, *default* and *research* modes
- Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models
- Trigger eval workflow on release or manually
- Make dataset, khoj mode and sample size configurable when triggered via manual workflow
- Enable Web search, webpage read tools during evaluation
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously, we'd replace the generated message with an error message
when message generation stopped via stop button on chat page of web app.
So the partially generated message (which could be useful) gets lost.
This change just stops generation, while keeping the generated
response so any useful information from the partially generated
message can be retrieved.
- Allows managing chat models in the OpenAI proxy service like Ollama.
- Removes need to manually add, remove chat models from Khoj Admin Panel
for these OpenAI compatible API services when enabled.
- Khoj still mantains the chat models configs within Khoj, so they can
be configured via the Khoj admin panel as usual.
Previously Jina search didn't API key. Now that it does need API key,
we should re-use the API key set in the Jina web scraper config,
otherwise fallback to using JINA_API_KEY from environment variable, if
either is present.
Resolves#978
- Integrate with Ollama or other openai compatible APIs by simply
setting `OPENAI_API_BASE' environment variable in docker-compose etc.
- Update docs on integrating with Ollama, openai proxies on first run
- Auto populate all chat models supported by openai compatible APIs
- Auto set vision enabled for all commercial models
- Minor
- Add huggingface cache to khoj_models volume. This is where chat
models and (now) sentence transformer models are stored by default
- Reduce verbosity of yarn install of web app. Otherwise hit docker
log size limit & stops showing remaining logs after web app install
- Suggest `ollama pull <model_name>` to start it in background
- Update to latest initialize with new claude 3.5 sonnet and haiku models
- Update to set vision enabled for google and anthropic models by
default. Previously we didn't support but we've supported this for a
month or two now
- Just load the raw csv from OpenAI bucket. Normalize it into FRAMES format
- Improve docstring for frames datasets as well
- Log the load dataset perf timer at info level
- Explictly adding a slash command is a higher priority intent than
research mode being enabled in the background. Respect that for a
more intuitive UX flow.
- Explicit slash commands do not currently work in research mode.
You've to turn research mode off to use other slash commands. This
is strange, unnecessary given intent priority is clear.
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously errors would get eaten up but the model wouldn't see
anything. And the model wouldn't be allowed re-run the same query-tool
combination in the next iteration.
This update should give it insight into why it didn't get a result. So
it can make an informed (hopefully better) decision on what to do next.
And re-run the previous query if appropriate.
Previously when call to online search API etc. failed, it'd error out
of response to query in research mode. Khoj should skip tool use that
iteration but continue to try respond.
Previously chatml messages were just strings, now they can be list of
strings or list of dicts as well.
- Use json seriallization to manage their variations and truncate them
before printing for context.
- Put logic in single function for use across chat models
- Default to evaluation decision of None when either agent or
evaluator llm fails. This fixes accuracy calculations on errors
- Fix showing color for decision True
- Enable arg flags to specify output results file paths
Previously chatml messages were just strings.
Since gemini, anthropic models always have messages as list of
strings, truncate those strings instead of the list of message content
Removing binary data and truncating large data in output files
generated by code runs should improve speed and cost of research mode
runs with large or binary output files.
Previously binary data in code results was passed around in iteration
context during research mode. This made the context inefficient
because models have limited efficiency and reasoning capabilities over
b64 encoded image (and other binary) data and would hit context limits
leading to unnecessary truncation of other useful context
Also remove image data when logging output of code execution
- Allow passing user files as input into code sandbox for analysis
- Update prompt to give more example of complex, multi-line code
- Simplify logic for model. Run one program at a time,
instead of allowing model to run multiple programs in parallel
- Show Code generated charts and docs in Reference pane of web app and make them downloaded
- Add a border below heading
- Show code snippet in pre block
- Overflow-x when reference side panel open to allow seeing whole text
via x-scroll
- Align header, body position of reference cards with each other
- Only show filename in doc reference cards at message bottom.
Show full file path in hover and reference side panel
- Improve rendering code reference with better icons, smaller text and
different line clamps for better visibility
- Show code output files as sub card of code card in reference section
- Allow downloading files generated by code instead of rendering it in
chat message directly
- Show executed code before online references in reference panel
- Fix to render code generated chart with images, excalidraw diagrams
- Fix to save code context to chat history in image, diagram output modes
- Fix bug in image markdown being wrapped twice in markdown syntax
- Render newline in code references shown on chat page of web app
Previously newlines weren't getting rendered. This made the code
executed by Khoj hard to read in references. This changes fixes that.
`dangerouslySetInnerHTML' usage is justified as rendered code
snippet is being sanitized by DOMPurify before rendering.
- Run one program at a time, instead of allowing model to pass
multiple programs to run in parallel to simplify logic for model
- Update prompt to give more example of complex, multi-line code
- Allow passing user files as input into code sandbox for analysis
- Log code execution timer at info level to evaluate execution latencies
in production
- Type the generated code for easier processing by caller functions
Support including file attachments in the chat message
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).
This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.
Previously the batch start index wasn't being passed so all batches
started in parallel were showing the same processing example index
This change doesn't impact the evaluation itself, just the index shown
of the example currently being evaluated
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
- Why
We need better, automated evals to measure performance shifts of Khoj
across prompt, model and capability changes.
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of AI agents. It's a good starter benchmark to evaluate Khoj.
- Details
This PR adds an eval script to evaluate Khoj responses on the the FRAMES
benchmark prompts against the ground truth provided by it.
Script allows configuring sample size, batch size, sampling queries from the
eval dataset.
Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth
data from the benchmark.
This was previously required, but now it's only usefuly for more
advanced settings, not typical for self-hosting users.
With recent updates, the user's selected chat model is used for both
Khoj's train of thought and response. This makes it easy to
switch your preferred chat model directly from the user settings
page and not have to update this in the admin panel as well.
Reflect these code changse in the docs, by removing the unnecessary
step for self-hosted users to create a server chat setting when using
an OpenAI proxy service like Ollama, LiteLLM etc.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of an agent.
The script uses Gemini as an LLM Judge to evaluate Khoj responses to
the FRAMES benchmark prompts against the ground truth provided by it.
- Improve chat actors and their prompts for research mode.
- Add documentation to enable the code tool when self-hosting Khoj
- Edit Chat Messages
- Store Turn Id in each chat message.
- Expose API to delete chat message.
- Expose delete chat message button to turn delete chat message from web app
- Set LLM Generation Seed for Reproducible Debugging and Testing
- Setting seed for LLM generation is supported by Llama.cpp and OpenAI models.
This can (somewhat) restrain LLM output
- Getting fixed responses for fixed inputs helps test, debug longer reasoning chains like used in advanced reasoning
## Overview
Khoj can now go into research mode and use a python code interpreter. These are experimental features that are being released early for feedback and testing.
- Research mode allows Khoj to dynamically select the tools it needs to best answer the question. It is also allowed more iterations to get to a satisfactory answer. Its more dynamic train of thought is shown to improve visibility into its thinking.
- Adding ability for Khoj to use a python code interpreter is an adjacent capability. It can help Khoj do some data analysis and generate charts for you. A sandboxed python to run code is provided using [cohere-terrarium](https://github.com/cohere-ai/cohere-terrarium?tab=readme-ov-file), [pyodide](https://pyodide.org/).
## Analysis
Research mode (significantly?) improves Khoj's information retrieval for more complex queries requiring multi-step lookups but takes longer to run. It can research for longer, requiring less back-n-forth with the user to find an answer.
Research mode gives most gains when used with more advanced chat models (like o1, 4o, new claude sonnet and gemini-pro-002). Smaller models improve their response quality but tend to get into repetitive loops more often.
## Next Steps
- Get community feedback on research mode. What works, what fails, what is confusing, what'd be cool to have.
- Tune Khoj's capabilities for longer autonomous runs and to generalize across a larger range of model sizes
## Miscellaneous Improvements
- Khoj's train of thought is saved and shown for all messages, not just the latest one
- Render charts generated by Khoj and code running using the code tool on the web app
- Align chat input color to currently selected agent color
- Dedent code for readability
- Use better name for in research mode check
- Continue to remove inferred summarize command when multiple files in
file filter even when not in research mode
- Continue to show select information source train of thought.
It was removed by mistake earlier
## Overview
Use git to capture prompt traces of khoj's train of thought. View, analyze and debug them using your favorite git client (e.g vscode, magit).
- Each commit captures an interaction with an LLM
The commit writes the query, response and system message each to a separate file in the repo.
The commit message captures the chat model, Khoj version and other metadata
- Each conversation turn can have multiple interactions with an LLM (e.g Khoj's train of thought)
- Each new conversation turn forks from and merges back into its conversation branch
- Each new conversation branches from the user branch
- Each new user branches from root commit on the main branch
## Usage
1. Set `KHOJ_DEBUG=true` or start khoj in very verbose mode with `khoj -vv` to turn on prompt tracing
2. Chat with Khoj as usual
3. Open the promptrace git repo to view the generated prompt traces using your favorite git porcelain.
The Khoj prompt trace git repo is created at `/tmp/khoj_promptrace` by default. You can configure the prompt trace directory by setting the `PROMPTRACE_DIR`environment variable.
## Implementation
- Add utility functions to capture prompt traces using git (via `gitpython`)
- Make each model provider in Khoj commit their LLM interactions with promptrace
- Weave chat metadata from chat API through all chat actors and commit it to the prompt trace
- Match the online query generator prompt to match the formatting of
extract questions
- Separate iteration results by newline
- Improve webpage and online tool descriptions
- Allow server to start if loading embedding model fails with an error.
This allows fixing the embedding model config via admin panel.
Previously server failed to start if embedding model was configured
incorrectly. This prevented fixing the model config via admin panel.
- Convert boolean string in config json to actual booleans when passed
via admin panel as json before passing to model, query configs
- Only create default model if no search model configured by admin.
Return first created search model if its been configured by admin.
Models were getting a bit confused about who is search for who's
information. Using third person to explicitly call out on who's behalf
these searches are running seems to perform better across
models (gemini's, gpt etc.), even if the role of the message is user.
Use placeholder for newline in json object values until json parsed
and values extracted. This is useful when research mode models outputs
multi-line codeblocks in queries etc.
Anthropic API doesn't have ability to enforce response with valid json
object, unlike all the other model types.
While the model will usually adhere to json output instructions.
This step is meant to more strongly encourage it to just output json
object when response_type of json_object is requested.
Separate conversation history with user from the conversation history
between the tool AIs and the researcher AI.
Tools AIs don't need top level conversation history, that context is
meant for the researcher AI.
The invoked tool AIs need previous attempts at using the tool in this
research runs iteration history to better tune their next run.
Or at least that is the hypothesis to break the models looping.
Models weren't generating a diverse enough set of questions. They'd do
minor variations on the original query. What is required is asking
queries from a bunch of different lenses to retrieve the requisite
information.
This prompt updates shows the AIs the breadth of questions to by
example and instruction. Seem like performance improved based on vibes
- Improve mobile friendliness with new research mode toggle, since chat input area is now taking up more space
- Remove clunky title from the suggestion card
- Fix fk lookup error for agent.creator
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Align context passed to offline chat model with other chat models
- Pass context in separate message for better separation between user
query and the shared context
- Pass filename in context
- Add online results for webpage conversation command
Context role was added to allow change message truncation order based
on context role as well.
Revert it for now since currently this is not currently being done.
Previously model would rarely read webpages after webpage search. Need
the model to webpages more regularly for deeper research and to stop
getting stuck in repetitive online search loops
Previous passing of online results as json dump in prompts was less
readable for humans, and I'm guessing less readable for
models (trained on human data) as well?
- Start from this branches src/khoj/routers/api_chat.py
Add tracer to all old and new chat actors that don't have it set
when they are called.
- Update the new chat actors like apick next tool etc to use tracer too
- Conflicts:
Combine both sides of the conflict in all 3 files below
- src/khoj/processor/conversation/utils.py
- src/khoj/routers/helpers.py
- src/khoj/utils/helpers.py
- Message train of thought forks and merges from its conversation branch
- Conversation branches from user branch
- User branches from root commit on the main branch
- Weave chat tracer metadata from api endpoint through all chat actors
and commit it to the prompt trace
### Major
- Give Vision to Anthropic models in Khoj
### Minor
- Reuse logic to format messages for chat with anthropic models
- Make the get image from url function more versatile and reusable
- Encourage output mode chat actor to output only json and nothing else
- Use a single standard search model across the server. There's diminishing benefits for having multiple user-customizable search models.
- We may want to add server-level customization for specific tasks
- Store the search model used to generate a given entry on the `Entry` object
- Remove user-facing APIs and view
- Add a management command for migrating the default search model on the server
In a future PR (after running the migration), we'll also remove the `UserSearchModelConfig`
Latest claude model wanted to say more than just give the json output.
The updated prompt encourages the model to ouput just json. This is
similar to what is already being done for other prompts
It was previously added under the google utils. Now it can be used by
other conversation processors as well.
The updated function
- can get both base64 encoded and PIL formatted images from url
- will return the media type of the image as well in response
* Create explicit flow to enable the free trial
The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive.
* Use the Subscription Type enum instead of hardcoded strings everywhere
* Use length of free trial in the frontend code as well
Had temporarily updated the default selected agent to last used.
Revert for now as
1. The previous logic was buggy. It didn't select the default agent
even when the last used agent was the default agent. Which would
require more work.
2. It maybe too early anyway to set the default agent to last used.
Adding div elements to message to render degraded text copied to
clipboard for messages with user uploaded images.
This change fixes that by separating message to render from message
for clipboard. It ensures differently formatted forms of the user
images are added to the two to allow proper rendering while still
having decently formatted text copied to clipboard
Add newline instead of sending message when hit Enter key on mobile
displays. As on phones shift key doesn't exist and send button is easily
clickable.
Limit hitting Enter key to send message to computers = larger display
= expected to have full fledged keyboards.
## Overview
Allow quickly selecting, switching agents from agents pane on home page of web app
## Details
- Show all agents in carousel on home screen agent pane of web app
- Smart Sort
1. Pin default agent as first for ease of access
2. Show used agents by MRU for ease of access
3. Shuffle unused agents for discoverability
- Select most recently used agent to chat with by default
- Push smart sort logic down to API
- Common logic can be reused across clients
- Agent sort was previously done in web app
- Focus on chat input on agent select
- Double click agent on home page to open edit agent card on agents page
## Overview
- Add vision support for Gemini models in Khoj
- Allow sharing multiple images as part of user query from the web app
- Handle multiple images shared in query to chat API
- Remove border from agent detail hover card on home page
- Do not wrap long agent names in agent pills on home page
- Handle scenario where chatInputRef is null
Add support for generating dynamic diagrams in flow with Excalidraw (https://github.com/excalidraw/excalidraw). This happens in three steps:
1. Default information collection & intent determination step.
2. Improving the overall guidance of the prompt for generating a JSON, Excalidraw-compatible declaration.
3. Generation of the diagram to output to the final UI.
Add support in the web UI.
Previously only notes context from chat history was included.
This change includes online context from chat history for model to use
for response generation.
This can reduce need for online lookups by reusing previous online
context for faster responses. But will increase overall response time
when not reusing past online context, as faster context buildup per
conversation.
Unsure if inclusion of context is preferrable. If not, both notes and
online context should be removed.
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Marking the context message with assistant role doesn't translate well
across chat models. E.g
- Gemini can't handle consecutive messages by role = model well
- Claude will merge consecutive messages by same role. In current
message ordering the context message will result get merged into the
previous assistant response. And if move context message after user
query. The truncation logic will have to hop and skip while doing
deletions
- GPT seems to handle consecutive roles of any type fine
Using context role = user generalizes better across chat models for
now and aligns with previous behavior.
Improve separation of note snippets and show its origin file in notes
prompt to have more readable, contextualized text shared with model.
Previously the references dict was being directly passed as a string.
The documents don't look well formatted and are less intelligible.
- Passing file path along with notes snippets will help contextualize
the notes better.
- Better formatting should help with making notes more readable by the
chat model.
- Double click on agent to open edit agent card
- Focus on chat input pane when agent selected/clicked
for quick, smooth agent switch and message flow
- Hover on agent to see agent detail card on non-mobile displays
- Use debounce to only show when hover on card for a bit
- Default to None for the input_tools and output_modes so that they can be managed in the admin panel
- Hold off on showing off all Public Agents until we have a better experience for user profiles etc.
Have get agents API return agents ordered intelligently
- Put the default agent first
- Sort used agents by most recently chatted with agent for ease of access
- Randomly shuffle the remaining unused agents for discoverability
This change wraps the agent pane in a scroll area with all agents shown.
It allows selecting an agent to chat with directly from the home
screen without breaking flow and having to jump to the agents page.
The previous flow was not convenient to quickly and consistently start
chat with one of your standard agents.
This was because a random subet of agents were shown on the home page.
To start chat with an agent not shown on home screen load you had to
open the agents page and initiate the conversation from there.
One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.
Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.
- Put the attached images display div inside the same parent div as
the text area
- Keep the attachment, microphone/send message buttons aligned with
the text area. So the attached images just show up at the top of the
text area but everything else stays at the same horizontal height as
before.
- This improves the UX by
- Ensuring that the attached images do not obscure the agents pane
above the chat input area
- The attached images visually look like they are inside the actual
input area, rather than floating above it. So the visual aligns
with the semantics
Previously the web app only expected a single image to be shared by
the user as part of their query.
This change allows sharing multiple images from the web app.
Closes#921
Previously Khoj could respond to a single shared image at a time.
This changes updates the chat API to accept multiple images shared by
the user and send it to the appropriate chat actors including the
openai response generation chat actor for getting an image aware
response
Khoj shouldn't refuse to respond to user if web lookups fail.
It should transparently mention that online search etc. failed.
But try respond as best as it can without those references
This change ensures a response to the users query is attempted even
when web info retrieval fails.
The huggingface endpoint can be flaky. Khoj shouldn't refuse to
respond to user if document search fails.
It should transparently mention that document lookup failed.
But try respond as best as it can without the document references
This changes provides graceful failover when inference endpoint
requests fail either when encoding query or reranking retrieved docs
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
Update regex to also include any links to code generated images that
aren't explicitly meant to be displayed inline. This allows folks to
download the image (unlike the fake link that doesn't work created by
model)
Previously Khoj would start answering the previous query. This maybe
because the prompt uses User for prompt in chat history but was using
Q for current user prompt.
Make webpages to read automatically on search_online configurable via
a argument.
Set it to default to 1, so other callers of the function
are unaffected.
But iterative chat director can still decide which, if
any, webpages to read based on the online search it performs
This change allows the iterative director to dive deeper into its
research as the data extracted contains relevant links from the webpage
Previous summarization prompt didn't extract relevant links from the
webpage which limited further explorations from webpages
Move construct_chat_history and ChatEvent enum into conversation.utils
and move send_message_to_model_wrapper to conversation.helper to
modularize code. And start thinning out the bloated routers.helper
- conversation.util components are shared functions that conversation
child packages can use.
- conversation.helper components can't be imported by conversation
packages but it can use these child packages
This division allows better modularity while avoiding circular
import dependencies
Create python code executing chat actor
- The chat actor generate python code within sandbox constraints
- Run the generated python code in the cohere terrarium, pyodide
based sandbox accessible at sandbox url
- Create a more dynamic reasoning agent that can evaluate information and understand what it doesn't know, making moves to get that information
- Lots of hacks and code that needs to be reversed later on before submission
2024-10-09 15:54:25 -07:00
425 changed files with 58849 additions and 27119 deletions
> To save time for both you and us, try follow these guidelines before submitting a new issue:
> 1. Check if there is an existing issue tracking your bug on our Github.
> 2. When unsure if your issue is an actual bug, first discuss it on a [Github discussion](https://github.com/khoj-ai/khoj/discussions/new?category=q-a) or the **#bugs** channel of our [Discord server](https://discord.gg/b6gUdpKr).
> These steps avoid opening issues which are duplicate or not actual bugs.
- type:"checkboxes"
id:"server"
attributes:
label:"Server"
description:"With which Khoj server are you experiencing the issue? (select at least one)"
options:
- label:"Cloud (https://app.khoj.dev)"
required:false
- label:"Self-Hosted Docker"
required:false
- label:"Self-Hosted Python package"
required:false
- label:"Self-Hosted source code"
required:false
validations:
required:true
- type:"checkboxes"
id:"clients"
attributes:
label:"Clients"
description:"With which Khoj client(s) are you experiencing the issue? (select at least one)"
options:
- label:"Web browser"
required:false
- label:"Desktop/mobile app"
required:false
- label:"Obsidian"
required:false
- label:"Emacs"
required:false
- label:"WhatsApp"
required:false
validations:
required:true
- type:"checkboxes"
id:"os"
attributes:
label:"OS"
description:"On which operating system do you experience the issue? (select at least one)"
options:
- label:"Windows"
required:false
- label:"macOS"
required:false
- label:"Linux"
required:false
- label:"Android"
required:false
- label:"iOS"
required:false
validations:
required:true
- type:"input"
id:"version"
attributes:
label:"Khoj version"
description:"Use `/help` command on the chat page to find the server version.\n
If using the cloud service, you can input, latest.\n
If self-hosting - please make sure to run the latest version of Khoj before reporting any issues as it may have already been fixed."
validations:
required:true
- type:"textarea"
id:"description"
attributes:
label:"Describe the bug"
description:"What is the problem? A clear and concise description of the bug."
validations:
required:true
- type:"textarea"
id:"current"
attributes:
label:"Current Behavior"
description:"What actually happened?\n\n
Please include full errors, uncaught exceptions, stack traces, screenshots and other relevant logs."
validations:
required:true
- type:"textarea"
id:"expected"
attributes:
label:"Expected Behavior"
description:"What did you expect to happen?"
validations:
required:true
- type:"textarea"
id:"reproduction"
attributes:
label:"Reproduction Steps"
description:"Detail the steps needed to reproduce the issue. This can include a self-contained, concise snippet of
code, if applicable.\n\n
For more complex issues, provide a link to a repository with the smallest sample that reproduces
the bug.\n
If the issue can be replicated without code, please provide a clear, step-by-step description of
the actions or conditions necessary to reproduce it. Any screenshots are also appreciated.\n
Avoid including business logic or unrelated details, as this makes diagnosis more difficult.\n\n
Whether it's a sequence of actions, code samples, or specific conditions, ensure that the steps
are clear enough to be easily followed and replicated."
validations:
required:true
- type:"textarea"
id:"workaround"
attributes:
label:"Possible Workaround"
description:"If you find any workaround for this problem - please, provide it here."
validations:
required:false
- type:"textarea"
id:"context"
attributes:
label:"Additional Information"
description:"Anything else that might be relevant for troubleshooting this bug.\n
Providing context helps us come up with a solution that is most useful in the real-world use case.\n
For example if self-hosting, provide environment details like OS versin, Docker version etc."
validations:
required:false
- type:"input"
id:"discussion_link"
attributes:
label:"Link to Discord or Github discussion"
description:"Provide a link to the first message of bug's discussion on Discord or Github.\n
This will help to keep history of why this bug exists."
description:"Use this template to request new feature or suggest an idea for Khoj"
labels:["upgrade"]
body:
- type:"markdown"
attributes:
value:|
> [!IMPORTANT]
> To save time for both you and us, try follow these guidelines before submitting a feature request:
> 1. Check if there is an existing feature request that is similar to your on our Github.
> 2. We encourage you to first discuss your idea on a [Github discussion](https://github.com/khoj-ai/khoj/discussions/categories/ideas) or the **#ideas** channel of our [Discord server](https://discord.gg/b6gUdpKr).
> This step helps in understanding the new feature and determining if it's can be implemented at all.
Only proceed with this report if your idea was approved after the GitHub/Discord discussion.
- type:"textarea"
id:"description"
attributes:
label:"Describe the feature"
description:"A clear and concise description of the feature you are proposing."
validations:
required:true
- type:"textarea"
id:"use-case"
attributes:
label:"Use Case"
description:"Why do you need this feature? Provide real world use cases, the more the better."
validations:
required:true
- type:"textarea"
id:"solution"
attributes:
label:"Proposed Solution"
description:"Suggest how to implement the new feature. Please include prototype/sketch/reference implementation."
validations:
required:false
- type:"textarea"
id:"additional_info"
attributes:
label:"Additional Information"
description:"Any additional information you would like to provide - links, screenshots, etc."
validations:
required:false
- type:"input"
id:"discussion_link"
attributes:
label:"Link to Discord or Github discussion"
description:"Provide a link to the first message of feature request's discussion on Discord or Github.\n
This will help to keep history of why this feature request exists."
* Meet 🌶️ **[Pipali](https://pipali.ai)** - our [open-source](https://github.com/khoj-ai/pipali) AI coworker that runs on your computer.
* [Read](https://blog.khoj.dev/posts/evaluate-khoj-quality/) about Khoj's excellent performance on modern retrieval and reasoning benchmarks.
***
## Overview
[Khoj](https://khoj.dev) is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini).
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini, deepseek).
- Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
- Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
- Create agents with custom knowledge, persona, chat model and tools to take on any role.
@@ -62,6 +68,26 @@ You can see the full feature list [here](https://docs.khoj.dev/category/features
To get started with self-hosting Khoj, [read the docs](https://docs.khoj.dev/get-started/setup).
## Enterprise
Khoj is available as a cloud service, on-premises, or as a hybrid solution. To learn more about Khoj Enterprise, [visit our website](https://khoj.dev/teams).
## Frequently Asked Questions (FAQ)
Q: Can I use Khoj without self-hosting?
Yes! You can use Khoj right away at [https://app.khoj.dev](https://app.khoj.dev) — no setup required.
Q: What kinds of documents can Khoj read?
Khoj supports a wide variety: PDFs, Markdown, Notion, Word docs, org-mode files, and more.
Q: How can I make my own agent?
Check out [this blog post](https://blog.khoj.dev/posts/create-agents-on-khoj/) for a step-by-step guide to custom agents.
For more questions, head over to our [Discord](https://discord.gg/BDgyabRM6e)!
## Contributors
Cheers to our awesome contributors! 🎉
@@ -72,5 +98,10 @@ Cheers to our awesome contributors! 🎉
Made with [contrib.rocks](https://contrib.rocks).
### Interested in Contributing?
Khoj is open source. It is sustained by the community and we’d love for you to join it! Whether you’re a coder, designer, writer, or enthusiast, there’s a place for you.
We are always looking for contributors to help us build new features, improve the project documentation, or fix bugs. If you're interested, please see our [Contributing Guidelines](https://docs.khoj.dev/contributing/development) and check out our [Contributors Project Board](https://github.com/orgs/khoj-ai/projects/4).
Why Contribute?
- Make an Impact: Help build, test and improve a tool used by thousands to boost productivity.
- Learn & Grow: Work on cutting-edge AI, LLMs, and semantic search technologies.
You can help us build new features, improve the project documentation, report issues and fix bugs. If you're a developer, please see our [Contributing Guidelines](https://docs.khoj.dev/contributing/development) and check out [good first issues](https://github.com/khoj-ai/khoj/contribute) to work on.
# Set KHOJ_OPERATOR_ENABLED=True in the server service environment variable to enable.
computer:
container_name:khoj-computer
image:ghcr.io/khoj-ai/khoj-computer:latest
# build:
# context: .
# dockerfile: computer.Dockerfile
ports:
- "5900:5900"
volumes:
- khoj_computer:/home/operator
server:
depends_on:
database:
condition:service_healthy
# Use the following line to use the latest version of khoj. Otherwise, it will build from source.
# Use the following line to use the latest version of khoj. Otherwise, it will build from source. Set this to ghcr.io/khoj-ai/khoj-cloud:latest if you want to use the prod image.
image:ghcr.io/khoj-ai/khoj:latest
# Uncomment the following line to build from source. This will take a few minutes. Comment the next two lines out if you want to use the offiicial image.
# Uncomment the following line to build from source. This will take a few minutes. Comment the next two lines out if you want to use the official image.
# build:
# context: .
ports:
@@ -29,10 +52,15 @@ services:
# change the port in the args in the build section,
# as well as the port in the command section to match
# uncomment line below to mount docker socket to allow khoj to use its computer.
# - /var/run/docker.sock:/var/run/docker.sock
# Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/
environment:
- POSTGRES_DB=postgres
@@ -44,23 +72,61 @@ services:
- KHOJ_DEBUG=False
- KHOJ_ADMIN_EMAIL=username@example.com
- KHOJ_ADMIN_PASSWORD=password
# Uncomment lines below to use chat models by each provider.
# Default URL of Terrarium, the default Python sandbox used by Khoj to run code. Its container is specified above
- KHOJ_TERRARIUM_URL=http://sandbox:8080
# Uncomment line below to have Khoj run code in remote E2B code sandbox instead of the self-hosted Terrarium sandbox above. Get your E2B API key from https://e2b.dev/.
# - E2B_API_KEY=your_e2b_api_key
# Default URL of SearxNG, the default web search engine used by Khoj. Its container is specified above
- KHOJ_SEARXNG_URL=http://search:8080
# Uncomment line below to use with Ollama running on your local machine at localhost:11434.
# Change URL to use with other OpenAI API compatible providers like VLLM, LMStudio, DeepInfra, DeepSeek etc.
> Describes the Khoj settings configurable via the admin panel
By default, your admin panel is available at `http://localhost:42110/server/admin/`. You can access the admin panel by logging in with your admin credentials (this would be your `KHOJ_ADMIN_EMAIL` and `KHOJ_ADMIN_PASSWORD`). The admin panel allows you to configure various settings for your Khoj server.
## App Settings
### Agents
Add all the agents you want to use for your different use-cases like Writer, Researcher, Therapist etc.
-`Personality`: This is a prompt to tell the chat model how to tune the personality of the agent.
-`Chat model`: The chat model to use for the agent.
-`Name`: The name of the agent. This field helps give the agent a unique identity across the app.
-`Avatar`: Url to the agents profile picture. It help give the agent a unique visual identity across the app.
-`Avatar`: Url to the agents profile picture. It helps give the agent a unique visual identity across the app.
-`Style color`, `Style icon`: These fields help give the agent a unique, visually identifiable identity across the app.
-`Slug`: This is the agent name to use in urls.
-`Public`: Check this if the agent is expected to be visible to all users on this Khoj server.
@@ -18,7 +20,7 @@ Add all the agents you want to use for your different use-cases like Writer, Res
### Chat Model Options
Add all the chat models you want to try, use and switch between for your different use-cases. For each chat model you add:
-`Chat model`: The name of an [OpenAI](https://platform.openai.com/docs/models), [Anthropic](https://docs.anthropic.com/en/docs/about-claude/models#model-names), [Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models) or [Offline](https://huggingface.co/models?pipeline_tag=text-generation&library=gguf) chat model.
-`Model type`: The chat model provider like `OpenAI`, `Offline`.
-`Model type`: The chat model provider like `OpenAI`, `Google`.
-`Vision enabled`: Set to `true` if your model supports vision. This is currently only supported for vision capable OpenAI models like `gpt-4o`
-`Max prompt size`, `Subscribed max prompt size`: These are optional fields. They are used to truncate the context to the maximum context size that can be passed to the model. This can help with accuracy and cost-saving.<br/>
-`Tokenizer`: This is an optional field. It is used to accurately count tokens and truncate context passed to the chat model to stay within the models max prompt size.
@@ -36,25 +38,24 @@ To add a server chat setting:
- The `Advanced` field doesn't need to be set when self-hosting. When unset, the `Default` chat model is used for all users and the intermediate steps.
### OpenAI Processor Conversation Configs
These settings configure chat model providers to be accessed over API.
The name of this setting is kind of a misnomer, we know, it'll hopefully be changed at some point.
For each chat model provider you [add](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add):
### AI Model API
These settings configure APIs to interact with AI models.
For each AI Model API you [add](http://localhost:42110/server/admin/database/aimodelapi/add):
-`Api key`: Set to your [OpenAI](https://platform.openai.com/api-keys), [Anthropic](https://console.anthropic.com/account/keys) or [Gemini](https://aistudio.google.com/app/apikey) API keys.
-`Name`: Give the configuration any friendly name like `OpenAI`, `Gemini`, `Anthropic`.
-`Api base url`: Set the API base URL. This is only relevant to set if you're using another OpenAI-compatible proxy server like [Ollama](/advanced/ollama) or [LMStudio](/advanced/lmstudio).


### Search Model Configs
Search models are used to generate vector embeddings of your documents for natural language search and chat. You can choose any [embeddings models on HuggingFace](https://huggingface.co/models?pipeline_tag=sentence-similarity) to try, use for your to create vector embeddings of your documents for natural language search and chat.
Search models are used to generate vector embeddings of your documents for natural language search and chat. You can choose any [embeddings models on HuggingFace](https://huggingface.co/models?pipeline_tag=sentence-similarity) to create vector embeddings of your documents for natural language search and chat.
<imgsrc="/img/example_search_model_admin_settings.png"alt="Example Search Model Settings"style={{width:500}}/>
### Text to Image Model Options
Add text to image generation models with these settings. Khoj currently supports text to image models available via OpenAI, Stability or Replicate API
-`api-key`: Set to your OpenAI, Stability or Replicate API key
Add text to image generation models with these settings. Khoj currently supports text to image models available via OpenAI, Google or Replicate API
-`api-key`: Set to your OpenAI, Google AI or Replicate API key
-`model`: Set the model name available over the selected model provider
-`model-type`: Set to the appropiate model provider
-`model-type`: Set to the appropriate model provider
-`openai-config`: For image generation models available via OpenAI (compatible) API you can set the appropriate OpenAI Processor Conversation Settings instead of specifying the `api-key` field above
### Speech to Text Model Options
@@ -63,6 +64,9 @@ Add speech to text models with these settings. Khoj currently only supports whis
### Voice Model Options
Add text to speech models with these settings. Khoj currently supports models from [ElevenLabs](https://elevenlabs.io/).
### Reflective Questions
This is a static list of starter question suggestions for each user. It is not currently used in any client app. It used to be shown on the web app home page. We may turn it into a dynamic list of starter questions personalized to each users, say based on their recent conversations or synced knowledge base.
## User Data
- Users, Entrys, Conversations, Subscriptions, Github configs, Notion configs, User search configs, User conversation configs, User voice configs
@@ -70,4 +74,4 @@ Add text to speech models with these settings. Khoj currently supports models fr
- Process Locks: Persistent Locks for Automations
- Client Applications:
Client applications allow you to setup third party applications that can query your Khoj server using a client application ID + secret. The secret would go in a bearer token.
Client applications allow you to setup third party applications that can query your Khoj server using a client application ID + secret. The secret would go in a bearer token.
This is only helpful for self-hosted users or teams. If you're using [Khoj Cloud](https://app.khoj.dev), both Magic Links and Google OAuth work.
:::
By default, most of the instructions for self-hosting Khoj assume a single user, and so the default configuration is to run in anonymous mode. However, if you want to enable authentication, you can do so either with with [Magic Links](#using-magic-links) or [Google OAuth](#using-google-oauth) as shown below. This can be helpful to make Khoj securely accessible to you and your team.
:::tip[Note]
Remove the `--anonymous-mode` flag from your khoj start up command or docker-compose file to enable authentication.
:::
## Using Magic Links
The most secure way to do this is to integrate with [Resend](https://resend.com) by setting up an account and adding an environment variable for `RESEND_API_KEY`. You can get your API key [here](https://resend.com/api-keys). This will allow you to automatically send sign-in links to users who want to log in.
It's still possible to use the magic links feature without Resend, but you'll need to manually send the magic links to users who want to log in.
## Manually sending magic links
1. The user will have to enter their email address in the login form.
They'll click `Send Magic Link`. Without the Resend API key, this will just create an unverified account for them in the backend
<imgsrc="/img/magic_link.png"alt="Magic link login form"width="400"/>
2. You can get their magic link using the admin panel
Go to the [admin panel](http://localhost:42110/server/admin/database/khojuser/). You'll see a list of users. Search for the user you want to send a magic link to. Tick the checkbox next to their row, and use the action drop down at the top to 'Get email login URL'. This will generate a magic link that you can send to the user, which will appear at the top of the admin interface.
| Get email login URL | Retrieved login URL |
|---------------------|---------------------|
| <imgsrc="/img/admin_get_emali_login.png"alt="Get user magic sign in link"width="400"/>| <imgsrc="/img/admin_successful_login_url.png"alt="Successfully retrieved a login URL"width="400"/>|
3. Send the magic link to the user. They can click on it to log in.
Once they click on the link, they'll automatically be logged in. They'll have to repeat this process for every new device they want to log in from, but they shouldn't have to repeat it on the same device.
A given magic link can only be used once. If the user tries to use it again, they'll be redirected to the login page to get a new magic link.
## Using Google OAuth
To set up your self-hosted Khoj with Google Auth, you need to create a project in the Google Cloud Console and enable the Google Auth API.
To implement this, you'll need to:
1. You must use the `python` package or build from source, because you'll need to install additional packages for the google auth libraries (`prod`). The syntax to install the right packages is
```
pip install khoj[prod]
```
2. [Create authorization credentials](https://developers.google.com/identity/sign-in/web/sign-in) for your application.
3. Open your [Google cloud console](https://console.developers.google.com/apis/credentials) and create a configuration like below for the relevant `OAuth 2.0 Client IDs` project:
4. Configure these environment variables: `GOOGLE_CLIENT_SECRET`, and `GOOGLE_CLIENT_ID`. You can find these values in the Google cloud console, in the same place where you configured the authorized origins and redirect URIs.
That's it! That should be all you have to do. Now, when you reload Khoj without `--anonymous-mode`, you should be able to use your Google account to sign in.
By default, most of the instructions for self-hosting Khoj assume a single user, and so the default configuration is to run in anonymous mode. However, if you want to enable authentication, you can do so either with with [Magic Links](#using-magic-links) or [Google OAuth](#using-google-oauth) as shown below. This can be helpful to make Khoj securely accessible to you and your team.
:::tip[Note]
Remove the `--anonymous-mode` flag from your khoj start up command or docker-compose file to enable authentication.
:::
## Using Magic Links
The most secure way to do this is to integrate with [Resend](https://resend.com).
1. Setup your account at https://resend.com
2. Set an environment variable for `RESEND_API_KEY`. You can get your API key [here](https://resend.com/api-keys).
3. Set an environment variable for `RESEND_EMAIL`. This is the email address that will show up in your `from` field when sending magic links.
This will allow you to automatically send sign-in links to users who want to log in.
It's still possible to use the magic links feature without Resend, but you'll need to manually send the magic links to users who want to log in.
## Manually sending magic links
1. The user will have to enter their email address in the login popup shown at http://localhost:42110/?v=app.
They'll click `Get Login Link`. Without the Resend API key, this will just create an unverified account for them in the backend
<img src="/img/magic_link.png" alt="Magic link login form" width="400"/>
2. You can get their magic link using the admin panel
Go to the [admin panel](http://localhost:42110/server/admin/database/khojuser/). You'll see a list of users. Search for the user you want to send a magic link to. Tick the checkbox next to their row, and use the action drop down at the top to 'Get email login URL'. This will generate a magic link that you can send to the user, which will appear at the top of the admin interface.
| Get email login URL | Retrieved login URL |
|---------------------|---------------------|
| <img src="/img/admin_get_emali_login.png" alt="Get user magic sign in link" width="400" />| <img src="/img/admin_successful_login_url.png" alt="Successfully retrieved a login URL" width="400" />|
3. Send the magic link to the user. They can click on it to log in.
Once they click on the link, they'll automatically be logged in. They'll have to repeat this process for every new device they want to log in from, but they shouldn't have to repeat it on the same device.
A given magic link can only be used once. If the user tries to use it again, they'll be redirected to the login page to get a new magic link.
## Using Google OAuth
For this method, you'll need to use the prod version of the Khoj package. You can install it as below:
<Tabs groupId="server" queryString>
<TabItem value="docker" label="Docker">
Update your `docker-compose.yml` to use the prod image
```bash
image: ghcr.io/khoj-ai/khoj-cloud:latest
```
</TabItem>
<TabItem value="pip" label="Pip">
```bash
pip install khoj[prod]
```
</TabItem>
</Tabs>
To set up your self-hosted Khoj with Google Auth, you need to create a project in the Google Cloud Console and enable the Google Auth API.
To implement this, you'll need to:
1. [Create authorization credentials](https://developers.google.com/identity/sign-in/web/sign-in) for your application.
2. Open your [Google cloud console](https://console.developers.google.com/apis/credentials) and create a configuration like below for the relevant `OAuth 2.0 Client IDs` project:
3. Configure these environment variables: `GOOGLE_CLIENT_SECRET`, and `GOOGLE_CLIENT_ID`. You can find these values in the Google cloud console, in the same place where you configured the authorized origins and redirect URIs.
That's it! That should be all you have to do. Now, when you reload Khoj without `--anonymous-mode`, you should be able to use your Google account to sign in.
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you can directly use any of the pre-configured AI models.
:::
Khoj can use Google's Gemini and Anthropic's Claude family of AI models from [Vertex AI](https://cloud.google.com/vertex-ai) on Google Cloud. Explore Anthropic and Gemini AI models available on Vertex AI's [Model Garden](https://console.cloud.google.com/vertex-ai/model-garden).
## Setup
1. Follow [these instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#before_you_begin) to use models on GCP Vertex AI.
3. Create a new [API Model API](http://localhost:42110/server/admin/database/aimodelapi/add) on your Khoj admin panel.
- **Name**: `Google Vertex` (or whatever friendly name you prefer).
- **Api Key**: `base64 encoded json keyfile` from step 2.
- **Api Base Url**: `https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`
- MODEL_GCP_REGION: A region the AI model is available in. For example `us-east5` works for [Claude](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#regions).
- YOUR_GCP_PROJECT_ID: Get your project id from the [Google cloud dashboard](https://console.cloud.google.com/home/dashboard)
4. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
- **Name**: `claude-3-7-sonnet@20250219`. Any Claude or Gemini model on Vertex's Model Garden should work.
- **Model Type**: `Anthropic` or `Google`
- **Ai Model API**: *the Google Vertex Ai Model API you created in step 3*
- **Max prompt size**: `60000` (replace with the max prompt size of your model)
- **Tokenizer**: *Do not set*
5. Select the chat model on [your settings page](http://localhost:42110/settings) and start a conversation.
## Troubleshooting & gcp AI Tips
- Permission Denied?
Ensure your service account has the `Vertex AI User` role and that the API is enabled in your GCP project.
- Region Errors?
Double-check that the model you're trying to use is supported in your selected region. Some Claude or Gemini models are restricted to specific zones like `us-east5` or `us-central1`.
- Prompt Size Limitations
The "Max prompt size" should align with the limits defined in the model documentation. Exceeding it can silently fail or truncate inputs.
- Testing the API Key
Before adding it to Khoj, you can verify that your key works by making a simple curl request to Vertex AI. This helps debug auth issues early.
- Use Environment Variables
For better security, consider using environment variables to manage sensitive keys and inject them at runtime during base64 encoding.
If you encounter any issues, the [Khoj Discord](https://discord.gg/BDgyabRM6e) is a great place to ask for help!
3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
- Name: `proxy-name`
- Api Key: `any string`
- Api Base Url: **URL of yourOpenaiProxy API**
4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
- Name: `llama3.1` (replace with the name of your local model)
Khoj does not work with LM Studio anymore. Khoj leverages [json mode](https://platform.openai.com/docs/guides/structured-outputs#json-mode) extensively but LMStudio's API seems to have dropped support for json mode. [1](https://x.com/lmstudio/status/1770135858709975547), [2](https://lmstudio.ai/docs/api/structured-output)
:::
:::info
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
:::
@@ -14,17 +18,14 @@ LM Studio can expose an [OpenAI API compatible server](https://lmstudio.ai/docs/
## Setup
1. Install [LM Studio](https://lmstudio.ai/) and download your preferred Chat Model
2. Go to the Server Tab on LM Studio, Select your preferred Chat Model and Click the green Start Server button
3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
- Name: `proxy-name`
- Api Key: `any string`
- Api Base Url: `http://localhost:1234/v1/` (default for LMStudio)
4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
- Name: `llama3.1` (replace with the name of your local model)
- Model Type: `Openai`
-Openai Config: `<the proxy config you created in step 3>`
- Max prompt size: `20000` (replace with the max prompt size of your model)
- Tokenizer: *Do not set for OpenAI, mistral, llama3 based models*
5.Create a new [Server Chat Setting](http://localhost:42110/server/admin/database/serverchatsettings/add/) on your Khoj admin panel
- Default model: `<name of chat model option you created in step 4>`
- Summarizer model: `<name of chat model option you created in step 4>`
6. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
3. Create a new [AI Model API](http://localhost:42110/server/admin/database/aimodelapi/add/) on your Khoj admin panel
-**Name**: `lmstudio`
-**Api Key**: `any string`
-**Api Base Url**: `http://localhost:1234/v1/` (default for LMStudio)
4. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
-**Name**: `llama3.1` (replace with the name of your local model)
-**Model Type**: `Openai`
-**Ai Model Api**: *the lmstudio Ai Model Api you created in step 3*
-**Max prompt size**: `20000` (replace with the max prompt size of your model)
-**Tokenizer**: *Do not set for OpenAI, mistral, llama3 based models*
5.Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
:::
:::info
Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
:::
Ollama allows you to run [many popular open-source LLMs](https://ollama.com/library) locally from your terminal.
For folks comfortable with the terminal, Ollama's terminal based flows can ease setup and management of chat models.
Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/ollama/blob/main/docs/openai.md#models). This makes it possible to use chat models from Ollama to create your personal AI agents with Khoj.
## Setup
1. Setup Ollama: https://ollama.com/
2. Start your preferred model with Ollama. For example,
```bash
ollama run llama3.1
```
3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
- Name: `ollama`
- Api Key: `any string`
- Api Base Url: `http://localhost:11434/v1/` (default for Ollama)
4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
- Name: `llama3.1` (replace with the name of your local model)
6. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
That's it! You should now be able to chat with your Ollama model from Khoj. If you want to add additional models running on Ollama, repeat step 6 for each model.
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you can use our first-party supported models.
:::
:::info
Khoj can directly run local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). The integration with Ollama is useful to run Khoj on Docker and have the chat models use your GPU or to try new models via CLI.
:::
Ollama allows you to run [many popular open-source LLMs](https://ollama.com/library) locally from your terminal.
For folks comfortable with the terminal, Ollama's terminal based flows can ease setup and management of chat models.
Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/ollama/blob/main/docs/openai.md#models). This makes it possible to use chat models from Ollama with Khoj.
## Setup
:::info
Restart your Khoj server after first run or update to the settings below to ensure all settings are applied correctly.
:::
<Tabs groupId="type" queryString>
<TabItem value="first-run" label="First Run">
<Tabs groupId="server" queryString>
<TabItem value="docker" label="Docker">
1. Setup Ollama: https://ollama.com/
2. Download your preferred chat model with Ollama. For example,
```bash
ollama pull llama3.1
```
3. Uncomment `OPENAI_BASE_URL` environment variable in your downloaded Khoj [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml#:~:text=OPENAI_BASE_URL)
4. Start Khoj docker for the first time to automatically integrate and load models from the Ollama running on your host machine
```bash
# run below command in the directory where you downloaded the Khoj docker-compose.yml
docker-compose up
```
</TabItem>
<TabItem value="pip" label="Pip">
1. Setup Ollama: https://ollama.com/
2. Download your preferred chat model with Ollama. For example,
```bash
ollama pull llama3.1
```
3. Set `OPENAI_BASE_URL` environment variable to `http://localhost:11434/v1/` in your shell before starting Khoj for the first time
This is only helpful for secure cross-device access to **self-hosted** Khoj. You **do not** need this if you're using [Khoj Cloud](https://app.khoj.dev).
:::
[Tailscale](https://tailscale.com) simplifies creating a private VPN using [Wireguard](https://www.wireguard.com/) and OAuth. So you can host and access services on your devices from anywhere.
The instructions below are one way to simply and securely access your self-hosted Khoj from your phone, laptop etc.
### Minimal Setup
1. Setup khoj on your preferred machine following the [standard steps](/get-started/setup)
2. Sign-up to [Tailscale](https://tailscale.com) and install the app on machines you want to access Khoj from. This usually includes your khoj server, your phone, laptop. Note the tailscale i.p of your khoj server.
3. Start khoj on your server by including the flag `--host <your_server_tailscale_ip>`
4. Open `http://<your_server_tailscale_ip>:42110` to access khoj from any device on your tailscale network!
### HTTPS Certificate
:::info
Tailscale uses Wireguard to encrypt and route traffic between your machines. So HTTPS isn't required with Tailscale for secure access. HTTPS with Tailscale is only useful for browsers to not complain about security and block certain features like clipboard access unless HTTPS is enabled.
:::
1. Enable [MagicDNS](https://tailscale.com/kb/1081/magicdns#enabling-magicdns) and [HTTPS](https://tailscale.com/kb/1153/enabling-https) toggle on your tailscale admin console [DNS](https://login.tailscale.com/admin/dns) page. Note your unique tailscale domain name (usually ends with .ts.net)
2. Create an https certificate for your Khoj server by running the following command:
```bash
# Assuming the server is named, `server` and your tailnet is `black-forest.ts.net`
# Note path of the .crt and .key files generated
tailscale cert server.black-forest.ts.net
```
3. Start khoj to be served via https on standard port
```bash
sudo KHOJ_DOMAIN=server.black-forest.ts.net \
khoj \
--sslcert /path/to/your/tailscale.crt \
--sslkey path/to/your/tailscale.key \
--host=server.black-forest.ts.net \
--port 443
```
4. You should now be able to access khoj on `https://server.black-forest.ts.net` from any device on your private tailscale network!
@@ -11,8 +11,8 @@ This is only helpful for self-hosted users. If you're using [Khoj Cloud](https:/
Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
:::
Khoj can use any OpenAI API compatible server including [Ollama](/advanced/ollama), [LMStudio](/advanced/lmstudio) and [LiteLLM](/advanced/litellm).
Configuring this allows you to use non-standard, open or commercial, local or hosted LLM models for Khoj
Khoj can use any OpenAI API compatible server including local providers like [Ollama](/advanced/ollama), [LMStudio](/advanced/lmstudio) and [LiteLLM](/advanced/litellm) and commercial providers like [HuggingFace](https://huggingface.co/docs/api-inference/tasks/chat-completion#using-the-api), [OpenRouter](https://openrouter.ai/docs/quick-start) etc.
Configuring this allows you to use non-standard, open or commercial, local or hosted LLM models for Khoj.
Combine them with Khoj can turn your favorite LLM into an AI agent. Allowing you to chat with your docs, find answers from the internet, build custom agents and run automations.
@@ -20,18 +20,15 @@ For specific integrations, see our [Ollama](/advanced/ollama), [LMStudio](/advan
## General Setup
1. Start your preferred OpenAI API compatible app
2. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
- Name: `any name`
- Api Key: `any string`
- Api Base Url: **URL of your Openai Proxy API**
3. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
- Name: `llama3` (replace with the name of your local model)
- Model Type: `Openai`
-Openai Config: `<the proxy config you created in step 2>`
- Max prompt size: `2000` (replace with the max prompt size of your model)
- Tokenizer: *Do not set for OpenAI, mistral, llama3 based models*
4.Create a new [Server Chat Setting](http://localhost:42110/server/admin/database/serverchatsettings/add/) on your Khoj admin panel
- Default model: `<name of chat model option you created in step 3>`
- Summarizer model: `<name of chat model option you created in step 3>`
5. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
1. Start your preferred OpenAI API compatible app locally or get API keys from commercial AI model providers.
3. Create a new [API Model API](http://localhost:42110/server/admin/database/aimodelapi/add) on your Khoj admin panel
-**Name**: `any name`
-**Api Key**: `any string`
-**Api Base Url**: *The URL of your Openai Compatible API*
3. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
-**Name**: `llama3` (replace with the name of your local model)
-**Model Type**: `Openai`
-**Ai Model Api**: *The AI Model API you created in step 2*
-**Max prompt size**: `2000` (replace with the max prompt size of your model)
-**Tokenizer**: *Do not set for OpenAI, mistral, llama3 based models*
4.Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
Text [+1 (848) 800 4242](https://wa.me/18488004242) or scan the QQ code below on your phone to chat with Khoj on WhatsApp.
Text [+1 (848) 800 4242](https://wa.me/18488004242) or scan the QR code below on your phone to chat with Khoj on WhatsApp.
Without any desktop clients, you can start chatting with Khoj on WhatsApp. Bear in mind you do need one of the desktop clients in order to share and sync your data with Khoj. The WhatsApp AI bot will work right away for answering generic queries and using Khoj in default mode.
In order to use Khoj on WhatsApp with your own data, you need to setup a Khoj Cloud account and connect your WhatsApp account to it. This is a one time setup and you can do it from the [Khoj Cloud config page](https://app.khoj.dev/settings).
If you hit usage limits for the WhatsApp bot, upgrade to [a paid plan](https://khoj.dev/pricing) on Khoj Cloud.
If you hit usage limits for the WhatsApp bot, upgrade to [a paid plan](https://khoj.dev/#pricing) on Khoj Cloud.
<imgsrc="https://khoj-web-bucket.s3.amazonaws.com/khojwhatsapp.png"alt="WhatsApp QR Code"width="300"height="300"/>
@@ -27,4 +27,4 @@ We have more commands under development, including `/share` to uploading documen
## Source Code
You can find all of the code for the WhatsApp bot in the the [flint repository](https://github.com/khoj-ai/flint). As all of our code, it is open source and you can contribute to it.
You can find all of the code for the WhatsApp bot in the [flint repository](https://github.com/khoj-ai/flint). As all of our code, it is open source and you can contribute to it.
Welcome to the development docs of Khoj! Thanks for you interesting in being a contributor ❤️. Open source contributors are a corner-store of the Khoj community. We welcome all contributions, big or small.
Welcome to the development docs of Khoj! Thanks for your interest in being a contributor ❤️. Open source contributors are a corner-store of the Khoj community. We welcome all contributions, big or small.
To get started with contributing, check out the official GitHub docs on [contributing to an open-source project](https://docs.github.com/en/get-started/exploring-projects-on-github/contributing-to-a-project).
@@ -30,7 +30,7 @@ git clone https://github.com/khoj-ai/khoj && cd khoj
You can optionally use `bun dev` to start a development server for the front-end which will be available at http://localhost:3000. This is especially useful if you're making changes to the front-end code, but not necessary for running Khoj. Note that streaming does not work on the dev server due to how it is handled with SSR in Next.js.
Always run `bun export` to test your front-end changes on http://localhost:42110 before creating a PR.
- Try reactivating the virtual environment and rerunning the `khoj` command.
- If it still doesn't work repeat the installation process.
2. Python Package Missing
- Use `pip install xxx` and try running the `khoj` command.
- Use `uv add xxx` and try running the `khoj` command.
3. Command `createdb` Not Recognized
- make sure path to postgres binaries is included in environment variables. It usually looks something like
```
@@ -179,18 +191,19 @@ In whichever clients you're using for testing, you'll need to update the server
## Validate
### Before Making Changes
1. Install Git Hooks for Validation
1. Install Khoj. Setup Git Hooks for Validation
```shell
./scripts/dev_setup.sh
```
- This ensures standard code formatting fixes and other checks run automatically on every commit and push
- Installs the khoj server and web app by default. Use the `--full` flag to install the Khoj Desktop and Obsidian client apps as well.
- Git hooks format code, check types automatically on every commit and push respectively.
- Note 1: If [pre-commit](https://pre-commit.com/#intro) didn't already get installed, [install it](https://pre-commit.com/#install) via `pip install pre-commit`
- Note 2: To run the pre-commit changes manually, use `pre-commit run --hook-stage manual --all` before creating PR
### Before Creating PR
:::tip[Note]
You should be in an active virtual environment for Khoj in order to run the unit tests and linter.
You should be in an active virtual environment for Khoj in order to run the unit tests and linter. The `dev_setup.sh` script wil automatically create and activate it for you.
:::
1. Ensure that you have a [Github Issue](https://github.com/khoj-ai/khoj/issues) that can be linked to the PR. If not, create one. Make sure you've tagged one of the maintainers to the issue. This will ensure that the maintainers are notified of the PR and can review it. It's best discuss the code design on an existing issue or Discord thread before creating a PR. This helps get your PR merged faster.
The Github integration allows you to index as many repositories as you want. It's currently default configured to index Issues, Commits, and all Markdown/Org files in each repository. For large repositories, this takes a fairly long time, but it works well for smaller projects.
:::warning[Unmaintained]
The Github integration is not maintained. We are considering deprecating it. It doesn't seem used by many folks and its cumbersome for us to maintain.
:::
The Github integration allows you to index as many repositories as you want. It's currently default configured to index all Markdown/Org/Text files in each repository. For large repositories, this takes a fairly long time, but it works well for smaller projects.
# Configure your settings
@@ -9,6 +13,6 @@ The Github integration allows you to index as many repositories as you want. It'
## Use the Github plugin
1. Generate a [classic PAT (personal access token)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) from [Github](https://github.com/settings/tokens) with `repo` and `admin:org` scopes at least.
2. Navigate to [https://app.khoj.dev/settings#github](https://app.khoj.dev/settings#github) to configure your Github settings. Enter in your PAT, along with details for each repository you want to index.
2. Navigate to [https://app.khoj.dev/settings#github](https://app.khoj.dev/settings/content/github) to configure your Github settings. Enter in your PAT, along with details for each repository you want to index.
3. Click `Save`. Go back to the settings page and click `Configure`.
4. Go to [https://app.khoj.dev/](https://app.khoj.dev/) and start searching!
@@ -16,4 +16,4 @@ Go to https://app.khoj.dev/settings to connect your Notion workspace(s) to Khoj.
4. In the first step, you generated an API key. Use the newly generated API Key in your Khoj settings, by default at [http://localhost:42110/settings#notion](http://localhost:42110/settings#notion). Click `Save`.
5. Click `Configure` in http://localhost:42110/settings to index your Notion workspace(s).
That's it! You should be ready to start searching and chatting. Make sure you've configured your [chat settings](/get-started/setup#2-configure).
That's it! You should be ready to start searching and chatting. Make sure you've configured your [chat settings](/get-started/setup#use-khoj).
There are several ways you can get started with sharing your data with the Khoj AI.
- Drag and drop your documents via [the web UI](/clients/web/#upload-documents). This is best if you have a one-off document you need to interact with.
- You can use the [search page](https://app.khoj.dev/search) to upload documents that become indexed & searchable by your second brain. Click on "Add Documents" to get started.
- Use the desktop app to [upload and sync your documents](/clients/desktop). This is best if you have a lot of documents on your computer or you need the docs to stay in sync.
- Setup the sync options for either [Obsidian](/clients/obsidian) or [Emacs](/clients/emacs) to automatically sync your documents with Khoj. This is best if you are already using these tools and want to leverage Khoj's AI capabilities.
- Configure your [Notion](/data-sources/notion_integration) or [Github](/data-sources/github_integration) to sync with Khoj. By providing your credentials, you can keep the data synced in the background.
@@ -17,7 +17,7 @@ Khoj supports a variety of features, including search and chat with a wide range
- **Works online or offline**: Chat using online or offline AI chat models
#### General
- **Cloud or Self-Host**: Use [cloud](https://app.khoj.dev/login) to use Khoj anytime from anywhere or [self-host](/get-started/setup) for privacy
- **Cloud or Self-Host**: Use [cloud](https://app.khoj.dev) to use Khoj anytime from anywhere or [self-host](/get-started/setup) for privacy
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Index your Org-mode, Markdown, PDF, plaintext files, Github repos and Notion pages
@@ -29,7 +29,7 @@ Khoj is available as a [Desktop app](/clients/desktop), [Emacs package](/clients

### Supported Data Sources
Khoj can understand your word, PDF, org-mode, markdown, plaintext files, [Github projects](/data-sources/github_integration) and [Notion pages](/data-sources/notion_integration).
Khoj can understand your word, PDF, org-mode, markdown, plaintext files, and [Notion pages](/data-sources/notion_integration).
@@ -22,7 +22,7 @@ See [the setup guide](/get-started/setup.mdx) to configure your chat models.
- **On Web**: Open [/chat](https://app.khoj.dev/chat) in your web browser
- **On Obsidian**: Search for *Khoj: Chat* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>`
2. Enter your queries to chat with Khoj. Use [slash commands](#commands) and [query filters](/miscellaneous/advanced#query-filters) to change what Khoj uses to respond
2. Enter your queries to chat with Khoj. Use [slash commands](#commands) and [query filters](/miscellaneous/query-filters) to change what Khoj uses to respond
#### Details
@@ -30,7 +30,7 @@ See [the setup guide](/get-started/setup.mdx) to configure your chat models.
2. These notes, the last few messages and associated metadata is passed to the enabled chat model along with your query to generate a response
#### Conversation File Filters
You can use conversation file filters to limit the notes used in the chat response. To do so, use the left panel in the web UI. Alternatively, you can also use [query filters](/miscellaneous/advanced#query-filters) to limit the notes used in the chat response.
You can use conversation file filters to limit the notes used in the chat response. To do so, use the left panel in the web UI. Alternatively, you can also use [query filters](/miscellaneous/query-filters) to limit the notes used in the chat response.
Khoj can generate and run simple Python code as well. This is useful if you want to have Khoj do some data analysis, generate plots and reports. LLMs by default aren't skilled at complex quantitative tasks. Code generation & execution can come in handy for such tasks.
Khoj automatically infers when to use the code tool. You can also tell it explicitly to use the code tool or use the `/code` [slash command](https://docs.khoj.dev/features/chat/#commands) in your chat.
## Setup (Self-Hosting)
### Terrarium Sandbox
Use [Cohere's Terrarium](https://github.com/cohere-ai/cohere-terrarium) to host the code sandbox locally on your machine for free.
To run with Docker, use our [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to automatically setup the Terrarium code sandbox, or start it manually like this:
```bash
docker pull ghcr.io/khoj-ai/terrarium:latest
docker run -d -p 8080:8080 ghcr.io/khoj-ai/terrarium:latest
```
To run from source, check [these instructions](https://github.com/khoj-ai/cohere-terrarium?tab=readme-ov-file#development).
#### Verify
Verify that it's running, by evaluating a simple Python expression:
```bash
curl -X POST -H "Content-Type: application/json"\
--url http://localhost:8080 \
--data-raw '{"code": "1 + 1"}'\
--no-buffer
```
### E2B Sandbox
[E2B](https://e2b.dev/) allows Khoj to run code on a remote but versatile sandbox with support for more python libraries. This is [not free](https://e2b.dev/pricing).
To have Khoj use E2B as the code sandbox:
1. Generate an API key on [their dashboard](https://e2b.dev/dashboard).
2. Set the `E2B_API_KEY` environment variable to it on the machine running your Khoj server.
- When using our [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml), uncomment and set the `E2B_API_KEY` env var in the `docker-compose.yml` file.
3. Now restart your Khoj server to switch to using the E2B code sandbox.
@@ -10,6 +10,17 @@ To generate images, you just need to provide a prompt to Khoj in which the image
## Setup (Self-Hosting)
Right now, we only support integration with OpenAI's DALL-E. You need to have an OpenAI API key to use this feature. Here's how you can set it up:
1. Setup your OpenAI API key. See instructions [here](/get-started/setup#2-configure)
2. Create a text to image config at http://localhost:42110/server/admin/database/texttoimagemodelconfig/. We recommend the value `dall-e-3`.
You have a couple of image generation options.
### Image Generation Models
We support most state of the art image generation models, including Ideogram, Flux, and Stable Diffusion. These will run using [Replicate](https://replicate.com). Here's how to set them up:
1. Get a Replicate API key [here](https://replicate.com/account/api-tokens).
2. Create a new [Text to Image Model](http://localhost:42110/server/admin/database/texttoimagemodelconfig/). Set the `type` to `Replicate`. Use any of the model names you see [on this list](https://replicate.com/pricing#image-models). We recommend the `model name``black-forest-labs/flux-1.1-pro` from [Replicate](https://replicate.com/black-forest-labs/flux-1.1-pro).
### OpenAI
1. Get [an OpenAI API key](https://platform.openai.com/settings/organization/api-keys).
2. Setup your OpenAI API key, if you haven't already. See instructions [here](/get-started/setup#add-chat-models)
3. Create a text to image config at http://localhost:42110/server/admin/database/texttoimagemodelconfig/. Use `model name``dall-e-3` to use openai for image generation. Make sure to set the `Ai model api` field to the OpenAI AI model api you setup in step 2.
@@ -14,8 +14,19 @@ Try it out yourself! https://app.khoj.dev
## Self-Hosting
Online search works out of the box even when self-hosting. Khoj uses [JinaAI's reader API](https://jina.ai/reader/) to search online and read webpages by default. No API key setup is necessary.
### Search
To improve online search, set the `SERPER_DEV_API_KEY` environment variable to your [Serper.dev](https://serper.dev/) API key. These search results include additional context like answer box, knowledge graph etc.
Online search can work even with self-hosting! You have a few options:
For advanced webpage reading, set the `OLOSTEP_API_KEY` environment variable to your [Olostep](https://www.olostep.com/) API key. This has a higher success rate at reading webpages than the default webpage reader.
- If you're using Docker, online search should work out of the box with [searxng](https://github.com/searxng/searxng) using our standard `docker-compose.yml`.
- To get production-grade, fast online search, set the `SERPER_DEV_API_KEY` environment variable to your [Serper.dev](https://serper.dev/) API key. These search results include additional context like answer box, knowledge graph etc.
- To use open, self-hostable search provider, set the `FIRECRAWL_API_KEY` environment variable to your [Firecrawl](https://firecrawl.dev) API key. These search results do not scrape social media results.
- To use Exa search provider, set the `EXA_API_KEY` environment variable to your [Exa](https://exa.ai) API key.
### Webpage Reading
Out of the box, you **don't have to do anything to enable webpage reading**. Khoj will automatically read webpages by using the `requests` library. To get faster, more readable webpages for Khoj, you can use the following options:
- For open, self-hostable webpage reader, you can use [Firecrawl](https://www.firecrawl.dev/). Create a new [Webscraper](http://localhost:42110/server/admin/database/webscraper/add/). Set your Firecrawl API key to the Api Key field, and set the type to Firecrawl.
- For advanced webpage reading, you can use [Olostep](https://www.olostep.com/). This can read a wider variety of webpages. Create a new [Webscraper](http://localhost:42110/server/admin/database/webscraper/add/). Set your Olostep API key to the Api Key field, and set the type to Olostep.
- For fast webpage reading, you can use [Exa](https://exa.ai). Create a new [Webscraper](http://localhost:42110/server/admin/database/webscraper/add/). Set your Exa API key to the Api Key field, and set the type to Exa.
@@ -11,7 +11,41 @@ Take advantage of super fast search to find relevant notes and documents from yo
- **On Web**: Open https://app.khoj.dev/ in your web browser
- **On Obsidian**: Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>`
2. Query using natural language to find relevant entries from your knowledge base. Use [query filters](/miscellaneous/advanced#query-filters) to limit entries to search
2. Query using natural language to find relevant entries from your knowledge base. Use [query filters](/miscellaneous/query-filters) to limit entries to search
A bi-encoder models is used to create meaning vectors (aka vector embeddings) of your documents and search queries.
1. When you sync you documents with Khoj, it uses the bi-encoder model to create and store meaning vectors of (chunks of) your documents
2. When you initiate a natural language search the bi-encoder model converts your query into a meaning vector and finds the most relevant document chunks for that query by comparing their meaning vectors.
3. The slower but higher-quality cross-encoder model is than used to re-rank these documents for your given query.
### Setup (Self-Hosting)
You are **not required** to configure the search model config when self-hosting. Khoj sets up decent default local search model config for general use.
You may want to configure this if you need better multi-lingual search, want to experiment with different, newer models or the default models do not work for your use-case.
You can use bi-encoder models downloaded locally [from Huggingface](https://huggingface.co/models?library=sentence-transformers), served via the [HuggingFace Inference API](https://endpoints.huggingface.co/), OpenAI API, Azure OpenAI API or any OpenAI Compatible API like Ollama, LiteLLM etc. Follow the steps below to configure your search model:
1. Open the [SearchModelConfig](http://localhost:42110/server/admin/database/searchmodelconfig/) page on your Khoj admin panel.
2. Hit the Plus button to add a new model config or click the id of an existing model config to edit it.
3. Set the `biencoder` field to the name of the bi-encoder model supported [locally](https://huggingface.co/models?library=sentence-transformers) or via the API you configure.
4. Set the `Embeddings inference endpoint api key` to your OpenAI API key and `Embeddings inference endpoint type` to `OpenAI` to use an OpenAI embedding model.
5. Also set the `Embeddings inference endpoint` to your Azure OpenAI or OpenAI compatible API URL to use the model via those APIs.
6. Ensure the search model config you want to use is the **only one** that has `name` field set to `default`[^1].
7. Save the search model configs and restart your Khoj server to start using your new, updated search config.
:::info
You will need to re-index all your documents if you want to use a different bi-encoder model.
:::
:::info
You may need to tune the `Bi encoder confidence threshold` field for each bi-encoder to get appropriate number of documents for chat with your Knowledge base.
Confidence here is a normalized measure of semantic distance between your query and documents. The confidence threshold limits the documents returned to chat that fall within the distance specified in this field. It can take values between 0.0 (exact overlap) and 1.0 (no meaning overlap).
:::
[^1]: Khoj uses the first search model config named `default` it finds on startup as the search model config for that session
@@ -17,11 +17,11 @@ You can also click on the speaker icon next to any message to hear it out loud.
Voice chat will automatically be configured when you initialize the application. The default configuration will run locally. If you want to use the OpenAI whisper API for voice chat, you can set it up by following these steps:
1. Setup your OpenAI API key. See instructions [here](/get-started/setup#2-configure).
1. Setup your OpenAI API key. See instructions [here](/get-started/setup#add-chat-models).
2. Create a new configuration at http://localhost:42110/server/admin/database/speechtotextmodeloptions/. We recommend the value `whisper-1` and model type `Openai`.
If you want to use the Text to Speech feature, you can set it up by following these steps:
1. Setup your account on [ElevenLabs.io](https://elevenlabs.io/).
2. Configure your API key in your environment variables with the key `ELEVEN_LABS_API_KEY`.
2. (Optional) Create a new [Voice model option](http://localhost:42110/server/admin/database/voicemodeloption/) with a specific voice ID from whichever voice you want to use. You can explore the options [here](https://elevenlabs.io/app/voice-library).
3. (Optional) Create a new [Voice model option](http://localhost:42110/server/admin/database/voicemodeloption/) with a specific voice ID from whichever voice you want to use. You can explore the options [here](https://elevenlabs.io/app/voice-library).
@@ -29,9 +26,9 @@ Welcome to the Khoj Docs! This is the best place to get setup and explore Khoj's
- Khoj is an open source, personal AI
- You can [chat](/features/chat) with it about anything. It'll use files you shared with it to respond, when relevant. It can also access information from the public internet.
- Quickly [find](/features/search) relevant notes and documents using natural language
- It understands pdf, plaintext, markdown, org-mode files, [notion pages](/data-sources/notion_integration) and [github repositories](/data-sources/github_integration)
- It understands pdf, plaintext, markdown, org-mode files, and [notion pages](/data-sources/notion_integration).
- Access it from your [Emacs](/clients/emacs), [Obsidian](/clients/obsidian), the [Khoj desktop app](/clients/desktop), or [any web browser](/clients/web)
- Use our [cloud](https://app.khoj.dev/login) instance to access your Khoj anytime from anywhere, [self-host](/get-started/setup) on consumer hardware for privacy
- Use our [cloud](https://app.khoj.dev) instance to access your Khoj anytime from anywhere, [self-host](/get-started/setup) on consumer hardware for privacy
@@ -17,7 +17,7 @@ Here's what to consider if you're using Khoj, whether self-hosted or on our clou
- If you're self-hosting, you can opt out of telemetry by following [these instructions](/miscellaneous/telemetry).
Self-hosting isn't for everyone, so we've still taken steps to make Khoj privacy-friendly, even if you choose to use our [cloud offering](https://app.khoj.dev/login). Here's what to consider when using Khoj Cloud:
Self-hosting isn't for everyone, so we've still taken steps to make Khoj privacy-friendly, even if you choose to use our [cloud offering](https://app.khoj.dev). Here's what to consider when using Khoj Cloud:
1. Your embeddings are generated by an open source model within our own dedicated endpoint [hosted on AWS with Huggingface](https://huggingface.co/inference-endpoints/dedicated). There's zero persistent memory to the Huggingface Inference endpoints (it's stateless).
1. Your embeddings and the associated raw text are stored in a secure Postgres DB in our private AWS cloud. Your data is sharded on a unique user ID. We store the raw text in your files to improve file syncing and provide context when you chat with Khoj.
1. When you use the single-sign-on option with Google, we only receive your name, a link to your profile photo, and your email address.
@@ -18,8 +18,8 @@ import TabItem from '@theme/TabItem';
These are the general setup instructions for self-hosted Khoj.
You can install the Khoj server using either [Docker](?server=docker) or [Pip](?server=pip).
:::info[Offline Model + GPU]
If you want to use the offline chat model and you have a GPU, you should use Installation Option 2 - local setup via the Python package directly. Our Docker image doesn't currently support running the offline chat model on GPU, making inference times really slow.
:::info[First Run]
Restart your Khoj server after the first run to ensure all settings are applied correctly.
:::
<Tabs groupId="server" queryString>
@@ -27,7 +27,29 @@ If you want to use the offline chat model and you have a GPU, you should use Ins
- *Option 1*: Click here to install [Docker Desktop](https://docs.docker.com/desktop/install/mac-install/). Make sure you also install the [Docker Compose](https://docs.docker.com/desktop/install/mac-install/) tool.
- *Option 2*: Use [Homebrew](https://brew.sh/) to install Docker and Docker Compose.
```shell
brew install --cask docker
brew install docker-compose
```
<h3>Setup</h3>
1. Download the Khoj docker-compose.yml file [from Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml)
2. Configure the environment variables in the `docker-compose.yml`
- Set `KHOJ_ADMIN_PASSWORD`, `KHOJ_DJANGO_SECRET_KEY` (and optionally the `KHOJ_ADMIN_EMAIL`) to something secure. This allows you to customize Khoj later via the admin panel.
- Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` to your API key if you want to use OpenAI, Anthropic or Gemini commercial chat models respectively.
- Uncomment `OPENAI_BASE_URL` to use [Ollama](/advanced/ollama?type=first-run&server=docker#setup) running on your host machine. Or set it to the URL of your OpenAI compatible API like vLLM or [LMStudio](/advanced/lmstudio).
3. Start Khoj by running the following command in the same directory as your docker-compose.yml file.
```shell
cd ~/.khoj
docker-compose up
```
</TabItem>
<TabItem value="windows" label="Windows">
<h3>Prerequisites</h3>
@@ -37,86 +59,59 @@ If you want to use the offline chat model and you have a GPU, you should use Ins
wsl --install
```
2. Install [Docker Desktop](https://docs.docker.com/desktop/install/windows-install/) with **[WSL2 backend](https://docs.docker.com/desktop/wsl/#turn-on-docker-desktop-wsl-2)** (default)
<h3>Setup</h3>
1. Download the Khoj docker-compose.yml file [from Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml)
```shell
# Windows users should use their WSL2 terminal to run these commands
2. Configure the environment variables in the `docker-compose.yml`
- Set `KHOJ_ADMIN_PASSWORD`, `KHOJ_DJANGO_SECRET_KEY` (and optionally the `KHOJ_ADMIN_EMAIL`) to something secure. This allows you to customize Khoj later via the admin panel.
- Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` to your API key if you want to use OpenAI, Anthropic or Gemini commercial chat models respectively.
- Uncomment `OPENAI_BASE_URL` to use [Ollama](/advanced/ollama) running on your host machine. Or set it to the URL of your OpenAI compatible API like vLLM or [LMStudio](/advanced/lmstudio).
3. Start Khoj by running the following command in the same directory as your docker-compose.yml file.
```shell
# Windows users should use their WSL2 terminal to run these commands
2. Configure the environment variables in the `docker-compose.yml`
- Set `KHOJ_ADMIN_PASSWORD`, `KHOJ_DJANGO_SECRET_KEY` (and optionally the `KHOJ_ADMIN_EMAIL`) to something secure. This allows you to customize Khoj later via the admin panel.
- Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` to your API key if you want to use OpenAI, Anthropic or Gemini commercial chat models respectively.
- Uncomment `OPENAI_BASE_URL` to use [Ollama](/advanced/ollama) running on your host machine. Or set it to the URL of your OpenAI compatible API like vLLM or [LMStudio](/advanced/lmstudio).
3. Start Khoj by running the following command in the same directory as your docker-compose.yml file.
```shell
cd ~/.khoj
docker-compose up
```
</TabItem>
</Tabs>
<h3>Setup</h3>
1. Download the Khoj docker-compose.yml file [from Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml)
```shell
# Windows users should use their WSL2 terminal to run these commands
2. Configure the environment variables in the docker-compose.yml
- Set `KHOJ_ADMIN_PASSWORD`, `KHOJ_DJANGO_SECRET_KEY` (and optionally the `KHOJ_ADMIN_EMAIL`) to something secure. This allows you to customize Khoj later via the admin panel.
- Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` to your API key if you want to use OpenAI, Anthropic or Gemini chat models respectively.
3. Start Khoj by running the following command in the same directory as your docker-compose.yml file.
```shell
# Windows users should use their WSL2 terminal to run these commands
cd ~/.khoj
docker-compose up
```
:::info[Remote Access]
By default Khoj is only accessible on the machine it is running. To access Khoj from a remote machine see [Remote Access Docs](/advanced/remote).
:::
Your setup is complete once you see `🌖 Khoj is ready to use` in the server logs on your terminal.
Your setup is complete once you see `🌖 Khoj is ready to engage` in the server logs on your terminal.
</TabItem>
<TabItem value="pip" label="Pip">
<h3>1. Install Postgres (with PgVector)</h3>
Khoj uses Postgres DB for all server configuration and to scale to multi-user setups. It uses the pgvector package in Postgres to manage your document embeddings. Both Postgres and pgvector need to be installed for Khoj to work.
Install [Postgres.app](https://postgresapp.com/). This comes pre-installed with `pgvector` and relevant dependencies.
</TabItem>
<TabItem value="windows" label="Windows">
For detailed instructions and troubleshooting, see [this section](/contributing/development#2-postgres-installation--setup).
1. Use the [recommended installer](https://www.postgresql.org/download/windows/).
2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#windows) in case you need to manually install it. Windows support is experimental for pgvector currently, so we recommend using Docker.
</TabItem>
<TabItem value="linux" label="Linux">
From [official instructions](https://wiki.postgresql.org/wiki/Apt)
</TabItem>
<TabItem value="source" label="From Source">
1. Follow instructions to [Install Postgres](https://www.postgresql.org/download/)
2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#installation) in case you need to manually install it.
Make sure to update the `POSTGRES_HOST`, `POSTGRES_PORT`, `POSTGRES_USER`, `POSTGRES_DB` or `POSTGRES_PASSWORD` environment variables to match any customizations in your Postgres configuration.
:::
<h3>3. Install Khoj Server</h3>
<h3>1. Install Khoj Server</h3>
- Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine
- Check [llama-cpp-python setup](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#supported-backends) if you hit any llama-cpp issues with the installation
@@ -127,12 +122,12 @@ Run the following command in your terminal to install the Khoj server.
Run the following command from your terminal to start the Khoj service.
```shell
khoj --anonymous-mode
USE_EMBEDDED_DB="true" khoj --anonymous-mode
```
`--anonymous-mode` allows access to Khoj without requiring login. This is usually fine for local only, single user setups. If you need authentication follow the [authentication setup docs](/advanced/authentication).
<h4>First Run</h4>
On the first run of the above command, you will be prompted to:
1. Create an admin account with a email and secure password
1. Create an admin account with an email and secure password
2. Customize the chat models to enable
- Keep your [OpenAI](https://platform.openai.com/api-keys), [Anthropic](https://console.anthropic.com/account/keys), [Gemini](https://aistudio.google.com/app/apikey) API keys and [OpenAI](https://platform.openai.com/docs/models), [Anthropic](https://docs.anthropic.com/en/docs/about-claude/models#model-names), [Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models), [Offline](https://huggingface.co/models?pipeline_tag=text-generation&library=gguf) chat model names handy to set any of them up during first run.
3. Your setup is complete once you see `🌖 Khoj is ready to use` in the server logs on your terminal!
3. Your setup is complete once you see `🌖 Khoj is ready to engage` in the server logs on your terminal!
:::tip[Auto Start]
@@ -226,10 +221,6 @@ To start Khoj automatically in the background use [Task scheduler](https://www.w
You can now open the web app at http://localhost:42110 and start interacting!<br />
Nothing else is necessary, but you can customize your setup further by following the steps below.
:::info[First Message to Offline Chat Model]
The offline chat model gets downloaded when you first send a message to it. The download can take a few minutes! Subsequent messages should be faster.
:::
### Add Chat Models
<h4>Login to the Khoj Admin Panel</h4>
Go to http://localhost:42110/server/admin and login with the admin credentials you setup during installation.
@@ -238,9 +229,14 @@ Go to http://localhost:42110/server/admin and login with the admin credentials y
Ensure you are using **localhost, not 127.0.0.1**, to access the admin panel to avoid the CSRF error.
:::
:::info[CSRF Trusted Origin or Unset Cookie Error]
If using a load balancer/reverse_proxy in front of your Khoj server: Set the environment variable KHOJ_ALLOWED_DOMAIN=your-internal-ip-or-domain to avoid this error.
If unset, it defaults to KHOJ_DOMAIN.
:::
:::info[DISALLOWED HOST or Bad Request (400) Error]
You may hit this if you try access Khoj exposed on a custom domain (e.g. 192.168.12.3 or example.com) or over HTTP.
Set the environment variables KHOJ_DOMAIN=your-domain and KHOJ_NO_HTTPS=True if required to avoid this error.
Set the environment variables KHOJ_DOMAIN=your-external-ip-or-domain and KHOJ_NO_HTTPS=True if required to avoid this error.
:::
:::tip[Note]
@@ -257,12 +253,12 @@ Setup which chat model you'd want to use. Khoj supports local and online chat mo
Using Ollama? See the [Ollama Integration](/advanced/ollama) section for more custom setup instructions.
:::
1. Create a new [OpenAI processor conversation config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) in the server admin settings. This is kind of a misnomer, we know.
1. Create a new [AI Model Api](http://localhost:42110/server/admin/database/aimodelapi/add) in the server admin settings.
- Add your [OpenAI API key](https://platform.openai.com/api-keys)
- Give the configuration a friendly name like `OpenAI`
- (Optional) Set the API base URL. It is only relevant if you're using another OpenAI-compatible proxy server like [Ollama](/advanced/ollama) or [LMStudio](/advanced/lmstudio).<br />

2. Create a new [chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/add)

2. Create a new [chat model](http://localhost:42110/server/admin/database/chatmodel/add)
- Set the `chat-model` field to an [OpenAI chat model](https://platform.openai.com/docs/models). Example: `gpt-4o`.
- Make sure to set the `model-type` field to `OpenAI`.
- If your model supports vision, set the `vision enabled` field to `true`. This is currently only supported for OpenAI models with vision capabilities.
@@ -270,22 +266,22 @@ Using Ollama? See the [Ollama Integration](/advanced/ollama) section for more cu

</TabItem>
<TabItem value="anthropic" label="Anthropic">
1. Create a new [OpenAI processor conversation config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) in the server admin settings. This is kind of a misnomer, we know.
1. Create a new [AI Model API](http://localhost:42110/server/admin/database/aimodelapi/add) in the server admin settings.
- Add your [Anthropic API key](https://console.anthropic.com/account/keys)
- Give the configuration a friendly name like `Anthropic`. Do not configure the API base url.
2. Create a new [chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/add)
2. Create a new [chat model](http://localhost:42110/server/admin/database/chatmodel/add)
- Set the `chat-model` field to an [Anthropic chat model](https://docs.anthropic.com/en/docs/about-claude/models#model-names). Example: `claude-3-5-sonnet-20240620`.
- Set the `model-type` field to `Anthropic`.
- Set the `Openai config` field to the OpenAI processor conversation config for Anthropic you created in step 1.
- Set the `ai model api` field to the Anthropic AI Model API you created in step 1.
</TabItem>
<TabItem value="gemini" label="Gemini">
1. Create a new [OpenAI processor conversation config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) in the server admin settings. This is kind of a misnomer, we know.
1. Create a new [AI Model API](http://localhost:42110/server/admin/database/aimodelapi/add) in the server admin settings.
- Add your [Gemini API key](https://aistudio.google.com/app/apikey)
- Give the configuration a friendly name like `Gemini`. Do not configure the API base url.
2. Create a new [chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/add)
- Set the `chat-model` field to a [Google Gemini chat model](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models). Example: `gemini-1.5-flash`.
- Set the `model-type` field to `Gemini`.
- Set the `Openai config` field to the OpenAI processor conversation config for Gemini you created in step 1.
2. Create a new [chat model](http://localhost:42110/server/admin/database/chatmodel/add)
- Set the `chat-model` field to a [Google Gemini chat model](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models). Example: `gemini-2.5-flash`.
- Set the `model-type` field to `Google`.
- Set the `ai model api` field to the Gemini AI Model API you created in step 1.
</TabItem>
<TabItem value="offline" label="Offline">
@@ -297,13 +293,14 @@ Offline chat stays completely private and can work without internet using any op
- A Nvidia, AMD GPU or a Mac M1+ machine would significantly speed up chat responses
:::
1. Get the name of your preferred chat model from [HuggingFace](https://huggingface.co/models?pipeline_tag=text-generation&library=gguf). *Most GGUF format chat models are supported*.
2. Open the [create chat model page](http://localhost:42110/server/admin/database/chatmodeloptions/add/) on the admin panel
3. Set the `chat-model` field to the name of your preferred chat model
- Make sure the `model-type` is set to `Offline`
4. Set the newly added chat model as your preferred model in your [User chat settings](http://localhost:42110/settings) and [Server chat settings](http://localhost:42110/server/admin/database/serverchatsettings/).
5. Restart the Khoj server and [start chatting](http://localhost:42110) with your new offline model!
</TabItem>
1. Install any Openai API compatible local ai model server like [llama-cpp-server](https://github.com/ggml-org/llama.cpp/tree/master/tools/server), Ollama, vLLM etc.
2. Add an [ai model api](http://localhost:42110/server/admin/database/aimodelapi/add/) on the admin panel
- Set the `api url` field to the url of your local ai model provider like `http://localhost:11434/v1/` for Ollama
3. Restart the Khoj server to load models available on your local ai model provider
- If that doesn't work, you'll need to manually add available [chat model](http://localhost:42110/server/admin/database/chatmodel/add) in the admin panel.
4. Set the newly added chat model as your preferred model in your [User chat settings](http://localhost:42110/settings)
5. [Start chatting](http://localhost:42110) with your local AI!
@@ -14,13 +14,6 @@ We don't send any personal information or any information from/about your conten
## Disable Telemetry
If you're self-hosting Khoj, you can opt out of telemetry at any time. To do so,
1. Open `~/.khoj/khoj.yml`
2. Add the following configuration:
```
app:
should-log-telemetry: false
```
3. Save the file and restart Khoj
If you're self-hosting Khoj, you can opt out of telemetry at any time by setting the `KHOJ_TELEMETRY_DISABLE` environment variable to `True` via shell or `docker-compose.yml`
If you have any questions or concerns, please reach out to us on [Discord](https://discord.gg/BDgyabRM6e).
{"name":"Khoj","short_name":"Khoj","display":"standalone","start_url":"/","description":"The open, personal AI for your digital brain. You can ask Khoj to draft a message, paint your imagination, find information on the internet and even answer questions from your documents.","theme_color":"#ffffff","background_color":"#ffffff","icons":[{"src":"/static/assets/icons/khoj_lantern_128x128.png","sizes":"128x128","type":"image/png"},{"src":"/static/assets/icons/khoj_lantern_256x256.png","sizes":"256x256","type":"image/png"}],"screenshots":[{"src":"/static/assets/samples/phone-remember-plan-sample.png","sizes":"419x900","type":"image/png","form_factor":"narrow","label":"Remember and Plan"},{"src":"/static/assets/samples/phone-browse-draw-sample.png","sizes":"419x900","type":"image/png","form_factor":"narrow","label":"Browse and Draw"},{"src":"/static/assets/samples/desktop-remember-plan-sample.png","sizes":"1260x742","type":"image/png","form_factor":"wide","label":"Remember and Plan"},{"src":"/static/assets/samples/desktop-browse-draw-sample.png","sizes":"1260x742","type":"image/png","form_factor":"wide","label":"Browse and Draw"}]}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.