This bug was introduced in 05d4e19cb, version 1.42.2, during migration
to save deeply typed ChatMessageModel. As the ChatMessageModel did
not use the right field name for organic results (since the start).
Previously it did not matter as it was storing to DB irrespective but
now the mapping of dictionary to ChatMessageModel drops that field
before save to conversation in DB.
This was resulting in organic context being lost on page reload and
only being shown on first response.
Not sure why but it some cases when interacting with o3 (which needs
non-streaming) the stream_options seems to be set.
Cannot reproduce but hopefully dropping the stream_options explicitly
should resolve this issue.
Related 985a98214
Older package (like 1.84.0) seem to always pass reasoning_effort
argument to openai api, which now seems to be throwing unexpected
request argument error when used with non-reasoning models (like
4o-mini).
There had been a regression that made all agents display the default
chat model instead of the actual chat model associated with the agent.
This change resolves that issue by prioritizing agent specific chat
model from DB (over user or server chat model).
- Fix code context data type for validation on server. This would
prevent the chat message from being written to history
- Handle null code results on web app
We now pass deeply typed chat messages throughout the application to
construct tool specific chat history views since 05d4e19cb.
This ChatMessageModel didn't allow intent.query to be unset. But
interrupted research iteration history can have unset query. This
changes allows makes intent.query optional.
It also uses message by user entry to populate user message in tool
chat history views. Using query from khoj intent was an earlier
shortcut used to not have to deal with message by user. But that
doesn't scale to current scenario where turns are not always required
to have a single user, assistant message pair.
Specifically a chat history can now contain multiple user messages
followed by a single khoj message. The new change constructs a chat
history that handles this scenario naturally and makes the code more
readable.
Also now only previous research iterations that completed are
populated. Else they do not serve much purpose.
Clean non useful slash commands to make chat API more maintanable.
- App version, chat model via /help is visible in other parts of the
UX. Asking help questions with site:docs.khoj.dev filter isn't used
or known to folks
- /summarize is esoterically tuned. Should be rewritten if add back.
It wasn't being used by /research already
- Automations can be configured via UX. It wasn't being shown in UX
already
The chat actor (and director) tests haven't been looked into in a long
while. They'd gone stale in how they were calling thee functions. And
what was required to run them. Now the online chat actor tests work
again.
Using model specific extract questions was an artifact from older
times, with less guidable models.
New changes collate and reuse logic
- Rely on send_message_to_model_wrapper for model specific formatting.
- Use same prompt, context for all LLMs as can handle prompt variation.
- Use response schema enforcer to ensure response consistency across models.
Extract questions (because of its age) was the only tool directly within
each provider code. Put it into helpers to have all the (mini) tools
in one place.
- Rename GET /api/automations to GET /api/automation
- Rename POST /api/trigger/automation to POST /api/automation/trigger
- Update calls to the automations API from the web app.
- Add context based on information provided rather than conversation
commands. Let caller handle passing appropriate context to ai
provider converse methods
Increase timeout to 180 (from 120s previous) and graceful timeout to
90 (from 30s default) to reduce
Increase default gunicorn workers and make it configurable to better
utilize (v)CPUs. This is manually configured (instead of using
multiprocessing.cpu_count()) as VMs/containers may read cpu count of
host machine instead of their VMs/containers.
The chat dictionary is an artifact from earlier non-db chat history
storage. We've been ensuring new chat messages have valid type before
being written to DB for more than 6 months now.
Move to using the deeply typed chat history helps avoids null refs,
makes code more readable and easier to reason about.
Next Steps:
The current update entangles chat_history written to DB
with any virtual chat history message generated for intermediate
steps. The chat message type written to DB should be decoupled from
type that can be passed to AI model APIs (maybe?).
For now we've made the ChatMessage.message type looser to allow
for list[dict] type (apart from string). But later maybe a good idea
to decouple the chat_history recieved by send_message_to_model from
the chat_history saved to DB (which can then have its stricter type check)
- Converts response schema into a anthropic tool call definition.
- Works with simple enums without needing to rely on $defs, $refs as
unsupported by Anthropic API
- Do not force specific tool use as not supported with deep thought
This puts anthropic models on parity with openai, gemini models for
response schema following. Reduces need for complex json response
parsing on khoj end.
There seems to be a more standard mechanism of specifying launch.json
params for devcontainers. Previous mechanism to write launch.json to
.vscode/launch.json in post creation step does not work.
Improve default launch.json to include khoj admin username, password
with placeholder values to get started with local development faster.
Define dockerfile for devcontainer to pre-built server, web app
dependencies during dev container image creation stage. So install on
dev container startup is sped up as no need to install dependencies.
## Description
This PR introduces significant improvements to the Obsidian Khoj
plugin's chat interface and editing capabilities, enhancing the overall
user experience and content management functionality.
## Features
### 🔍 Enhanced Communication Mode
I've implemented radio buttons below the chat window for easier
communication mode selection. The modes are now displayed as emojis in
the conversation for a cleaner interface, replacing the previous
text-based system (e.g., /default, /research). I've also documented the
search mode functionality in the help command.
#### Screenshots
- Radio buttons for mode selection
- Emoji display in conversations

### 💬 Revamped Message Interaction
I've redesigned the message buttons with improved spacing and color
coding for better visual differentiation. The new edit button allows
quick message modifications - clicking it removes the conversation up to
that point and copies the message to the input field for easy editing or
retrying questions.
#### Screenshots
- New message styling and color scheme

- Edit button functionality

### 🤖 Advanced Agent Selection System
I've added a new chat creation button with agent selection capability.
Users can now choose from their available agents when starting a new
chat. While agents can't be switched mid-conversation to maintain
context, users can easily start fresh conversations with different
agents.
#### Screenshots
- Agent selection dropdown

### 👁️ Real-Time Context Awareness
I've added a button that gives Khoj access to read Obsidian opened tabs.
This allows Khoj to read open notes and track changes in real-time,
maintaining a history of previous versions to provide more contextual
assistance.
#### Screenshots
- Window access toggle

### ✏️ Smart Document Editing
Inspired by Cursor IDE's intelligent editing and ChatGPT's Canvas
functionality, I've implemented a first version of a content creation
system we've been discussing. Using a JSON-based modification system,
Khoj can now make precise changes to specific parts of files, with
changes previewed in yellow highlighting before application.
Modification code blocks are neatly organized in collapsible sections
with clear action summaries. While this is just a first step, it's
working remarkably well and I have several ideas for expanding this
functionality to make Khoj an even more powerful content creation
assistant.
#### Screenshots
- JSON modification preview
- Change highlighting system
- Collapsible code blocks
- Accept/cancel controls

---------
Co-authored-by: Debanjum <debanjum@gmail.com>
## Summary
- Enable Khoj to operate computers: Add experimental computer operator
functionality that allows Khoj to interact with desktop environments,
browsers, and terminals to accomplish complex tasks
- Multi-environment support: Implement computer environments with GUI,
file system, and terminal access. Can control host computer or Docker
container computer
## Key Features
### Computer Operation Capabilities
- Desktop control (screenshots, clicking, typing, keyboard shortcuts)
- File editing and management
- Terminal/bash command execution
- Web browser automation
- Visual feedback via train-of-thought video playback
### Infrastructure & Architecture:
- Docker container (ghcr.io/khoj-ai/computer:latest) with Ubuntu 24.04,
XFCE desktop, VNC access
- Local computer environment support with pyautogui
- Modular operator agent system supporting multiple environment types
- Trajectory compression and context management for long-running tasks
### Model Integration:
- Anthropic models only (Claude Sonnet 4, Claude 3.7 Sonnet, Claude Opus
4)
- OpenAI and binary operator agents temporarily disabled
- Enhanced caching and context management for operator conversations
### User Experience:
- `/operator` command or just ask Khoj to use operator tool to invoke
computer operation
- Integrate with research mode for extended 30+ minute task execution
- Video of computer operation in train of thought for transparency
### Configuration
- Set `KHOJ_OPERATOR_ENABLED=True` in `docker-compose.yml`
- Requires Anthropic API key
- Computer container runs on port 5900 (VNC)
- You can seek through the train of thought video of computer operation or
follow it in live mode.
- Interleaves video with normal text thoughts.
- Video available of old interactions and currently streaming message.
- Add type guards for action.path in drag vs text editor actions
- Added type guards for Union type attribute access
- Fixed variable naming conflicts between drag and text editor cases
- Resolved remaining typing issues in OpenAI, Anthropic agents
- Type guard without requiring another code indent level
- Create reusable method to call model
- Fix to summarize messages on operator run.
- Mark assistant tool calls with role = assistant, not environment
- Try fix message format when load after interrupts.
Does not work well yet
Previously CTRL+A would get triggered instead of ctrl+a. CTRL+A is
equivalent to ctrl+shift+a. This isn't intended and should be
called directly when required.
Now key combos like ctrl+a on computer firefox etc. work as expected
Track research and operator results at each nested iteration step
using python object references + async events bubbled up from nested
iterators.
Instantiates operator with interrupted operator messages from research
or normal mode.
Reflects actual interaction trajectory as closely as possible to agent
including conversation history, partial operator trajectory and new
query for fine grained, corrigible steerability.
Research mode continues with operator tool directly if previous
iteration was an interrupted operator run.
Since partial state reload after interrupt drops Khoj messages. The
assumption that there will always be a Khoj message after a user
message is broken. That is, there can now be multiple user messages
preceding a Khoj user message now.
This change allow for user queries to still be extracted for chat
history even if no khoj message follow.
Minor logic update to only include non image inferred queries for
gemini, anthropic models as well instead of just for openai models.
Apart from that the extracted function should be functionally same.
We were passing operator results as a simple dictionary. Strongly
typing it makes sense as operator results becomes more complex.
Storing operator results with trajectory on interrupts will allow
restarting interrupted operator run with agent messages of interrupted
trajectory loaded into operator agents
This allows:
- Each operator agent to own its summarization prompt. That it can
tune if it wants
- The outer operator loop to pass an override summarize prompt when it
invokes the summarize func but it does not have to
It had become broken at some point due to refactoring. The cache
control was getting added and removed right after in add_action_results
What we actually wanted to do is clear the old cache breakpoint and
put a new one at the latest operator tool result message.
This should improve operator speed and lower costs with anthropic
models.
- Generalize building pyautogui into executable python code snippet.
This should work across docker and local. And should be easier to
extend to operate a remote computer over the network as well.
- Create dockerfile for pyautogui operate-able containerized computer
Previously it could only operate a (playwright) browser. Now
- The operator logic and naming has been updated assuming
multiple environment types can be operated
- The operator entrypoint is now at __init__.py to simplify imports
and the entrypoint function is called operate_environment
- All operator agents have been updated to select their system prompts
and tools based on the environment they'll operate
- Khoj can now save and restore research from partial state
This triggers an interrupt that saves the partial research, then
when a new query is sent it loads the previous partial research as
context and continues utilizing with the new user query to orient
its future research
- Support natural interrupt and send query behavior from web app
This triggers an abort and send when a user sends a chat message
while khoj is in the middle of some previous research.
This interrupt mechanism enables a more natural, interactive
research flow
- Just send your new query. If a query was running previously it'd
be interrupted and new query would start processing. This improves on
the previous 2 click interrupt and send ux.
- Utilizes partial research for interrupted query, so you can now
redirect khoj's research direction. This is useful if you need to
share more details, change khoj's research direction in anyway or
complete research. Khoj's train of thought can be helpful for this.
- Track operator, research context in ChatMessage
- Track query field in (document) context field of ChatMessage
This allows validating chat message before inserting into DB
These seem to be a new class of errors showing up. Explicitly using
django timezone functions to add awareness to date time files stored
in DB seems to mitigate the issue.
Related #1180
- Engage reasoning when using claude 4 models
- Allow claude 4 models as monolithic operator agents
- Ease identifying which anthropic models can reason, operate GUIs
- Track costs, set default context window of claude 4 models
- Handle stop reason on calls to new claude 4 models
- Normalize conversation_id type to str instead of str or UUID
- Do not pass conversation_id to agenerate_chat_response as
the associated conversation is also being passed. So can get its id
directly.
## Overview
1. Create base framework to compose different operators and environments
for Khoj to operate.
2. Enable Khoj to operate a web browser using anthropic, openai, gemini
or open-source models
**Note**: *This is an alpha level feature release. It is meant for local
testing by contributors and self-hosters.*
## Capabilities
- Have Khoj operate a web browser to complete tasks that require actions
and visual feedback.
- Experiment with any vision model as operator. Khoj supports monolithic
and binary operator
- Monolithic operators rely on a single models like claude, openai to
both reason and ground operator actions
- Binary operators allow bootstrapping a fully local operator. It can
use any vision model for visual reasoning when paired with a capable
visual grounding model.
## Limitations
- In general, it is slower, more expensive and less comprehensive than
standard Khoj for research
## Setup
1. Install Khoj with playwright by either
- running `pip install khoj[local]`
- installing playwright separately via `pip install playwright` and
`playwright install chromium`
2. Set `KHOJ_OPERATOR_ENABLED` env var to true (i.e
`KHOJ_OPERATOR_ENABLED=true`)
3. Start Khoj (e.g `USE_EMBEDDED_DB="true" khoj --anonymous-mode -vv`)
4. Add the necessary chat model(s) with `vision enabled` via your [Khoj
Admin Panel](http://localhost:42110/server/admin)
- To use Anthropic claude: `claude-3.7-sonnet*` chat model is required
with vision enabled
- To use Openai operator: `gpt-4o` chat model is required with vision
enabled
- For other operator configurations: a chat model named `ui-tars-1.5` is
required with vision enabled
This can technically be any visual grounding model served via an openai
compatible api. I've just tested with ui-tars-1.5-7b deployed to an HF
inference endpoint for now. See [deployment
instructions](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md)
5. Set your desired vision chat model via [user
settings](http://localhost:42110/settings) to use as operator.
6. Run your queries with either the `/operator` slash command or by just
asking Khoj in your query to use the operator tool. You can combine run
operator in research mode a well
### Advanced Usage
- Reuse Browser Session
- Why: Have Khoj operate web services you've logged into. E.g manage
your gmail, github, social media etc.
- Setup
1. Start Chromium or Edge in Remote Debugging mode. For example, on Mac
you can start Edge by running the following in your terminal:
`/Applications/Microsoft\ Edge.app/Contents/MacOS/Microsoft\ Edge
--remote-debugging-port=9222`
4. Connect Khoj to that browser instance by setting the environment
variable `KHOJ_CDP_URL` to its URL.
By default you'd set `KHOJ_CDP_URL="http://localhost:9222"`
## Architecture
### Operator Agents
| Type | Design |
|----- |-----|
| Monolithic | <img
src="https://github.com/user-attachments/assets/7a96440f-1732-482b-9bd9-0920cb0c60890"
width=400> |
| Binary | <img
src="https://github.com/user-attachments/assets/c5d101c0-3475-43c2-a301-daa943cde190"
width=400> |
The generic grounding agent has not been tested properly but at least
it should be aligned with the interface being used by the ui-tars
grounding agent which has been tested.
It sometimes outputs coordinates in string rather than list. Make
parser more robust to those kind of errors.
Share error with operator agent to fix/iterate on instead of exiting
the operator loop.
- Encourage grounder to adhere to the reasoners action instruction
- Encourage reasoner to explore other actions when stuck in a loop
Previously seemed to be forcing it too strongly to choose
"single most important" next action. So may not have been exploring
other actions to achieve objective on initial failure.
- Do not catch errors messages just to re-throw them. Results in
confusing exception happened during handling of an exception
stacktrace. Makes it harder to debug
- Log error when action_results.content isn't set or empty to
debug this operator run error
Goto and back functions are chosen by the visual reasoning model for
increased reliability in selecting those tools. The ui-tars grounding
models seems too tuned to use a specific set of tools.
Documentation about this is currently limited, confusing. But it seems
like reasoning item should be kept if computer_call after, else drop.
Add noop placeholder for reasoning item to prevent termination of
operator run on response with just reasoning.
The reasoning messages in openai cua needs to be passed back or some
such. Else it throws missing response with required id error.
Folks are confused about expected behavior for this online as well.
The documentation to handle this seems to be sparse, unclear.
Show natural language, formatted text for each action. Previously we
were just showing json dumps of the actions taken.
Pass screenshot at each step for openai, anthropic and binary operator agents
Use text and image field in json passed to client for rendering both.
Show actions, env screenshot after actions applied in train of thought.
Showing the post action application screenshot seems more intuitive.
Previously we were showing the screenshot used to decide next action.
This pre action application screenshot was being shown after next
action decided (in train of thought). This was anyway misleading to
the actual ordering of event.
Rendered response is now a structured payload (dict) passing image
and text to be rendered up from operator to clients for rendering of
train of thought.
Operator is still early in development. To enable it:
- Set KHOJ_OPERATOR_ENABLE environment variable to true
- Run any one of the commands below:
- `pip install khoj[local]'
- `pip install khoj[dev]'
- `pip install playwright'
Grounding agent does not have the full context and capabilities to
make this call. Only let reasoning agent make termination decision.
Add a wait action instead when grounder requests termination.
UI tars grounder doesn't like calling non-standard functions like
goto, back.
Directly parse visual reasoner instruction to bypass uitars grounder
model.
At least for goto and back functions grounding isn't necessary, so
this works well.
Previously the grounding agent would be reset on every call. So it
only saw the most recent instruction and screenshot to make its next
action suggestion.
This change allows the visual grounders to see past instructions and
actions to prevent looping and encourage more exploratory action
suggestions by it when stuck or see errors.
Split visual grounder into two implementations:
- A ui-tars specific visual grounder agent. This uses the canonical
implementation of ui-tars with specialized system prompt and action
parsing.
- Fallback to generic visual grounder utilizing tool-use and served over
any openai compatible api. This was previously being used for our
ui-tars implementation as well.
Adds the results of each action in a separate item in message content.
Previously we were adding this as a single larger text blob. This
changes adds structure to simplify post processing (e.g truncation).
The updated add_action_results should also require less work to
generalize if we pass tool call history to grounding model as
action results in valid openai format.
The initial user query isn't updated during an operator run. So set it
when initializing the operator agent. Instead of passing it on every
call to act.
Pass summarize prompt directly to the summarize function. Let it
construct the summarize message to query vision model with.
Previously it was being passed to the add_action_results func as
previous implementation that did not use a separate summarize func.
Also rename chat_model to vision_model for a more pertinent var name.
These changes make the code cleaner and implementation more readable.
For some reason the page.go_back() action in playwright had a much
higher propensity to timeout. Use goto instead to reduce these page
traversal timeouts.
This requires tracking navigation history.
Only let the visual reasoner handle terminating the operator run.
Previously the grounder was also able to trigger termination.
Make catching the termination by the reasoner more robust
The previous browser_operator.py file had become pretty massive and
unwieldy. This change breaks it apart into separate files for
- the abstract environment and operator agent base
- the concrete agents: anthropic, openai and binary
- the concrete environment browser operator
- the operator actions used by agents and environment
- This operator works with model served over an openai compatible api
- It uses separate vision models to reason and ground actions.
This improves flexibility in the operator agents that can be created.
We do not know need our operator agent ot rely on monolithic models to
can both reason over visual data and ground their actions.
We can create operator agent from 2 separate models:
1. To reason over screenshots to suggest natural language next action
2. To ground those suggestion into visually grounded actions
This allows us to create fully local operators or operators combining
the best visual reasoner with the best visual grounder models.
Inform it can only control a single playwright browser page.
Previously it was assuming it is operating a whole browser, so would
have trouble navigating to different pages.
Improve handling of error in action parsing
Remove each OperatorAgent specific code from leaking out into the
operator. The Oprator just calls the standard OperatorAgent functions.
Each AgentOperator specific logic is handled by the OperatorAgent
internally.
The improve the separation of responsibility between the operator,
OperatorAgent and the Environment.
- Make environment pass screenshot data in agent agnostic format
- Have operator agents providers format image data to their AI model
specific format
- Add environment step type to distinguish image vs text content
- Clearly mark major steps in the operator iteration loop
- Handle anthropic models returning computer tool actions as normal
tool calls by normalizing next action retrieval from response for it
- Remove unused ActionResults fields
- Remove unnnecessary placeholders to content of action results like
for screenshot data
Decouple applying action on Environment from next action decision by
OperatorAgent
- Create an abstract Environment class with a `step' method
and a standardized set of supported actions for each concrete Environment
- Wrap playwright page into a concrete Environment class
- Create abstract OperatorAgent class with an abstract `act' method
- Wrap Openai computer Operator into concrete OperatorAgent class
- Wrap Claude computer Operator into a concrete OperatorAgent class
Handle interaction between Agent's action
Previously some link clicks would open in new tab. This is out of the
browser operator's context and so the new page cannot be interacted
with by the browser operator.
This change catches new page opens and opens them in the context page
instead.
Give the Khoj browser operator access to browser with existing
context (auth, cookies etc.) by starting it with CDP enabled.
Process:
1. Start Browser with CDP enabled:
`Edge/Chromium/Chrome --remote-debugging-port=9222'
2. Set the KHOJ_CDP_URL env var to the CDP url of the browser to use.
3. Start Khoj and ask it to get browser based work done with operator
+ research mode
The github run_eval workflow sets OPENAI_BASE_URL to empty string.
The ai model api created during initialization for openai models gets
set to empty string rather than None or the actual openai base url
This tries to call llm at to empty string base url instead of the
default openai api base url, which obviously fails.
Fix is to map empty base url's to the actual openai api base url.
Previously all exceptions were being caught. So retry logic wasn't
getting triggered.
Exception catching had been added to close llm thread when threads
instead of async was being used for final response generation.
This isn't required anymore since moving to async. And we can now
re-enable retry on failures.
Raise error if response is empty to retry llm completion.
Send image as png to non-openai models served via an openai compatible
api. As more models support png than webp.
Continue storing images as webp on server for efficiency.
Convert to png at the openai api layer and only for non-openai models
served via an openai compatible api.
Enable using vision models like ui-tars (via llama.cpp server), grok.
### Major
* Do more granular truncation on hitting context limits
* Pack research iterations as list of message content instead of
separate messages
* Update message truncation logic to truncate items in message content
list
* Make researcher aware of number of web, doc queries allowed per
iteration
### Minor
* Prompt web page reader to extract quantitative data as is from pages
* Track gemini 2.0 flash lite cost. Reduce max prompt size for 4o-mini
* Ensure time to first token logged only once per chat response
* Upgrade tenacity to respect min_time passed to exponential backoff
with jitter function
Fix for issue is in tenacity 9.0.0. But older langchain required
tenacity <0.9.0.
Explicitly pin version of langchain sub packages to avoid indexing
and doc parsing breakage.
- Construct tool description dynamically based on configurable query
count
- Inform the researcher how many webpage reads, online searches and
document searches it can perform per iteration when it has to decide
which next tool to use and the query to send to the tool AI.
- Pass the query counts to perform from the research AI down to the
tool AIs
Time to first token Log lines were shown multiple times if new chunk
bein streamed was empty for some reason.
This change makes the logic robust to empty chunks being recieved.
Previously research iterations and conversation logs were added to a
single user message. This prevented truncating each past iteration
separately on hitting context limits. So the whole past research
context had to be dropped on hitting context limits.
This change splits each research iteration into a separate item in a
message content list.
It uses the ability for message content to be a list, that is
supported by all major ai model apis like openai, anthropic and gemini.
The change in message format seen by pick next tool chat actor:
- New Format
- System: System Message
- User/Assistant: Chat History
- User: Raw Query
- Assistant: Iteration History
- Iteration 1
- Iteration 2
- User: Query with Pick Next Tool Nudge
- Old Format
- User: System + Chat History + Previous Iterations Message
- User: Query
- Collateral Changes
The construct_structured_message function has been updated to always
return a list[dict[str, Any]].
Previously it'd only use list if attached_file_context or vision model
with images for wider compatibility with other openai compatible api
Previously the research agent would have a hard time getting
quantitative data extracted by the web page reader tool AI.
This change aims to encourage the web page reader tool to extract
relevant data in verbatim form for higher granularity research and
responses.
Code tool should see code context and webpage tool should see online
context during research runs
Fix to include code context from past conversations to answer queries.
Add all queries to tool chat history when no specific tool to limit
extracting inferred queries for provided.
- Use much larger read, connect timeout if llm served over local url
- Use larger timeout duration than default (5s) for online llms too
This matches timeout duration increase calls to gemini api
- Improve overall flow of the contribute section of Readme
- Fix where to look for good first issues. The contributors board is outdated. Easier to maintain and view good-first-issue with issue tags directly.
Co-authored-by: Debanjum <debanjum@gmail.com>
Fallback to assume not a subscribed user if user not passed.
This allows user arg to be actually optional in the async
send_message_to_model_wrapper function
### Major
All reasoning models return thoughts differently due to lack of
standardization.
We normalize thoughts by reasoning models and providers to ease handling
within Khoj.
The model thoughts are parsed during research mode when generating final
response.
These model thoughts are returned by the chat API and shown in train of
thought shown on web app.
Thoughts are enabled for Deepseek, Anthropic, Grok and Qwen3 reasoning
models served via API.
Gemini and Openai reasoning models do not show their thoughts via
standard APIs.
### Minor
- Fix ability to use Deepseek reasoner for intermediate stages of chat
- Enable handling Qwen3 reasoning models
Previously Deepseek reasoner couldn't be used via API for completion
because of the additional formatting constrains it required was being
applied in this function.
The formatting fix was being applied in the chat completion endpoint.
DeepSeek reasoners returns reasoning in reasoning_content field.
Create an async stream processor to parse the reasoning out when using
the deepseek reasoner model.
The Qwen3 reasoning models return thoughts within <think></think> tags
before response.
This change parses the thoughts out from final response from the
response stream and returns as structured response with thoughts.
These thoughts aren't passed to client yet
OpenAI API doesn't support thoughts via chat completion by default.
But there are thinking models served via OpenAI compatible APIs like
deepseek and qwen3.
Add stream handlers and modified response types that can contain
thoughts as well apart from content returned by a model.
This can be used to instantiate stream handlers for different model
types like deepseek, qwen3 etc served over an OpenAI compatible API.
Recent changes enabled free tier users to switch free tier chat models
per conversation or the default.
This change enables free tier users to generate responses with their
conversation specific chat model.
Related: #725, #1151
Reason
---
- Simplify code and logic to stream chat response by solely relying on
asyncio event loop.
- Reduce overhead of managing threads to increase efficiency and
throughput (where possible).
Details
---
- Use async/await with no threading when generating chat response via
OpenAI, Gemini, Anthropic AI model APIs
- Use threading for offline chat model as llama-cpp doesn't support
async streaming yet
# PR Summary
This small PR resolves the deprecation warnings on `datetime` in
Python3.12+. You can find them in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/14538833837/job/40792624987#step:9:134):
```python
/__w/khoj/khoj/src/khoj/processor/content/images/image_to_entries.py:61: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp_now = datetime.utcnow().timestamp()
```
Overview
---
Enable free tier users to chat with any AI model made available on free tier
of production deployments like [Khoj cloud](https://app.khoj.dev).
Previously model switching was completely disabled for users on free tier.
Details
---
- Track price tier of each Chat, Speech, Image, Voice AI model in DB
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
Previously the model switching APIs and UX fields on web app were
completely disabled for free tier users
Rely on deepthought flag to control reasoning effort of low/high for
the grok model
This is different from the openai reasoning models which support
low/medium/high and for which we use low/medium effort based on the
deepthought flag
Note: grok is accessible over an openai compatible API
Disregard chart types as not using rich chart rendering
and they are duplicate of chart images that are rendered
Disregard text output associated with generated image files
Added a “Troubleshooting & Tips” section to the GCP Vertex documentation.
This section provides guidance for self-hosted users on common issues
they may encounter when setting up Google Vertex AI integration in Khoj.
Topics covered include permissions, region compatibility, prompt size
limits, API key testing, and secure key management with environment
variables. The goal is to improve the onboarding experience and reduce
setup errors for contributors and self-hosters using Vertex AI models
like Claude and Gemini.
Signed off by: brightally6@gmail.com
Server:
- Rate limit based on unverified email before creating user
- Check email address for deliverability before creating user
- Track rate limit for unverified email in new non-user keyed table
Web app:
- Show error in login popup to user on failure/throttling
- Simplify login popup logic by moving magic link handling logic
into EmailSigninContext instead of passing require props via parent
- Set chatSidebar prompt, Setting name fields to empty str if value null
- Track if agent modified in chatSidebar to simplify code, fix looping
- Suppress spurious dark mode hydration warnings on the web app
- Set key for chatMessage parent to get UX efficiently updated by react
- Let only root next.js layout handle html, body tags, not child layouts
Previously the sidebar could recurse on opening chat page (from home?)
due to child modelSelector component updating parent chatSidebar prop
which was passed back down to it in a loop.
The chatSidebar decides if agent has been modified in a single
useEffect and enables the Save button accordingly.
- Track agent modification wrt agent info received from server in
chatSidebar instead.
- Reduce modelSelector's mandate to just notify
when the user changes the model.
- Fix to infer, show & update agent state from chat sidebar on web app
This logic is fragile and convoluted because:
- the default agent chat model is dynamically determined.
- need to disambiguate tools not set vs none set vs all set by user
The default agent's tool selection is stored as undefined to show
not set scenario, which allows for all tools to be dynamically
used by agent.
But the user can also set no tools or all tools for their agents.
All 3 scenarios are handled differently.
- Track tools to be displayed vs tools to be stored
This is triggered by mismatch between "dark" class present on server
sent layout but not in client sent layout on initial render.
That mismatch exists because the server applies dark-mode styling
early to avoid FOUC flickering of UX.
Related 43e032e
Remove html, body elements from child page layouts. Let only the root
layout handle it.
Next.js router structure mounts child layouts inside parent layouts,
as defined by their directory hierarchy. So the html, body component
should only be defined in the parent layout.
This avoids the child layout mounting its html, body component within
the actual root layout's existing html, body component.
Previously the chat model associated with the default agent was always
the first chat model populated on the server. This doesn't match
behavior of the rest of the system, where the server chat settings is
preferred over the user chat settings over the first chat model.
This change brings the default agent's chat model in line with the
preference order used in the reset of the system.
Previous change to fallback to default agent was not functional. It
would error out if the conversation agent wasn't set when trying to
get conversation.agent.slug for calling aget_agent_by_slug func
We were previously relying on an older, unmaintained version of
pgvector docker image, ankane/pgvector.
Moving to new docker image requires selecting from tags based on the
pg major version (14, 15, 16 or 17).
This change uses pg15 tag to resolve image pull.
Note: we use postgres 15 for khoj docker images currently
Fixes#1154
Issue introduced in commit 5a3c7b1.
Usage of KHOJ_DOMAIN
---
KHOJ_DOMAIN is tri-state for local, official and other production deployments:
- If KHOJ_DOMAIN is unset (for local):
- sets CSRF cookie to localhost
- adds khoj.dev variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds app.khoj.dev variants to CORS origins
- If KHOJ_DOMAIN is set to empty (for official):
- sets CSRF to khoj.dev
- adds khoj.dev variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds app.khoj.dev variants to CORS origins
- If KHOJ_DOMAIN is set (for other prod deployments):
- sets CSRF cookie to KHOJ_DOMAIN
- adds KHOJ_DOMAIN variants to ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS
- adds KHOJ_DOMAIN variants to CORS origins
Related #1137, #1152Resolves#1123
Unsure why this error triggers on every request to the Django admin
panel these days but all the requests are completing fine and the
client is clearly not aborting the request when the RequestAborted
exception is raised.
Suppress these errors for now via middleware to prevent them from
unnecessarily cluttering up the server logs and confusing folks.
Related #1152
### Improve Gemini usage
- Allow text tool to give agent ability to terminate research
- Set default context for gemini 2 flash models
2x context window for small, commercial models to 120K
- Default temperature of Gemini models to 1.0 to reduce repetition
### Improve evaluation harness
- Add more knobs to control eval workflow
- Allow running eval with any chat model served over an openai compatible api
- Control random sampling from eval set
- Auto read web page
- Use embedded postgres instead of postgres server for eval workflow
- Use Gemini 2.0 flash as evaluator. Set seed for evaluator to reduce decision variance
Previously Gemini 2 flash and flash lite were using context window of
10K by default as no defaults were added for it.
Increase default context for small commercial models to 120K from 60K
as cheaper and faster than their pro models equivalents at 60K context.
We'd moved research planner to only use tools in enum of schema. This
enum tool enforcement prevented model from terminating research by
setting tool field to empty.
Fix the issue by adding text tool to research tools enum and tell
model to use that to terminate research and start response instead.
Make research planner consistently select tool before query. As the
model should tune it's query for the selected tool. It got space to
think about tool to use in the scratchpad already.
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
Default to false as it is the default that is going to be used in prod
going forward.
- Set openai api key via input param in manual eval workflow runs
- Simplify evaluating other chat models available over openai
compatible api via eval workflow.
- Mask input api key as secret in workflow.
- Discard unnecessary null setting of env vars.
- Control randomization of samples in eval workflow.
If randomization is turned off, it'll take the first SAMPLE_SIZE
items from the eval dataset instead of a random collection of
SAMPLE_SIZE items.
This PR implements a new feature request template with a few UX/UI improvements.
Key changes:
- Use of GitHub forms.
- Provide note info for a submitter about feature request submitting rules.
- Adds a few handy fields like "Describe the feature" or "Use Case"
Overall, with a template like this feature requests will be more structured and meaningful.
Only setup speech to text and text to image models served via openai
compatible APIs when explicitly specified during initialization.
This avoids setup of whisper and dalle when an openai compatible API
is being setup instead of the openai API itself.
- Specify min, max number of list items expected in AI response via JSON schema enforcement. Used by Gemini models
- Warn and drop invalid/empty messages when format messages for Gemini models
- Make Gemini response adhere to the order of the schema property definitions
- Improve agent creation safety checker by using response schema, better prompt
Without explicitly using the property ordering field, gemini returns
responses in alphabetically sorted property order.
We want the model to respect the schema property definition order.
This ensures control during development to maintain response quality.
For example in CoT make it fill scratchpad before answers.
Require at least 1 item in lists. Otherwise gemini flash will
sometimes return an empty list. For chat actors where max items is
known, set that as well.
OpenAI API does not support specifying min, max items in response
schema lists, so drop those properties when response schema is
passed. Add other enforcements to response schema to comply with
response schema format expected by OpenAI API.
Previously we were setting message content part with empty text. This
results in error from Gemini API. Warn and drop such messages instead.
Log empty message content found during construction to root-cause the
issue but allow Khoj to respond without the offending messages in
context for call to Gemini API.
New
- Support Firecrawl as a online search provider
Improve
- Fallback to other enabled online search providers on failure
- Speed up online search with Jina by excluding webpage content in search results
Fix
- Fix Jina webpage reader. Improve it to include generated alt text to each image on webpage
- Truncate online query to Serper if query exceeds max supported length
Previously query to serper with longer than max supported would throw
error instead of returning at least some results.
Truncating the onlien search query to serper to max supported length
mitigates that issue.
- Improve webpage read to include image alt text
- Improve Jina webpage search to not include each page content
- Use POST instead of GET for web search, webpage read with Jina
This avoids installing pgserver on linux arm64 docker builds, which it
doesn't currently support and isn't required to support as Khoj docker
images can use standard postgres server made available via our
docker-compose.yml
Use pgserver python package as an embedded postgres db,
installed directly as a khoj python package dependency.
This significantly simplifies self-hosting with just a `pip install khoj'.
No need to also install postgres separately.
Still use standard postgres server for multi-user, production use-cases.
- Update default anthropic chat models to latest good models.
- Now that Google supports a good text to image model. Suggest adding
that if Google AI API is setup on first run.
Previously agent slug was not considered on create even when passed
explicitly in agent creation step.
This made the default agent slug different until next run when it was
updated after creation. And didn't allow chat to work on first run
The fix to use the agent slug when explicitly passed allows users to
chat on first run.
Previously messages got Anthropic specific formatting done before
being passed to Anthropic (chat) completion functions.
Move the code to format messages of type list[ChatMessage] into Anthropic
specific format down to the Anthropic (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalized list[ChatMesssage] type of chat messages across AI API
providers
Previously we'd always request up to 3 webpage url via the prompt but
read only one of the requested webpage url.
This would degrade quality of research and default mode. As model may
request reading upto 3 webpage links but get only one of the requested
webpages read.
This change passes the number of webpages to read down to the AI model
dynamically via the updated prompt. So number of webpages requested to
be read should mostly be same as number of webpages actually read.
Note: For now, the max webpages to read is kept same as before at 1.
Previously the research mode planner ignored the current agent or
conversation specific chat model the user was chatting with. Only the
server chat settings, user default chat model, first created chat model
were considered to decide the planner chat model.
This change considers the agent chat model to be used for the planner
as well. The actual chat model picked is decided by the existing
prioritization of server > agent > user > first chat model.
This change enables the creator of a shared conversation to stop sharing the conversation publicly.
### Details
1. Create an API endpoint to enable the owner of the shared conversation to unshare it
2. Unshare a public conversations from the title pane of the public conversation on the web app
Only show the unshare button on public conversations created by the
currently logged in user. Otherwise hide the button
Set conversation.isOwner = true only if currently logged in user
shared the current conversation.
This isOwner information is passed by the get shared conversation API
endpoint
Previously messages passed to gemini (chat) completion functions
got a little of Gemini specific formatting mixed in.
These functions expect a message of type list[ChatMessage] to work
with prompt tracer etc.
Move the code to format messages of type list[ChatMessage] into gemini
specific format down to the gemini (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalize list[ChatMesssage] type of chat messages across
providers
This is analogous to how we enable extended thinking for claude models
in research mode.
Default to medium effort irrespective of deepthought for openai
reasoning models as high effort is currently flaky with regular
timeouts and low effort isn't great.
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
- Anthropic expects a 0-1 range. Gemini & OpenAI expect a 0-2 range
- Anneal temperature to explore reasoning trajectories but respond factually
- Default send_message_to_model and extract_question temps to the same
Enable configuring a Khoj AI model API for Vertex AI using GCP credentials.
Specifically use the api key & api base url fields of the AI Model API
associated with the current chat model to extract gcp region, gcp
project id & credentials. This helps create a AnthropicVertex client.
The api key field should contain the GCP service account keyfile as a
base64 encoded string.
The api base url field should be of the form
`https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`
Accepting GCP credentials via the AI model API makes it easy to use
across local and cloud environments. As it bypasses the need for a
separate service account key file on the Khoj server.
- The 3.4.1 release of sentence tranformer fixes offline load latency
of sentence transformer models (and Khoj) by avoiding call to HF
- The 4.50.0 release of transformers is resulting in
jax error (unexpected keyword argument 'flatten_with_keys') on load.
Previously google auth library was explicitly installed only for the
cloud variant of Khoj to minimize packages installed for non
production use-cases.
But it was being implicitly installed as a dependency of an explicit
package in the default installation anyway.
Making the dependency on google auth package explicit simplifies
the conditional import of google auth in code while not incurring any
additional cost in terms of space or complexity.
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
- Fix regression: Inline images were not getting passed to the AI
models since #992
- Format inline images passed to Gemini models correctly
- Format inline images passed to Anthropic models correctly
Verified vision working with inline and url images for OpenAI,
Anthropic and Gemini models.
Resolves#1112
Previously on slow connection you'd see the agent dropdown flicker
from undefined to Khoj default agent on phones and other thin screens.
This is unnecessary and jarring. Populate with default agent to remove
this issue
Previously the chat input area didn't allow inputting text while Khoj is
researching and generating response.
This change allows the user to add their next text while Khoj
responds. This should speed up interaction cycles as user can have
their next query ready to send when Khoj finishes its response.
- Trigger
Gemini 2.0 Flash doesn't always follow JSON schema in research prompt
- Details
- Use json schema to enforce generate online queries format
- Use json schema to enforce research mode tool pick format
- Support constraining Gemini model output to specified response schema
- Support constraining OpenAI model output to specified response schema
- Only enforce json output in supported AI model APIs
- Simplify OpenAI reasoning model specific arguments to OpenAI API
Previously OpenAI reasoning models didn't support stream_options and
response_format
Add reasoning_effort arg for calls to OpenAI reasoning models via API.
Right now it defaults to medium but can be changed to low or high
Previously was encoding E2B code execution text output content as b64.
This was breaking
- The AI model's ability to see the content of the file
- Downloading the output text file with appropriately encoded content
Issue created when adding E2B code sandbox in #1120
* Implement better bug issue template
* Fix IDs in new bug issue template
* Reduce, reorder and improve field descriptions in the bug issue template
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Claude 3.7 Sonnet is Anthropic's first reasoning model. It provides a
single model/api capable of standard and extended thinking. Utilize
the extended thinking in Khoj's research mode.
Increase default max output tokens to 8K for Anthropic models.
# Improve Code Tool, Sandbox
- Improve code gen chat actor to output code in inline md code blocks
- Stop code sandbox on request timeout to allow sandbox process restarts
- Use tenacity retry decorator to retry executing code in sandbox
- Add retry logic to code execution and add health check to sandbox container
- Add E2B as an optional code sandbox provider
# Improve Gemini Chat Models
- Default to non-zero temperature for all queries to Gemini models
- Default to Gemini 2.0 flash instead of 1.5 flash on setup
- Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code gen chat actor to improve correct code gen success,
especially for smaller models & models with limited json mode support
Allow specify code blocks inline with reasoning to try improve
code quality
Infer input files based on user file paths referenced in code.
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template
- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
- Batch sync files by size to try not exceed API request payload size limits
- Fix force sync of large vaults from Obsidian
- Add API endpoint to delete all indexed files by file type
- Fix to also delete file objects when call DELETE content source API
Previously if you tried to force sync a vault with more than 1000
files it would only end up keeping the last batch because the PUT API
call would delete all previous entries.
This change calls DELETE for all previously indexed data first, followed by
a PATCH to index current vault on a force sync (regenerate) request.
This ensures that files from previous batches are not deleted.
- Delete file objects on deleting content by source via API
Previously only entries were deleted, not the associated file objects
- Add new db adapter to delete multiple file objects (by name)
- Set KHOJ_ALLOWED_DOMAIN to the domain that Khoj is accessible on
from the host machine. This can be the internal i.p or domain of the
host machine.
It can be used by your load balancer/reverse_proxy to access Khoj.
For example, if the load balancer service is in the khoj docker
network, KHOJ_DOMAIN will be `server' (i.e service name).
- Set KHOJ_DOMAIN to your externally accessible DOMAIN or I.P to avoid
CSRF trusted origin or unset cookie issue when trying to access the
khoj admin panel.
Resolves#1114
- Also, when in not subscribed state, fallback to the default model when chatting with an agent
- With conversion, create a brand new agent from inside the chat view that can be managed separately
Make it easier to determine which model you're chatting with, and to effortlessly update said model from within a given chat.
In this change, we introduce a side bar that allows users to quickly change their chat model, tools, custom instructions, and file filters, directly within the chat view. This removes the need for setting up custom agents for simple instructions and mitigates the requirement to go to the settings page to verify the chat model in action.
The settings page will still configure a per-user *default*, but the sidebar will allow for greater customization based on the needs of a conversation.
We also extend the chat model to include more attributes that help users make decisions about model selection, including `strengths` and `description`. This can help people quickly understand which model might work best for their use case.
Currently, it's rather opaque and difficult to substantially browse through the uploaded knowledge base. Effectively, you can only do this through the small file modal in the settings page.
Update to include all indexed files in the search page for viewing & deletion. Function to delete all files is still in the settings page.
Add a migration that associates file objects with `entry`s using a foreign key. Add a migration command that deletes dangling fileobjects.
## Description
Added file path autocompletion to enhance the search experience in Khoj. Users can now easily search for specific files by typing "file:" followed by the file name, with real-time suggestions appearing as they type.
## Changes
- Added new API endpoint `/api/file-suggestions` to get file path suggestions
- Enhanced search UI with dropdown suggestions for file paths
- Implemented debounced search to optimize API calls
- Added keyboard (Enter) and mouse click support for selecting suggestions
## Features
- Type "file:" to trigger file path suggestions
- Real-time filtering of suggestions as you type
- Top 10 alphabetically sorted suggestions
- Case-insensitive matching
- Keyboard and mouse interaction support
- Clear visual feedback with a dropdown UI
A hidden agent basically allows each individual conversation to maintain custom settings, via an agent that's not exposed to the traditional functionalities allotted for manually created agents (e.g., browsing, maintenance in agents page).
This will be hooked up to the front-end such that any conversation that's initiated with the default agent can then be given custom settings, which in the background creates a hidden agent. This allows us to repurpose all of our existing agents infrastructure for chat-level customization.
Previously emails with url special characters would not get
successfully identified for login. Account creation was fine due to
email being in POST request body. But login with such emails did not
work due to query params not being escaped before being sent to server
This change escapes both the code and email in login URL sent to
server. So login with emails containing special characters like
`email+khoj@gmail.com' works. It fixes both the URL web app sent by
web app directly and the magic link sent to users to their email
This change also fixes accessibility issue of having a DialogTitle in
DialogContent for screen readers.
Resolves#1090
We've been having issues generating diagrams with Excalidraw that are any degree of complexity. By contrast, LLMs are able to handle Mermaid.js syntax a lot better, as it's much more forgiving and has a simpler declarative style. Refer to https://mermaid.js.org/.
Update so that new diagrams are generated with Mermaid.js, while old diagrams generated with Excalidraw can still be viewed.
Fix for #1082 pushed down adding the `data:image/webp;base64' prefix
of the base64 images to the server image gen API. But the code on the
Obsidian and Desktop client still add these prefixes.
This change stops the Desktop, Obsidian clients from adding the prefix
as it is being handled by the API now. It should resolve showing
images inline in those clients as well
Previously we'd use the general openai client, even if the image
generation model has a different api key and base url set.
This change uses the openai config of the image generation models when
set. Otherwise it fallbacks to use the first openai api provider set
This change adds the ability to use OpenAI, Azure OpenAI or any embedding model exposed behind an OpenAI compatible API (like Ollama, LiteLLM, vLLM etc.).
Khoj previously only supported HuggingFace embedding models running locally on device or via HuggingFaceW inference API endpoint. This allows using commercial embedding models to index your content with Khoj.
Previously if the call to this tool AI failed, the API call would
non-gracefully fail on server. This would leave the client hanging in
a wierd state (e.g with spinner running on web app with no indication
of issue).
Also do not show filters in query debug log lines when no filters in query
- Background
Access to the clipboard API is disabled by certain browsers in non
localhost http scenarios for security reasons.
So the copy API key button doesn't work when khoj is self-hosted
with authentication enabled at a non localhost domain
- Change
This change enables copying API keys by manual text highlight + copy
if copy button is disabled
Resolves#1070
This change makes Automations (and possibly other entrypoints) use the configured OpenAI-compatible server if that has been set. Without this change it tries to use the hardcoded OpenAI provider.
- One current issue in the Khoj application is that managing the files being referenced as the user's knowledge base is slightly opaque and difficult to access
- Add a migration for associating the fileobjects directly with the Entry objects, making it easier to get data via foreign key
- Add the new page that shows all indexed files in the search view, also allowing you to upload new docs directly from that page
- Support new APIs for getting / deleting files
- Just say using default config. This old khoj.yml settings mechamism
isn't standard, so not having a configured khoj.yml isn't a concern
- Deep link to desktop download instead of the whole download page as
android etc. are also on it, which don't help with syncing docs
OpenAI chat models deployed on Azure are (ironically) not OpenAI API compatible endpoints.
This change enables using OpenAI chat models deployed on Azure with Khoj.
The github integration has not been tested and may still be broken.
This change at least makes it easier to add repositories without
needing a PAT token if/when it does work.
- Add the mermaid package and apply front-end parsing for interpreting the diagrams. Retain processing of the excalidraw type for backwards compatibility
- Translated comments from French to English for better accessibility and understanding.
- Updated CSS comment for loading animation to reflect the change in language.
- Enhanced code readability by ensuring consistent language usage across multiple files.
- Improved user experience by clarifying the purpose of various functions and settings in the codebase.
Fix AttributeError: 'NoneType' object has no attribute 'model_extra'
* cost = chunk.usage.model_extra.get("estimated_cost", 0) if hasattr(chunk, "usage") and chunk.usage else 0 # Estimated costs returned by DeepInfra API
- Encode article urls in filename indexed in Khoj KB
Makes it easier for humans to compare, trace retrieval performance
by looking at logs than using content hash (which was previously
explored)
- Added visual loading indicators to the search modal for improved user experience during search operations.
- Implemented logic to check if search results correspond to files in the vault, with color-coded results for better clarity.
- Refactored the getSuggestions method to handle loading states and abort previous requests if necessary.
- Updated CSS styles to support new loading animations and result file status indicators.
- Improved the renderSuggestion method to display file status and provide feedback for files not in the vault.
- Added a new setting for users to configure the sync interval in minutes, allowing for more flexible automatic synchronization.
- Introduced methods to start and restart the synchronization timer based on the configured interval.
- Updated the synchronization logic to use the user-defined interval instead of a fixed 60 minutes.
- Improved code readability and organization by refactoring the sync timer logic.
- Added a new setting to manage sync folders, allowing users to specify which folders to sync or to sync the entire vault.
- Implemented a modal for folder suggestions to facilitate folder selection.
- Updated the folder list display to show currently selected folders with options to remove them.
- Improved CSS styles for chat interface and folder list for better user experience.
- Refactored code for consistency and readability across multiple files.
- Introduced a new command 'Sync new changes' to allow users to manually synchronize new modifications.
- The command updates the content index without regenerating it, ensuring only new changes are synced.
- User-triggered notifications are displayed upon successful sync.
- Format AI response to send in automation email
- Let Khoj infer chat query based on user automation query
- Decide if automation emails should be sent based on response
- Fix the `to_notify_or_not` decider AI
- Ask reason before decision to improve to_notify decider AI
- Show error message on web app when fail to create/update automation
Previously we sent the AI response directly. This change post
processes the AI response with the awareness that it is to be sent to
the user as an email to improve rendering and quality of the emails.
This tries to decouple the automation query from the chat query. So
the chat model doesn't have to know it is running in an automation
context or figure how to notify user or send automation response. It
just has to respond to the AI generated `query_to_run' corresponding
to the `scheduling_request` automation by the user.
For example, a `scheduling_request' of `notify me when X happens'
results in the automation calling the chat api with a `query_to_run`
like `tell me about X` and deciding if to notify based on information
gathered about X from the scheduled run. If these two are not
decoupled, the chat model may respond with how it can notify about X
instead of just asking about it.
Swap query_to_run with scheduling_request on the automation web page
- Rather than chunky generic cards, make the suggested actions more action oriented, around the problem a user might want to solve. Give them follow-up options. Design still in progress.
- De facto, was being assumed everywhere if authenticatedData is null, that it's not logged in. This isn't true because the data can still be loading. Update the hook to send additional states.
- Bonus: Delete model picker code and a slew of unused imports.
- Add support for seeing all steps of the agent modification flow via tabs at the top of the modal
- Separate knowledge base & tool selection into two separate parts
Use the [shadcn sidebar](https://ui.shadcn.com/docs/components/sidebar#sidebarmenusub) across Khoj and standardize how the side panel experience works across the app. This helps us generalize the code better, while re-using the same components.
Deprecates current usage of the chat history side panel, replacing it with the new `appSidebar.tsx` component.
We'll eventually move out the `Manage Files` section and move it into a separate panel dedicated to chat-level controls.
- Rather than chunky generic cards, make the suggested actions more action oriented, around the problem a user might want to solve. Give them follow-up options. Design still in progress.
- De facto, was being assumed everywhere if authenticatedData is null, that it's not logged in. This isn't true because the data can still be loading. Update the hook to send additional states.
- Bonus: Delete model picker code and a slew of unused imports.
New conversation have a slug, but older conversation may not. This
change allows those older conversations to still be shareable by using
a random uuid for constructing their url instead
- Some of the instructions were duplicated (e.g illegal, harmful)
- Return format requested was inconsistent
- Safety prompt felt overtly prudish which lowered their utility.
Make it laxer for now, add checks later if required
- Print hash in CI to ease verifying ci built python package matches
khoj package published on pypi
- Newer pypi publish github action should speed up workflow by ~30s
Major
---
Previously we couldn't enable research mode or use other slash
commands in automated tasks.
This change separates determining if a chat query is triggered via
automated task from the (other) conversation commands to run the query
with.
This unlocks the ability to enable research mode in automations
apart from other variations like /image or /diagram etc.
Minor
---
- Clean the code to get data sources and output formats
- Have some examples shown on automations page run in research mode now
This allows online search to work out of the box again
for self-hosting users, as no auth/api key setup required.
Docker users do not need to change anything in their setup flow.
Direct installers can setup Searxng locally or use public instances if
they do not want to use any of the other providers (like Jina, Serper)
Resolves#749. Resolves#990
This allows online search to work out of the box again for
self-hosting users, as no auth/api key setup required.
Docker users do not need to change anything in their setup flow.
Direct installers can setup searxng locally or use public instances if
they do not want to use any of the other providers (like Jina, Serper)
Resolves#749. Resolves#990
The welcome, feedback and automation emails were still using the Khoj
domain, which wouldnt work for self-hosted users with their RESEND key
Resolves#1004
- Previous was incorrectly plural but was defining only a single model
- Rename chat model table field to name
- Update documentation
- Update references functions and variables to match new name
- Move khoj message border to left like in web ui
- Remove user, sender emojis and name
- Ensure conversation title always at top of chat sessions view,
even if no chat sessions loaded yet, instead of causing layout shift
on chat sessions load
Improve handling of multiple output modes
- Use the generated descriptions / inferred queries to supply context to the model about what it's created and give a richer response
- Stop sending the generated image in user message. This seemed to be confusing the model more than helping.
- Collect generated assets in a structured objects to provide model context. This seems to help it follow instructions and separate responsibility better
- Also, rename the open ai converse method to converse_openai to follow patterns with other providers
- Use the generated descriptions / inferred queries to supply context to the model about what it's created and give a richer response
- Stop sending the generated image in user message. This seemed to be confusing the model more than helping.
- Also, rename the open ai converse method to converse_openai to follow patterns with other providers
Previously id were used (by default) for model display strings.
This made it hard to select chat model options, server chat settings
etc. in the admin panel dropdowns.
This change uses more recognizable names for the DB objects to ease
selection in dropdowns and display in general on the admin panel.
* Rename OpenAIProcessorConversationConfig to more apt AiModelAPI
The DB model name had drifted from what it is being used for,
a general chat api provider that supports other chat api providers like
anthropic and google chat models apart from openai based chat models.
This change renames the DB model and updates the docs to remove this
confusion.
Using Ai Model Api we catch most use-cases including chat, stt, image generation etc.
Simplify the desktop app
- Make the desktop app mainly a file-syncing client for users who have lots of documents that they need to share with Khoj. This is because the web app provides a fairly robust chat client which can be used by anyone on their computer.
- The chat client in the desktop app had significantly drifted from our current brand / them, and didn't provide enough value add to update. Later, we will make it easier to install the existing web app as a desktop PWA.
Currently, Khoj has terminal states with respect to what assets it outputs. We limit it to image, text, and excalidraw diagram. This limitation is unnecessary and provides undue constraints for creating more dynamic, diverse experiences. For instance, we may want the chat view to morph for document editing or generation, in which case having limited output modes would be a detriment.
This change allows us to emit generated assets and then continue on to more text generation in final response. It forces text response for all messages. It adds a new stream event, GENERATED_ASSETS, which holds content that the AI is emitting in response to the query.
Overview
- The default django admin panel UI looks pretty dated and didn't
have any Khoj specific branding
- Used the Unfold Django admin panel theme for a modern look
- Used the Khoj logo and name in Admin panel title, headings, favicons
Details:
All models shown on Admin panel need to inherit from unfold's
ModelAdmin to get styling applied. So
- Make all models on Admin panel inherit from unfold's ModelAdmin
- Subclassed UserAdmin to inherit from unfold's ModelAdmin
- Deregistered the unused Auth Group model from the Admin panel
We can add it back when its actually used. Avoid confusion for now
- Explicitly register DjangoJobExecution on admin panel and again make
it inherit from the unfold.admin.ModelAdmin
- Make the desktop app mainly a file-syncing client for users who have lots of documents that they need to share with Khoj. This is because the web app provides a fairly robust chat client which can be used by anyone on their computer.
- The chat client in the desktop app had significantly drifted from our current brand / them, and didn't provide enough value add to update. Later, we will make it easier to install the existing web app as a desktop PWA.
- Use bubblewrap generated splash screen, notification icons from
1200x1200 high res khoj icon in assets.khoj.dev.
- Discard bubblewrap generated launcher icons.
- Fix orientation to respect phone orientation settings. "any" was not.
- Add 512, 192 Khoj maskable icons to web app manifest for android rendering
- Add id, categories etc suggested by pwabuilder
- Use higher quality icon images for splash screen than what
bubblewrap creates by default
Encode api key in header, POST request body and GET query param for
search as utf-8 to avoid the multibyte char in request issue when
making API calls from khoj.el to khoj server.
Resolves#935
### Issue
- Path with / are converted to \\ on Windows using the `Path' operator.
- The `markdown_to_entries' module was trying to normalize file paths with`Path' for some reason.
This would store the file paths in DB Entry differently than the file to entries map if Khoj ran on Windows.
That'd result in a KeyError when trying to look up the entry file path from `file_to_text_map' in the `text_to_entries:update_embeddings()' function.
### Fix
- Removing the unnecessary OS dependent Path normalization in `markdown_to_entries' should keep the file path storage consistent across `file_to_text_map' var, `FileObjectAdaptor', `Entry' DB tables on Windows for Markdown files as well.
This issue will affect users hosting Khoj server on Windows and attempting to index markdown files.
Resolves#984
- Add type hints to improve maintainability of stabilzed indexing code
- It shouldn't be necessary to wrap string variables in an f-string
This change aims to improve code quality. It should not affect
functionality.
Issue
- Path with / are converted to \\ on Windows using the Path operator.
- The markdown to entries method for some reason was doing this.
This would store the file paths in DB entry differently than the file
to entries map. Resulting in a KeyError when trying to look up the
entry file path from file_to_text_map in the
text_to_entries:update_embeddings() function.
Fix
- Removing the unnecessary OS dependendent Path normalization in
markdown_to_entries should keep the file path storage consistent
across file_to_text_map var, FileObjectAdaptor, Entry DB tables on
Windows for Markdown files as well
This issue would only affect users hosting Khoj server on Windows and
attempting to index markdown files.
Resolves#984
- Added it to the Django migrations so that it auto-triggers when someone updates their server and starts it up again for the first time. This will require that they update their clients as well in order to view/consume image content.
- Remove server-side references in the code that allow to parse the text-to-image intent as it will no longer be necessary, given the chat logs will be migrated
This avoid unnecessarily throwing an internal server error when the
user tries to sign-up using multiple mechanisms (e.g first by email, then
by google oauth)
- Improve escaping to load complex json objects
- Fallback to a more forgiving [json5](https://json5.org/) loader if `json.loads` cannot parse complex json str
This should reduce failures to pick research tool and run code by agent
JSON5 spec is more flexible, try to load using a fast json5 parser if
the stricter json.loads from the standard library can't load the
raw complex json string into a python dictionary/list
Gemini doesn't work well when trying to output json objects. Using it
to output raw json strings with complex, multi-line structures
requires more intense clean-up of raw json string for parsing
- Use pre-built wheels for torch and llama-cpp-python
- Install and link musl as it's used by llama-cpp-python pre-built
wheel instead of glibc
- Join Install git and Install Dependencies steps in pytest workflow
To remove unnecessary steps
- Building arm64 image on an ubuntu arm64 runner reduces `yarn build'
step time by 75% from 12mins to 3mins.
- This is because no QEMU emulation for arm64 on x86 is required now
- Parallelizing x64 and arm64 platform builds halves build time on top
- Revert to use standard ubuntu-latest runner as large x64 runner
doesn't give much more speed improvements
This results an effective additional 50%-66% reduction in build time
on top of #987.
So a full dockerize workflow run now takes *10 mins* vs previous 35+mins.
This is a total of *72% improvement* in max dockerize run time.
Get additional speed improvements when docker layer cache hit.
## Objective
Improve build speed and size of khoj docker images
## Changes
### Improve docker image build speeds
- Decouple web app and server build steps
- Build the web app and server in parallel
- Cache docker layers for reuse across dockerize github workflow runs
- Split Docker build layers for improved cacheability (e.g separate `yarn install` and `yarn build` steps)
### Reduce size of khoj docker images
- Use an up-to-date `.dockerignore` to exclude unnecessary directories
- Do not installing cuda python packages for cpu builds
### Improve web app builds
- Use consistent mechanism to get fonts for web app
- Make tailwind extensions production instead of dev dependencies
- Make next.js create production builds for the web app (via `NODE_ENV=production` env var)
The current fix should improve Khoj responses when charts in response
context. It truncates code context before sharing with response chat actors.
Previously Khoj would respond with it not being able to create chart
but than have a generated chart in it's response in default mode.
The truncate code context was added to research chat actor for
decision making but it wasn't added to conversation response
generation chat actors.
When khoj generated charts with code for its response, the images in
the context would exceed context window limits.
So the truncation logic to drop all past context, including chat
history, context gathered for current response.
This would result in chat response generator 'forgetting' all for the
current response when code generated images, charts in response context.
It needs to be used across routers and processors. It being in
run_code tool makes it hard to be used in other chat provider contexts
due to circular dependency issues created by
send_message_to_model_wrapper func
Previous changes to depend on just the PROMPTRACE_DIR env var instead
of KHOJ_DEBUG or verbosity flag was partial/incomplete.
This fix adds all the changes required to only depend on the
PROMPTRACE_DIR env var to enable/disable prompt tracing in Khoj.
Pass your domain cert files via the --sslcert, --sslkey cli args.
For example, to start khoj at https://example.com, you'd run command:
KHOJ_DOMAIN=example.com khoj --sslcert example.com.crt --sslkey
example.com.key --host example.com
This sets up ssl certs directly with khoj without requiring a
reverse proxy like nginx to serve khoj behind https endpoint for
simple setups. More complex setups should, of course, still use a
reverse proxy for efficient request processing
- Track, return cost and usage metrics in chat api response
Track input, output token usage and cost of interactions with
openai, anthropic and google chat models for each call to the khoj chat api
- Collect, display and store costs & accuracy of eval run currently in progress
This provides more insight into eval runs during execution
instead of having to wait until the eval run completes.
Collect, display and store running costs & accuracy of eval run.
This provides more insight into eval runs during execution instead of
having to wait until the eval run completes.
- Previously when settings list became long the dropdown height would
overflow screen height. Now it's max height is clamped and y-scroll
- Previously the dropdown content would take width of content. This
would mean the menu could sometimes be less wide than the button. It
felt strange. Now dropdown content is at least width of parent button
- Track input, output token usage and cost for interactions
via chat api with openai, anthropic and google chat models
- Get usage metadata from OpenAI using stream_options
- Handle openai proxies that do not support passing usage in response
- Add new usage, end response events returned by chat api.
- This can be optionally consumed by clients at a later point
- Update streaming clients to mark message as completed after new
end response event, not after end llm response event
- Ensure usage data from final response generation step is included
- Pass usage data after llm response complete. This allows gathering
token usage and cost for the final response generation step across
streaming and non-streaming modes
- Previously online chat actors, director tests only worked with openai.
This change allows running them for any supported onlnie provider
including Google, Anthropic and Openai.
- Enable online/offline chat actor, director in two ways:
1. Explicitly setting KHOJ_TEST_CHAT_PROVIDER environment variable to
google, anthropic, openai, offline
2. Implicitly by the first API key found from openai, google or anthropic.
- Default offline chat provider to use Llama 3.1 3B for faster, lower
compute test runs
- Set output mode to single string. Specify output schema in prompt
- Both thesee should encourage model to select only 1 output mode
instead of encouraging it in prompt too many times
- Output schema should also improve schema following in general
- Standardize variable, func name of io selector for readability
- Fix chat actors to test the io selector chat actor
- Make chat actor return sources, output separately for better
disambiguation, at least during tests, for now
- Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across *general*, *default* and *research* modes
- Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models
- Trigger eval workflow on release or manually
- Make dataset, khoj mode and sample size configurable when triggered via manual workflow
- Enable Web search, webpage read tools during evaluation
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously, we'd replace the generated message with an error message
when message generation stopped via stop button on chat page of web app.
So the partially generated message (which could be useful) gets lost.
This change just stops generation, while keeping the generated
response so any useful information from the partially generated
message can be retrieved.
- Allows managing chat models in the OpenAI proxy service like Ollama.
- Removes need to manually add, remove chat models from Khoj Admin Panel
for these OpenAI compatible API services when enabled.
- Khoj still mantains the chat models configs within Khoj, so they can
be configured via the Khoj admin panel as usual.
Previously Jina search didn't API key. Now that it does need API key,
we should re-use the API key set in the Jina web scraper config,
otherwise fallback to using JINA_API_KEY from environment variable, if
either is present.
Resolves#978
- Integrate with Ollama or other openai compatible APIs by simply
setting `OPENAI_API_BASE' environment variable in docker-compose etc.
- Update docs on integrating with Ollama, openai proxies on first run
- Auto populate all chat models supported by openai compatible APIs
- Auto set vision enabled for all commercial models
- Minor
- Add huggingface cache to khoj_models volume. This is where chat
models and (now) sentence transformer models are stored by default
- Reduce verbosity of yarn install of web app. Otherwise hit docker
log size limit & stops showing remaining logs after web app install
- Suggest `ollama pull <model_name>` to start it in background
- Update to latest initialize with new claude 3.5 sonnet and haiku models
- Update to set vision enabled for google and anthropic models by
default. Previously we didn't support but we've supported this for a
month or two now
- Just load the raw csv from OpenAI bucket. Normalize it into FRAMES format
- Improve docstring for frames datasets as well
- Log the load dataset perf timer at info level
- Explictly adding a slash command is a higher priority intent than
research mode being enabled in the background. Respect that for a
more intuitive UX flow.
- Explicit slash commands do not currently work in research mode.
You've to turn research mode off to use other slash commands. This
is strange, unnecessary given intent priority is clear.
- JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically.
- Make slight improvements to tool selection indicators
Previously errors would get eaten up but the model wouldn't see
anything. And the model wouldn't be allowed re-run the same query-tool
combination in the next iteration.
This update should give it insight into why it didn't get a result. So
it can make an informed (hopefully better) decision on what to do next.
And re-run the previous query if appropriate.
Previously when call to online search API etc. failed, it'd error out
of response to query in research mode. Khoj should skip tool use that
iteration but continue to try respond.
Previously chatml messages were just strings, now they can be list of
strings or list of dicts as well.
- Use json seriallization to manage their variations and truncate them
before printing for context.
- Put logic in single function for use across chat models
- Default to evaluation decision of None when either agent or
evaluator llm fails. This fixes accuracy calculations on errors
- Fix showing color for decision True
- Enable arg flags to specify output results file paths
Previously chatml messages were just strings.
Since gemini, anthropic models always have messages as list of
strings, truncate those strings instead of the list of message content
Removing binary data and truncating large data in output files
generated by code runs should improve speed and cost of research mode
runs with large or binary output files.
Previously binary data in code results was passed around in iteration
context during research mode. This made the context inefficient
because models have limited efficiency and reasoning capabilities over
b64 encoded image (and other binary) data and would hit context limits
leading to unnecessary truncation of other useful context
Also remove image data when logging output of code execution
- Allow passing user files as input into code sandbox for analysis
- Update prompt to give more example of complex, multi-line code
- Simplify logic for model. Run one program at a time,
instead of allowing model to run multiple programs in parallel
- Show Code generated charts and docs in Reference pane of web app and make them downloaded
- Add a border below heading
- Show code snippet in pre block
- Overflow-x when reference side panel open to allow seeing whole text
via x-scroll
- Align header, body position of reference cards with each other
- Only show filename in doc reference cards at message bottom.
Show full file path in hover and reference side panel
- Improve rendering code reference with better icons, smaller text and
different line clamps for better visibility
- Show code output files as sub card of code card in reference section
- Allow downloading files generated by code instead of rendering it in
chat message directly
- Show executed code before online references in reference panel
- Fix to render code generated chart with images, excalidraw diagrams
- Fix to save code context to chat history in image, diagram output modes
- Fix bug in image markdown being wrapped twice in markdown syntax
- Render newline in code references shown on chat page of web app
Previously newlines weren't getting rendered. This made the code
executed by Khoj hard to read in references. This changes fixes that.
`dangerouslySetInnerHTML' usage is justified as rendered code
snippet is being sanitized by DOMPurify before rendering.
- Run one program at a time, instead of allowing model to pass
multiple programs to run in parallel to simplify logic for model
- Update prompt to give more example of complex, multi-line code
- Allow passing user files as input into code sandbox for analysis
- Log code execution timer at info level to evaluate execution latencies
in production
- Type the generated code for easier processing by caller functions
Support including file attachments in the chat message
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).
This will ensure only unique online references are shown in all
clients.
The duplication issue was exacerbated in research mode as even with
different online search queries, you can get previously seen results.
This change does a global deduplication across all online results seen
across research iterations before returning them in client reponse.
- Deduplicate online, doc search queries across research iterations.
This avoids running previously run online, doc searches again and
dedupes online, doc context seen by model to generate response.
- Deduplicate online search queries generated by chat model for each
user query.
- Do not pass online, docs, code context separately when generate
response in research mode. These are already collected in the meta
research passed with the user query
- Improve formatting of context passed to generate research response
- Use xml tags to delimit context. Pass per iteration queries in each
iteration result
- Put user query before meta research results in user message passed
for generating response
This deduplications will improve speed, cost & quality of research mode
Previously the whole research mode response would fail if the pick
next tool call to chat model failed. Now instead of it completely
failing, the researcher actor is told to try again in next iteration.
This allows for a more graceful degradation in answering a research
question even if a (few?) calls to the chat model fail.
Jina search API returns content of all webpages in search results.
Previously code wouldn't remove content beyond max_webpages_to_read
limit set. Now, webpage content in organic results aree explicitly
removed beyond the requested max_webpage_to_read limit.
This should align behavior of online results from Jina with other
online search providers. And restrict llm context to a reasonable size
when using Jina for online search.
This fixes chat with old chat sessions. Fixes issue with old Whatsapp
users can't chat with Khoj because chat history doc context was
stored as a list earlier
Command rate limit wouldn't be shown to user as server wouldn't be
able to handle HTTP exception in the middle of streaming.
Catch exception and render it as LLM response message instead for
visibility into command rate limiting to user on client
Log rate limmit messages for all rate limit events on server as info
messages
Convert exception messages into first person responses by Khoj to
prevent breaking the third wall and provide more details on wht
happened and possible ways to resolve them.
Previously the batch start index wasn't being passed so all batches
started in parallel were showing the same processing example index
This change doesn't impact the evaluation itself, just the index shown
of the example currently being evaluated
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
- Why
We need better, automated evals to measure performance shifts of Khoj
across prompt, model and capability changes.
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of AI agents. It's a good starter benchmark to evaluate Khoj.
- Details
This PR adds an eval script to evaluate Khoj responses on the the FRAMES
benchmark prompts against the ground truth provided by it.
Script allows configuring sample size, batch size, sampling queries from the
eval dataset.
Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth
data from the benchmark.
This was previously required, but now it's only usefuly for more
advanced settings, not typical for self-hosting users.
With recent updates, the user's selected chat model is used for both
Khoj's train of thought and response. This makes it easy to
switch your preferred chat model directly from the user settings
page and not have to update this in the admin panel as well.
Reflect these code changse in the docs, by removing the unnecessary
step for self-hosted users to create a server chat setting when using
an OpenAI proxy service like Ollama, LiteLLM etc.
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models.
- The migration script fo ra new model needs to be updated to accommodate full regeneration.
Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of an agent.
The script uses Gemini as an LLM Judge to evaluate Khoj responses to
the FRAMES benchmark prompts against the ground truth provided by it.
- Improve chat actors and their prompts for research mode.
- Add documentation to enable the code tool when self-hosting Khoj
- Edit Chat Messages
- Store Turn Id in each chat message.
- Expose API to delete chat message.
- Expose delete chat message button to turn delete chat message from web app
- Set LLM Generation Seed for Reproducible Debugging and Testing
- Setting seed for LLM generation is supported by Llama.cpp and OpenAI models.
This can (somewhat) restrain LLM output
- Getting fixed responses for fixed inputs helps test, debug longer reasoning chains like used in advanced reasoning
## Overview
Khoj can now go into research mode and use a python code interpreter. These are experimental features that are being released early for feedback and testing.
- Research mode allows Khoj to dynamically select the tools it needs to best answer the question. It is also allowed more iterations to get to a satisfactory answer. Its more dynamic train of thought is shown to improve visibility into its thinking.
- Adding ability for Khoj to use a python code interpreter is an adjacent capability. It can help Khoj do some data analysis and generate charts for you. A sandboxed python to run code is provided using [cohere-terrarium](https://github.com/cohere-ai/cohere-terrarium?tab=readme-ov-file), [pyodide](https://pyodide.org/).
## Analysis
Research mode (significantly?) improves Khoj's information retrieval for more complex queries requiring multi-step lookups but takes longer to run. It can research for longer, requiring less back-n-forth with the user to find an answer.
Research mode gives most gains when used with more advanced chat models (like o1, 4o, new claude sonnet and gemini-pro-002). Smaller models improve their response quality but tend to get into repetitive loops more often.
## Next Steps
- Get community feedback on research mode. What works, what fails, what is confusing, what'd be cool to have.
- Tune Khoj's capabilities for longer autonomous runs and to generalize across a larger range of model sizes
## Miscellaneous Improvements
- Khoj's train of thought is saved and shown for all messages, not just the latest one
- Render charts generated by Khoj and code running using the code tool on the web app
- Align chat input color to currently selected agent color
- Dedent code for readability
- Use better name for in research mode check
- Continue to remove inferred summarize command when multiple files in
file filter even when not in research mode
- Continue to show select information source train of thought.
It was removed by mistake earlier
## Overview
Use git to capture prompt traces of khoj's train of thought. View, analyze and debug them using your favorite git client (e.g vscode, magit).
- Each commit captures an interaction with an LLM
The commit writes the query, response and system message each to a separate file in the repo.
The commit message captures the chat model, Khoj version and other metadata
- Each conversation turn can have multiple interactions with an LLM (e.g Khoj's train of thought)
- Each new conversation turn forks from and merges back into its conversation branch
- Each new conversation branches from the user branch
- Each new user branches from root commit on the main branch
## Usage
1. Set `KHOJ_DEBUG=true` or start khoj in very verbose mode with `khoj -vv` to turn on prompt tracing
2. Chat with Khoj as usual
3. Open the promptrace git repo to view the generated prompt traces using your favorite git porcelain.
The Khoj prompt trace git repo is created at `/tmp/khoj_promptrace` by default. You can configure the prompt trace directory by setting the `PROMPTRACE_DIR`environment variable.
## Implementation
- Add utility functions to capture prompt traces using git (via `gitpython`)
- Make each model provider in Khoj commit their LLM interactions with promptrace
- Weave chat metadata from chat API through all chat actors and commit it to the prompt trace
- Match the online query generator prompt to match the formatting of
extract questions
- Separate iteration results by newline
- Improve webpage and online tool descriptions
- Allow server to start if loading embedding model fails with an error.
This allows fixing the embedding model config via admin panel.
Previously server failed to start if embedding model was configured
incorrectly. This prevented fixing the model config via admin panel.
- Convert boolean string in config json to actual booleans when passed
via admin panel as json before passing to model, query configs
- Only create default model if no search model configured by admin.
Return first created search model if its been configured by admin.
Models were getting a bit confused about who is search for who's
information. Using third person to explicitly call out on who's behalf
these searches are running seems to perform better across
models (gemini's, gpt etc.), even if the role of the message is user.
Use placeholder for newline in json object values until json parsed
and values extracted. This is useful when research mode models outputs
multi-line codeblocks in queries etc.
Anthropic API doesn't have ability to enforce response with valid json
object, unlike all the other model types.
While the model will usually adhere to json output instructions.
This step is meant to more strongly encourage it to just output json
object when response_type of json_object is requested.
Separate conversation history with user from the conversation history
between the tool AIs and the researcher AI.
Tools AIs don't need top level conversation history, that context is
meant for the researcher AI.
The invoked tool AIs need previous attempts at using the tool in this
research runs iteration history to better tune their next run.
Or at least that is the hypothesis to break the models looping.
Models weren't generating a diverse enough set of questions. They'd do
minor variations on the original query. What is required is asking
queries from a bunch of different lenses to retrieve the requisite
information.
This prompt updates shows the AIs the breadth of questions to by
example and instruction. Seem like performance improved based on vibes
- Improve mobile friendliness with new research mode toggle, since chat input area is now taking up more space
- Remove clunky title from the suggestion card
- Fix fk lookup error for agent.creator
Overview
---
- Put context into separate user message before sending to chat model.
This should improve model response quality and truncation logic in code
- Pass online context from chat history to chat model for response.
This should improve response speed when previous online context can be reused
- Improve format of notes, online context passed to chat models in prompt.
This should improve model response quality
Details
---
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Align context passed to offline chat model with other chat models
- Pass context in separate message for better separation between user
query and the shared context
- Pass filename in context
- Add online results for webpage conversation command
Context role was added to allow change message truncation order based
on context role as well.
Revert it for now since currently this is not currently being done.
Previously model would rarely read webpages after webpage search. Need
the model to webpages more regularly for deeper research and to stop
getting stuck in repetitive online search loops
Previous passing of online results as json dump in prompts was less
readable for humans, and I'm guessing less readable for
models (trained on human data) as well?
- Start from this branches src/khoj/routers/api_chat.py
Add tracer to all old and new chat actors that don't have it set
when they are called.
- Update the new chat actors like apick next tool etc to use tracer too
- Conflicts:
Combine both sides of the conflict in all 3 files below
- src/khoj/processor/conversation/utils.py
- src/khoj/routers/helpers.py
- src/khoj/utils/helpers.py
- Message train of thought forks and merges from its conversation branch
- Conversation branches from user branch
- User branches from root commit on the main branch
- Weave chat tracer metadata from api endpoint through all chat actors
and commit it to the prompt trace
### Major
- Give Vision to Anthropic models in Khoj
### Minor
- Reuse logic to format messages for chat with anthropic models
- Make the get image from url function more versatile and reusable
- Encourage output mode chat actor to output only json and nothing else
- Use a single standard search model across the server. There's diminishing benefits for having multiple user-customizable search models.
- We may want to add server-level customization for specific tasks
- Store the search model used to generate a given entry on the `Entry` object
- Remove user-facing APIs and view
- Add a management command for migrating the default search model on the server
In a future PR (after running the migration), we'll also remove the `UserSearchModelConfig`
Latest claude model wanted to say more than just give the json output.
The updated prompt encourages the model to ouput just json. This is
similar to what is already being done for other prompts
It was previously added under the google utils. Now it can be used by
other conversation processors as well.
The updated function
- can get both base64 encoded and PIL formatted images from url
- will return the media type of the image as well in response
* Create explicit flow to enable the free trial
The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive.
* Use the Subscription Type enum instead of hardcoded strings everywhere
* Use length of free trial in the frontend code as well
Had temporarily updated the default selected agent to last used.
Revert for now as
1. The previous logic was buggy. It didn't select the default agent
even when the last used agent was the default agent. Which would
require more work.
2. It maybe too early anyway to set the default agent to last used.
Adding div elements to message to render degraded text copied to
clipboard for messages with user uploaded images.
This change fixes that by separating message to render from message
for clipboard. It ensures differently formatted forms of the user
images are added to the two to allow proper rendering while still
having decently formatted text copied to clipboard
Add newline instead of sending message when hit Enter key on mobile
displays. As on phones shift key doesn't exist and send button is easily
clickable.
Limit hitting Enter key to send message to computers = larger display
= expected to have full fledged keyboards.
## Overview
Allow quickly selecting, switching agents from agents pane on home page of web app
## Details
- Show all agents in carousel on home screen agent pane of web app
- Smart Sort
1. Pin default agent as first for ease of access
2. Show used agents by MRU for ease of access
3. Shuffle unused agents for discoverability
- Select most recently used agent to chat with by default
- Push smart sort logic down to API
- Common logic can be reused across clients
- Agent sort was previously done in web app
- Focus on chat input on agent select
- Double click agent on home page to open edit agent card on agents page
## Overview
- Add vision support for Gemini models in Khoj
- Allow sharing multiple images as part of user query from the web app
- Handle multiple images shared in query to chat API
- Remove border from agent detail hover card on home page
- Do not wrap long agent names in agent pills on home page
- Handle scenario where chatInputRef is null
Add support for generating dynamic diagrams in flow with Excalidraw (https://github.com/excalidraw/excalidraw). This happens in three steps:
1. Default information collection & intent determination step.
2. Improving the overall guidance of the prompt for generating a JSON, Excalidraw-compatible declaration.
3. Generation of the diagram to output to the final UI.
Add support in the web UI.
Previously only notes context from chat history was included.
This change includes online context from chat history for model to use
for response generation.
This can reduce need for online lookups by reusing previous online
context for faster responses. But will increase overall response time
when not reusing past online context, as faster context buildup per
conversation.
Unsure if inclusion of context is preferrable. If not, both notes and
online context should be removed.
The document, online search context are now passed as separate user
messages to chat model, instead of being added to the final user message.
This will improve
- Models ability to differentiate data from user query.
That should improve response quality and reduce prompt injection
probability
- Make truncation logic simpler and more robust
When context window hit, can simply pop messages to auto truncate
context in order of context, user, assistant message for each
conversation turn in history until reach current user query
The complex, brittle logic to extract user query from context in
last user message isn't required.
Marking the context message with assistant role doesn't translate well
across chat models. E.g
- Gemini can't handle consecutive messages by role = model well
- Claude will merge consecutive messages by same role. In current
message ordering the context message will result get merged into the
previous assistant response. And if move context message after user
query. The truncation logic will have to hop and skip while doing
deletions
- GPT seems to handle consecutive roles of any type fine
Using context role = user generalizes better across chat models for
now and aligns with previous behavior.
Improve separation of note snippets and show its origin file in notes
prompt to have more readable, contextualized text shared with model.
Previously the references dict was being directly passed as a string.
The documents don't look well formatted and are less intelligible.
- Passing file path along with notes snippets will help contextualize
the notes better.
- Better formatting should help with making notes more readable by the
chat model.
- Double click on agent to open edit agent card
- Focus on chat input pane when agent selected/clicked
for quick, smooth agent switch and message flow
- Hover on agent to see agent detail card on non-mobile displays
- Use debounce to only show when hover on card for a bit
- Default to None for the input_tools and output_modes so that they can be managed in the admin panel
- Hold off on showing off all Public Agents until we have a better experience for user profiles etc.
Have get agents API return agents ordered intelligently
- Put the default agent first
- Sort used agents by most recently chatted with agent for ease of access
- Randomly shuffle the remaining unused agents for discoverability
This change wraps the agent pane in a scroll area with all agents shown.
It allows selecting an agent to chat with directly from the home
screen without breaking flow and having to jump to the agents page.
The previous flow was not convenient to quickly and consistently start
chat with one of your standard agents.
This was because a random subet of agents were shown on the home page.
To start chat with an agent not shown on home screen load you had to
open the agents page and initiate the conversation from there.
Exposes a transient switch with available agents as selectable options
in the Khoj chat sub-menu.
Currently shows agent slugs instead of agent names as options. This
isn't the cleanest but gets the job done for now.
Only new conversations with a different agent can be started. Existing
conversations will continue with the original agent it was created with.
The ability to switch the conversation's agent doesn't exist on the
server yet.
One limitation of this methodology is that localStorage has a limit in how much data it can take. Should add more graceful error handling here as well.
Currently experiencing difficulty instruction following when an image is shared. It's more likely to try and output an image. Update to make a clearer distinction.
- Put the attached images display div inside the same parent div as
the text area
- Keep the attachment, microphone/send message buttons aligned with
the text area. So the attached images just show up at the top of the
text area but everything else stays at the same horizontal height as
before.
- This improves the UX by
- Ensuring that the attached images do not obscure the agents pane
above the chat input area
- The attached images visually look like they are inside the actual
input area, rather than floating above it. So the visual aligns
with the semantics
Previously the web app only expected a single image to be shared by
the user as part of their query.
This change allows sharing multiple images from the web app.
Closes#921
Previously Khoj could respond to a single shared image at a time.
This changes updates the chat API to accept multiple images shared by
the user and send it to the appropriate chat actors including the
openai response generation chat actor for getting an image aware
response
Recent changes made Khoj try respond even when document lookup fails.
This change missed handling downstream effects of a failed document
lookup, as the defiltered_query was null and so the text response
didn't have the user query to respond to.
This code initializes defiltered_query to original user query to
handle that.
Also response_type wasn't being passed via
send_message_to_model_wrapper_sync unlike in the async scenario
## Overview
### New
- Support using Firecrawl(https://firecrawl.dev) to read web pages
- Add, switch and re-prioritize web page reader(s) to use via the admin panel
### Speed
- Improve response speed by aggregating web page read, extract queries to run only once for each web page
### Response Resilience
- Fallback through enabled web page readers until web page read
- Enable reading web pages on the internal network for self-hosted Khoj running in anonymous mode
- Try respond even if web search, web page read fails during chat
- Try respond even if document search via inference endpoint fails
### Fix
- Return data sources to use if exception in data source chat actor
## Details
### Configure web page readers to use
- Only the web scraper set in Server Chat Settings via the Django admin panel, if set
- Otherwise use the web scrapers added via the Django admin panel (in order of priority), if set
- Otherwise, use all the web scrapers enabled by settings API keys via environment variables (e.g `FIRECRAWL_API_KEY', `JINA_API_KEY' env vars set), if set
- Otherwise, use Jina to web scrape if no scrapers explicitly defined
For self-hosted setups running in anonymous-mode, the ability to directly read webpages is also enabled by default. This is especially useful for reading webpages in your internal network that the other web page readers will not be able to access.
### Aggregate webpage extract queries to run once for each distinct web page
Previously, we'd run separate webpage read and extract relevant
content pipes for each distinct (query, url) pair.
Now we aggregate all queries for each url to extract information from
and run the webpage read and extract relevant content pipes once for
each distinct URL.
Even though the webpage content extraction pipes were previously being
run in parallel. They increased the response time by
1. adding more ~duplicate context for the response generation step to read
2. being more susceptible to variability in web page read latency of the parallel jobs
The aggregated retrieval of context for all queries for a given
webpage could result in some hit to context quality. But it should
improve and reduce variability in response time, quality and costs.
This should especially help with speed and quality of online search
for offline or low context chat models.
- Simplifies changing order in which web scrapers are invoked to read
web page by just changing their priority number on the admin panel.
Previously you'd have to delete/, re-add the scrapers to change
their priority.
- Add help text for each scraper field to ease admin setup experience
- Friendlier env var to use Firecrawl's LLM to extract content
- Remove use of separate friendly name for scraper types.
Reuse actual name and just make actual name better
The other webpage scrapers will not work for internal webpages. Try
access those urls directly if they are visible to the Khoj server over
the network.
Only enable this by default for self-hosted, single user setups.
Otherwise ability to scan internal network would be a liability!
For use-cases where it makes sense, the Khoj server admin can
explicitly add the direct webpage scraper via the admin panel
- Set up scrapers via API keys, explicitly adding them via admin panel
or enabling only a single scraper to use via server chat settings.
- Use validation to ensure only valid scrapers added via admin panel
Example API key is present for scrapers that require it etc.
- Modularize the read webpage functions to take api key, url as args
Removes dependence on constants loaded in online_search. Functions
are now mostly self contained
- Improve ability to read webpages by using the speed, success rate of
different scrapers. Optimal configuration needs to be discovered
This should reduce webpage read and response generation time.
Previously, we'd run separate webpage read and extract relevant
content pipes for each distinct (query, url) pair.
Now we aggregate all queries for each url to extract information from
and run the webpage read and extract relevant content pipes once for
each distinct url.
Even though the webpage content extraction pipes were previously being
in parallel. They increased response time by
1. adding more context for the response generation chat actor to
respond from
2. and by being more susceptible to page read and extract latencies of
the parallel jobs
The aggregated retrieval of context for all queries for a given
webpage could result in some hit to context quality. But it should
improve and reduce variability in response time, quality and costs.
Set the FIRECRAWL_TO_EXTRACT environment variable to true to have
Firecrawl scrape and extract content from webpage using their LLM
This could be faster, not sure about quality as LLM used is obfuscated
Firecrawl is open-source, self-hostable with a default hosted service
provided, similar to Jina.ai. So it can be
1. Self-hosted as part of a private Khoj cloud deployment
2. Used directly by getting an API key from the Firecrawl.dev service
This is as an alternative to Olostep and Jina.ai for reading webpages.
Khoj shouldn't refuse to respond to user if web lookups fail.
It should transparently mention that online search etc. failed.
But try respond as best as it can without those references
This change ensures a response to the users query is attempted even
when web info retrieval fails.
The huggingface endpoint can be flaky. Khoj shouldn't refuse to
respond to user if document search fails.
It should transparently mention that document lookup failed.
But try respond as best as it can without the document references
This changes provides graceful failover when inference endpoint
requests fail either when encoding query or reranking retrieved docs
- Remove unused subscribed variable from the chat API
- Unexpectedly dropped client app logging when migrated API chat to do
advanced streaming in july
- Only set addedFiles to selectedFiles when selectedFiles is an array
- Only set seleectedFiles, addedFiles to API response json when
response succeeded. Previously we set it to response json
on errors as well. This made the variables into json objects instead
of arrays on API call failure
- Check if selectedFiles, addedFiles are arrays before running
operations on them. Previously the addedFiles.includes was where the
code would fail
finish_reason (google.ai.generativelanguage_v1beta.types.Candidate.FinishReason):
Optional. Output only. The reason why the
model stopped generating tokens.
If empty, the model has not stopped generating
the tokens.
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
Khoj shouldn't refuse to respond to user if web lookups fail.
It should transparently mention that online search etc. failed.
But try respond as best as it can without those references
This change ensures a response to the users query is attempted even
when web info retrieval fails.
The huggingface endpoint can be flaky. Khoj shouldn't refuse to
respond to user if document search fails.
It should transparently mention that document lookup failed.
But try respond as best as it can without the document references
This changes provides graceful failover when inference endpoint
requests fail either when encoding query or reranking retrieved docs
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
- Advanced chat model should also fallback to user chat model if set
- Get conversation config should falback to user chat model if set
These assume no server chat model settings is configured
# Overview
- Default to use user chat models for train of thought when no server chat settings created by admins
- Default to not create server chat settings on first run
# Details
This change simplifies switching chat models for self-hosted setups
by just changing the chat model on the user settings page.
It falls back to use the user chat model for train of thought
if server chat settings have not been created on the admin panel.
Server chat settings, when set, controls the chat model used
for Khoj's train of thought and the default user chat model.
Previously a self-hosted user had to update
1. the server chat settings in the admin panel and
2. their own user chat model in the user settings panel
to completely switch to a different chat model
for both train of thought & response generation respectively
You can still set server chat settings via the admin panel
to use a different chat model for train of thought vs response generation.
But this is only useful for advanced, multi-user setups.
Update regex to also include any links to code generated images that
aren't explicitly meant to be displayed inline. This allows folks to
download the image (unlike the fake link that doesn't work created by
model)
Previously Khoj would start answering the previous query. This maybe
because the prompt uses User for prompt in chat history but was using
Q for current user prompt.
Make webpages to read automatically on search_online configurable via
a argument.
Set it to default to 1, so other callers of the function
are unaffected.
But iterative chat director can still decide which, if
any, webpages to read based on the online search it performs
This change allows the iterative director to dive deeper into its
research as the data extracted contains relevant links from the webpage
Previous summarization prompt didn't extract relevant links from the
webpage which limited further explorations from webpages
Move construct_chat_history and ChatEvent enum into conversation.utils
and move send_message_to_model_wrapper to conversation.helper to
modularize code. And start thinning out the bloated routers.helper
- conversation.util components are shared functions that conversation
child packages can use.
- conversation.helper components can't be imported by conversation
packages but it can use these child packages
This division allows better modularity while avoiding circular
import dependencies
Create python code executing chat actor
- The chat actor generate python code within sandbox constraints
- Run the generated python code in the cohere terrarium, pyodide
based sandbox accessible at sandbox url
- Create a more dynamic reasoning agent that can evaluate information and understand what it doesn't know, making moves to get that information
- Lots of hacks and code that needs to be reversed later on before submission
Update chat actors to use user's chat model for train of thought. This
requires passing the user info as argument to all the chat actors.
Whether the user is subscribed or not can be inferred from the user
info being passed, so it doesn't need to be passed as a separate
argument to chat actor functions
Let send_message_to_model function infer chat model instead of passing
it as an argument from some chat actors. Better if this logic can be
done in a single place.
Server chat settings can be set for advanced self-hosted or multi-user
cloud setups. They are not necessary anymore as we fallback to use the
users chat model for train of thought now
Fallback to use user chat model for train of thought if server chat
settings not defined.
This simplifies switching chat models for single-user, self-hosted
setups by just changing the chat model on the user settings page.
Server chat settings, when set, controls the default user chat model
and the chat model that is used for Khoj's train of thought.
Previously a self-hosted user had to update both the server chat
settings in the admin panel and their own user chat model in the user
settings panel to explicitly switch to a different chat model (i.e to
switch to a new model for both train of thought & response generation)
You can still set server chat settings to use a different chat
model for train of thought and response generation. But this is only
necessary for advanced self-hosted or cloud hosted setups of Khoj.
Previously you had to refresh the page to see the updated data on
reopening the agents edit card after a save operation.
Now you see the latest saved agent data on reopening the agents edit
card. This should avoid confusion on whether the data was saved
correctly
If a public or protected agent is made private. Other users who were
having conversation with that agent will have to carry on their
conversation using default agent instead
Loading the embeddings model, even locally seems to be taking much
longer. Use timer to track visibility into embedding, cross-encoder
model load times
We should start disambiguating the the max input from output size. Max
prompt size should only be used for the max input context to an LLM.
If required max_output_tokens should be set as a separate new field
Currently, the personality of the agent is only included in the final response that it returns to the user. Historically, this was because models were quite bad at navigating the additional context of personality, and there was a bias towards having more control over certain operations (e.g., tool selection, question extraction).
Going forward, it should be more approachable to have prompts included in the sub tasks that Khoj runs in order to response to a given query. Make this possible in this PR. This also sets us up for agent creation becoming available soon.
Create custom agents in #928
Agents are useful insofar as you can personalize them to fulfill specific subtasks you need to accomplish. In this PR, we add support for using custom agents that can be configured with a custom system prompt (aka persona) and knowledge base (from your own indexed documents). Once created, private agents can be accessible only to the creator, and protected agents can be accessible via a direct link.
Custom tool selection for agents in #930
Expose the functionality to select which tools a given agent has access to. By default, they have all. Can limit both information sources and output modes.
Add new tools to the agent modification form
## Overview
Add user country code as context for doing online search with serper.dev API.
This should find more user relevant results from online searches by Khoj
## Details
### Major
- Default to using system clock to infer user timezone on js clients
- Infer country from timezone when only timezone received by chat API
- Localize online search results to user country when location available
### Minor
- Add `__str__` func to `LocationData` class to deduplicate location string generation
Make all the scroll actions just use requestAnimationFrame instead of
setTimeout. It better aligns with browser rendering loop, so better
for UX changes than setTimeout
Using system clock to infer user timezone on clients makes Khoj
more robust to provide location aware responses.
Previously only ip based location was used to infer timezone via API.
This didn't provide any decent fallback when calls to ipapi failed or
Khoj was being run in offline mode
Timezone is easier to infer using clients system clock. This can be
used to infer user country name, country code, even if ip based
location cannot be inferred.
This makes using location data to contextualize Khoj's responses more
robust. For example, online search results are retrieved for user's
country, even if call to ipapi.co for ip based location fails
Get country code to server chat api from i.p location check on clients.
Use country code to get country specific online search results via Serper.dev API
Previously the location string from location data was being generated
wherever it was being used.
By adding a __str__ representation to LocationData class, we can
dedupe and simplify the code to get the location string
- Use tabs for GPU/CPU type khoj being install on
- Update CMAKE flags to use to install Khoj with correct GPU support
Previous flags used DLLAMA, this has been updated to use DGGML now
in llama.cpp
Remove unnecessary "Inferred Query" heading prefix to image generation prompt
used by Khoj. The inferred query in chat message has a heading of it's
own, so avoid two headings for the image prompt
The problem was the tool tip was visible on hover, but it was slow, so before the tool tip popped up, the user would click on the button and this stopped the tool tip from popping up.
So i reduced the popup delay to 10ms. now as soon as user hovers over the button, they will see that its a feature coming soon!
Improve Scrolling on Chat page of Web app
- Details
1. Only auto scroll Khoj's streamed response when scroll is near bottom of page
Allows scrolling to other messages in conversation while Khoj is formulating and streaming its response
2. Add button to scroll to bottom of the chat page
3. Scroll to most recent conversation turn on conversation first load
It's a better default to anchor to most recent conversation turn (i.e most recent user message)
4. Smooth scroll when Khoj's chat response is streamed
Previously the scroll would jitter during response streaming
5. Anchor scroll position when fetch and render older messages in conversation
Allow users to keep their scroll position when older messages are fetched from server and rendered
Resolves#758
* Update the conversation_id primary key field to be a uuid
- update associated API endpoints
- this is to improve the overall application health, by obfuscating some information about the internal database
- conversation_id type is now implicitly a string, rather than an int
- ensure automations are also migrated in place, such that the conversation_ids they're pointing to are now mapped to the new IDs
* Update client-side API calls to correctly query with a string field
* Allow modifying of conversation properties from the chat title
* Improve drag and drop file experience for chat input area
* Use a phosphor icon for the copy to clipboard experience for code snippets
* Update conversation_id parameter to be a str type
* If django_apscheduler is not in the environment, skip the migration script
* Fix create automation flow by storing conversation id as string
The new UUID used for conversation id can't be directly serialized.
Convert to string for serializing it for later execution
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
## Improve
- Intelligently initialize a decent default set of chat model options
- Create non-interactive mode. Auto set default server configuration on first run via Docker
## Fix
- Make RapidOCR dependency optional as flaky requirements causing docker build failures
- Set default openai text to image model correctly during initialization
## Details
Improve initialization flow during first run to remove need to configure Khoj:
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the `{OPENAI,GEMINI,ANTHROPIC}_API_KEY' env var is set
- Used when server run via Docker as user input cannot be processed to configure server during first run
- Do not ask for `max_tokens', `tokenizer' for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
The chat model initialize interaction flow is fairly similar across
the chat model providers.
This should simplify adding new chat model providers and reduce
chances of bugs in the interactive chat model initialization flow.
This is an initial pass to add documentation for all the knobs
available on the Khoj Admin panel.
It should shed some light onto what each admin setting is for and how
they can be customized when self hosting.
Resolves#831
- Improve Self Hosting Docker Instructions
- Ask to Install Docker Desktop to not require separate
docker-compose install and unify the instruction across OS
- To Self Host on Windows, ask to use Docker Desktop with WSL2 backend
- Use nested Tab grouping to split Docker vs Pip Self Host Instructions
- Reduce Self Host Setup Steps in Documentation after code simplification
- First run now avoids need to configure Khoj via admin panel
- So move the chat model config steps into optional post setup
config section
- Improve Instructions to Configure chat models on First Run
- Compress configuring chat model providers into a Tab Group
- Add Documentation for Remote Access under Advanced Self Hosting
Given the LLM landscape is rapidly changing, providing a good default
set of options should help reduce decision fatigue to get started
Improve initialization flow during first run
- Set Google, Anthropic Chat models too
Previously only Offline, Openai chat models could be set during init
- Add multiple chat models for each LLM provider
Interactively set a comma separated list of models for each provider
- Auto add default chat models for each provider in non-interactive
model if the {OPENAI,GEMINI,ANTHROPIC}_API_KEY env var is set
- Do not ask for max_tokens, tokenizer for offline models during
initialization. Use better defaults inferred in code instead
- Explicitly set default chat model to use
If unset, it implicitly defaults to using the first chat model.
Make it explicit to reduce this confusion
Resolves#882
This should configure Khoj with decent default configurations via
Docker and avoid needing to configure Khoj via admin page to start
using dockerized Khoj
Update default max prompt size set during khoj initialization
as online chat model are cheaper and offline chat models have larger
context now
RapidOCR depends on OpenCV which by default requires a bunch of GUI
paramters. This system package dependency set (like libgl1) is flaky
Making the RapidOCR dependency optional should allow khoj to be more
resilient to setup/dependency failures
Trade-off is that OCR for documents may not always be available and
it'll require looking at server logs to find out when this happens
This reverts commit c9665fb20b.
Revert "Fix handling for new conversation in agents page"
This reverts commit 3466f04992.
Revert "Add a unique_id field for identifiying conversations (#914)"
This reverts commit ece2ec2d90.
- This allows triggering khoj chat from the browser addressbar
- So now if you add Khoj to your browser bookmark with
- URL: https://app.khoj.dev/?q=%s
- Keyword: khoj
- Then you can type "khoj what is the news today" to trigger Khoj to
quickly respond to your query. This avoids having to open the Khoj web
app before asking your question
* Add a unique_id field to the conversation object
- This helps us keep track of the unique identity of the conversation without expose the internal id
- Create three staged migrations in order to first add the field, then add unique values to pre-fill, and then set the unique constraint. Without this, it tries to initialize all the existing conversations with the same ID.
* Parse and utilize the unique_id field in the query parameters of the front-end view
- Handle the unique_id field when creating a new conversation from the home page
- Parse the id field with a lightweight parameter called v in the chat page
- Share page should not be affected, as it uses the public slug
* Fix suggested card category
Previously Khoj would stop in the middle of response generation when
the safety filters got triggered at default thresholds. This was
confusing as it felt like a service error, not expected behavior.
Going forward Khoj will
- Only block responding to high confidence harmful content detected by
Gemini's safety filters instead of using the default safety settings
- Show an explanatory, conversational response (w/ harm category)
when response is terminated due to Gemini's safety filters
- Support using image generation models like Flux via Replicate
- Modularize the image generation code
- Make generate better image prompt chat actor add composition details
- Generate vivid images with DALLE-3
Enables using any image generation model on Replicate's Predictions
API endpoints.
The server admin just needs to add text-to-image model on the
server/admin panel in organization/model_name format and input their
Replicate API key with it
Create db migration (including merge)
Set sender email using `RESEND_EMAIL` environment variable for magic link sent via Resend API for authentication . It was previously hard-coded. This prevented hosting Khoj on other domains.
Resolves#908
- Major
- The new O1 series doesn't seem to support streaming, response_format enforcement,
stop words or temperature currently.
- Remove any markdown json codeblock in chat actors expecting json responses
- Minor
- Override block display styling of links by Katex in chat messages
Strip any json md codeblock wrapper if exists before processing
response by output mode, extract questions chat actor. This is similar
to what is already being done by other chat actors
Useful for succesfully interpreting json output in chat actors when
using non (json) schema enforceable models like o1 and gemma-2
Use conversation helper function to centralize the json md codeblock
removal code
This happens sometimes when LLM respons contains [\[1\]] kind of links
as reference. Both markdown-it and katex apply styling.
Katex's span uses display: block which makes the rendering of these
references take up a whole line by themselves.
Override block styling of spans within an `a' element to prevent such
chat message styling issues
* Add functions to chat with Google's gemini model series
* Gracefully close thread when there's an exception in the gemini llm thread
* Use enums for verifying the chat model option type
* Add a migration to add the gemini chat model type to the db model
* Fix chat model selection verification and math prompt tuning
* Fix extract questions method with gemini. Enforce json response in extract questions.
* Add standard stop sequence for Gemini chat response generation
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
Additional logging was enabled to debug automation failures in
production since migration chat API to use POST request method (from
earlier GET).
Redirect from http to https was default to use GET instead of POST
method to call /api/chat on redirect. This has been resolved now
Get information sources and get output mode don't actually see the
images. They just get placeholder text to indicate that the user
attached an image to their message for context
- Make train of thought icons to be top aligned, next to the
their intermediate step heading
- Add margin bottom to ordered, unordered lists in chat message,
similar to how it is already added for paragraphs
# Summary of Changes
* New UI to show preview of image uploads
* ChatML message changes to support gpt-4o vision based responses on images
* AWS S3 image uploads for persistent image context in conversations
* Database changes to have `vision_enabled` option in server admin panel while configuring models
* Render previously uploaded images in the chat history, show uploaded images for pending msgs
* Pass the uploaded_image_url through to subqueries
* Allow image to render upon first message from the homepage
* Add rendering support for images to shared chat as well
* Fix some UI/functionality bugs in the share page
* Convert user attached images for chat to webp format before upload
* Use placeholder to attached image for data source, response mode actors
* Update all clients to call /api/chat as a POST instead of GET request
* Fix copying chat messages with images to clipboard
TLDR; Add vision support for openai models on Khoj via the web UI!
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
Limit file types to sync with Khoj from Obsidian to:
- Avoid hitting per user index-able data limits, especially for folks on the Khoj cloud free tier. E.g by excluding images in Obsidian vault from being synced
- Improve context used by Khoj to generate responses
When user exceeds data sync limits. Show error notice with
- Link to web app settings page to upgrade subscription
- Link to Khoj plugin settings in Obsidian to configure file types to
sync from vault to Khoj
Previously chat stream iterator wasn't closed when response streaming
for offline chat model threw an exception.
This would require restarting the application. Now application doesn't
hang even if current response generation fails with exception
GPT-4o-mini is cheaper, smarter and can hold more context than
GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes
sense to upgrade defaults and tests to work with it
- Background
Llama.cpp allows enforcing response as json object similar to OpenAI
API. Pass expected response format to offline chat models as well.
- Overview
Enforce json output to improve intermediate step performance by
offline chat models. This is especially helpful when working with
smaller models like Phi-3.5-mini and Gemma-2 2B, that do not
consistently respond with structured output, even when requested
- Details
Enforce json response by extract questions, infer output offline
chat actors
- Convert prompts to output json objects when offline chat models
extract document search questions or infer output mode
- Make llama.cpp enforce response as json object
- Result
- Improve all intermediate steps by offline chat actors via json
response enforcement
- Avoid the manual, ad-hoc and flaky output schema enforcement and
simplify the code
This is a more robust way to extract json output requested from
gemma-2 (2B, 9B) models which tend to return json in md codeblocks.
Other models should remain unaffected by this change.
Also removed request to not wrap json in codeblocks from prompts. As
code is doing the unwrapping automatically now, when present
- Allow free tier users to have unlimited chats with default chat model. It'll only be rate-limited and at the same rate as subscribed users
- In the server chat settings, replace the concept of default/summarizer models with default/advanced chat models. Use the advanced models as a default for subscribed users.
- For each `ChatModelOption' configuration, allow the admin to specify a separate value of `max_tokens' for subscribed users. This allows server admins to configure different max token limits for unsubscribed and subscribed users
- Show error message in web app when hit rate limit or other server errors
Currently, the search model config display for admins only shows the id of the search model config, which is not very informative.
The changes enhances the admin console by displaying the name of the search model config (name), as well as the bi-encoder model (bi_encoder) and cross-encoder model (cross_encoder) along the id.
Previously `force' was passed as a query param to the single indexing API. After the recent API updates, it is meant to select the API method to use (PATCH vs PATCH). Converting `force' argument to a bool fixes implementing this new behavior
- Major
- Improve doc search actor performance on vague, random or meta questions
- Pass user's name to document and online search actors prompts
- Minor
- Fix and improve openai chat actor tests
- Remove unused max tokns arg to extract qs func of doc search actor
- Issue
Previously the doc search actor wouldn't extract good search queries
to run on user's documents for broad, vague questions.
- Fix
The updated extract questions prompt shows and tells the doc search
actor on how to deal with such questions
The doc search actor's temperature was also increased to support more
creative/random questions. The previous temp of 0 was meant to
encourage structured json output. But now with json mode, a low temp is
not necessary to get json output
- Use temperature of 0 by default for extract questions offline chat
actor
- Use temperature of 0.2 for send_message_to_model_offline (this is
the default temperature set by llama.cpp)
* Add ability to cycle through the chat history in the chat input on Obsidian (similar to terminal history navigation)
* Add mod key shortcut to cycle through chat history in chat input
* Add shortcut help text in chat input placeholder
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
### Overview
Support exclude file filter in user search queries
### Details
- All of the exclude file filter terms need to be satisfied
- Any one of the include file filter terms should be satisfied
### Example
- **Search Query**: *what happened yesterday? -file:"tasks.org" -file:"work.md" file:"diary.org" file:"journal.org*
- **Behavior**: Query will try find relevant notes in any of `journal.org` or `diary.org` and not in `tasks.org` and not in `work.md`
### Details
* Add support for exclusion file filters
* Translate file filter to valid Django DB entry filter regex
* Exclude all files when multiple exclude file filter in query
Previously we were applying an "Or" filter, which would exclude any
file mentioned in a query with multiple exclude file filter.
This is not what we naturally mean when we ask excluding a file in a query
* Rename, rearrange, deduplicate and add file filter tests
Closes#728
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
Previously required the automation page to be refreshed to see updates
to the automation in the edit automation card. This would be seen when
user tries to edit an automation multiple times (without a page refresh)
Previously, the code incorrectly treated all non-nil values as true, leading to
the index being re-indexed with the force flag whenever the user selected to
update the index.
- Pass the new conversation id as kwarg for the scheduled_chat function
- For edit automations, re-use the original conversation id
- Parse images correctly for image automations
- Use color to provide visual feedback when hover, click on feedback
buttons
- Use color to provide visual feedback when hover on speech, copy
buttons click
- Add cooldown period before being able to send feedback on that message again.
Avoids inadvertent multiple consecutive clicks on feedback buttons
- Since the .gitignore will ignore any of the assets in the src/ folder when building the package wheel, we need to output the static assets to another folder just for the python pypi package. Use /compiled for this.
- Auto focus on email input on login screen for smoother login experience
- Use file icon associated with search page results. Improve search bar
- Show logged in user's email in nav menu for context
- Use previous icons with eyes for search, agents and automations items in nav menu
Hierarchical documents like org-mode, markdown have their ancestry
shown in first line. Remove it to show cleaner, deduplicated reference
text from org-mode, markdown files
Utilize chat footer space more efficiently. This is especially useful
on small screens
- Send button is anyway only enabled when there is text in chat input
- Otherwise voice message button is better to show by default
- Remove invalid call to styles.main
- Remove unnecessary top padding above side pane to keep side pane at
consistent position across web app
- Use same pageLayout styles and styling structure on agent like
automation
- Vertically center automation section and page title on it's row
- Fix applying flex vs grid with tailwind
- Remove x axis footer padding on small screens to preserve space,
keep equal spacing between footer items
- Add 1rem margin to buttons to not have overlap in boundary
- Add 1rem y-axis padding to chat footer to not have focus boundary
leave the footer boundary on smaller screens
Installing Khoj as PWA was supported in previous web UX as well. This
just adds link to the existing webmanifest to continue support for
installing Khoj as PWA with new web UX
Previously the rename wasn't updating the chat session title. We'd
have to refresh the page or side pane to get latest chat session names
after rename action.
Previously the footer's right border wasn't visible on small screens
due to usage of w-full
Use mr-1 on send button instead of px-1 on chat input parent to
eualize chat footer buttons spacing
- Show informative toast messages on copy, delete of API keys
- Onle show API keys card in non anonymous mode. API keys aren't
required (and is disabled on server side) in anon mode. Not showing
card at all in anon mode reduces chance of unnecessary confusion
Style profile pircture button on nav menu
- Use primary colored ring around subscribed user profile on nav menu
- Use gray colored ring around non-subscribed user profile on nav menu
- Use upper case initial as profile pic for user with no profile pic
- Click anywhere on nav menu item to trigger action
Previously the actual clickable area was smaller than the width of
the nav menu item
- Move the nav menu into the chat history side panel component, so that they both show up on one line
- Update all pages to use it with the new formatting
- in mobile, present the sidebar button, home button, and profile button evenly centered in the middle
- Pass userConfig from Home as prop to chatBodyData component with
loading state
- Pass loading state of userConfig to allow components to handle
rendering dependent elements once it is loaded
- Move the nav menu into the chat history side panel component, so that they both show up on one line
- Update all pages to use it with the new formatting
- in mobile, present the sidebar button, home button, and profile button evenly centered in the middle
Use updated format for HTTP streamed responses from the Khoj server in the new chat UX
Remove references to the websocket connected field, as websocket use has been deprecated
Otherwise the Khoj's chat response is filling up in between the
streamed message and already rendered references section at the bottom
of the message
Define OnlineContext type to simplify typing online context param
across other interfaces and functions
- There were some state mismatches in configuring a whatsapp number. This commit fixes those issues and uses an external library for phone number validation
- Note that the SSR for next doesn't support rendering on the client-side, so it'll only update it one big chunk
- Fix unique key error in the chatmessage history for incoming messages
- Remove websocket value usage in the chat history side panel
- Remove other websocket code from the chat page
- Why
Profile section and billing section looked too empty (1 card each).
Combining them makes the setting page look more complete. Shows
subscription options early on
- Details
- Made Futurist text orange
- Made Unsubscribe a down button instead of cloud slash
- Updated toast title to subscription
- Improve Futurist trial title and description
- Remove now unnecessary button to Save in Card with dropdown
- Use toast to show success, failure (not working)
- Rename language to search, Move it to features section. Add icon to
the card
- Update references in new and old web client settings
- Arrange new client settings props and add header comments similar to
- config response for code readability
- Add a lot more suggestions cards, improve mobile rendering of suggestion cards, improve alignment of chat input, shift message when starts recording voice, remove dead code
* Converted navigation menu into a dropdown menu
* Moved collapsed side panel menu icons into top row
* Auto refresh when conversation is deleted to update side panel and route back to main page if deletion is on current conversation
* Highlight the current conversation in the side panel
* Dynamic homepage messages with current day and time of day.
* `colorutils` upgraded to have more expansive tailwind color options and dynamic class name generation.
* Converted create agent button alert into shadcn `ToolTip`
* Colored lines and icons for agents in chat window
* Cleaned up border styling in dark mode
* fixed three dot menu in side panel to be more easier to click
* Add the KhojLogo import in the nav menu and use a default user profile icon when not authenticated
* Get rid of custom --box-shadow CSS variable
* Pass the agent metadat through the chat body data in order to style the send button
* Add login to the unauthenticated login view, redirecto to home if conversation history not loaded
* Set a max height for the input text area
* Simplify tailwind class names
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
## Major: Breaking Changes
- Move API endpoints under /configure/<type>/model to /api/model/<type>
- Move API endpoints under /api/configure/content/ to /api/content/
- Accept file deletion requests by clients during sync
- Split /api/v1/index/update into /api/content PUT, PATCH API endpoints
## Minor: Create New API Endpoint
- Create API endpoints to get user content configurations
Related: #852
- Use new form of passing doc references to now passing chat actor
test
- Fix message list generation from conversation logs provided
Strangely the parent conversation_log gets passed down to
message_to_log func when the kwarg is not explicitly specified
Changes for new agents page
- Modernized agent cards
- Responsive design to support mobile users
- Button for users to create their own agents (coming soon)
- Optimized to use tailwind and icon utils
- Side panel added for quick access to conversations
Previous logic was more brittle to break with simple unbalanced
'{' or '}' string present in the event data. This method of trying to
identify valid json obj was fairly brittle. It only allowed json
objects or processed event as raw strings.
Now we buffer chunk until we see our unicode magic delimiter and only
then process it.
This is much less likely to break based on event data and the
delimiter is more tunable if we want to reduce rendering breakage
likelihood further
- Add support for text to speech, speech to text. Add loading and responsive indicators to reflect state.
- When streaming for speech to text, show incremental transcription in the message input field
- When streaming text to speech, and a pause button in the chat message to allow user to stop playback
* V1 of the new automations page
Implemented:
- Shareable
- Editable
- Suggested Cards
- Create new cards
- added side panel new conversation button
- Implement mobile-friendly view for homepage
- Fix issue of new conversations being created when selected agent is changed
- Improve center of the homepage experience
- Fix showing agent during first chat experience
- dark mode gradient updates
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
- Deduplicate code to collect chat telemetry by relying on
end_llm_response event
- Log time to first token and total chat response time for latency
analysis of Khoj as an agent. Not just the latency of the LLM
- Remove duplicate timer in the image generation path
Do not need response generator to stuff compiled references in chat
stream using "### compiled references:" separator.
References are now sent to clients as structured json while streaming
## Overview
- Gemma 2 is a new open model family by Google. They've released a 9B, 29B param model. A 2B model is also expected.
- It performs really well on the Chatbot arena and shows good performance when testing within Khoj as well.
- Llama.cpp support for Gemma 2 architecture seems to have stabilized
- If Gemma 2 performs well in further testing, it can be made the default offline chat model for Khoj
- Once the 2B param model is released, the model size to download can be automatically chosen based on (V)RAM available
## Major
- Support Gemma 2 for Offline Chat
- Improve and fix chat model prompts for better, consistent context
## Minor
- Fix and improve offline chat actor, director tests
- Improve offline chat truncation to consider chat message delimiter tokens
Previously loading animation would be at top of message. Moving it to
bottom is more intuitve and easier to track.
Remove white-space: pre from list elements. It was adding too much y
axis padding to chat messages (and train of thought)
- Details
Only return notes refs, online refs, inferred queries and generated
response in non-streaming mode. Do not return train of throught and
other status messages
Incorporate missing logic from old chat API router into new one.
- Motivation
So we can halve chat API code by getting rid of the duplicate logic
for the websocket router
The deduplicated code:
- Avoids inadvertant logic drift between the 2 routers
- Improves dev velocity
- Overview
Use simpler HTTP Streaming Response to send status messages, alongside
response and references from server to clients via API.
Update web client to use the streamed response to show train of thought,
stream response and render references.
- Motivation
This should allow other Khoj clients to pass auth headers and recieve
Khoj's train of thought messages from server over simple HTTP
streaming API.
It'll also eventually deduplicate chat logic across /websocket and
/chat API endpoints and help maintainability and dev velocity
- Details
- Pass references as a separate streaming message type for simpler
parsing. Remove passing "### compiled references" altogether once
the original /api/chat API is deprecated/merged with the new one
and clients have been updated to consume the references using this
new mechanism
- Save message to conversation even if client disconnects. This is
done by not breaking out of the async iterator that is sending the
llm response. As the save conversation is called at the end of the
iteration
- Handle parsing chunked json responses as a valid json on client.
This requires additional logic on client side but makes the client
more robust to server chunking json response such that each chunk
isn't itself necessarily a valid json.
- Convert functions in SSE API path into async generators using yields
- Validate image generation, online, notes lookup and general paths of
chat request are handled fine by the web client and server API
* Update the automations UI to be a more suitable color distribution based on new designs
* Use accented colors for the metadata, update dark mode colors
* Update form to use icons as well and render more pretty inline form labels
Pull out /api/configure/phone API endpoints into /api/phone for
more concise and sufficiently explanatory API path
Refactor Flow
1. Rename /api/configure/phone -> /api/phone
Now the API to configure all the AI models is under /api/models.
This provides better organization and API hierarchy. The /configure
url segment was redundant.
- Rename POST /api/phone to PATCH /api/phone
- Rename GET /api/configure to GET /api/settings
Refactor Flow
1. Move out POST /user/name to main api.py
2. Rename /api/configure/<type>/model -> /api/model/<type>
3. Rename @api_configure to @api_mode
4. Rename file api_config.py to api_model.py
Pull out /api/configure/content API endpoints into /api/content to
allow for more logical organization of API path hierarchy
This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics along with the
path.
The /configure URL path segment was either
- redundant (e.g POST /configure/notion) or
- incorrect (e.g GET /configure/files)
Some example of naming improvements:
- GET /configure/types -> GET /content/types
- GET /configure/files -> GET /content/files
- DELETE /configure/files -> DELETE /content/files
This should also align, merge better the the content indexing API
triggered via PUT, PATCH /content
Refactor Flow
1. Rename /api/configure/types -> /api/content/types
2. Rename /api/configure -> /api
3. Move /api/content to api_content from under api_config
- Remove unused full_corpus boolean. The full_corpus=False code path
wasn't being used (accept for in a test)
- The full_corpus=True code path used was ignoring file deletion
requests sent by clients during sync. Unclear why this was done
- Added unit test to prevent regression and show file deletion by
clients during sync not ignored now
- This utilizes PUT, PATCH HTTP method semantics to remove need for
the "regenerate" query param and "/update" url suffix
- This should make the url more succinct and API request intent more
understandable by using existing HTTP method semantics
- Add day of week to system prompt of openai, anthropic, offline chat models
- Pass more context to offline chat system prompt to
- ask follow-up questions
- know where to find information about khoj (itself)
- Fix output mode selection prompt. Log error if model does not select
valid option from list of valid output modes provided
- Use consistent names for question, answers passed to
extract_questions_offline prompt
- Log which model extracts question, what the offline chat model sees
as context. Similar to debug log shown for openai models
- Pass system message as the first user chat message as Gemma 2
doesn't support system messages
- Use gemma-2 chat format
- Pass chat model name to generic, extract questions chat actors
Used to figure out chat template to use for model
For generic chat actor argument was anyway available but not being
passed, which is confusing
### Major
- Reuse get config data logic across config pages on web client
- Make config api endpoint urls and response fields consistent
- Rename API path /api/config to /api/configure
- Move Web, Desktop client settings page to be under `/settings` from the previous `/config` url path
### Minor
- Pass isMobileWidth prop to SidePanel via chat share interface
- Turn prettier off instead of throwing error for now
- Do no explicitly add line-clamp plugin as it's in Tailwind by default
- Update references to the settings page to use new url across docs
and code
- Rename desktop and web settings page to settigns.html instead of
config[ure].html
- Use name, id for every [search|chat|voice|pain]_model_option
- Rename current_model_state field to more intuitive enabled_content_source
- Update references to the update fields in config.html
- Deprecate khoj-assistant pypi package. Use more accurate and
succinct pypi project name, khoj
- Update references to sye khoj pypi package in docs and code instead
of the legacy khoj-assistant pypi package
- Update pypi workflow to publish to both khoj, khoj-assistant for now
- Update stale python 3.9 support mentioned in our pyproject. Can't
support python 3.9 as depend on latest django which support >=3.10
- Put logic to get config data, detailed or basic into router helpers module
- Use the get config func across the config pages on web clients
- Put configure content and get_notion_auth_url funcs in router helper
module to avoid circular import
Migrates the Automations page to React, mostly keeping the overall design consistent with organization. Use component library, with some changes in color. Add easier management with straightforward form and editing experience.
Use system preference for determining dark mode if not explicitly set.
### Behavior
- Close chat sessions side panel on click open a chat session
- Show agent profile card with description when hover on agent in chat view
- Show action bar on last chat message without hover
- Show chat message action buttons without hover on mobile interfaces
- Show chat message timestamp on hover in chat view
- Show text descriptions of chat message action buttons on hover
- Render inline png, webp images generated by Khoj in chat view
### Fixes
- Do not render references with broken links in chat view
- Fix closing side panel on mobile when click open a chat session
- Only open side panel as drawer in mobile view
- Constrain chat messages to stay within view port across screen sizes
### Styling: Spacing, Sizing, Mobile Friendly
- Make Khoj icon appropriately sized and side panel arrow bold
- Conversations list should resize to take max space on side panel
- Make loading message, styling configurable. Do not show agent when no data
- Improve Train of Thought icons spacing and loading circle
- Improve mobile friendly styling of chat session side panel
- Improve styling of chat input, references UI across screen sizes
- Center cursor in chat input. See upto 2 lines for multi-line context
### Miscellaneous
- Add code formatter for web interface with prettier
- Updated references panel
- Use subtle coloring for chat cards
- Chat streaming with train of thought
- Side panel with limited sessions, expandable
- Manage conversation file filters easily from the side panel
- Updated nav menu, easily go to agents/automations/profile
- Upload data from the chat UI (on click attachment icon)
- Slash command pop-up menu, scrollable and selectable
- Dark mode-enabled
- Mostly mobile friendly
- Pass Loading message, class name via props to both inline and normal
loading spinners
- Pass loading conversation message to loading spinner when chat
history is being fetched
- Create profile card componennt. Use it for agent profile card
- Pass agent persona from khoj server via API
- Put link to agent profile page in the hover card to make it 2 clicks
away. Othewise inadvertent clicks on agent in chat view lead away to
agent page
- Use tailwind line-clamp extension to clamp card to first two lines
- Reuse class name when get slash command icons
- Previous chat input styling didn't have the cursor centered in the
chat input text area. But it did allow seeing multi line chat inputs
for context
- Major
- Ask for prompt in prose
- Remove seed from SD3 image generation to improve diversity of output
for a given prompt
Otherwise for conversations with similar sounding
prompts, the images would be almost exactly the same. This maybe
another indicator of SD3's inability to capture detailed
instructions
- Consistently use "prompt" wording instead of "query" in improved
image generation prompts.
Previously a mix of those terms were being used, which could confuse
the chat model
- Minor
- Add day of week to prompt
- Remove 2-5 sentence limit on instructions to SD3. It seems to be
able to follow longer instructions just with less fidelity than
DALLE. And the 2-5 sentence instruction limit wasn't being adhered to
- Improve ability to edit, improve the image based on follow-up
instructions by the user
- Align prompts for DALLE and SD3. Only difference is to wrap text to
be rendered in quotes for SD3. This improves it's ability to render
requested text. DALLE cannot render text as well or consistently
- Because we're using a FastAPI api framework with a Django ORM, we're running into some interesting conditions around connection pooling and clean-up. We're ending up with a large pile-up of open, stale connections to the DB recurringly when the server has been running for a while. To mitigate this problem, given starlette and django run in different python threads, add a middleware that will go and call the connection clean up method in each of the threads.
- Issue
The Khoj docker build would fail with `ImportError: libGL.so.1: cannot open shared object file: No such file or directory`. This was required by the Khoj RapidOCR python package dependency.
- Fix
A minimal set of system packages have been added to resolve this issue.
- Transcribe on holding Ctrl+s keyboard shortcut
- Transcribe on holding the transcribe button pressed via mouse too
- Make the transcribe button robust to inadvertent touches by using timeout
- Do not transcribe, trigger auto-send on silences. Silence detection
is super rudimentary, just blocks standard emanations by whisper
when no speech
### Fix
- Fix degrade in speed when indexing large files
- Resolve org-mode indexing bug by splitting current section only once by heading
- Improve summarization by fixing formatting of text in indexed files
### Improve
- Improve scaling user, admin flows to delete all entries for a user
- Split once by heading (=first_non_empty) to extract current section body
Otherwise child headings with same prefix as current heading will
cause the section split to go into infinite loop
- Also add check to prevent getting into recursive loop while trying
to split entry into sub sections
Adding files to the DB for summarization was slow, buggy in two ways:
- We were updating same text of modified files in DB = no of chunks
per file times
- The `" ".join(file_content)' code was breaking each character in the
file content by a space. This formats the original file content
incorrectly before storing in the DB
Because this code ran in the main file indexing path, it was slowing down
file indexing. Knowledge bases with larger files were impacted more strongly
- Delete entries by batch to improve efficiency of query at scale
- Share code to delete all user entries between it's async, sync methods
- Add indicator to show when files being deleted on web config page
- Add CSP to web config pages. Load phone no. validation js, css from S3
- Construct config page elements on Web via DOM scripting
- Disable CSP in Khoj Obsidian as it interferes with Obsidian functionality
- Other miscellaneous voice message level improvements (rate limit, listening animation)
The Khoj CSP interferes with other Obsidian features and plugins as
CSP is applied page wide.
For now chat message sanitization via Dompurify should suffice.
Enable CSP when can scope it to only the Khoj Obsidian plugin.
Maybe better to fallback to non-summarize behavior if summarize intent
is just inferred but we can't actually summarize because the single
file added to conversation isn't satisfied
- Simplify quick jump between Khoj side pane and main editor view using keyboard shortcuts
- Enable voice chat in Obsidian to make interactions with Khoj more seamless
- Overview
Khoj wil be able to do online search out of the box, even for self-hosted users
- Default to Jina search, reader API when no Serper.dev, Olostep API keys
- Run online searches in parallel to process multiple queries faster
- Details
- Jina provides a [reader API](https://github.com/jina-ai/reader) for online search and web page reading
It requires no API key. This provides a good default to enable
online search for self-hosted readers requiring no additional setup.
- Jina search API also returns webpage contents with the results, so
just use those directly when Jina Search API used instead of
trying to read webpages separately. The extract relevant content from
webpage step using a chat model is still used from the
`read_webpage_and_extract_content' func in this case.
- Parse search results from Jina search API into same format as
Serper.dev for accurate rendering of online references by clients
- Run online searches in parallel with AsyncIO to process multiple queries faster
- Support Stable Diffusion 3 via API
Server Admin needs to setup model similar to DALLE-3 via Django Admin Panel
- Use shorter prompt generator to prompt SD3 to create better images
- Allow users to set paint model to use from web client config page
Update offline, openai chat actor, director tests to not require
Serper to run the online command tests
Update documentation for self-hosted online search to mention no setup
is required by default. But improvements can be made by using
Serper.dev or Olostep
Jina AI provides a search and webpage reader API that doesn't require
an API key. This provides a good default to enable online search for
self-hosted readers requiring no additional setup.
Jina search API also returns webpage contents with the results, so
just use those directly when Jina Search API used instead of
trying to read webpages separately. The extract relvant content from
webpage step using a chat model is still used from the
`read_webpage_and_extract_content' func in this case.
Parse search results from Jina search API into same format as
Serper.dev for accurate rendering of online references by clients
- Added support for uploading .jpeg, .jpg, and .png files to Khoj from Web, Desktop app
- Updating indexer to generate raw text and entries using RapidOCR
- Details
* added support for indexing images via ocr
* fixed pyproject.toml
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/processor/content/images/image_to_entries.py
Co-authored-by: Debanjum <debanjum@gmail.com>
* removed redudant try except blocks
* updated desktop js file to support image formats
* added tests for jpg and png
* Fix processing for image to entries files
* Update unit tests with working image indexer
* Change png test from version verificaition to open-cv verification
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Co-authored-by: sabaimran <narmiabas@gmail.com>
This should improve fluidity of keyboard interactions with Khoj on
Obsidian.
Open Khoj chat view via keybinding or command pallete and ask
question using only the keyboard, with no mouse clicks required
- Automatically carry out voice chats with Khoj from within Obsidian
When send voice message, Khoj will auto respond with voice as well
- Listen to past Khoj messages as speech
- Add circular loading spinner to use while message is being converted
to speech
* Add a leader election mechanism to circumvent runtime issues for multiple schedulers
- Reduce the load on the DB and risk of issues on the service side by limiting the execution environment to one elected leader at a given time. This one is responsible for managing all of the execution of the jobs, though all workers are capable of adding and removing jobs
* Set a max duration for the schedule leader position (12 hrs), add some error if automation not added successfully
* rough sketch of desktop shortcuts. many bugs to fix still
* working MVP of desktop shortcut khoj
* UI fixes
* UI improvements for editable shortcut message
* major rendering fix to prevent clipboard text from getting lost
* UI improvements and bug fixes
* UI upgrades: custom top bar, edit sent message and color matching
* removed debug javascript file
* font reverted to Noto Sans
* cleaning up the code and removing diffs
* UX fixes
* cleaning up unused methods from html
* front end for button to send user back to main window to continue conversation
* UX fix for window and continue conversation support added
* migrated common js functions into chatutils.js
* Fix window closing issue in macos by
1. Use a helper function to determine if the window is open by seeing if there's a browser window with shortcut.html loaded
2. Use the event listener on the window to handle teardown
* removed extra comment and renamed continue convo button
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
- Add an experimental feature used for fact-checking falsifiable statements with customizable models. See attached screenshot for example. Once you input a statement that needs to be fact-checked, Khoj goes on a research spree to verify or refute it.
- Integrate frontend libraries for [Tailwind](https://tailwindcss.com/) and [ShadCN](https://ui.shadcn.com/) for easier UI development. Update corresponding styling for some existing UI components.
- Add component for model selection
- Add backend support for sharing arbitrary packets of data that will be consumed by specific front-end views in shareable scenarios
Initialize our migration to use Next.js for front-end views via Agents. This includes setup for getting authenticated users, reading in available agents, setting up a pop-up modal when you're clicking on an agent, and allowing users to start new conversations with agents.
Best attempt at an in-place migration, though there are some noticeable differences.
Also adds view for chat that are not being used, but in experimental phase.
Previously the text_to_image helper would only trigger the image
generation flow if OpenAI client was setup. This is not required
anymore as offline chat model + sd3 API works. So remove that check
- Add instructions for self-hosted users with info, warning boxes to
avoid, fix common issues when setting up Khoj server
- Create new Advanced Self Hosting section
- Extract Advanced Self-Hosting Sections from the Advanced Page and
move them to separate Pages under Advanced Self Hosting section
- Improve OpenAI Proxy Docs
- Put Ollama setup as a section under OpenAI API Proxy page instead
of a separate page
- Add Section to use Khoj with chat model from LM Studio
- Update LiteLLM docs to use chat model from LM Studio
This handles updates from manifest.json minAppVersion field to the
versions.json file.
The minAppVersion field is for the minimum Obsidian app version
supported by a Khoj plugin version
Khoj will find and display notes similar to the current entry in the side pane when
1. find similar is open in side pane and
2. cursor has moved to a new entry
### Major
- Find similar notes to current note at cursor automatically in background
- Only show headings of search result and increase default results count
### Minor
- Pass absolute path of file to index from khoj.el emacs client
- Update help message to only show the smaller set of new keybindings
- Fix edge cases in loading some chat sessions
To improve the developer experience for front-end development, we're migrating to Next.js. In order to do this migration page-by-page, we're using static site generation via Next.js. This also helps us avoid making cross site requests from front-end to back-end for the time being, while giving a ramp to separating out server and client if needed for scale down the road.
Dev instructions for using the next.js setup are in the added README.
This adds scaffolding for including the built files in the python package as well as the docker images. Docker setup has been tested locally. In order to verify the build is working as expected, we can navigate to the {khoj_host}:42110/experimental and verify that the experiment page comes up.
This setup works with serving static files included in the src/interface/web folder from the Django app. The key bit for understanding the setup is in the yarn export command in package.json.
- Just use in-built `npm version' command to update desktop, obsidian version
- Upgrade by major, minor or patch version using new -t flag in script
E.g bump_version -t minor
* Enable speech to text responses in khoj chat
- Current issue: reads out all the markdown formatting, plus waits for the whole result to be streamed before playing it
* Extract content from markdown-formatted text
* Add a loader for while you're waiting for Khoj's response
* Add user configuration option for chat model options, allow server side configuration for option list
* Join up APIs, views, admin pages to allow configuring custom voice models
When create new conversation session, automatically request query. As
that is expected next action after creating new session
Pass session-id to khoj-chat to allow reuse from
create-new-conversation func
When delete conversation session, do not call load chat session.
Unnecessary action.
Use thread-last to improve code flow in new, delete conversation funcs
* Add magic link email sign-in option
* Adding backend routes and model changes to keep state of email verification code and status
* Test and fix end to end email verification flow
* Add documentation for how to use the magic link sign-in when self-hosting Khoj
* Add magic link sign in to public conversation page
Previously the cursor would move to the Khoj side pane on opening it.
This would break user's flow, especially when find similar triggers
automatically
New behavior maintains smoother update of auto find similar without
disrupting user browsing
Previously it would show complete result body this would make the
result width variable and hard to track all the returned results
Showing just heading makes it easier to track
- Call find similar on current element if point has moved to new
element
- Delete the first result from find-similar search results as that'll
be the current note (which is trivially most similar to itself)
- Determine find-similar based text formating at the rendering layer
rather than at the top level find-similar func
* Uses entire file text and summarizer model to generate document summary.
* Uses the contents of the user's query to create a tailored summary.
* Integrates with File Filters #788 for a better UX.
To add multiple allowed Khoj domains pass them as a comma separated
list of domains via the KHOJ_DOMAIN environment variable
Resolve comment in issue #662
Given img src enforcement via CSP required loosening. Soft enforce it
via a regex replace of img HTML elements if the src isn't from the
whitelisted set of source prefixes.
Currently allowed source prefixes are
- app: for local images
- data: for inline generated images
- https://generated.khoj.dev: for cloud generated images
- Create and use a function to convert markdown to sanitized html
- Remove unused Latex delimiter handling as Katex isn't used in
Khoj chat on Obsidian
- Open Khoj in Emacs Side pane
Open Khoj chat, search in right pane to allow for ambient engagement
- Improve Khoj Chat
- Show online references used for chat
- Make chat API call async to not block user interactions
- Fix loading chat history, references in khoj.el chat buffer
- Improve Khoj Search, Find Similar functions
- Make calls to Khoj search API async to not block user interactions
- Support Conversation Sessions
- Create transient menu to open, create, delete conversation sessions from the Khoj Emacs client
- C-x o to switch to search org content conflicts with switch buffer shortkey
This is more apparent in the async search scenario as it prevents
perform other actions while async search is in progress
- Also switching content type wouldn't scale to all the content types
Khoj will support without causing more conflicting keybinding
Khoj side pane occupies a vertically split bottom right side pane.
If the bottom right window is not a vertical split, create a new
vertical split pane for khoj, otherwise reuse the existing window
- Fix getting file filters for not found conversations
- Allow iamge rendering in automation emails
- Fix nearest 15th minute calculation in automations creation
* added support for uploading multiple files at a time.
* optimized multiple file upload to use a batch upload
* allowing files to upload even if there is one unsupported file
See the currently active window in context while doing chat, search
or find similar operations in a side pane.
This is similar to how we've moved Khoj on Obsidian into the side pane
as well
# Major
- Disambiguate Text output mode to disambiguate from Default data source lookup
- Fix showing headings in intermediate step in generating chat response
- Remove "Path" prefix from org ancestor heading in compiled entry
# Minor
- Fix OpenAI chat actor, director unit tests
* Add language-specific syntax highlighting via highlight.js
- Add highlight.js to our assets CDN for fast load and compliance with the CSP
- See other stylesheets options here: https://cdnjs.com/libraries/highlight.js
* Bonus: set min-height to prevent increasing length of the sessions pane
* Fix references rendering and add highlight.js in public conversation
* Fix multilingual font rendering; fallback to an Arabic language font which contains more Asian characters. Close#756
* Tune font-sizes and styling to accomodate new fonts with old sizing
- Move connection-status styling out from inline html into css block
- Remove start typing chat-input height jitter
- align new-conversation button, text
- use relative font sizes instead of absolute font sizes in most places
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
* UI update for file filtered conversations
* Interactive file menu #UI to add/remove files on each conversation as references.
* Backend changes implemented to load selected file filters from a conversation into the querying process.
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Previously if default output was selected by Khoj, we'd end up doing
an documents search as well, even when Khoj selected internet or
general data source to lookup.
This update disambiguates the default information mode from the text
output mode. To avoid doing documents search when not deemed necessary
by Khoj
Prevent XSS attacks by enforcing Content-Security-Policy (CSP) in apps.
Do not allow loading images, other assets from untrusted domains.
- Only allow loading assets from trusted domains
like 'self', khoj.dev, ipapi for geolocation, google (fonts, img)
- images from khoj domain, google (for profile pic)
- assets from khoj domain
- Do not allow iframe src
- Allow unsafe-inline script and styles for now as markdown-it escapes html
in user, khoj chat
- Add hostURL to CSP of the Desktop, Obsidian apps
Given web client is served by khoj server, it doesn't need to
explicitly allow for khoj.dev domain. So if user self-hosting, it'll
automatically allow the domain in the CSP (via 'self')
Whereas the Obsidian, Desktop clients allow configure the server URL.
Note *switching server URL breaks CSP until app is reloaded*
* The command menu (triggered by "/") now has a clickable list of possible commands, that automatically fill into the chat when pressed.
* The `/help` command now searches `khoj.dev` pages to provide useful assistance to the user.
---------
Co-authored-by: raghavt3 <raghavt3@illinois.edu>
Co-authored-by: sabaimran <65192171+sabaimran@users.noreply.github.com>
### Details
- **Chat with Khoj from right pane on Obsidian**
- Modal was too ephemeral, couldn't have it open for reference, quick jump to Khoj chat
- **Stream intermediate steps taken by Khoj** for generating response to the chat pane
Gives more transparency into Khoj 'thinking' process, e.g internet, notes searches performed, documents read etc.
The feedback allows us to tune our messages to elicit better responses by Khoj
- Add ability to **copy message to clipboard, paste chat messages directly into current file**
- Jump to **Search**, **Find Similar** functions from navigation bar on the Khoj Obsidian side pane
- Improve spacing, use consistent colors in chat message references and buttons
Resolves#789, #754
* Updating the API / UI to support sharing of automations
* Allow people to see the automations even when not logged in, and add an overlay effect
* Handle unauthenticated users taking actions
* Support showing pre-filled automation details on the config automations page
* Redirect user to login if they try to add an automation while unauthenticated
- Dedupe the code to add action buttons to chat messages
- Update the renderIncrementalMessage function to also add the action
buttons to newly generated chat messages by Khoj
* Disable automation recurrence at minute level frequency
* Set a max lifetime for django's connections to the db
* Disable any automation that has a non-numeric first digit (i.e., recuring on the minute level)
* Re-enable automations
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
Previously clicking inline links would open the URL directly in the
Desktop app. This was strange and it didn't provide any way to go back
to Khoj desktop app UI from the opened link
- Don't propagate max_tokens to the openai chat completion method. the max for the newer models is fixed at 4096 max output. The token limit is just used for input
* Add support for chatting with Anthropic's suite of models
- Had to use a custom class because there was enough nuance with how the anthropic SDK works that it would be better to simply separate out the logic. The extract questions flow needed modification of the system prompt in order to work as intended with the haiku model
- Pass file path of reference along with the compiled reference in
list of references returned by chat API converts
- Update the structure of references from list of strings to list of
dictionary (containing 'compiled' and 'file' keys)
- Pull out the compiled reference from the new references data struct
wherever it was is being used
Simplify, reuse, standardize code to render messages with references
in the obsidian, web and desktop clients. Specifically:
- Reuse function to create reference section, dedupe code
- Create reusable function to generate image markdown
- Simplify logic to render message with references
- Setup websocket using Khoj web app as reference.
- Moved the geolocating code to chat view out from the general pane
view
- Use loading spinner from web instead of the thinking emoji
It'll replace any highlighted text with the chat message or if not
text is highlighted, it'll insert the chat message at the last cursor
position in the active file
- Jump to chat, show similar actions from nav menu of Khoj side pane
- Add chat, search icons from web, desktop app
- Use lucide icon for find similar (for now)
- Match proportions of find similar icon to khoj other icons via css, js
- Use KhojPaneView abstract class to allow reuse of common functionality like
- Creating the nav bar header in side pane views
- Loading geo-location data for chat context
This should make creating new views easier
* Update suggested automations
* add a schedule picker when creating an automation
* Create a new conversation in flow of the automation scheduling in order to send a preview and deliver more consistent results
* Start adding in scaffolding to manually trigger a test job for an automation
* Add support for manually triggering automations for testing
* Schedule automation asynchronously
* Update styling of the preview button
* Improve admin lookup experience and prevent jobs from being scheduled to run every minute of everyday
* Ignore mypy issues on job info short description
### Description and Rationale for Changes
This feature includes thumbs up and thumbs down buttons on Khoj's chat responses that provide automated feedback. When a thumbs up/down button is clicked, the code sends an email to team@khoj.dev with the following:
* user query
* khoj's response
* whether the sentiment of the user was good or bad.
This is critical in improving Khoj's nondeterministic LLM model for a better user experience.
### List of Changes
* new endpoint in `api_chat.py` (/feedback) that can be used to trigger mail sending).
* thumbs up and thumbs down buttons implemented in `chat.html`
* new function in `routers/email.py` to handle feedback email sending via resend
* `feedback.html` template for a formatted email with the feedback.
---------
Co-authored-by: mythicalcow <mythicalcow@linux.myguest.virtualbox.org>
Co-authored-by: sabaimran <narmiabas@gmail.com>
* Improve the automations UX
- Add suggested jobs to elimiinate some of the cold start problem
- Make each of the tasks cards that are clickable/editable
* Hide suggested automations that have already been added
* Add a footer and reapply styling when a save action is taken on a card
- Allows having it open on the side as you traverse your Obsidian notes
- Allow faster time to response, having responses visible for context
- Enables ambient interactions
* Make conversations optionally shareable
- Shared conversations are viewable by anyone, without a login wall
- Can share a conversation from the three dot menu
- Add a new model for Public Conversation
- The rationale for a separate model is that public and private conversations have different assumptions. Separating them reduces some of the code specificity on our server-side code and allows us for easier interpretation and stricter security. Separating the data model makes it harder to accidentally view something that was meant to be private
- Add a new, read-only view for public conversations
- Add parsing logic for LaTeX-format math equations in the web chat
- Add placeholder delimiters when converting the markdown to HTML in order to avoid removing the escaped characters
- Add the `<!DOCTYPE html>` specification to the page
## Support Scheduling Automations (#695)
1. Detect when user intends to schedule a task, aka reminder
- Support new `reminder` output mode to the response type chat actor
- Show examples of selecting the reminder output mode to the response type chat actor
2. Extract schedule time (as cron timestring) and inferred query to run from user message
3. Use APScheduler to call chat with inferred query at scheduled time
## Make Automations Persistent (#714)
- Make scheduled jobs persistent and work in multiple worker setups
- Add new operation Scheduled Job to Operation enum of ProcessLock
## Add UX to Configure Scheduled Tasks (#715)
- Add section in settings page to view, delete your scheduled tasks
- Add API endpoints to get and delete user scheduled tasks
## Make Automations more Robust. Improve UX (#718)
- Decouple Task Run from User Notification
- Make Scheduling more Robust
- Use JSON mode to get parse-able output from chat model
- Make timezone calculation programmatic on server instead of asking chat model
- Use django-apscheduler to handle apscheduler and django interfacing
- Improve automation UX. Move it out into separate top level page
- Allow creating, modifying automations from the automations page
- Infer cron from natural language client side to avoid roundtrip
- Pass user and calling_url to the scheduled chat too when modifying
params of automation
- Update to use user timezone even when update job via API
- Move timezone string to timezone object calculation into the
schedule automation method
- Previously it was a section in the settings page. Move it to
independent, top-level page to improve visibility of feature
- Calculate crontime from natural language on web client before
sending it to to server for saving new/updated schedule to disk.
- Avoids round-trip of call to chat model
- Convert POST /api/automation API endpoint into a direct request for
automation with query_to_run, subject and schedule provided via the
automation page. This allows more granular control to create automation
- Make the POST automations endpoint more robust; runs validation
checks, normalizes parameters
- Make the config page content use the same top level 3-column layout
as the khoj-header-wrapper
This ensures the content is aligned with heading pane width
- Let cards and other settings sections scale to the width of their
grid element. This utilizes more of the screen space and does it
consistently across the different settings pages
- Create new POST API endpoint to create automations
- Use it in the settings page on the web interface to create
new automations
This simplified managing automations from the setting page by allowing
both delete and create from the same page
- Render crontime string in natural language in message & settings UI
- Show more fields in tasks web config UI
- Add link to the tasks settings page in task scheduled chat response
- Improve task variables names
Rename executing_query to query_to_run. scheduling_query to
scheduling_request
- Make timezone aware scheduling programmatic, instead of asking the
chat model to do the conversion. This removes the need for
scratchpad and may let smaller models handle the task as well
- Make chat model infer subject for email. This should make the
notification email more readable
- Improve email by using subject in email subject, task heading. Move
query to email final paragraph, which is where task metadata should
go
- Using inferred_query directly was brittle (like previous job id)
- And process lock id had a limited size, so wouldn't work for larger
inferred query strings
- Pass timezone string from ipapi to khoj via clients
- Pass this data from web, desktop and obsidian clients to server
- Use user tz to render next run time of scheduled task in user tz
This takes the load of the task scheduling chat actor / prompt from
having to artifically differentiate query to create scheduled task
from a scheduled task run.
- The task scheduling actor was having trouble calculating the
timezone. Giving the actor a scratchpad to improve correctness by
thinking step by step
- Add more examples to reduce chances of the inferred query looping to
create another reminder instead of running the query and sharing
results with user
- Improve task scheduling chat actor test with more tests and
by ensuring unexpected words not present in response
There's a difference between running a scheduled task and notifying
the user about the results of running the scheduled task.
Decide to notify the user only when the results of running the
scheduled task satisfy the user's requirements.
Use sync version of send_message_to_model_wrapper for scheduled tasks
- Store scheduled job state in Postgres so job schedules persist
across app restarts
- Use Process Locks to only allow single worker to process a given job
type. This prevents duplicating job runs across all workers
- Detect when user intends to schedule a task, aka reminder
Add new output mode: reminder. Add example of selecting the reminder
output mode
- Extract schedule time (as cron timestring) and inferred query to run
from user message
- Use APScheduler to call chat with inferred query at scheduled time
- Handle reminder scheduling from both websocket and http chat requests
- Support constructing scheduled task using chat history as context
Pass chat history to scheduled query generator for improved context
for scheduled task generation
Previously the make delete API response failed, after deleting token.
Required a page refresh to see that the API token was actually gone.
This was happening because the response type of the delete token API
endpoint isn't a string, so it failed FastAPI response validation
checks.
- Allow self-hosted users to customize their open ai base url. This allows you to easily use a proxy service and extend support for other models.
- This also includes a migration that associates any existing openai chat model configuration with an openai processor configuration
- Make changing model a paid/subscriber feature
- Removes usage of langchain's OpenAI wrapper for better control over parsing input/output
- Allow passing completion args through completion_with_backoff
- Pass model_kwargs in a separate arg to simplify this
- Pass model in `model_name' kwarg from the send_message_to_model func
`model_name' kwarg is used by langchain, not `model' kwarg
- Make valid file extension checking case insensitive on Desktop app
- Skip indexing non-existent folders on Desktop app
- Pass auth headers to fix lazy load of chat messages on Desktop app
- Set chat-message height to height of content in web, desktop
Previous cross-encoder model was a few years old, newer models should
have improved in quality. Model size increases by 50% compared to
previous for better performance, at least on benchmarks
Most newer, better embeddings models add a query, docs prefix when
encoding. Previously Khoj admins couldn't configure these, so it
wasn't possible to use these newer models.
This change allows configuring the kwargs passed to the query, docs
encoders by updating the search config in the database.
Improve tool, online search, webpage links, docs search chat actor
prompts. Ensure works with hermes-2-pro and llama-3.
Be more specific about generating JSON and not saying anything else.
- Improve extract question prompts to explicitly request JSON list
- Use llama-3 chat format if HF repo_id mentions llama-3. The
llama-cpp-python logic for detecting when to use llama-3 chat format
isn't robust enough currently
* Changed the styling of the link that takes a user to the settings page into a button
* added an indicator that shows if a user is connected to the server or not
* made a class name more descriptive and also made the text in first run message more intuitive
* changed the command to install dependencies in the README.md
* changed the class name of the first run message text to be more descriptive
* added icons in the desktop UI that shows if a file is synced successfully or not
* made the link class name in the homepage more descriptive
* fixed the hover issue on status box in the chat header pane
* fixed hovering issue on status box on macOS
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window
- User configured max tokens limits weren't being passed to
`send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
on machine
- Use min of app configured max tokens, vram based max tokens and
model context window
To access the Khoj admin panel from a non HTTPS custom domain the
`KHOJ_NO_SSL' and `KHOJ_DOMAIN' env vars need to be explictly set.
See the updated setup docs for details.
Resolves#662
### Store Generated Images as WebP
- 78bac4ae Add migration script to convert PNG to WebP references in database
- c6e84436 Update clients to support rendering webp images inline
- d21f22ff Store Khoj generated images as webp instead of png for faster loading
### Lazy Fetch Chat Messages to Improve Time, Data to First Render
This is especially helpful for long conversations with lots of images
- 128829c4 Render latest msgs on chat session load. Fetch, render rest as they near viewport
- 9e558577 Support getting latest N chat messages via chat history API
### Intelligently set Context Window of Offline Chat to Improve Performance
- 4977b551 Use offline chat prompt config to set context window of loaded chat model
### Fixes
- 148923c1 Fix to raise error on hitting rate limit during Github indexing
- b8bc6bee Always remove loading animation on Desktop app if can't login to server
- 38250705 Fix `get_user_photo` to only return photo, not user name from DB
### Miscellaneous Improvements
- 689202e0 Update recommended CMAKE flag to enable using CUDA on linux in Docs
- b820daf3 Makes logs less noisy
- Reduces time to first render when loading long chat sessions
- Limits size of first page load, when loading long chat sessions
These performance improvements are maximally felt for large chat
sessions with lots of images generated by Khoj
Updated web and desktop app to support these changes for now
Previously you couldn't configure the n_ctx of the loaded offline chat
model. This made it hard to use good offline chat model (which these
days also have larger context) on machines with lower VRAM
- Show telemetry enabled/disabled state on init, not every 2 minutes
- Convert no docs synced logs to debug level instead of warning
Having synced docs isn't as important to use Khoj now, unlike before
- Magika on Desktop app was too bloated (100Mb to 250Mb) and broke
install for some reason. Not sure why it was causing the app install
to fail but do not have time to currently investigate
- Just use file extensions whitelist it's good enough for now. Let
server handle the deeper identification of file type
### Index more text file types
- Index all text, code files in Github repos. Not just md, org files
- Send more text file types from Desktop app and improve indexing them
- Identify file type by content & allow server to index all text files
### Deprecate Github Indexing Features
- Stop indexing commits, issues and issue comments in a Github repo
- Skip indexing Github repo on hitting Github API rate limit
### Fixes and Improvements
- **Fix indexing files in sub-folders from Desktop app**
- Standardize structure of text to entries to match other entry processors
- Show internet search, webpage read, image query, image generation steps
- Standardize, improve rendering of the intermediate steps on the web app
Benefits:
1. Improved transparency, allow users to see what Khoj is doing behind
the scenes and modify their query patterns to improve response quality
2. Reduced websocket connection keep alive timeouts for long running steps
- `file-type' doesn't handle mis-labelled files or files without
extensions well
- Only show supported file types in file selector dialog on Desktop app
Use Magika to get list of text file extensions. Combine with other
supported extensions to get complete list of supported file extensions.
Use it to limit selectable files in the File Open dialog.
Note: Folder selector will index text files with no extensions as well
* Don't trigger any re-indexing on server initailization
* Integrate Resend to send welcome emails when a new user signs up
- Only send if this is the first time they've signed in
- Configure welcome email with basic styling, as more complex designs don't work and style tag did not work
### Enable copying chat messages. Improve copy button behavior and styling
- Add button to copy chat messages on Desktop, Web apps
- Improve copy button's icon, hover color & click animation in Desktop, Web apps
### Improve Navigation, Chat Session Panes on Desktop, Web apps
- Dynamically generate navigation menu based on user info from server
- Create API endpoint to get authenticated user information
- Collapse navigation tabs into icons on mobile. Add spacing to them
- Add Chat navigation tab back to top pane on Web app
- Use proper icons for Search, Chat and Agents tab on navigation pane
### Miscellaneous Improvements
- Make current chat expand to full width when session panel collapsed on Desktop App
- Add chat session loading spinner to Desktop App (same as Web app)
### Fixes
- Show title bar in Khoj desktop app on Windows to simplify close, minimize etc.
- Only render first run setup message once if error or server not running
- Fix showing Search navigation tab from Agent pages on web client
The username and location in system prompt should disambiguate user
context from user's actual message for the chat model.
It doesn't need to be told to not mention the context or acknowledge
the context instructions in it's response, as it understands that this
information is just context and not part of the user's actual message.
- Move new conversation button to right of "Conversation" title
- Reduce size of chat message loading ellipsis animation
- Add loading animation for chat session
The `has_documents' flag wasn't being passed. So the search tab
always showing up as empty instead of being dynamically enabled if
documents had been indexed.
- `fs.readdir' func in node version 18.18.2 has buggy `recursive' option
See nodejs/node#48640, effect-ts/effect#1801 for details
- We were recursing down a folder in two ways on the Desktop app.
Remove `recursive: True' option to the `fs.readdirSync' method call
to recurse down via app code only
Add process_single_plaintext_file func etc with similar signatures as
org_to_entries and markdown_to_entries processors
The standardization makes modifications, abstractions easier to create
Sleep until rate limit passed is too expensive, as it keeps a
app worker occupied.
Ideally we should schedule job to contine after rate limit wait time
has passed. But this can only be added once we support jobs scheduling.
Normal indexing quickly Github hits rate limits. Purpose of exposing
Github indexer is for indexing content like notes, code and other
knowledge base in a repo.
The current indexer doesn't scale to index metadata given Github's
rate limits, so remove it instead of giving a degraded experience of
partially indexed repos
- Allow syncing more file types from desktop app to index on server
- Use `file-type' package to identify valid text file types on Desktop app
- Split plaintext entries into smaller logical units than a whole file
Since the text splitting upgrades in #645, compiled chunks have more
logical splits like paragraph, sentence.
Show those (potentially) smaller snippets to the user as references
- Tangential Fix:
Initialize unbound currentTime variable for error log timestamp
- Use Magika's AI for a tiny, portable and better file type
identification system
- Existing file type identification tools like `file' and `magic'
require system level packages, that may not be installed by default
on all operating systems (e.g `file' command on Windows)
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character
## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries
Resolves#620
### Why
- Python 3.12 is the default Python on Ubuntu 24.04 LTS, Windows and Mac via Homebrew
- Python 3.12 has a bunch of improvements that can be explored with Khoj (e.g per core GIL for performance)
## Changes
- The latest PyTorch now supports Python 3.12
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
But it's an optional dependency, so only install it if python < 3.12
### Testing
- Verified Khoj installs fine on Windows and Mac with Python 3.12
- Verified Khoj chat works fine on Mac, Windows with Python 3.12
Resolves#522
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
It's an optional dependency anyway, so only install it if python < 3.12
- Run unit tests with python version 3.12 as well
Resolves#522
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
`re.MULTILINE' should be passed to the `flags' argument, not the
`max_splits' argument of the `re.split' func
This was messing up the indexing by only allowing a maximum of
re.MULTILINE splits. Fixing this improves the search quality to
previous state
More content indexed per entry would result in an overall scores
lowering effect. Increase default search distance threshold to counter that
- Details
- Fix expected results post indexing updates
- Fix search with max distance post indexing updates
- Minor
- Remove openai chat actor test for after: operator as it's not expected anymore
- Major
- Do not split org file, entry if it fits within the max token limits
- Recurse down org file entries, one heading level at a time until
reach leaf node or the current parent tree fits context window
- Update `process_single_org_file' func logic to do this recursion
- Convert extracted org nodes with children into entries
- Previously org node to entry code just had to handle leaf entries
- Now it recieve list of org node trees
- Only add ancestor path to root org-node of each tree
- Indent each entry trees headings by +1 level from base level (=2)
- Minor
- Stop timing org-node parsing vs org-node to entry conversion
Just time the wrapping function for org-mode entry extraction
This standardizes what is being timed across at md, org etc.
- Move try/catch to `extract_org_nodes' from `parse_single_org_file'
func to standardize this also across md, org
These changes improve context available to the search model.
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with sparse, short heading/entry trees
Previously we'd always split markdown files by headings, even if a
parent entry was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to the search model to select appropriate entries for a query,
especially from short entry trees
Revert back to using regex to parse through markdown file instead of
using MarkdownHeaderTextSplitter. It was easier to implement the
logical split using regexes rather than bend MarkdowHeaderTextSplitter
to implement it.
- DFS traverse the markdown knowledge tree, prefix ancestry to each entry
These changes improve entry context available to the search model
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with small files
Previously we split all markdown files by their headings,
even if the file was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to select the appropriate entries for a given query for the search model,
especially from short knowledge trees
- Previous simplistic chunking strategy of splitting text by space
didn't capture notes with newlines, no spaces. For e.g in #620
- New strategy will try chunk the text at more natural points like
paragraph, sentence, word first. If none of those work it'll split
at character to fit within max token limit
- Drop long words while preserving original delimiters
Resolves#620
This was earlier used when the index was plaintext jsonl file. Now
that documents are indexed in a DB this func is not required.
Simplify org,md,pdf,plaintext to entries tests by removing the entry
to jsonl conversion step
- Convert extract_org_entries function to actually extract org entries
Previously it was extracting intermediary org-node objects instead
Now it extracts the org-node objects from files and converts them
into entries
- Create separate, new function to extract_org_nodes from files
- Similarly create wrapper funcs for md, pdf, plaintext to entries
- Update org, md, pdf, plaintext to entries tests to use the new
simplified wrapper function to extract org entries
- Move green server connected dot to the bottom. Show status when
disconnected from server
- Move "New conversation" button to right of the "Conversation" title
- Center alignment of the new conversation and connection status buttons
- Overview
- Extract more structured date variants (e.g with dot(.) & slash(/) separators, 2-digit year)
- Extract some natural, partial dates as well from entries
- Capability
Add ability to extract the following additional date forms:
- Natural Dates: 21st April 2000, February 29 2024
- Partial Natural Dates: March 24, Mar 2024
- Structured Dates: 20/12/24, 20.12.2024, 2024/12/20
Note: Previously only YYYY-MM-DD ISO-8601 structured date form was extracted for date filters
- Performance
Using regexes is MUCH faster than using the `dateparser' python library
It's a little crude but gives acceptable performance for large datasets
## Benefits
- Support all GGUF format chat models
- Support more GPUs like AMD, Nvidia, Mac, Vulcan (previously just Vulcan, Mac)
- Support more capabilities like larger context window, schema enforcement, speculative decoding etc.
## Changes
### Major
- Use llama.cpp for offline chat models
- Support larger context window
- Automatically apply appropriate chat template. So offline chat models not using llama2 format are now supported
- Use better default offline chat model, NousResearch/Hermes-2-Pro-Mistral-7B
- Enable extract queries actor to improve notes search with offline chat
- Update documentation to use llama.cpp for offline chat in Khoj
### Minor
- Migrate to use NouseResearch's Hermes-2-Pro 7B as default offline chat model in khoj.yml
- Rename GPT4AllChatProcessor to OfflineChatProcessor Config, Model
- Only add location to image prompt generator when location known
- Much faster than using dateparser
- It took 2x-4x for improved regex to extracts 1-15% more dates
- Whereas It took 33x to 100x for dateparser to extract 65% - 400% more dates
- Improve date extractor tests to test deduping dates, natural,
structured date extraction from content
- Extract some natural, partial dates and more structured dates
Using regex is much faster than using dateparser. It's a little
crude but should pay off in performance.
Supports dates of form:
- (Day-of-Month) Month|AbbreviatedMonth Year|2DigitYear
- Month|AbbreviatedMonth (Day-of-Month) Year|2DigitYear
Previously we just extracted dates in YYYY-MM-DD format from content
for date filterings during search.
Use dateparser to extract dates across locales and natural language
This should improve notes returned as context when chat searches
knowledge base with date filters
Fallback to regex for date parsing from content if dateparser fails
- Limit natural date extractor capabilities to improve performance
- Assume language is english
Language detection otherwise takes a REALLY long time
- Do not extract unix timestamps, timezone
- This isn't required, as just using date and approximating dates as UTC
- When setting up the default agent, configure every conversation that doesn't have an agent to use the Khoj agent
- Fix reverse migration for the locale removal migration
Previously we were skipping the extract questions step for offline
chat as default offline chat model wasn't good enough to output proper
json given the time it took to extract questions.
The new default offline chat models gives json much more regularly and
with date filters, so the extract questions step becomes useful given
the impact on latency
- How to pip install khoj to run offline chat on GPU
After migration to llama-cpp-python more GPU types are supported but
require build step so mention how
- New default offline chat model
- Where to get supported chat models from on HuggingFace
- Benefits of moving to llama-cpp-python from gpt4all:
- Support for all GGUF format chat models
- Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
- Supports models with more capabilities like tools, schema
enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
chat models
- Load offline chat model when present on disk without requiring internet
- Load model onto GPU if not disabled and device has GPU
- Load model onto CPU if loading model onto GPU fails
- Create helper function to check and load model from disk, when model
glob is present on disk.
`Llama.from_pretrained' needs internet to get repo info from
HuggingFace. This isn't required, if the model is already downloaded
Didn't find any existing HF or llama.cpp method that looked for model
glob on disk without internet
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
* Customize default behaviors for conversations without agents or with default agents
* Add a new web client route for viewing all agents
* Use agent_id for getting correct agent
* Add web UI views for agents
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue
* Fix agent view
* Spruce up the 404 page and improve the overall layout for agents pages
* Create chat actor for directly reading webpages based on user message
- Add prompt for the read webpages chat actor to extract, infer
webpage links
- Make chat actor infer or extract webpage to read directly from user
message
- Rename previous read_webpage function to more narrow
read_webpage_at_url function
* Rename agents_page -> agent_page
* Fix unit test for adding the filename to the compiled markdown entry
* Fix layout of agent, agents pages
* Merge migrations
* Let the name, slug of the default agent be Khoj, khoj
* Fix chat-related unit tests
* Add webpage chat command for read web pages requested by user
Update auto chat command inference prompt to show example of when to
use webpage chat command (i.e when url is directly provided in link)
* Support webpage command in chat API
- Fallback to use webpage when SERPER not setup and online command was
attempted
- Do not stop responding if can't retrieve online results. Try to
respond without the online context
* Test select webpage as data source and extract web urls chat actors
* Tweak prompts to extract information from webpages, online results
- Show more of the truncated messages for debugging context
- Update Khoj personality prompt to encourage it to remember it's capabilities
* Rename extract_content online results field to webpages
* Parallelize simple webpage read and extractor
Similar to what is being done with search_online with olostep
* Pass multiple webpages with their urls in online results context
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.
URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
* Render webpage read in chat response references on Web, Desktop apps
* Time chat actor responses & chat api request start for perf analysis
* Increase the keep alive timeout in the main application for testing
* Do not pipe access/error logs to separate files. Flow to stdout/stderr
* [Temp] Reduce to 1 gunicorn worker
* Change prod docker image to use jammy, rather than nvidia base image
* Use Khoj icon when Khoj web is installed on iOS as a PWA
* Make slug required for agents
* Simplify calling logic and prevent agent access for unauthenticated users
* Standardize to use personality over tuning in agent nomenclature
* Make filtering logic more stringent for accessible agents and remove unused method:
* Format chat message query
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
### Overview
Khoj can now read website directly without needing to go through the search step first
### Details
- Parallelize simple webpage read and extractor
- Rename extract_content online results field to web pages
- Tweak prompts to extract information from webpages, online results
- Test select webpage as data source and extract web urls chat actors
- Render webpage read in chat response references on Web, Desktop apps
- Pass multiple webpages with their urls in online results context
- Support webpage command in chat API
- Add webpage chat command for read web pages requested by user
- Create chat actor for directly reading webpages based on user message
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.
URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
- Fallback to use webpage when SERPER not setup and online command was
attempted
- Do not stop responding if can't retrieve online results. Try to
respond without the online context
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
* Customize default behaviors for conversations without agents or with default agents
* Use agent_id for getting correct agent
* Merge migrations
* Simplify some variable definitions, add additional security checks for agents
* Rename agent.tuning -> agent.personality
- Use the conversation id of the retrieved conversation rather than the
potentially unset conversation id passed via API
- await creating new chat when no chat id provided and no existing
conversations exist
- Move some common methods into separate functions to make the UI components more efficient
- The normal HTTP-based chat connection will still work and serves as a fallback if the websocket is unavailable
- Convert to a model of calling the search API directly with a function call (rather than using the API method)
- Gracefully handle websocket connection disconnects
- Ensure that the rest of the response is still saved, as it is currently, if the user disconects from the client
- Setup unchangeable context at the beginning of the session when the connection is established (like location, username, etc)
The recently added after: operator to online search actor was too
restrictive, gave worse results than when just use natural language
dates in search query
### Improve
- Improve delete, rename chat session UX in Desktop, Web app
- Get conversation by title when requested via chat API
### Fix
- Allow unset locale for Google authenticating user
- Handle truncation when single long non-system chat message
- Fix setting chat session title from Desktop app
- Only create new chat on get if a specific chat id, slug isn't requested
Previously was assuming the system prompt is being always passed as
the first message. So expected there to be at least 2 messages in logs.
This broke chat actors querying with single long non system message.
A more robust way to extract system prompt is via the message role
instead
- Ask for Confirmation before deleting chat session in Desktop, Web app
- Save chat session rename on hitting enter in title edit input box
- No need to flash previous conversation cleared status message
- Move chat session delete button after rename button in Desktop app
- Add prompt for the read webpages chat actor to extract, infer
webpage links
- Make chat actor infer or extract webpage to read directly from user
message
- Rename previous read_webpage function to more narrow
read_webpage_at_url function
### Major
- Enforce json mode response from OpenAI chat actors prev using string lists
- Use `gpt-4-turbo-preview' as default chat model, extract questions actor
- Make Khoj read khoj website to respond with accurate, up-to-date information about itself
- Dedupe query in notes prompt. Improve OAI chat actor, director tests
### Minor
- Test data source, output mode selector, web search query chat actors
- Improve notes search actor to always create a non-empty list of queries
- Construct available data sources, output modes as a bullet list in prompts
- Use consistent agent name across static and dynamic examples in prompts
- Add actor's name to extract questions prompt to improve context for guidance
Previously only the notes references would get rendered post response
streaming when when both online and notes references were used to
respond to the user's message
- Allow passing response format type to OpenAI API via chat actors
- Convert in-context examples to use json objects instead of str lists
- Update actors outputting str list to request output to be json_object
- OpenAI's json mode enforces the model to output valid json object
- Remove stale tests
- Improve tests to pass across gpt-3.5 and gpt-4-turbo
- The haiku creation director was failing because of duplicate query in
instantiated prompt
- Remove the option for Notes search query generation actor to return
no queries. Whether search should be performed is decided before,
this step doesn't need to decide that
- But do not throw warning if the response is a list with no elements
- Add examples where user queries requesting information about Khoj
results in the "online" data source being selected
- Add an example for "general" to select chat command prompt
Previously the examples constructed from chat history used "Khoj" as
the agent's name but all 3 prompts using the func used static examples
with "AI:" as the pertinent agent's name
- Add example to read khoj.dev website for up-to-date info to setup,
use khoj, discover khoj features etc.
- Online search should use site: and after: google search operators
- Show example of adding the after: date filter to google search
- Give local event lookup example using user's current location in
query
- Remove unused select search content type prompt
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue
### Major
- Read web pages in parallel to improve chat response time
- Read web pages directly when Olostep proxy not setup
- Include search results & web page content in online context for chat response
### Minor
- Simplify, modularize and add type hints to online search functions
Previously if a web page was read for a sub-query, only the extracted
web page content was provided as context for the given sub-query. But
the google results themselves have relevant snippets. So include them
- Simplify content arg to `extract_relevant_info' function. Validate,
clean the content arg inside the `extract_relevant_info' function
- Extract `search_with_google' function outside the parent function
- Call the parent function a more appropriate `search_online' instead
of `search_with_google'
- Simplify the `search_with_google' function using list comprehension.
Drop empty search result fields from chat model context for response
to reduce cost and response latency
- No need to show stacktrace when unable to read webpage, basic error
is enough
- Add type hints to online search functions to catch issues with mypy
- Time reading webpage, extract info from webpage steps for perf
analysis
- Deduplicate webpages to read gathered across separate google
searches
- Use aiohttp to make API requests non-blocking, pair with asyncio to
parallelize all the online search webpage read and extract calls
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
- Trigger
SentenceTransformer Cross Encoder models now run fast on GPU enabled machines, including Mac ARM devices since UKPLab/sentence-transformers#2463
- Details
- Use cross-encoder to rerank search results by default on GPU machines and when using an inference server
- Only call search API when pause in typing search query on web, desktop apps
Wait for 300ms since stop typing before calling search API.
This smooths out UI jitter when rendering search results, especially
now that we're reranking for every search query on GPU enabled devices
Emacs already has 300ms debounce time. More convoluted to add
debounce time to Obsidian search modal, so not updating that yet
Latest sentence-transformer package uses GPU for cross-encoder. This
makes it fast enough to enable reranking on machines with GPU.
Enabling search reranking by default allows (at least) users with GPUs
to side-step learning the UI affordance to rerank results
(i.e hitting Cmd/Ctrl-Enter or ENTER).
### Issue
Previously deleting a chat session from the side panel on desktop, web app would sometimes result in also creating a new chat session
### Fix
`get_conversation_by_user' shouldn't return new conversation if
conversation with requested id not found.
It should only return new conversation if no specific conversation
is requested and no conversations found for user at all
### Miscellaneous Improvements
- Chat history load should be logged as call to that chat_history api,
not the "chat" api
- Show status updates of clearing conversation history in chat input
- Simplify web, desktop client code by removing unnecessary new variables
### Repro
- Delete a new chat, this calls loadChat via window.onload which
calls server /chat/history API endpoint with conversationId set to
that of just deleted conversation sporadically
The call to GET chat/history API with conversationId set occurs
when window.onload triggers before the conversationId is deleted
by the delete button after the DELETE /chat/history API call (via race)
- In such a scenario, get_conversation_by_user called by
chat/history API with conversationId of deleted conversation
returns a new conversation
- Fix
`get_conversation_by_user' shouldn't return new conversation if
conversation with requested id not found.
It should only return new conversation if no specific conversation
is requested and no conversations found for user at all
- Repro
- Delete a new chat, this calls loadChat via window.onload which
calls server /chat/history API endpoint with conversationId set to
that of just deleted conversation sporadically
The call to GET chat/history API with conversationId set occurs
when window.onload triggers before the conversationId is deleted
by the delete button after the DELETE /chat/history API call (via race)
- In such a scenario, get_conversation_by_user called by
chat/history API with conversationId of deleted conversation
returns a new conversation
- Miscellaneous
- Chat history load should be logged as call to that chat_history api,
not the "chat" api
- Show status updates of clearing conversation history in chat input
- Simplify web, desktop client code by removing unnecessary new variables
* Upload generated images to s3, if AWS credentials and bucket is available.
- In clients, render the images via the URL if it's returned with a text-to-image2 intent type
* Make the loading screen more intuitve, less jerky and update the programmatic copy button
* Update the loading icon when waiting for a chat response
* Add additional styling changes for showing UI changes when dragging file to the main screen
* Add a loading spinner when file upload is in progress, and don't index github/notion when indexing files
* Add an explicit icon for file uploading in the chat button menu
* Add appropriate dragover styling when picking a file from the file picker/browser
* Add a loading screen when retrieving chat history. Fix width of the chat window. Put attachment icon to the left of chat input
* Make major improvements to the image generation flow
- Include user context from online references and personal notes for generating images
- Dynamically select the modality that the LLM should respond with
- Retun the inferred context in the query response for the dekstop, web chat views to read
* Add unit tests for retrieving response modes via LLM
* Move output mode unit tests to the actor suite, rather than director
* Only show the references button if there is at least one available
* Rename aget_relevant_modes to aget_relevant_output_modes
* Use a shared method for generating reference sections, simplify some of the prompting logic
* Make out of space errors in the desktop client more obvious
- Open external links using the default link handler registered on OS
for the link type, e.g http:// -> firefox, mailto: thunderbird etc
- Confirm before opening non-http URL using an external app
- Improve render of inferred query in image chat messages in Web, Desktop apps
- Add inferred queries to image chat responses in Obsidian client
- Fix rendering images from Khoj response in Obsidian client
* Simplify and clarify prompt for selecting toolset dynamically
* Add error handling around call to OLOSTEP api
* Fix conversation admin page
* Skip adding none or empty entries in the chunking method
- Improve
- Only send files modified since their last sync for indexing on server from the Obsidian client
- Fix
- Invalidate static asset browser cache in Web client when Khoj version changes
Previously we'd send all files in vault and let the server
deduplicate.
This changes takes inspiration from the desktop app, and only pushes
files which were modified after their previous sync with the server.
This should reduce the processing load on the server
* Retrieve, create, and save conversations differently if they're coming from a client application
- Not all of our client apps will necessarily maintain state over the conversation IDs available to a user. For some (single-threaded conversations), it should just use a single conversation. Fix the code to do so
* Simplify conversation retrieval logic
* Keep 0 padding below chat response
* Add order_by sorting to retrieving the conversation without id
### Improvements to Chat UI on Web, Desktop apps
- Improve styling of chat session side panel
- Improve styling of chat message bubble in Desktop, Web app
- Add frosted, minimal chat UI to background of Login screen
- Improve PWA install experience of Khoj
### Fixes to Chat UI on Web, Desktop apps
- Fix creating new chat sessions from the Desktop app
- Only show 3 starter questions even when consecutive chat sessions created
### Other Improvements
- Update Khoj cloud trial period to a fortnight instead of a week
- Document using venv to handle dependency conflict on khoj pip install
Resolves#276
- Resolve PWA issues thrown by Chrome/Edge
- Add screenshot samples showcasing remember, browse and draw features
- This can provide a richer app store like experience when
installing Khoj PWA on Mobile or Desktop
- Add wide and narrow screenshots to show Mobile vs Desktop UX
- Add higher resolution favicon for PWA
- Use single web manifest instead of separate ones for Chat, Search
- Update manifest description with more details about Khoj features
Reset starter question suggestions before appending in web, desktop app
Otherwise previously it'd keep adding to existing starter question
suggestions on each new session creation if multiple consecutive new
chat sessions created.
This would result in more than the 3 expected starter questions being
displayed at a time
- Make collapse, expand toggle arrow point in the direction the action
will expand the side panel in
- Make the collapsed side panel reduce to a 1px sliver
- Improve rate limit error message wording
- Make the "too many requests" error message more robust. Should throw
that exception fix self.request >= self.subscribed_requests because
upgrading wouldn't fix this rate limiting
- Respect newline with pre-line but not for bullets to improve
formatting of responses by Khoj
- Respect bold font by loading tajawal font with other weights
- Reduce bottom margin in chat message bubble, its taking too much space
* Document original query when subqueries can't be generated
* Only add messages to the chat message log if it's non-empty
* When changing the search model, alert the user that all underlying data will be deleted
* Adding more clarification to the prompt input for username, location
* Check if has_more is in the notion results before getting next_cursor
* Update prompt template for user name/location, update confirmation message when changing search model
* Display given_name field only if it is not None
* Add default slugs in the migration script
* Ensure that updated_at is saved appropriately, make sure most recent chat is returned for default history
* Remove the bin button from the chat interface, given deletion is handled in the drop-down menus
* Refresh the side panel when a new chat is created
* Improveme tool retrieval prompt, don't let /online fail, and improve parsing of extract questions
* Fix ending chat response by offline chat on hitting a stop phrase
Previously the whole phrase wouldn't be in the same response chunk, so
chat response wouldn't stop on hitting a stop phrase
Now use a queue to keep track of last 3 chunks, and to stop responding
when hit a stop phrase
* Make chat on Obsidian backward compatible post chat session API updates
- Make chat on Obsidian get chat history from
`responseJson.response.chat' when available (i.e when using new api)
- Else fallback to loading chat history from
responseJson.response (i.e when using old api)
* Fix detecting success of indexing update in khoj.el
When khoj.el attempts to index on a Khoj server served behind an https
endpoint, the success reponse status contains plist with certs. This
doesn't mean the update failed.
Look for :errors key in status instead to determine if indexing API
call failed. This fixes detecting indexing API call success on the
Khoj Emacs client, even for Khoj servers running behind SSL/HTTPS
* Fix the mechanism for populating notes references in the conversation primer for both offline and online chat
* Return conversation.default when empty list for dynamic prompt selection, send all cmds in telemetry
* Fix making chat on Obsidian backward compatible post chat session API updates
New API always has conversation_id set, not `chat' which can be unset
when chat session is empty.
So use conversation_id to decide whether to get chat logs from
`responseJson.response.chat' or `responseJson.response' instead
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
- Remove unused git dependency from Docker images
- Move python packages used for test into dev dependency group
- Only enable API token, Whatsapp cards on Web UI when Stripe, Twilio setup
- Move production dependencies to prod python packages group
- Fix docs links in Khoj welcome chat message
This will reduce khoj dependencies to install for self-hosting users
- Move auth production dependencies to prod python packages group
- Only enable authentication API router if not in anonymous mode
- Improve error with requirements to enable authentication when not in
anonymous mode
* Add location metadata to chat history
* Add support for custom configuration of the user name
* Add region, country, city in the desktop app's URL for context in chat
* Update prompts to specify user location, rather than just location.
* Add location data to Obsidian chat query
* Use first word for first name, last word for last name when setting profile name
* Have Khoj dynamically select which conversation command(s) are to be used in the chat flow
- Intercept the commands if in default mode, and have Khoj dynamically guess which tools would be the most relevant for answering the user's query
* Remove conditional for default to enter online search mode
* Add multiple-tool examples in the prompt, make prompt for tools more specific to info collection
* Add chat sessions to the desktop application
* Increase width of the main chat body to 90vw
* Update the version of electron
* Render the default message if chat history fails to load
* Merge conversation migrations and fix slug setting
* Update the welcome message, use the hostURL, and update background color for chat actions
* Only update the window's web contents if the page is config
* Enable support for multiple chat sessions within the web client
- Allow users to create multiple chat sessions and manage them
- Give chat session slugs based on the most recent message
- Update web UI to have a collapsible menu with active chats
- Move chat routes into a separate file
* Make the collapsible side panel more graceful, improve some styling elements of the new layout
* Support modification of the conversation title
- Add a new field to the conversation object
- Update UI to add a threedotmenu to each conversation
* Get the default conversation if a matching one is not found by id
- Removed node-fetch dependency to work on mobile.
- Fix CORS issue for Khoj (streaming) chat on Obsidian mobile
- Verified Khoj plugin, search, chat work on Obsidian mobile.
## Details
### Major
- Allow calls to Khoj server from Obsidian mobile app to fix CORS issue
- Chat stream using default `fetch' not `node-fetch' in obsidian plugin
### Minor
- Load chat history after other elements in chat modal on Obsidian are rendered
- Scroll to bottom of chat modal on Obsidian across mobile & desktop
- Provide more context and instructions to offline chat on Khoj
- Upgrade offilne chat quality tests to support more use-cases
### Details
- Improve offline chat system prompt to think step by step
- Make offline chat model current date aware. Improve system prompts
- Fix actor, director tests using freeze time by ignoring transformers package
- Obsidian mobile uses capacitor js. Requests from it have origin as
http://localhost on Android and capacitor://localhost on iOS
- Allow those Obsidian mobile origins in CORS middleware of server
- Can now expect date awareness chat quality test to pass
- Prevent offline chat model from printing verbatim user Notes and
special tokens
- Make it ask follow-up questions if it needs more context
This allows using open or commerical, local or hosted LLM models that
are not supported in Khoj by default.
It also allows users to use other local LLM API servers that support
their GPU
Closes#407
- Put Whatsapp card back in Client section.
- Fixes side spacing on cards
- Improve Whatsapp card row gaps
- Hide notification banner on web app load. Previously it showed up as
a yellow dot on smaller displays
* Fix subscription state detection for users based on phone numbers, emails
* Fix unit tests for api_user4
* Use a single method for determining subscription from user
* Pass user object, rather than user.email for getting subscription state
* Fix license in pyproject.toml. Remove unused utils.state import
* Use single debug mode check function. Disable telemetry in debug mode
- Use single logic to check if khoj is running in debug mode.
Previously there were 3 different variants of the check
- Do not log telemetry if KHOJ_DEBUG is set to true. Previously didn't
log telemetry even if KHOJ_DEBUG set to false
* Respect line breaks in user, khoj chat messages to improve formatting
* Disable Whatsapp config section on web client if Twilio not configured
Simplify Whatsapp configuration status checking js by standardizing
external input to lower case
* Disable Phone API when Twilio not setup and rate limit calls to it
- Move phone api to separate router and only enable it if Twilio enabled
- Add rate-limiting to OTP and verification calls
* Add slugs for phone rate limiting
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
* Add a page about privacy and organize some of the documentation
* Add notice about telemetry
* Improve copy for privacy section, link to telemetry section
* Store rate limiter-related metadata in the database for more resilience
- This helps maintain state even between server restarts
- Allows you to scale up workers on your service without having to implement sticky routing
* Make the usage exceeded message less abrasive
* Fix rate limiter for specific conversation commands and improve the copy
* Add retries in case the embeddings API fails
* Improve error handling in the inference endpoint API request handler
- retry only if HTTP exception
- use logger to output information about errors
* Initailize changes to incporate web scraping logic after getting SERP results
- Do some minor refactors to pass a symptom prompt to the openai model when making a query
- integrate Olostep in order to perform the webscraping
* Fix truncation error with new line, fix typing in olostep code
* Use the authorization header for the token
* Add a small hint/indicator for how to use Khojs other modalities in the welcome prompt
* Add more detailed error message if Olostep query fails
* Add unit tests which invoke Olostep in chat director
* Add test for olostep tool
* Improve subqueries for online search and prompt generation for image
- Include conversation history so that subqueries or intermediate prompts are generated with the appropriate context
* Allow users to configure phone numbers with the Khoj server
* Integration of API endpoint for updating phone number
* Add phone number association and OTP via Twilio for users connecting to WhatsApp
- When verified, store the result as such in the KhojUser object
* Add a Whatsapp.svg for configuring phone number
* Change setup hint depending on whether the user has a number already connected or not
* Add an integrity check for the intl tel js dependency
* Customize the UI based on whether the user has verified their phone number
- Update API routes to make nomenclature for phone addition and verification more straightforward (just /config/phone, etc).
- If user has not verified, prompt them for another verification code (if verification is enabled) in the configuration page
* Use the verified filter only if the user is linked to an account with an email
* Add some basic documentation for using the WhatsApp client with Khoj
* Point help text to the docs, rather than landing page info
* Update messages on various callbacks and add link to docs page to learn more about the integration
## Major
### Move to single click audio chat UX on Obsidian, Desktop, Web clients
New default UX has 1 long-press on mobile, 2-click on desktop to send transcribed audio message
- New Audio Chat Flow
1. Record audio while microphone button pressed
2. Show auto-send 3s countdown timer UI for audio chat message
Provide a visual cue around send button for how long before audio
message is automatically sent to Khoj for response
3. Auto-send msg in 3s unless stop send message button clicked
- Why
- Removes the previous default of 3 clicks required to send audio message
The record > stop > send process to send audio messages was unclear and effortful
- Still allows stopping message from being sent, to make correction to transcribed audio
- Removes inadvertent long audio transcriptions if forget to press stop while recording
### Improve chat input pane actions & icons on Obsidian. Desktop, Web clients
- Use SVG icons in chat footer on web, desktop app
- Move delete icon to left of chat input. This makes it harder to inadvertently click it
- Add send button to chat input pane
- Color chat message send button to make it primary CTA
- Make chat footer shorter. Use no or round border on action buttons
## Minor
- Stop rendering empty starter questions element when no questions present
- Add round border, hover color to starter questions in web, desktop apps
- Fix auto resizing chat input box when transcribed text added
- Convert chat input into a text area in the Obsidian client
- Capabillity
New default UX has 1 long-press to send transcribed audio message
- Removes the previous default of 3 clicks required to send audio message
- The record > stop > send process to send audio messages was unclear
- Still allows stopping message from being sent, if users want to make
correction to transcribed audio
- Removes inadvertent long audio transcriptions if user forgets to
press stop when recording
- Changes
- Record audio while microphone button pressed
- Show auto-send 3s countdown timer UI for audio chat message
Provide a visual cue around send button for how long before audio
message is automatically sent to Khoj for response
- Auto-send msg in 3s unless stop send message button clicked
- Capabillity
New default UX has 1 long-press to send transcribed audio message
- Removes the previous default of 3 clicks required to send audio message
- The record > stop > send process to send audio messages was unclear
- Still allows stopping message from being sent, if users want to make
correction to transcribed audio
- Removes inadvertent long audio transcriptions if user forgets to
press stop when recording
- Changes
- Record audio while microphone button pressed
- Show auto-send 3s countdown timer UI for audio chat message
Provide a visual cue around send button for how long before audio
message is automatically sent to Khoj for response
- Auto-send msg in 3s unless stop send message button clicked
- Capabillity
New default UX has 1 long-press to send transcribed audio message
- Removes the previous default of 3 clicks required to send audio message
- The record > stop > send process to send audio messages was unclear
- Still allows stopping message from being sent, if users want to make
correction to transcribed audio
- Removes inadvertent long audio transcriptions if user forgets to
press stop when recording
- Changes
- Record audio while microphone button pressed
- Show auto-send 3s countdown timer UI for audio chat message
Provide a visual cue around send button for how long before audio
message is automatically sent to Khoj for response
- Auto-send msg in 3s unless stop send message button clicked
- Move delete icon to left of chat input. This makes it harder to
inadvertently click
- Add send button to chat footer. Enter being the only way to send
messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
inadvertently click
- Add send button to chat footer. Enter being the only way to send
messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
inadvertently click
- Add send button to chat footer. Enter being the only way to send
messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
* Add support for a first party client app
- Based on a client id and client secret, allow a first party app to call into the Khoj backend with a phone number identifier
- Add migration to add phone numbers to the KhojUser object
* Add plus in front of country code when registering a phone number.
- Decrease free tier limit to 5 (from 10)
- Return a response object when handling stripe webhooks
* Fix telemetry method which references authenticated user's client app
* Add better error handling for null phone numbers, simplify logic of authenticating user
* Pull the client_secret in the API call from the authorization header
* Add a migration merge to resolve phone number and other changes
The actual issue was that `get_or_create_user_by_email' tried to
create a subscription even if it already existed.
With updated logic:
- New subscription is only created when it doesn't already exist in
`get_or_create_user_by_email'
- `set_user_subscription' just updates the subscription state as
user subscription object creation is already managed by
`get_or_create_user_by_email'. So the other conditionals are
unnecessary
- Issue
Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string
This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview plugin error
- Fix
Pass a string as sourcePath to markdownRenderer to fix failing chat response
and stop throwing dataview errors on search
Resolves#614, Resolves#606
- Update health API to pass authenticated users their info
- Improve Khoj server status check in Khoj Obsidian client
- Show Khoj Obsidian commands even if no connection to server
- Show Khoj chat by default in Obsidian side pane instead of search
Server connection check can be a little flaky in Obsidian. Don't gate
the commands behind it to improve usability of Khoj.
Previously the commands would get disabled when server connection
check failed, even though server was actually accessible
- Update server connection status on every edit of khoj url, api key in
settings instead of only on plugin load
The error message was stale if connection fixed after changes in
Khoj plugin settings to URL or API key, like on plugin install
- Show better welcome message on first plugin install.
Include API key setup instruction
- Show logged in user email on Khoj settings page
- Issue: Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string
This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview error
- Fix: Pass a string as sourcePath to markdownRenderer to fix
failing chat response
Resolves#614, Resolves#606
* Add support for custom inference endpoints for the cross encoder model
- Since there's not a good out of the box solution, I've deployed a custom model/handler via huggingface to support this use case.
* Use langchain.community for pdf, openai chat modules
* Add an explicit stipulation that the api endpoint for crossencoder inference should be for huggingface for now
This allows Khoj clients to get email address associated with
user's API token for display in client UX
In anonymous mode, default user information is passed
### Major
- Short-circuit API rate limiter for unauthenticated user
Calls by unauthenticated users were failing at API rate limiter as it
failed to access user info object. This is a bug.
API rate limiter should short-circuit for unauthenicated users so a
proper Forbidden response can be returned by API
Add regression test to verify that unauthenticated users get 403
response when calling the /chat API endpoint
### Minor
- Remove trailing slash to normalize khoj url in obsidian plugin settings
- Move used /api/config API controllers into separate module
- Delete unused /api/beta API endpoint
- Fix error message rendering in khoj.el, khoj obsidian chat
- Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn
- Use /api/health for server up check instead of api/config/default
- Remove unused `khoj--post-new-config' method
- Remove the now unused /config/data GET, POST API endpoints
### Major
- Push 1000 files at a time from the Desktop client for indexing
- Push 1000 files at a time from the Obsidian client for indexing
- Test 1000 file upload limit to index/update API endpoint
### Minor
- Show relevant error message in desktop app, e.g when can't connect to server
- Pass indexed filenames in API response for client validation
- Collect files to index in single dict to simplify index/update controller
Resolves#573
- Only run /online command offline chat director test when `SERPER DEV_API_KEY' present
- Decode URL encoded query string in chat API endpoint before processing
- Make references and online_results optional params to converse_offline
- Pass max context length to fix using updated `GPT4All.list_gpu' method
* Support using hosted Huggingface inference endpoint for embeddings generation
* Since the huggingface inference endpoint is model-specific, make the URL an optional property of the search model config
* Handle ECONNREFUSED error in desktop app
* Drive API key via the search model config model and use more generic names
- Ensure langchain less than 0.2.0 is used, to prevent breaking
ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0
- Set subscription renewal date to a timezone aware datetime
- Use logger.warning instead of logger.warn as latter is deprecated
- Use `model_dump' not deprecated dict to get all configured content_types
Calls by unauthenticated users were failing at API rate limiter as it
failed to access user info object. This is a bug.
API rate limiter should short-circuit for unauthenicated users so a
proper Forbidden response can be returned by API
Add regression test to verify that unauthenticated users get 403
response when calling the /chat API endpoint
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
- 26f96e00 Use Khoj Client, Data sources diagrams in feature docs
- c82d34b6 Add Docs footer, nav pane links. Fix tagline, Remove announcement topbar
- d920e4d0 Make the docs overview page as the main docs landing page
- 80d1ad5b Fix image urls on docs overview page. Remove logo header in client docs
- Make the docs overview page available at docs.khoj.dev root instead of
under docs.khoj.dev/docs path
- Remove the new landing page, it is unnecessary.
- Remove /docs path prefix from links to internal doc pages
- Remove .md path suffix in internal doc pages for consistency
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
Make usage of the first offline/openai chat model as the default LLM
to use for background tasks more explicit
The idea is to use the default/first chat model for all background
activities, like user message to extract search queries to perform.
This is controlled by the server admin.
The chat model set by the user is used for user-facing functions like
generating chat responses
Previously you could only index org-mode files and directories from
khoj.el
Mark the `khoj-org-directories', `khoj-org-files' variables for
deprecation, since `khoj-index-directories', `khoj-index-files'
replace them as more appropriate names for the more general case
Resolves#597
- Add more accurate steps for building Khoj locally
- Remove outdated instructions
- Add specific steps to create a Github Issue
- Add instructions for Obsidian plugin development
- Fix streaming chat response in Obsidian client
- Fix first-run, chat error message in obsidian, desktop and web clients
- Set Khoj app version to latest version in Docker images
- Tag Khoj Docker image built on release with the `latest` tag
This align docker image release cadence with client, server releases
- Tag docker image with `tag_name' on release (i.e tag push)
- Else tag with 'pre' on push to master
- Else tag with 'dev' on push to PR branch
- Only tag the latest release with release tag
Previously the latest commit on master was being tagged with the
latest tag. This doesn't sync with the release cadence of the rest
of Khoj
This will allow troubleshooting by getting the actual khoj version
being used. Previously it was always set to a static 0.0.0 version
Command to build Khoj docker image with dynamically set current app version:
`docker-compose build server --build-arg VERSION=$(pipx run hatch version)'
- Honor this setting across the relevant places where embeddings are used
- Convert the VectorField object to have None for dimensions in order to make the search model easily configurable
- Improve the prompt before sending it for image generation
- Update the help message to include online, image functionality
- Improve styling for the voice, trash buttons
The source URL returned by OpenAI would expire soon. This would make
the chat sessions contain non-accessible images/messages if using
OpenaI image URL
Get base64 encoded image from OpenAI and store directly in
conversation logs. This resolves the image link expiring issue
- All search models are loaded into memory, and stored in a dictionary indexed by name
- Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option
- Move it out to conversation.utils from generate_chat_response function
- Log new optional intent_type argument to capture type of response
expected. This can be type responses by Khoj e.g speech, image. It
can be used to render responses by Khoj appropriately on clients
- Make user_message_time argument optional, set the time to now by
default if not passed by calling function
- Wasn't able to login to the admin panel when KHOJ_DEBUG was not True. Fix this error so self-hosted users can get unblocked from accessing the admin settings
- Don't force users to set their KHOJ_DJANGO_SECRET_KEY
- Show temporary status message when copied to clipboard
- Render chat responses as markdown in Desktop client
- Render chat responses as markdown in chat modal of Obsidian client
- Render references of new responses in chat modal on Obsidian client. Use new style for references
- Properly stop `mediaRecorder` stream to clear microphone in-use state
- Render newlines when references expanded in Web, Desktop and Obsidian clients
- Use new style references for Khoj chat modal in Obsidian
- Khoj Chat responses in Obsidian had regressed to not show references
for new questions after modal has been opened. Now even those are
rendered, and use new references style
- Render chat response as markdown while it's being streamed
- Create speech to text API endpoint
- Use OpenAI Whisper for ASR offline (by downloading Whisper model) or online (via OpenAI API)
- Add speech to text model configuration to Database
- Speak to Khoj from the Web, Desktop or Obsidian client
- Add transcription button with mic icon
- Collect audio recording on pressing mic
- Process and send audio recording to server for transcription
- Extract the functionality to flash status in chat input for reuse
- Allow server admin to configure offline speech to text model during
initialization
- Use offline speech to text model to transcribe audio from clients
- Set offline whisper as default speech to text model as no setup api key reqd
- Extract flashing status message in chat input placeholder into
reusable function
- Use emoji prefixes for status messages
- Improve alt text of transcribe button to indicate what the button does
- Conflicts:
- src/interface/desktop/chat.html
Combine and use common class names for speak component
- src/khoj/database/adapters/__init__.py
Combine imports
- src/khoj/interface/web/chat.html
Combine and use common class names for speak component
- src/khoj/routers/api.py
Combine imports
- Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed
- Delete any files that are being removed for adminstering the calculation
- Show current amount of data indexed in the config page
- Ignore errors in deleting network requests to khoj server
- Also delete open network connection to khoj server on auto reindex
Otherwise when server is unreachable a bunch of failed network
connections accrue in the processes list
- Make auto-update of content index user configurable from khoj.el
- Handle server unavailable error on auto-index schedule job in khoj.el
Resolves#567
- Append chat message to chat logs as TextNodes in web, desktop clients
- Simplify Code to Identify Files from Github, Notion on Web, Desktop Client
- Use file source to find entries from github, notion on web, desktop client
- Pass file source to clients via text search API response
- Make Django Logs Follow Khoj Log Format, Verbosity
- Handle image search setup related warning
- Format Django initializing outputs using Khoj logger format
- Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj
Ideally should rename model_directory to config_directory or some such
but the current image search code will need to be migrated soon. So
changing the variable name and creating a migration script for old
khoj.yml files using model-directory variable isn't worth it
Remove the explicity set of number of threads to use by pytorch. Use
the default used by it.
- Collect STDOUT from the `migrate', `collectstatic' commands and
output using the Khoj logger format and verbosity settings
- Only show Django `collectstatic' command output in verbose mode
- Fix showing the Initializing Khoj log line by moving it after logger
level set
This bug was causing the search results on the Obsidian client to be shown in the reverse order of their actual relevance.
It reversed since entry scores returned by Khoj server are a distance metric since the move to Postgres. So lesser distance is better. Previously higher score was better.
Previously it was only searching for PDF and Markdown files. This was
meant to show only content from current vault as results.
But it has not scaled well as other clients also allow syncing PDF and
markdown files now. So remove this content type filter for now.
A proper solution would limit by using file/dir filters on server or
client side.
- PyPi doesn't like to have files that start with numbers, however all of the generated django migration files start with numbers. To accommodate, skip this check.
- Refer to https://pypi.org/project/check-wheel-contents/ for documentation and recommendation
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
- c07401cf Fix, Improve chat config via CLI on first run by using defaults
- d61b0dd5 Add Khoj Django app package to sys path to load Django module via pip install
- 4e98acbc Update minimum pydantic version to one with model_validate function
### Overview
The parent hierarchy of org-mode entries can store important context.
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index
### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser
Resolves#85
- Update docs to show how to use Khoj Cloud
- Move self-hosting Khoj to separate section
- Add page to setup Desktop app
- Set default URL to Khoj Cloud URL in Obsidian, Emacs clients
- No Khoj server setup required to start using Khoj from Obsidian, Emacs
- Use tabs for install, upgrade in Emacs with different package
managers
- Use default subtitles in Khoj Docs
- Deduplicate query filters, remove backend setup instructions in
plugin pages
- Remove stale Setup demo on Khoj Obsidian plugin docs
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
Earlier version didn't support wrapping common query params in class
- Use per fixture app instead of a global FastAPI app in conftest
- Upgrade minimum required Django version
- Fix no notes chat director test with updated no notes message
No notes message was updated in commit 118f1143
- Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command
- Add it as an additional tool for doing Google searches
- Render the results appropriately in the chat web window
- Pass appropriate reference data down to the LLM
- By default assume the audience of this website is people looking to understand the featuer offering of Khoj, and then people who are looking to self-host
- Most important updates include the depedency requirement to setup Postgres when running/setting Khoj up locally
- Add instructiosn for Docker
- Shift to recommend desktop client and update instructions for how to configure Khoj for user
- Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials
- Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration
- Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration
- Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page
- Adds billing to allow users to subscribe to the cloud service easily
- Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server.
- Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration.
- Changes license to GNU AGPLv3
Resolves#467Resolves#488Resolves#303Resolves#345Resolves#195Resolves#280Resolves#461Closes#259Resolves#351Resolves#301Resolves#296
- Update test data to add deeper outline hierarchy for testing
hierarchy as context
- Update collateral tests that need count of entries updated, deleted
asserts to be updated
- Make search model configurable on server
- Update migration script to get search model from `khoj.yml` to Postgres
- Update first run message on Khoj Desktop and Web app landing page
- Other miscellaneous bug fixes
- Link to Django admin panel for user to create Chat Models on their
Khoj server
- This should only get hit when user is not using Khoj cloud, as Khoj
cloud would already have Chat models configured
- While sigmoid normalization isn't required for reranking.
Normalizing score to distance metrics for both encoder and cross
encoder scores is useful to reason about them
- Softmax wasn't required as don't need probabilities, sigmoid is good
enough to get distance metric
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
in code
- Now it's user configurable but with a default config generated by
default
### Overview
Prepare Khoj with multi-user, db support for Khoj Cloud release
### Details
- Add first run experience to configure Khoj via khoj CLI
- Improve Web app settings page: Move files data into content section card. Move content index update button(s) to content section
- Improve OpenAI chat prompts
- Push more general information for OpenAI models into system prompt
- Make it more aware of it's current capabilities
- Weaken asking follow-up questions
- Rate-limit calls to the chat API
- Add back search results quality threshold
- Normalize quality score definitions across cross_encoder, encoder to distance metric
- Remove reference to deprecated button
- Await result of the search query
- Fixed Langchain issue by allowing the Docker image to rebuild with a later package version
- During the migration, the confidence score stopped being used. It
was being passed down from API to some point and went unused
- Remove score thresholding for images as image search confidence
score different from text search model distance score
- Default score threshold of 0.15 is experimentally determined by
manually looking at search results vs distance for a few queries
- Use distance instead of confidence as metric for search result quality
Previously we'd moved text search to a distance metric from a
confidence score.
Now convert even cross encoder, image search scores to distance metric
for consistent results sorting
Remove the Results Count button from the web app. It's hanging weirdly
with not much context to its purpose.
Reintroduce it in the Search card when created under the Features section
Reduce user confusion by joining config update with index updation for
each content type.
So only a single click required to configure any content type instead
of two clicks on two separate pages
- Notes prompt doesn't need to be so tuned to question answering. User
could just want to talk about life. The notes need to be used to
response to those, not necessarily only retrieve answers from notes
- System and notes prompts were forcing asking follow-up questions a
little too much. Reduce strength of follow-up question asking
The Chat models sometime output reference notes directly in the chat
body in unformatted form, specifically as Notes:\n['. Prevent that.
Reference notes are shown in clean, formatted form anyway
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease Cloud sync setup
- Add detailed chat intro message, mention download desktop app for docs sync
- Only show search in web app nav pane if user has documents indexed
- Hide download desktop app message in web app if synced files exist
- Mark generated profile pic with subscription circle in web app
- Make mutable syncing variable not a const
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease setup
### Major
- Expose Billing via Stripe on Khoj Web app for Khoj Cloud subscription
- Expose card on web app config page to manage subscription to Khoj cloud
- Create API webhook, endpoints for subscription payments using Stripe
- Put Computer files to index into Card under Content section
- Show file type icons for each indexed file in config card of web app
- Enable deleting all indexed desktop files from Khoj via Desktop app
- Create config page on web app to manage computer files indexed by Khoj
- Track data source (computer, github, notion) of each entry
- Update content by source via API. Make web client use this API for config
- Store the data source of each entry in database
### Cleanup
- Set content enabled status on update via config buttons on web app
- Delete deprecated content config pages for local files from web client
- Rename Sync button, Force Sync toggle to Save, Save All buttons
### Fixes
- Prevent Desktop app triggering multiple simultaneous syncs to server
- Upgrade langchain version since adding support for OCR-ing PDFs
- Bubble up content indexing errors to notify user on client apps
- Add fields to mark users as subscribed to a specific plan and
subscription renewal date in DB
- Add ability to unsubscribe a user using their email address
- Expose webhook for stripe to callback confirming payment
Previously hitting configure or disable wouldn't update the state of
the content cards. It needed page refresh to see if the content was
synced correctly.
Now cards automatically get set to new state on hitting disable button
on card or global configure buttons
Lock syncing to server if a sync is already in progress.
While the sync save button gets disabled while sync is in progress,
the background sync job can still trigger a sync in parallel. This
sync lock prevents that
Remove the table of all files indexed by Khoj. This seems overkill and
doesn't match the UI semantics of the other data sources like Github,
Notion.
Create instead a data source card for computer files with the same
update, disable semantics of the Github and Notion data source cards
Users can disable each data source from its card on the main config page.
They can see/delete individual files indexed from the computer data source
once they click into the computer files data source card on the config page
This will be useful for updating, deleting entries by their data
source. Data source can be one of Computer, Github or Notion for now
Store each file/entries source in database
Major
- Ensure search results logic consistent across migration to DB, multi-user
- Manually verified search results for sample queries look the same across migration
- Flatten indexing code for better indexing progress tracking and code readability
Minor
- a4f407f Test memory leak on MPS device when generating vector embeddings
- ef24485 Improve Khoj with DB setup instructions in the Django app readme (for now)
- f212cc7 Arrange remaining text search tests in arrange, act, assert order
- 022017d Fix text search tests to test updated indexing log messages
The Langchain HuggingFaceEmbeddings wrapper doesn't support disabling
progressbar, not especially for only query but not documents.
This makes the logs noisy with encoding progressbar for each
incremental queries
No features of the Langchain wrapper for SentenceTransformer was
currently being used anyway for now, and we can always switch back to
it if required
Flatten the nested loops to improve visibilty into indexing progress
Reduce spurious logs, report the logs at aggregated level and update
the logging description text to improve indexing progress reporting
- Given the separation of the client and server now, the web UI will no longer support configuration of local file paths of data to index
- Expose a way to show all the files that are currently set for indexing, along with an option to delete all or specific files
- Update theme for Desktop, Web and Obsidian client apps to use lighter colors
- Show splash screen on starting Desktop app
- Make chat the landing page on Desktop and Web clients
- Simplify style of login page on Web app
- Add About page for Desktop app accessible from system tray menu
- Remove spurious whitespace in chat input box on page load being
added because text area element was ending on newline
- Do not insert newline in message when send message by hitting enter key
This would be more evident when send message with cursor in the
middle of the sentence, as a newline would be inserted at the cursor
point
- Remove chat message separator tokens from model output. Model
sometimes starts to output text in it's chat format
- Pass current khoj version from package.json to about page via
electron IPC between backend js and frontend page
- Update Khoj information in default About screen as well, in case
it's exposed anywhere else
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
with lantern flame shade
- Add water, leaf, flower color variables
Fix refactor bugs, CSRF token issues for use in production
* Add flags for samesite settings to enable django admin login
* Include tzdata to dependencies to work around python package issues in linux
* Use DJANGO_DEBUG flag correctly
* Fix naming of entry field when creating EntryDate objects
* Correctly retrieve openai config settings
* Fix datefilter with embeddings name for field
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
with lantern flame shade
- Add water, leaf, flower color variables
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
### ✨ New
- Create profile pic drop-down menu in navigation pane
Put settings page, logout action under drop-down menu
### ⚙️ Fix
- Add Key icon for API keys table on Web Client's settings page
### 🧪 Improve
- Rename `TextEmbeddings` to `TextEntries` for improved readability
- Rename `Db.Models` `Embeddings`, `EmbeddingsAdapter` to `Entry`, `EntryAdapter`
- Show truncated API key for identification & restrict table width for config page responsiveness
Previously pico.css font-families were being selected for the config
page. This was different from the fonts used by index.html, chat.html
This improves spacing issue of heading further
- Create dropdown menu. Put settings page, logout action under it
- Make user's profile picture the dropdown menu heading
- Create khoj.js to store shared js across web client
It currently stores the dropdown menu open, close functionality
- Put shared styling for khoj dropdown menu under khoj.css
- Use a function to generate API Key table row HTML, to dedup logic
- Show delete, copy icon hints on hover
- Reduce length of copied message to not expand table width
- Truncating API key helps keep the API key table width within width
of smaller width displays
Emoji icons have already been added to the Search, Chat and Settings
top navigation menu in the desktop client. This change adds these to
the web client as well
Improves readability as name has closer match to underlying
constructs
- Entry is any atomic item indexed by Khoj. This can be an org-mode
entry, a markdown section, a PDF or Notion page etc.
- Embeddings are semantic vectors generated by the search ML model
that encodes for meaning contained in an entries text.
- An "Entry" contains "Embeddings" vectors but also other metadata
about the entry like filename etc.
- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
### ✨ New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database
### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme
### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.
This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
- Offline chat models outputing gibberish when loaded onto some GPU.
GPU support with Vulkan in GPT4All seems a bit buggy
- This change mitigates the upstream issue by allowing user to
manually disable using GPU for offline chat
Closes#516
GPT4all now supports gguf llama.cpp chat models. Latest
GPT4All (+mistral) performs much at least 3x faster.
On Macbook Pro at ~10s response start time vs 30s-120s earlier.
Mistral is also a better chat model, although it hallucinates more
than llama-2
Ignore .org, .pdf etc. suffixed directories under `input-filter' from
being evaluated as files.
Explicitly filter results by input-filter globs to only index files,
not directory for each text type
Add test to prevent regression
Closes#448
On Windows, the default locale isn't utf8. Khoj had regressed to
reading files in OS specified locale encoding, e.g cp1252, cp949 etc.
It now explicitly uses utf8 encoding to read text files for indexing
Resolves#495, resolves#472
* Changed globbing. Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob.
Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included.
* Fix typo in globformat variable
* Use older glob pattern for plaintext files
---------
Co-authored-by: Saba <narmiabas@gmail.com>
### Overview
- Add ability to push data to index from the Emacs, Obsidian client
- Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON
- Benefits of new mechanism
- No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests
- The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required
- Binary files don't need to be encoded on client and decoded on server
### Code Details
### Major
- Use multi-part form to receive files to index on server
- Use multi-part form to send files to index on desktop client
- Send files to index on server from the khoj.el emacs client
- Send content for indexing on server at a regular interval from khoj.el
- Send files to index on server from the khoj obsidian client
- Update tests to test multi-part/form method of pushing files to index
#### Minor
- Put indexer API endpoint under /api path segment
- Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
- Improve emoji, message on content index updated via logger
- Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user
- Improve indexing of binary files
- Let fs_syncer pass PDF files directly as binary before indexing
- Use encoding of each file set in indexer request to read file
- Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost
- Update indexer API endpoint URL to` index/update` from `indexer/batch`
Resolves#471#243
New URL query params, `force' and `t' match name of query parameter in
existing Khoj API endpoints
Update Desktop, Obsidian and Emacs client to call using these new API
query params. Set `client' query param from each client for telemetry
visibility
New URL follows action oriented endpoint naming convention used for
other Khoj API endpoints
Update desktop, obsidian and emacs client to call this new API
endpoint
Using fetch from Khoj Obsidian plugin was failing due to cross-origin
request and method: no-cors didn't allow passing x-api-key custom
header. And using Obsidian's request with multi-part/form-data wasn't
possible either.
- Keep state of previously synced files to identify files to be deleted
- Last synced files stored in settings for persistence of this data
across Obsidian reboots
Use the multi-part/form-data request to sync Markdown, PDF files in
vault to index on khoj server
Run scheduled job to push updates to value for indexing every 1 hour
This prevents Khoj from polling the Khoj server until explicitly
invoked via `khoj' entrypoint function.
Previously it'd make a request to the khoj server every time Emacs or
khoj.el was loaded
Closes#243
Previously lookback turns was set to a static 2. But now that we
support more chat models, their prompt size vary considerably.
Make lookback_turns proportional to max_prompt_size. The truncate_messages
can remove messages if they exceed max_prompt_size later
This lets Khoj pass more of the chat history as context for models
with larger context window
- Dedupe offline_chat_model variable. Only reference offline chat
model stored under offline_chat. Delete the previous chat_model
field under GPT4AllProcessorConfig
- Set offline chat model to use via config/offline_chat API endpoint
This provides flexibility to use non 1st party supported chat models
- Create migration script to update khoj.yml config
- Put `enable_offline_chat' under new `offline-chat' section
Referring code needs to be updated to accomodate this change
- Move `offline_chat_model' to `chat-model' under new `offline-chat' section
- Put chat `tokenizer` under new `offline-chat' section
- Put `max_prompt' under existing `conversation' section
As `max_prompt' size effects both openai and offline chat models
Pass user configured chat model as argument to use by converse_offline
The proper fix for this would allow users to configure the max_prompt
and tokenizer to use (while supplying default ones, if none provided)
For now, this is a reasonable start.
- Format extract questions prompt format with newlines and whitespaces
- Make llama v2 extract questions prompt consistent
- Remove empty questions extracted by offline extract_questions actor
- Update implicit qs extraction unit test for offline search actor
* Strip the incoming query from the slash conversation command before passing it to the model or for search
* Return q when content index not loaded
* Remove -n 4 from pytest ini configuration to isolate test failures
- Make `bump_version.sh' script set version for the Khoj desktop app too
- Sync Khoj desktop app authors, license, description and version with
the other interfaces and server
- Update description in packages metadata to match project subtitle on Github
- Pass payloads as unibyte. This was causing the request to fail for
files with unicode characters
- Suppress messages with file content in on index updates
- Fix rendering response from server on index update API call
- Extract code to populate body of index update HTTP request with files
Previously global state of `url-request-method' would affect the
kind of request made to api/config/data API endpoint as it wasn't
being explicitly being set before calling the API endpoint
This was done with the assumption that the default value of GET for
url-request-method wouldn't change globally
But in some cases, experientially, it can get changed. This was
resulting in khoj.el load failing as POST request was being made
instead which would throw error
Instead of using the previous method to push data as json payload of POST request
pass it as files to upload via the multi-part/form to the batch indexer API endpoint
- Add elisp variable to set API key to engage with the Khoj server
- Use multi-part form to POST the files to index to the indexer API
endpoint on the khoj server
Previously only the the last filter's terms were getting effectively
applied as the `filter.defilter' operation was being done on
`user_query' but was updating the `defiltered_query'
- This uses existing HTTP affordance to process files
- Better handling of binary file formats as removes need to url encode/decode
- Less memory utilization than streaming json as files get
automatically written to disk once memory utilization exceeds preset limits
- No manual parsing of raw files streams required
Use mailbox closed with flag down once content index completed.
Use standard, existing logger messages in new indexer messages, when
files to index sent by clients
- Improves user experience by aligning idle time with search latency
to avoid display jitter (to render results) while user is typing
- Makes the idle time configurable
Closes#480
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
Git tag tests/data files with the linguist-vendored attribute to
prevent github from including them in stats.
Otherwise Khoj is getting marked as an HTML project due to the
tardigrades html page in tests data, when it's primarily a python
project currently
- Overview
- Allow applying word, file or date filters on your knowledge base from the chat interface
- This will limit the portion of the knowledge base Khoj chat can use to respond to your query
- Make Khoj ask clarifying questions when answer not in provided context
- Add default conversation command to auto switch b/w general, notes modes
- Show filtered list of commands available with the currently input text
- Use general prompt when no references found and not in Notes mode
- Test general and notes slash commands in offline chat director tests
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
---------
Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
* Update gpt4all tests to use md configuration
* Add a /help tooltip
* Add dynamic support for describing slash commands. Remove default and treat notes as the default type
---------
Co-authored-by: sabaimran <narmiabas@gmail.com>
The test workflow fails regularly with an OperationCancelled error.
This is an intermittent failure that gets resolved on running the
failed workflows a few times.
* Allow indexing to continue even if there's an issue parsing a particular org file
* Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files
* Change error of expected failure
* Add support for indexing plaintext files
- Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md)
- Add equivalent frontend views for setting up plaintext file indexing
- Update config, rawconfig, default config, search API, setup endpoints
* Add a nifty plaintext file icon to configure plaintext files in the Web UI
* Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist
* Add support for starting a new line with shift-enter
* Remove useless comments. Set font-size: medium.
* Update src/khoj/interface/web/chat.html
Update the styling to have the padding, margin and line-height like before.
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/interface/web/chat.html
Make the chat-body scroll to the bottom after resizing
Co-authored-by: Debanjum <debanjum@gmail.com>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel
- Opens settings page on first run and landing page after in GUI mode
Previously was only opening the GUI on linux after first run as it
doesn't have a system tray
- Both the views are from the web interface but are rendered within
the app instead of the browser
Build the Debian package using Ubuntu 20.04 instead of 22.04 as Ubuntu 20.04 comes pre-installed with glibc_2.31 unlike Ubuntu 22.04 which uses glibc_2.35
This should reduce chances of installation errors due to regex package
being built from source for python3.11
Previously, the regex dependency of dateparser = 1.1.1 didn't have a
wheel for python 3.11. This would trigger building the regex package
from scratch which would fail for a lot of folks
* Add checksums to verify the correct model is downloaded as expected
- This should help debug issues related to corrupted model download
- If download fails, let the application continue
* If the model is not download as expected, add some indicators in the settings UI
* Add exc_info to error log if/when download fails for llamav2 model
* Simplify checksum checking logic, update key name in model state for web client
- See if this fixes the issue with the workflows failing to install
system packages
- Make the build desktop app run on changes to the workflow file as well
# Incoming
## Major
### Fix Prompt Size Exceeded Issue
- Fix issues related to prompt size, Closes#386. Use the correct tokenizer to calculate whether the input needs to be truncated or not.
### Improve Llama 2 Model Download
- Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium
- Add better downloading logic to retry download if it failed, Closes#379
### Fix Segmentation Fault due to Race
- Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes#367.
- Add a loading symbol to the web chat UI when the model is thinking. Closes#392
### Improve Chat Response Latency
- Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes#363
### Fix Fake Dialogue Continuation
- Fix formatting of user query with offline chat, this was contributing to #398
- Stop Llama 2 from Creating Fake Dialogue Continuations. Closes#398
## Minor
- Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
- Add null check in `perform_chat_checks` method
- Add offline chat director unit tests
## Performance Analysis (Time to First Token)
| | v0.10.0 | this branch |
|-|-|-|
| Query 1 | 52s | 28s |
| Query 2 | 33s| 42s |
| Query 3 | 67s| 38s|
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.
This is a jarring experience. Stop generation response when hit <s>
Resolves#398
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
it's temp filename while it is being downloaded
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model
Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
- Screenshots of khoj search, chat
- Put quickstart on landing page
- Put miscellaneous pages under separate section
- Move credits to separate page under miscellaneous
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
- Configure using Offline Chat from Emacs:
- Enable, Disable Offline Chat from Emacs
- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
Asymmetric search is the only search type used now in khoj.el. So
making distinction between between symmetric and asymmetric search
isn't necessary anymore
* Working example with LlamaV2 running locally on my machine
- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format
* Add appropriate prompts for extracting questions based on a query based on llama format
* Rename Falcon to Llama and make some improvements to the extract_questions flow
* Do further tuning to extract question prompts and unit tests
* Disable extracting questions dynamically from Llama, as results are still unreliable
OpenAI conversation processor schema had updated but conftest hadn't
been updated to reflect the same.
Update conftest setup of conversation processor to fix this
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
Create Schema Migrator and Reindex to Apply Index Corruption Fixes
- 83e1088 Manage `khoj.yml' config migrations on app start. Version the `khoj.yml' schema
- 429e1b4 Regenerate index to apply corruption fixes on first run of this khoj version
Otherwise users would need to manually re-index their contents with khoj
## Stabilize and Simplify Content Indexing
### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
- Refer: 1482fd4, 7669b85, 88d1a29
- ad41ef3 Make normalization of embeddings configurable to test this in c73feeb
### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods
Resolves#190
Ensure order of new embedding insertion on incremental update
does not affect the order and value of existing embeddings when
normalization is turned off
Asymmetric was older name used to differentiate between symmetric,
asymmetric search.
Now that text search just uses asymmetric search stick to simpler name
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence
Update all text_to_jsonl methods to use the above method for
generating index from scratch
- Current incorrect behavior:
All entries with duplicate compiled form are kept on regenerate
but on update only the last of the duplicated entries is kept
This divergent behavior is not ideal to prevent index corruption
across reconfigure and update
Reuse Search Models across Content Types to reduce Memory Consumption
- Memory consumption now only scales with search models used, not with content types.
Previously each content type had it's own copy of the search ML models.
That'd result in 300+ Mb per enabled text content type
- Split model state into 2 separate state objects, `search_models` and `content_index`.
This allows loading text_search and image_search models first
and then reusing them across all content_types in content_index
- The change should cut down memory utilization quite a bit for most users.
I see a >50% drop in memory utilization on my Khoj instance.
But this will vary for each user based on the amount of content indexed vs number of plugins enabled.
- This change does not solve the RAM utilization scaling with size of the index,
as the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
Wrap acquire/release locks in try/catch/finally when updating content
index and search models to prevent lock not being released on error
and causing a deadlock
* Add additional telemetry in order to understand which data sources are the most useful
* Make actions side by side in the configuration page
* Restore main run command
* Update links to point to wiki pages for Github, Notion integrations
* Stanardize nomenclature of the api_type to use _config suffix
Remove header fields that aren't actually helpful for understanding config usage
- Memory consumption now only scales with search models used, not with
content types as well. Previously each content type had it's own
copy of the search ML models. That'd result in 300+ Mb per enabled
content type
- Split model state into 2 separate state objects, `search_models' and
`content_index'.
This allows loading text_search and image_search models first and then
reusing them across all content_types in content_index
- This should cut down memory utilization quite a bit for most users.
I see a ~50% drop in memory utilization.
This will, of course, vary for each user based on the amount of
content indexed vs number of plugins enabled
- This does not solve the RAM utilization scaling with size of the index.
As the whole content index is still kept in RAM while Khoj is running
Should help with #195, #301 and #303
* Add a Github workflow that allows you to build dev versions of Desktop applications
* Add pull_request trigger for testing
* Fix errant open quote in Package Khoj App step
* Nix the release step, since this isn't associated with any tags
- Set retention period for uploaded artifacts to 1 day
* Remove pull_request trigger - limit to manual triggers and pushes to master
Just use a random static version for Khoj on the Docker as otherwise
the hatch vcs dynamic versioning requires the .git directory in the
docker image too
My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.
- Provide more details on what clicking configure, initialize buttons
or changing the results count slider does
- This shows up on user hovering over those buttons
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
* Add backend support for Notion data parsing
- Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token
- Make corresponding updates to the default config, raw config to support the new notion addition
* Add corresponding views to support configuring Notion from the web-based settings page
- Support backend APIs for deleting/configuring notion setup as well
- Streamline some of the index updating code
* Use defaults for search and chat queries results count
* Update pagination of retrieving pages from Notion
* Update state conversation processor when update is hit
* frequency_penalty should be passed to gpt through kwargs
* Add check for notion in render_multiple method
* Add headings to Notion render
* Revert results count slider and split Notion files by blocks
* Clean/fix misc things in the function to update index
- Use the successText and errorText variables appropriately
- Name parameters in function calls
- Add emojis, woohoo
* Clean up and further modularize code for processing data in Notion
* Add langchain static files and pytorch metadata to Khoj native app
* Add pillow static files, metadata & hidden imports to Khoj native app
* Fix path to web interface static files on Khoj native app
* Add tiktoken hidden imports to make chat work from Khoj native app
* Fix Khoj native app to run with GUI mode enabled
This got broken when we moved from using the --no-gui flag to using
--gui in https://github.com/khoj-ai/khoj/pull/263
* Update the /chat endpoint to conditionally support streaming
- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history
* Update khoj.el to use the /history endpoint
- Update corresponding unit tests to use stream=true
* Remove & from call to /chat for obsidian
* Abstract functions out into a helpers.py file and clean up some of the error-catching
- Deprecate the unused beta /answer and /search type identification endpoints and associated GPT functions
- Update extract_questions to use GPT4
- Update summarize method to default to GPT-3.5
- Update date filter to support quoting values in single quotes too. So now both dt>'2023-04-01' and dt>"2023-04-01" should work
- Remove "model" field from chat settings on the web interface
Deprecate usage of the older gpt3 models in-place of the newer chat
based models
- text-davinci-003 is only 50% cheaper than gpt4 and less reliable for
question extraction
- Using gpt-3.50turbo for summarization should reduce cost of chat
- Keep conversation.chat_session as a list instead of a string
- Update completion_with_backoff func to use ChatML format
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
parameter to chat completion methods. This should resolve the arg
warning thrown by OpenAI module
The previous json parsing was failing to handle questions with date
filters
Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model
Remove quote escapes from date filter examples provided to
extract_questions actor
- Before
Only the search interface had the results count configuration option
- After
- The results count is set on the settings page instead of the
search page
- Both search and chat can use the configured results count instead
of just search
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
- What
- Stream chat responses from OpenAI API to Web, Obsidian clients
- Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
- When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
- When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
- Incrementally decode tokens on the front end and add them as they appear from the streamed response
- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat
Closes https://github.com/khoj-ai/khoj/issues/257
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
Removing unused content types will reduce khoj code to manage
- 0f993b3 Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type in Index plain text files #237.
This along with a file filter should suffice for searching through Ledger transactions
- c9db532 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.
Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.
Cleaning up that code will reduce number of content types for khoj to
manage.
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface
Should try fix this upstream in org.js as well
- Previously Khoj could only support Python upto 3.10 due to pytorch.
But lots of folks had python 3.11 installed by default on their machines.
This required installing python 3.10 and dealing with virtual envs.
With Torch >= 2.0.1 now able to support python 3.11, at least one
class of installation troubles for Khoj should drop. See
https://github.com/pytorch/pytorch/issues/86566 for reference
- Preliminary testing indicates using the new torch 2.x may reduce
search time by 25% (from 80ms to 60ms on Mac M1)
- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
- Allow searching across asymmetric text content types using threads
- Query time on my Mac averages 95ms latency (140ms at 90 percentile) across (Org, Markdown, Github, PDF and Music content types)
- This is not too much more than search for a single content type (maybe max ~50% latency increase?). Encoding query is what takes most of the time anyway and that's just done once like before, threading adds some overhead
- An **average** of `95 ms` latency or `140ms` at **90th percentile** is inline with keeping an incremental search (search-as-you-type) experience
- Put logic to remove filter terms from query in a `defilter` method for each filter
- Encode query once during search to encode query once across all (asymmetric) content types
- Search across all content types via the web and emacs interfaces in [d5fb419](d5fb4196de) and [5c4eb95](5c4eb950d5) respectively
- Allow Khoj Chat to pull relevant data from across content types (without the perf hit). Khoj chat is only pulling data from a single content type currently
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
- Set image_search.query to async to use it with multi-threading
This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
- So when searching across content types (with content-type = "all")
org-mode results get rendered differently than markdown, PDF etc. results
- Set div class for each result separately instead of a single uber div
for styling. This allows styling div of each result based on the
content-type of that result
- No need to create placeholder "all" content type on web interface as
server is passing an all content type by itself
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
- Previous state
Ideally docker image should use latest app code available locally.
But this is better than the previous state where the latest Docker
image was being built using older khoj package published to pypi
This would happen because the workflow to publish the khoj-assistant
pypi package runs in parallel to the dockerize workflow so the latest
khoj pypi package isn't published before the latest docker image is
built on master
- Updated state
Now at least the docker image published via the dockerize github
workflow will be built using the latest khoj code on github
- Show success/failure status message much closer to the save button
Previously status message was shown on top of the page, which wasn't
always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
- Style all pages with consistent lantern theme styling
- Add navigation pane to all web interface pages
- a200af68b38d0625c42e2098d171c6ddab121bd2 Keep pico.css locally for offline usage
- cd8d069e6673b4db4c14f736c3d8af80bf94614d Highlight currently active tab in web interface
- Update config pages to use Lantern theme
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.
This results in search across all enabled asymmetric search text
content types
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
query and pass it to the query methods of each search types
TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
- Update API to return content from all enabled content types when type
is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
- Default is 30. So number of paginated requests required to get all
items (commits, files) will reduce by 67%
- No need to increase page size for the get tree Github API request from
`get_markdown_files'
Get tree Github API doesn't support pagination and return 100K items
in response. This should be way more than enough for our current
use-cases
- Previously wasn't prefixing "token" to PAT token in Auth header
This resulted in the request being considered unauthenticated
- Unauthenticated requests to Github API are limited to 60 requests/hour
Authenticated requests to Github API are allowed 5000 requests/hour
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
The Llama_Hub Github plugin is fairly limited.
The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
- Use the Github plugin on LlamaHub to read in markdown files from specified Github repository for indexing
- Update the desktop GUI application to take in the required parameters to read from Github
- Requires a classic PAT token for Github access
- Make API endpoints on Khoj server accept `client` as request parameter
- Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
- Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
- This improves latency of @general chat by avoiding unnecessary
compute
- It also avoids passing references in API response when they haven't
been used to generate the chat response. So interfaces don't have to
add logic to not render them unnecessarily
- **Introduce Khoj to LangChain**:
Call GPT with LangChain for Khoj Chat
- **Search (and Chat about) PDF files with Khoj**
- Create PDF to JSONL Processor: Convert PDF content into standardized JSONL format
- Expose PDF search type via Khoj server API
- Enable querying PDF files via Obsidian, Emacs and Web interfaces
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
- Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
- Match argument names passed to khoj openai completion funcs with
arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
- Fix bug where both LangChain and Khoj retry requests 6 times each.
So a total of 12 requests at >1minute intervals for each chat
response in case of OpenAI API being down
- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
between retries way too much. This slowed down chat response times
quite a bit when API was being flaky
- With these updates you'll know if call to chat API failed in under a
minute
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
creating agents and managing their state
### Objective:
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.
Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?
### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
- Khoj chat will now respond to general queries if:
1. no relevant reference notes available or
2. when explicitly induced by prefixing the chat message with "@general"
- Previously Khoj Chat would a lot of times refuse to respond to
general queries not answerable from reference notes or chat history
- Make chat quality tests more robust
- Add more equivalent chat response options refusing to answer
- Force haiku writing to not give any preable, just the haiku
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
defaults to using gpt-3.5-turbo, unless chat-model field under
conversation processor updated in khoj.yml
Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context
- Set filename as top heading in compiled org, markdown entries
- Note: *Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text*. The updated structure should provide more schematic context of relevance
- Set entry heading as heading for compiled org, md entries, even if split by max tokens
- Snip prepended heading to avoid crossing model max_token limits
- Entries with no md headings should not get heading prefix prepended
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.
100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context
Test filename getting prepended as heading to compiled entry
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.
This limits search context required to retrieve these continuation
entries
- cl-push expects a generatlized variable. Else throws (setf quote)
undefined warning
- This results in the config call failing on calling khoj entrypoint
- Remove waiting for server message as it hides the messages from the
server
- Fix the nil message that were being rendered, by checking before
showing messages from server
- Consistently prefix messages from khoj with khoj.el
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs
Also show more details to user about what in khoj is being configured
Resolves#185, #199
- Issue
IndexName created from Obsidian Absolute Vault path wasn't replacing
windows path, drive separators with underscore. It was only
replacing unix path separators
- Fix
Also replace windows drive and path separators with _ while creating
IndexName in Khoj Obsidian plugin
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
Merge pull request #198 from debanjum/improve-khoj-search-for-markdown-obsidian
### Overview
- Copied Khoj Search Modal styling from Jim Prince's PR #135 with minor improvements
- Implements improvements to the Khoj Search in Markdown/Obsidian suggested by folks. Specifically:
- #133
- #134
- #142
### Changes
- 5673bd5 Keep original formatting in compiled text entry strings
- a2ab68a Include filename of markdown entries for search indexing
- 6712996 Create Note with Query as title from within Khoj Search Modal
- d3257cb Style the search result. Use Obsidian theme colors and font-size
- 4009148 For each result: snip it by lines, show filename, remove frontmatter
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
No point in dropping the formatting unnecessarily,
even if (say) the currrent search models don't account for it (yet)
Append originating filename to compiled string of each entry for
better search quality by providing more context to model
Update markdown_to_jsonl tests to ensure filename being added
Resolves#142
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.
The note creation code is borrowed from Omnisearch.
Resolves#133
Merge pull request #196 from debanjum/create-chat-modal-for-obsidian
- Set your OpenAI API key in the Khoj Obsidian Settings
- Use Modal in Obsidian for Chat
- Style Chat Modal combining the Khoj Web interface and Obsidian theme style
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
- Modify it to work under a Obsidian modal
- So replace html, body styling from web interface to instead
styling new "khoj-chat" class attached to contentEl of modal
Merge pull request #193 from debanjum/simplify-khoj-server-setup-on-emacs
## Major Changes
- ae535a0 Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs
- 82eb4bf Setup Khoj server on opening khoj.el
- 99d19dc Start Khoj server from Emacs using khoj.el
- c92d791 Install Khoj server from Emacs using khoj.el
*This assumes you have python (<3.11) and pip installed in a system path*
### Sample Config
- Enable Khoj Chat by configuring you OpenAI API Key
- Specify Org Files, Directories to Index for Search (and Chat)
By default, your org-agenda-files (include archive files)) are indexed
- Invoke khoj by calling `C-c s`
``` emacs-lisp
(use-package khoj
:after org
:straight (khoj
:type git
:host github
:repo "debanjum/khoj"
:files ("src/interface/emacs/khoj.el"))
:bind ("C-c s" . 'khoj)
:config (setq
khoj-openai-api-key "<YOUR_OPENAI_API_KEY_FOR_KHOJ_CHAT>"
khoj-org-directories '("~/docs/notes" "~/docs/journals")
khoj-org-files '("~/docs/tasks.org" "~/docs/journal.org" "~/docs/archive.org")))
```
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path
Should help setup for org-roam users from khoj.el
- khoj-auto-setup controls whether to automatically check for and
setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
method. Allows calling khoj-setup during package load via init.el
- Fix: Do not attempt to configure or wait for server ready if
user has said no to auto-setup request
- Fix logic to mark server started vs ready
- Previously the started/running vs ready variables defs were getting
intertwined
- Server started indicates server bootup has been triggered
- Server ready indicates server API ready to accept requests
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)
- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done
As easier to know when server ready to configure
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
Until then khoj features wouldn't work anyway, so avoids confusion
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
tenacity package. This is officially suggested and used by other
popular GPT based libraries
Merge pull request #192 from debanjum/improvements-to-khoj-chat-in-emacs
### Khoj Chat on Emacs Improvements
- d78454d Load Khoj Chat buffer before asking for query to provide context
- 93e2aff Use org footnotes to add references, allows jump to def on click
- 5e9558d Stylize reference links as superscripts and show definition on hover
- bc71c19 Use `m` or `C-x m` in-buffer keybindings to send messages to Khoj
### Khoj Chat Server Improvements
- 27217a3 Time chat API sub-components for performance analysis
- 508b217 Update Chat API, Logs, Interfaces to store, use references as list
- d4b3866 Truncate message logs to below max supported prompt size by chat model
- cf28f10 Register separate timestamps for user query and response by Khoj Chat
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
argument to generate_chatml_messages_with_context method
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
- Add continuation suffix, ..., when reference definition truncated
Merge pull request #191 from debanjum/create-chat-interface-on-emacs
- Render conversation history in a read-only org-mode buffer for Khoj Chat
- Add `chat` as a transient action in the Khoj transient menu
- Style chat messages as org-mode entries
- Put received date in property drawer and keep it hidden/folded by default
- Add Khoj chat response as child entry of the users associated question org entry
This allows folding back-n-forth between user and Khoj for easier viewing
- Render source notes snippets used as references for response as org-mode links
Hovering mouse on link or opening links shows reference note snippets used
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
- Improves color coding for readability
- Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
- Generalize the render-chat-response method to handle rendering
history or chat response from chat API reponse
- Trigger rendering of khoj chat history if Khoj chat buffer not
created for this session yet
- Use org-insert-link method to improve link rendering robustness
Previous simple mechanism to crete org-links would result in links
escaping out of formating. Use a user-facing org-mode method to
remove/reduce probability of this
- Replace newlines with space to render reference notes as links
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
- [sender-name]: *[message]*
- /[receive-date]/
- Add references as org links with context visible on hover,
but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
Use `-map-indexed' method from dash
Merge pull request #189 from debanjum/add-search-actor-to-improve-notes-lookup-for-chat
### Introduce Search Actor
Search actor infers Search Queries from user's message
- Capabilities
- Use previous messages to add context to current search queries[^1]
This improves quality of responses in multi-turn conversations.
- Deconstruct users message into multiple search queries to lookup notes[^2]
- Use relative date awareness to add date filters to search queries[^3]
- Chat Director now does the following:
1. [*NEW*] Use Search Actor to generate search queries from user's message
2. Retrieve relevant notes from Knowledge Base using the Search queries
3. Pass retrieved relevant notes to Chat Actor to respond to user
### Add Chat Quality Tests
- Test Search Actor capabilities
- Mark Chat Director Tests for Relative Date, Multiple Search Queries as Expected Pass
### Give More Search Results as Context to Chat Actor
- Loosen search results score threshold to work better for searches with date filters
- Pass more search results (up to 5 from 2) as context to Chat Actor to improve inference
[^1]: Multi-Turn Example
Q: "When did I go to Mars?"
Search: "When did I go to Mars?"
A: "You went to Mars in the future"
Q: "How was that experience?"
Search: "How my Mars experience?"
*This gives better context for the Chat actor to respond*
[^2]: Deconstruct Example:
Is Alpha older than Beta? => What is Alpha's age? & When was Beta born?
[^3]: Date Example:
Convert user messages containing relative dates like last month, yesterday to date filters on specific dates like dt>="2023-03-01"
- Reasons:
- GPT can extract date aware search queries with date filters
better than ChatGPT given the same prompt.
- Need quality more than cost savings for now.
- Need to figure ways to improve prompt for ChatGPT before using it
Update Search Actor prompt with answers, more precise primer and
two more examples for context
Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
extract complete questions
- This should improve search results quality
- Example Expected Inferred Questions from User Message using History:
1. "What is the name of Arun's daughter?"
=> "What is the name of Arun's daughter"
2. "Where does she study?" =>
=> "Where does Arun's daughter study?" OR
=> "Where does Arun's daughter, Reena study?"
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
Answer time range based questions
Limit search to specified timeframe in question using date filter
E.g "What national parks did I visit last year?" adds
dt>="2022-01-01" dt<"2023-01-01" to Khoj search
Note: Temperature set to 0. Message to search queries should be deterministic
- Remove stale message_to_prompt test
It is too broad, reduces maintainability.
Remove as it doesn't really need its own test right now
- Setting skipif at module level for chat actor, director tests
reduces code duplication as earlier was using decorator on each chat
test
Create Rubric to Test Chat Quality and Capabilities
### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities
### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
- Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
- Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
- Measure quality at 2 separate layers
- **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
- **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes.
- Mark desired but not currently available capabilities as expected to fail <br />
This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
Combine hand-written custom notes and PG essays with personal
content to bulk up notes count
Delete old documentation markdown as not a representative dataset for
application (which is more tuned for personal notes)
- Chat directors are broad agents.
- Chat directors orchestrate narrow actor agents to synthesize
final response for the user
- Agents are Prompts + ML Model
- Test Chat Director Capabilities
1. [X] Answer from retrieved notes
2. [X] Answer from chat history
3. [X] Answer general questions
4. [X] Carry out multi-turn conversation
5. [X] Say don't know when answer not in provided context
6. [X] Answers that require current date awareness
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
7. [X] Date-aware aggregation across multiple different notes
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
8. [X] Ask clarification questions if no unambiguous answer in provided context
9. [X] Retrieve answer from chat history beyond lookback window
This test is expected to fail as the chat director is not capable
of searching chat history yet. But the test allows assessing chat quality
10. [X] Retrieve context for answer using multiple independent
searches on knowledge base
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
- Index markdown test data as knowledge base. As easier to get good
markdown content (vs org)
- Setup markdown_content_config, processor_config and chat_client to
test chat API
- Mark chat quality tests, register custom mark for chat quality
- Filter unhelpful deprecation warnings from within dateparser library
- Error if tests use unregistered marks
- Chat actors are narrow agents (prompt + ML model)
Chat actors are different from the Chat director. who orchestrates
the narrow actor agents to synthesize final response to the user
- Test Chat Actor Capabilities
1. Answer from retrieved notes
2. Answer from chat history
3. Answer general questions
4. Carry out multi-turn conversation
5. Say don't know when answer not in provided context
6. Answers that require current date awareness
7. Date-aware aggregation across multiple different notes
8. Ask clarification questions if no unambiguous answer in provided context
This test is expected to fail as the chat is not capable of doing
this consistently yet. But having the test allows assessing chat quality
- Use Openai API Key from OPENAI_API_KEY environment variable
- Gitignore .env file, python virtualenv directory
Put OpenAI API Key in .env file to run chatbot tests via vscode
The .env file is default location for importing env vars
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
answering
- Make GPT be more reliable in looking at past conversations for
forming response
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
- Previously was asking GPT to summarize the notes
- Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
- Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
- Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now
## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
- Allows Khoj to say don't know if it can't find answer to query from notes
- Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
- Updates version in khoj.el and Obsidian manifest, package, versions
json files under interface and project root
- Create and tag release commit with updated files
- Creates commit with post-release version upgrade in files
- Use flags to specify whether to create a release or post-release commit
GPT still mostly says I don't know when answer not in notes or chats
But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
- Chat uses compiled form of search results, not the raw entries to
provide context for chat. The compiled snipped search results
themselves are unique and using multiple of them for context from
the same raw note is fine if they cross the score and rank thresholds
This should improve the context provided for chat
- Also apply score_threshold, no deduplication to the answers API
- Issue
The file path separator by khoj server and the Obsidian vault were
different on Windows
- Fix
Normalize file path to use forward slash(/) to find the matching
note file in the Obsidian vault for jump to it
Resolves#177
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat
For now it is only exposed via API. Later it will be expose in the
interfaces as well
Remove ability to select different chat types from the chat web
interface as there is only a single chat type
Stop appending answers to the conversation logs
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context
- Make GPT provide reason when it can't answer the question. Gives
user context to tune their questions
- Set context by either including last 2 chat messages from active
session or past 2 conversation summaries from conversation logs
- Set personality in system message
- Place personality system message before last completed back & forth
This may stop ChatGPT forgetting its personality as conversation progresses given:
- The conditioning based on system role messages is light
- If system message is too far back in conversation history, the
model may forget its personality conditioning
- If system message at end of conversation, the model can think its
the start of a new conversation
- Inserting the system message before last completed back & forth should
prevent ChatGPT from assuming its the start of a new conversation
while not losing personality conditioning from the system message
- Simplfy the Khoj Chat API to for now just answer from users notes
instead of trying to infer other potential interaction types.
- This is the default expected behavior from the feature anyway
- Use the compiled text of the top 2 search results for context
- Benefits of using ChatGPT
- Better model
- 1/10th the price
- No hand rolled prompt required to make GPT provide more chatty,
assistant type responses
- Improve GPT prompt
- Make GPT answer users query based on provided notes instead
of summarizing the provided notes
- Make GPT be truthful using prompt and reduced temperature
- Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
fit for default consumption yet
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config
Fallback to using default config for unconfigured core content type
instead
See #165 for details
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
take time
- Convert all other info logs to be only shown in verbose mode
- Text before headings was not being indexed due to buggy orgnode
parsing logic
- Resolved indexing intro text from files with and without headings in
them
- Ensure intro text node has heading set to all title lines collected
from the file
Resolves#165
- Test /config/types API when no plugin configured, only plugin configured
and no content configured scenarios
- Do not throw null reference exception while configuring search types
when no plugin configured
- Do not throw null reference exception on calling /config/types API
when no plugin configured
Resolves bug introduced by #173
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`
Bug:
- Unconfigured processor and search_types are instantiated as None in
self.current_config
- While creating the desktop GUI, these null configs are attempted to
be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors
Fix:
Use default config to create their GUI elements for unconfigured
search and processor types
Resolves#167
## Enable Creating Content Plugins
### Goal
Index, Search text content not supported by default in Khoj using plugins
### Code Changes
- fcbbe8c Configure content plugins to index using `khoj.yml`
- Index content plugins from standardized JSONL format for ingestion
- 55a032e Add jsonl processor to index plugin content
- ab0d3a0 Index configured plugins on app start and via update API endpoint
- Expose plugin content types for usage by interfaces
- 47b58a2 Dynamically update available types on loading the Khoj server
- Expose indexed types via API (9d38ead). Simplify getting enabled types in Web (f3f2438), Emacs (1e43f1a) interfaces
- Search plugin content from the Web and Emacs Interfaces
- d91c7e2 Search plugin content via the search API
- Render plugin content on Web (88344f9) and Emacs (c2814fc) interfaces
- The Web, Emacs interfaces are general interfaces, they allow searching across all content types
- The Obsidian interface is currently tuned for only markdown content
It will be extended to render more content plugins later
### Testing
- fcbbe8c Add unit tests to test reading plugin config from khoj.yml
- 55a032e Add unit tests for the `JsonlToJsonl` processor
- 88a9ead Add unit tests to validate search, incremental update, force-update API works with plugin content types
- b09350c Add unit test to validate only configure search types returned by the new /api/config/types API endpoint
- Manually test the config read, indexing, search and update with local khoj
- Previously was return all core content types even if they had not been
setup
- Add test to validate only configured content types are returned by
the api/config/types API endpoint
- Remove need for interfaces to downcase content types returned by API
before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
- Instead of building the package locally like before
The issue started since moving to dynamic git based versioning with hatch-vcs
This should reduce image size of docker builds too
- Also move to ubuntu image since pyqt6 builds available on it, so do
not need to build it locally for image
- This s
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
different errors in different environments (venv, system and pre-commit)
- Run mypy on git push (not every commit) but for all files
- Running it on pre-commit, doesn't make sense as mypy wants to look
at all files, not just diff files
- But this is too time consuming to run every commit, so run on push
- Update development section documentation on installing, manually
running pre-commit for validation that includes running mypy checks
- Why
- pyprojects.toml is the python standards compliant config format
- allows collating python tooling configs into single standard file
- hatch(-ling) is a new lightweight build system for python packages
- Detailed Changes
- Replace setup.py, setuptools with pyproject.toml, hatchling for
khoj python config and build
- move pytest into optional development dependencies
- add more links to khoj in the project urls section
- add topic classifiers and keywords to find khoj package
- Delete setup.py, MANIFEST.in as moved to pyproject.toml based setup
- Update pypi workflow to set python package version in pyproject.toml
- Use Rich to render uvicorn, fastAPI logs as well
The previous CustomFormatter only worked on khoj logs
- Improve rendering stacktrace on errors using Rich
- Use emoji's to improve visual indicator of action step
- Rename to pypi instead of the more ambiguous publish name
Publish could mean publish docker image, publish to pypi, MELPA or
Obsidian plugin
- Update workflow badge, link pypi badge to khoj pypi package page
- Use pypa official github action to upload package to (test) pypi
instead of doing it manually using twine
- Upload python package artifact for easier access for testing.
As uploading to testpypi doesn't work for PRs by others from forked repos
- What
- The Emacs and Obsidian interfaces stay in their original
directories under src/
- src/khoj now only contains code meant for pypi packaging
- Benefits
- This avoids having to update khoj MELPA, Obsidian plugin config as
the Emacs, Obsidian code is under their original directories
- It separates the code in src/khoj meant for python packaging from
code for external interfaces like Emacs and Obsidian
- Why
The khoj pypi packages should be installed in `khoj' directory.
Previously it was being installed into `src' directory, which is a
generic top level directory name that is discouraged from being used
- Changes
- move src/* to src/khoj/*
- update `setup.py' to `find_packages' in `src' instead of project root
- rename imports to form `from khoj.*' in complete project
- update `constants.web_directory' path to use `khoj' directory
- rename root logger to `khoj' in `main.py'
- fix image_search tests to use the newly rename `khoj' logger
- update config, docs, workflows to reference new path `src/khoj'
- By default the obsidian plugin automatically configures the khoj
backend to index the current vault
- For more complex scenarios, users can manage their ~/.khoj/khoj.yml
manually by toggling the auto-configure setting off in the khoj
plugin settings
Resolves#156
- setup.py best practise recommends only specifying core dependencies,
not dependencies of core dependencies in it
- Latest sentence-transformer (version 2.2.2) correctly installs its
huggingface_hub dependency. Else application fails to start
### Background
1. Obsidian stores markdown notes as `utf8`[1]
2. By default, the python `open` command uses the OS locale encoding[2]
### Issue
Based on above background, if the OS locale encoding isn't `utf8` it causes the `UnicodeDecodeError: <locale_encoding> codec can't decode byte` error
### Fix
- Read markdown files as `utf8`
The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as `utf8`.
Do not assume utf8 for other content types like org-mode, beancount for now.
- Fail if error in reading file as utf8, instead of ignoring errors.
Would rather have user realize that their files are not going to get indexed correctly.
[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
- Khoj supports indexing subdirectories but the khoj docker config
wasn't updated to support the same
- This should also allow khoj docker users to index multiple separate
directory trees by mounting them into separate sub folders within
/data/<content-type>/.
For e.g /data/org/dir1, /data/org/dir2 etc in khoj_docker.yml
- Background
1. Obsidian stores markdown notes as utf8[1]
2. By default, the python `open' command uses the OS locale encoding[2]
This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error
- Fix
- Read markdown files as utf8
The Obsidian plugin is the main use-case for markdown files in
khoj currently and that stores md files as utf8.
Do not assume utf8 for other content types like org-mode, beancount for now.
- Fail if error in reading file as utf8, instead of ignoring errors.
Would rather have user realize that their files are not going to
get indexed correctly.
[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
Khoj plugin page from within Obsidian isn't recognized. Seems like it
needs an uppercase readme file only. So it doesn't show the Khoj
readme from within Obsidian itself.
- Update khoj.el test to reflect updated rendering logic
- Move ledger render function before image rendered to group functions
with similar logic closer
### Details
- b415f87 Split find and jump to notes code in `onChooseSuggestion' method
- 37063f6 Truncate query to 8k chars for find similar notes from Obsidian plugin
- 4456cf5 No need to use `then' or `finally' in `async' functions after an `await'
- 4070be6 Pass app object from plugin instance to child objects and functions
- c203c6a Use Sentence case for Find similar mote Obsidian command name
Split find file, jump to file code to make onChooseSuggestion more readable
- Use find, instead of using return in forEach to get first match
- Move the jump to file+heading code out from forEach
Do not reference global app object from child objects and funcs
directly.
It is only available for debugging purposes and access to it maybe
dropped in the future.
- Use ERT to test `khoj.el'
- Test extracting and rendering of Org, Markdown and Ledger entries from Khoj API response
- Automate `khoj.el' testing using Github workflow
- Fix, Simplify and Test the get text around point code for the "Find Similar" feature
Previously no query syntax helpers, like the "file:" prefix, were used
before checking if query contains file path.
This made query to image search brittle to misinterpretation and
pointless checking
Add test to verify search by image at file works as expected
### Overview
Find items of specified type similar to current text item at point
### Capabilities
- Support querying with text surrounding point in any text buffer
- Find similar items of specified content type indexed on Khoj
### Details
- Query using text in current section if in a `outline-mode` buffer (i.e markdown heading, org-mode entry text)
- Query using text in current paragraph if in non `outline-mode` buffer
- Search for items of `content-type` set in khoj transient menu
- Update last used khoj content-type and results from the
*find-similar* and *update* functions for later reuse
### Related
- Recently added [Find Similar Notes in Khoj Obsidian](https://github.com/debanjum/khoj/pull/122) as well
- Support querying with text surrounding point in any text buffer
Previously could only find items similar to org entry at point
- Find similar items of specified content type indexed on khoj
Previously only looked for similar org entries indexed on khoj
Now uses the content-type configured in khoj transient menu to find
items of the specified content type
- Details
- Generalize the get-current-org-entry-text func to get text for any
outline section
- Replace leading whitespaces from query text as well
- Create method to get current paragraph text from non-outline mode
buffers
- Update transient, find-similar funcs to pass, use content-type
configured in khoj transient menu
- Generalize query title creation logic to remove markdown headings
prefix (#) apart from org heading prefix (*) as well
- Update last used khoj content-type and results from the
find-similar and update funcs for later reuse
- Jump to top of results buffer after results rendered
- 9f0bd0a Build `khoj.el' and Run `package-lint', `checkdoc' and other melpa package quality checks
- 48ad3c5 Use default content types if fail to call backend on `khoj.el` load
- 5f446b1 Convert `khoj' entry point method to transient.el menu for richer configuration
- 9d64a00 Allow updating khoj content index from within `khoj.el'
Enable searching for notes similar to the current note being viewed
## Main Changes
- 39a18e2 Extend search modal to search for similar notes
- Hide input field on init, Trigger search on opening modal when in similar notes mode
- Set input to contents of current markdown file and get notes similar to it
- Re-rank, by default, when searching for similar notes
- Filter out current note from similar note search results
- 0bed410 Only show `Find Similar Note' command in Editor
- Hide input field on init, Trigger search on opening modal in similar notes mode
- Set input to current markdown file and get similar notes to it
- Enable rerank when searching for similar notes
- Filter out current note from similar note search results
- Screenshot querying "Setup Editor" on test vault with Khoj Readmes
- New features showcase:
- information keybindings, rerank keybinding at bottom of modal
- fixed top level headings in search results
- search results snipped if greater than N words
- Previously top level headings would have get stripped of the
space between heading text and the prefix # symbols. That is,
`# Top Level Heading' would get converted to `#Top Level Heading'
- This would mess up their rendering as a heading in search results
- Add unit tests to text_to_jsonl processors to prevent regression
Provides a more consistent rendering of results in modal.
Makes it easier to see more results in modal.
To see complete entry, user can always just jump to entry from modal
- Add Setup OpenAI API key in Khoj Section to Miscellaneous
Refer all mentions of setting up your OpenAI API key to that section
- Add Demo Screenshot of Chat with Notes
- Put existing Miscellaneous Section under Beta API sub heading
- Fix to make Access Khoj on Mobile a Subsection of Advanceed Usage
- Trigger refresh of github image cache by adding ? at end of image paths
- Convert Troubleshooting Issues into Headings instead of Bullets
Allows them to be linked to more easily. E.g when pointing folks to
it in github issues etc
- Add index corruption issue and fix to the Troubleshooting section
### Overview
- Provide a chat interface to engage with and inquire your notes
- Simplify interacting with the beta `chat` and `summarize` APIs
### Use
- Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize
- Type your queries, see summarized response by Khoj from your notes
**Note**:
- **You will need to add an API key from OpenAI to your khoj.yml**
- **Your query and top note from search result will be sent to OpenAI for processing**
## Details
- 177756b Show chat history on loading chat page on web interface
- d8ee0f0 Save chat history to disk for persistence, seeing chat logs
- 5294693 Style chat messages as speech bubbles
- d170747 Add khoj web interface and chat styling to new chat page on khoj web
- de6c146 Implement functional, unstyled chat page for khoj web interface
- The previous mechanism to trigger saving on shutdown event did not work
- Use scheduler to persist chat sessions to disk at a 5 minute interval
- This improve time granularity, fixed interval of saving chat logs
- It may lose ~5 minutes of chat history until mechanism to also
write on shutdown found/resolved
- Create conversation directory if it doesn't exist before attempting write
- Reset chat_session after writing it to disk
- Wrap messages into speech bubbles
- Color messages by khoj blue, sender grey
- Add those standard protrusions to the speech bubbles for fun
- Align bubbles left or right based on sender
- messages by khoj are left aligned, message by self are right aligned
- Put message metadata like sender and time under speech bubble
- use data-* attribute and ::after css pseudo-selector for this
- Update renderMessage func to accept time param, remove unused type_ param
Not all notes are in the past. Notes can be about stuff in the future.
Casting them to past tense gives the impression that they've already
happened / been done.
- Changes
- Use blue color for khoj heading font
- This fixes the title color issue
- Update background to lighter shade
- This fixes the body text color issue
- Update colors for todo, done, miscellaneous todo state, tag color
- This does not fix the color contrast issue but seems like an acceptable solution
- Using white text rather than black text on blue background
better even though the black text on blue background passes the
WCAG acceptable contrast score
- For details see blog post:
https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/
- Add border to tags to give them tag pills look and differntiate
from todo states
- Buttons and inputs
- Change background color of input fields like type dropdown,
update button and results count counter, to match background
color of page
- Add shadow on hover over button, dropdowns
Resolves#111
- Ensure message input box sticks to bottom of screen
- Ensure chat logs div is scrollable when logs become longer than screen
Do not make the whole page scroll, just the chat logs body div
Uses longest file path match to find markdown file in vault
corresponding to file of search result returned by Khoj
Allow jumping to search result from khoj plugin modal on Android too
### Details
- 1c813a6 Convert *Results Count* setting to `Slider` from `Text` in plugin settings pane
- 4e1abd1 Disable `Update` button in plugin settings while indexing vault
- 513c86c Set index file paths relative to current or default path on Khoj backend
- 4407e23 Only index current vault on Khoj. Remove `ObsidianVaultPath` setting from plugin
- 86a1e43 Return HTTP Exception on */api/update* API call failure
- 5af2b68 Update plugin notifications for errors. Remove notification for success
Previous mechanism of manually triggering getSuggestions,
renderSuggestions flow was corrupting traversing and opening
reranked search results in KhojModal
Emulate event that would anyway trigger the get & render of results in
modal. This lets obsidian core handle the flow without digging too
deep into obsidian cores handling of the flow. Lowers the chance of
breakage
We need the index file paths to make sense on the khoj backend server
Having path of index on backend relative to current vault directory
on frontend ignores the fact that the frontend maybe on a different
machine than the khoj backend server
Using unique index name per vault allows switching vaults without
overwriting indices of other vaults created on khoj backend when khoj
obsidian plugin is loaded on opening a different vault
- Overview
Limits using Khoj with a single vault at a time. This is
automatically configured to the most recently opened vault.
Once directory filters are supported on backend, the plugin will be
updated to index multiple vault but search only current vault from
current vaults khoj obsidian plugin
- Code Details
- Remove setting to configure Vault directory from Khoj Obsidian plugin
- Automatically configure Khoj to index only current Vault.
- Overwrites any previous vaults that were intended to be indexed by
Khoj backend
- Force update of index after configuring vault
- Why
It's not helpful for now and can lead to more problems, confusion.
Once directory filters
- Previously the backend was just throwing backend error.
The frontend calling the /update API wasn't getting notified
- Now the frontend can react appropriately and make the issue
visible to the user
- Only show notification on plugin load and failure.
- In settings page, set current backend status at top of pane instead
of showing notification
Notices bubbles cluttered the UI while typing updates to settings
- Show notification once index updated via settings pane button click
There was no notification on index updated, which usually takes time
on the backend
### Search Modal Enhancements
- b52cd85 Allow Reranking results using Keybinding from Khoj Search Modal
- 580f4ac Add hints to Modal for available Keybindings
- da49ea2 Add placeholder text to modal in Khoj Obsidian plugin
### Handle Failure to Connect to Khoj Backend
Load plugin but warn on failure to connect to Khoj backend
- f046a95 Track connectedToBackend as a setting. Use it across obsidian plugin to:
- Disable command if not connected to backend
- Trigger warning notice on clicking Khoj ribbon if not connected to backend
- Show warning at top of Khoj Obsidian plugin settings pane
- 768e874 Load obsidian plugin even if fail to connect to backend but show warning
- Allows user to see reason for failure to try resolve it
- Allows user to update Khoj URL settings to point to URL of Khoj server
### Miscellaneous
- 7991ab7 Add button in Obsidian plugin settings to force re-indexing your vault
- Useful if index gets corrupted
- Display warning at top of khoj obsidian plugin settings
- Make search command available only if connected to backend
- Show warning notice on clicking khoj search ribbon button
- Call saveData after configureKhojBackend to ensure
connnectedToBackend setting saved after being (potentially) updated
in configureKhojBackend function
- Previously the plugin would not load if cannot connect to Khoj backend
- Silently failing to load with no reason provided is not helpful
- Load plugin to allow user to fix the Khoj URL in their plugin setting
- Show reason for khoj plugin not working. More helpful than failing silently
Use the timer context manager in all places where code was being timed
- Benefits
- Deduplicate timing code scattered across codebase.
- Provides single place to manage perf timing code
- Use consistent timing log patterns
The query method had become too big.
Extract out filter, score, sort and deduplicate logic used by
text_search.query into separate methods.
This should improve readabilty of code.
- Changes
- Fix method signatures of BaseFilter subclasses.
Else typing information isn't translating to them
- Explicitly pass `entries: list[Entry]' as arg to `load' method
- Fix type of `raw_entries' arg to `apply' method
to list[Entry] from list[str]
- Rename `raw_entries' arg to `apply' method to `entries'
- Fix `raw_query' arg used in `apply' method of subclasses to `query'
- Set type of entries, corpus_embeddings in TextSearchModel
- Verification
Ran `mypy --config-file .mypy.ini src' to verify typing
- `torch.Tensor' is apparently a legacy tensor constructor
- Using that to create tensor on MPS devices throws error:
RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed
- `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine
This is unlike the more general chat API that combines summarization
of top search result and conversing with the OpenAI model
This should give faster summary results. As no intent categorization
API call required
- Use latest davinci model for tests
- Wrap prompt in triple quotes to improve legibilty
- `understand' method returns dictionary instead of string. Fix its test
- Fix prompt for new model to pass `chat_with_history' test
- Default to using `text-davinci-003' if conversation model not
explicitly configured by user. Stop using the older `davinci' and
`davinci-instruct' models
- Use `model' instead of `engine' as parameter.
Usage of `engine' parameter in OpenAI API is deprecated
- 2fe37a0 Make type of encoder to use for embeddings configurable via `khoj.yml'
- Previously `encoder_type' was set in the setup code of search_type
- All *encoders* were of type `SentenceTransformer'
- All *cross_encoders* were of type `CrossEncoder'
- Now the `encoder_type' can be configured via the new `encoder_type' field
in `TextSearchConfig' under `search_type` in `khoj.yml'
- All the specified `encoder-type' class needs is an `encode' method
that takes entries and returns embedding vectors
- 826f9dc Drop long words from compiled entries to be within max token limit of models
Long words (>500 characters) provide less useful context to models.
Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
- c0ae8ee Allow using OpenAI models for search in Khoj
To use OpenAI models for search in Khoj, in `~/.khoj/khoj.yml'
1. Set `encoder' to name of an OpenAI model. E.g *text-embedding-ada-002*
2. Set `encoder-type' to *src.utils.models.OpenAI*
3. Set `model-directory` to *null*, as this is an online model and
cannot be stored on the file system
- Init processor before search to instantiate `openai_api_key'
from `khoj.yml'. The key is used to configure search with openai models
- To use OpenAI models for search in Khoj
- Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002
- Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI'
- Set `model-directory' to `null', as online model cannot be stored on disk
Long words (>500 characters) provide less useful context to models.
Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
- Previously `model_type' was set in the setup of each `search_type'
- All encoders were of type `SentenceTransformer'
- All cross_encoders were of type `CrossEncoder'
- Now `encoder-type' can be configured via the new `encoder_type' field
in `TextSearchConfig' under `search-type` in `khoj.yml`.
- All the specified `encoder-type' class needs is an `encode' method
that takes entries and returns embedding vectors
- Ensure all tensors are on MPS device before doing operations across them
- Background
- GPU is used by default for Khoj on MacOS now
- Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now
- MPS should speed up search and indexing on MacOS
Fix usage warning for unescaped single quote in `khoj.el' docstring.
Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs
⛔ Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
⛔ Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
2023-01-05 16:47:38 +08:00
742 changed files with 113097 additions and 7317 deletions
> To save time for both you and us, try follow these guidelines before submitting a new issue:
> 1. Check if there is an existing issue tracking your bug on our Github.
> 2. When unsure if your issue is an actual bug, first discuss it on a [Github discussion](https://github.com/khoj-ai/khoj/discussions/new?category=q-a) or the **#bugs** channel of our [Discord server](https://discord.gg/b6gUdpKr).
> These steps avoid opening issues which are duplicate or not actual bugs.
- type:"checkboxes"
id:"server"
attributes:
label:"Server"
description:"With which Khoj server are you experiencing the issue? (select at least one)"
options:
- label:"Cloud (https://app.khoj.dev)"
required:false
- label:"Self-Hosted Docker"
required:false
- label:"Self-Hosted Python package"
required:false
- label:"Self-Hosted source code"
required:false
validations:
required:true
- type:"checkboxes"
id:"clients"
attributes:
label:"Clients"
description:"With which Khoj client(s) are you experiencing the issue? (select at least one)"
options:
- label:"Web browser"
required:false
- label:"Desktop/mobile app"
required:false
- label:"Obsidian"
required:false
- label:"Emacs"
required:false
- label:"WhatsApp"
required:false
validations:
required:true
- type:"checkboxes"
id:"os"
attributes:
label:"OS"
description:"On which operating system do you experience the issue? (select at least one)"
options:
- label:"Windows"
required:false
- label:"macOS"
required:false
- label:"Linux"
required:false
- label:"Android"
required:false
- label:"iOS"
required:false
validations:
required:true
- type:"input"
id:"version"
attributes:
label:"Khoj version"
description:"Use `/help` command on the chat page to find the server version.\n
If using the cloud service, you can input, latest.\n
If self-hosting - please make sure to run the latest version of Khoj before reporting any issues as it may have already been fixed."
validations:
required:true
- type:"textarea"
id:"description"
attributes:
label:"Describe the bug"
description:"What is the problem? A clear and concise description of the bug."
validations:
required:true
- type:"textarea"
id:"current"
attributes:
label:"Current Behavior"
description:"What actually happened?\n\n
Please include full errors, uncaught exceptions, stack traces, screenshots and other relevant logs."
validations:
required:true
- type:"textarea"
id:"expected"
attributes:
label:"Expected Behavior"
description:"What did you expect to happen?"
validations:
required:true
- type:"textarea"
id:"reproduction"
attributes:
label:"Reproduction Steps"
description:"Detail the steps needed to reproduce the issue. This can include a self-contained, concise snippet of
code, if applicable.\n\n
For more complex issues, provide a link to a repository with the smallest sample that reproduces
the bug.\n
If the issue can be replicated without code, please provide a clear, step-by-step description of
the actions or conditions necessary to reproduce it. Any screenshots are also appreciated.\n
Avoid including business logic or unrelated details, as this makes diagnosis more difficult.\n\n
Whether it's a sequence of actions, code samples, or specific conditions, ensure that the steps
are clear enough to be easily followed and replicated."
validations:
required:true
- type:"textarea"
id:"workaround"
attributes:
label:"Possible Workaround"
description:"If you find any workaround for this problem - please, provide it here."
validations:
required:false
- type:"textarea"
id:"context"
attributes:
label:"Additional Information"
description:"Anything else that might be relevant for troubleshooting this bug.\n
Providing context helps us come up with a solution that is most useful in the real-world use case.\n
For example if self-hosting, provide environment details like OS versin, Docker version etc."
validations:
required:false
- type:"input"
id:"discussion_link"
attributes:
label:"Link to Discord or Github discussion"
description:"Provide a link to the first message of bug's discussion on Discord or Github.\n
This will help to keep history of why this bug exists."
description:"Use this template to request new feature or suggest an idea for Khoj"
labels:["upgrade"]
body:
- type:"markdown"
attributes:
value:|
> [!IMPORTANT]
> To save time for both you and us, try follow these guidelines before submitting a feature request:
> 1. Check if there is an existing feature request that is similar to your on our Github.
> 2. We encourage you to first discuss your idea on a [Github discussion](https://github.com/khoj-ai/khoj/discussions/categories/ideas) or the **#ideas** channel of our [Discord server](https://discord.gg/b6gUdpKr).
> This step helps in understanding the new feature and determining if it's can be implemented at all.
Only proceed with this report if your idea was approved after the GitHub/Discord discussion.
- type:"textarea"
id:"description"
attributes:
label:"Describe the feature"
description:"A clear and concise description of the feature you are proposing."
validations:
required:true
- type:"textarea"
id:"use-case"
attributes:
label:"Use Case"
description:"Why do you need this feature? Provide real world use cases, the more the better."
validations:
required:true
- type:"textarea"
id:"solution"
attributes:
label:"Proposed Solution"
description:"Suggest how to implement the new feature. Please include prototype/sketch/reference implementation."
validations:
required:false
- type:"textarea"
id:"additional_info"
attributes:
label:"Additional Information"
description:"Any additional information you would like to provide - links, screenshots, etc."
validations:
required:false
- type:"input"
id:"discussion_link"
attributes:
label:"Link to Discord or Github discussion"
description:"Provide a link to the first message of feature request's discussion on Discord or Github.\n
This will help to keep history of why this feature request exists."
* Start any message with `/research` to try out the experimental research mode with Khoj.
* Anyone can now [create custom agents](https://blog.khoj.dev/posts/create-agents-on-khoj/) with tunable personality, tools and knowledge bases.
* [Read](https://blog.khoj.dev/posts/evaluate-khoj-quality/) about Khoj's excellent performance on modern retrieval and reasoning benchmarks.
***
## Overview
[Khoj](https://khoj.dev) is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini, deepseek).
- Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
- Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
- Create agents with custom knowledge, persona, chat model and tools to take on any role.
- Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
- Find relevant docs quickly and easily using our advanced semantic search.
- Generate images, talk out loud, play your messages.
- Khoj is open-source, self-hostable. Always.
- Run it privately on [your computer](https://docs.khoj.dev/get-started/setup) or try it on our [cloud app](https://app.khoj.dev).
You can see the full feature list [here](https://docs.khoj.dev/category/features).
## Self-Host
To get started with self-hosting Khoj, [read the docs](https://docs.khoj.dev/get-started/setup).
## Enterprise
Khoj is available as a cloud service, on-premises, or as a hybrid solution. To learn more about Khoj Enterprise, [visit our website](https://khoj.dev/teams).
## Frequently Asked Questions (FAQ)
Q: Can I use Khoj without self-hosting?
Yes! You can use Khoj right away at [https://app.khoj.dev](https://app.khoj.dev) — no setup required.
Q: What kinds of documents can Khoj read?
Khoj supports a wide variety: PDFs, Markdown, Notion, Word docs, org-mode files, and more.
Q: How can I make my own agent?
Check out [this blog post](https://blog.khoj.dev/posts/create-agents-on-khoj/) for a step-by-step guide to custom agents.
For more questions, head over to our [Discord](https://discord.gg/BDgyabRM6e)!
Khoj is open source. It is sustained by the community and we’d love for you to join it! Whether you’re a coder, designer, writer, or enthusiast, there’s a place for you.
Why Contribute?
- Make an Impact: Help build, test and improve a tool used by thousands to boost productivity.
- Learn & Grow: Work on cutting-edge AI, LLMs, and semantic search technologies.
You can help us build new features, improve the project documentation, report issues and fix bugs. If you're a developer, please see our [Contributing Guidelines](https://docs.khoj.dev/contributing/development) and check out [good first issues](https://github.com/khoj-ai/khoj/contribute) to work on.
*A natural language search engine for your personal notes, transactions and images*
## Table of Contents
- [Features](#Features)
- [Demos](#Demos)
- [Khoj in Obsidian](#khoj-in-obsidian)
- [Khoj in Emacs, Browser](#khoj-in-emacs-browser)
- [Interfaces](#Interfaces)
- [Architecture](#Architecture)
- [Setup](#Setup)
- [Install](#1-Install)
- [Configure](#2-Configure)
- [Run](#3-Run)
- [Use](#Use)
- [Interfaces](#Interfaces-1)
- [Query Filters](#Query-filters)
- [Upgrade](#Upgrade)
- [Khoj Server](#upgrade-khoj-server)
- [Khoj.el](#upgrade-khoj-on-emacs)
- [Khoj Obsidian](#upgrade-khoj-on-obsidian)
- [Troubleshoot](#Troubleshoot)
- [Advanced Usage](#advanced-usage)
- [Access Khoj on Mobile](#access-khoj-on-mobile)
- [Miscellaneous](#Miscellaneous)
- [Performance](#Performance)
- [Query Performance](#Query-performance)
- [Indexing Performance](#Indexing-performance)
- [Miscellaneous](#Miscellaneous-1)
- [Development](#Development)
- [Setup](#Setup)
- [Using Pip](#Using-Pip)
- [Using Docker](#Using-Docker)
- [Using Conda](#Test)
- [Test](#Test)
- [Credits](#Credits)
## Features
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search, indexing is done on your machine[\*](https://github.com/debanjum/khoj#miscellaneous)
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Search your Org-mode and Markdown notes, Beancount transactions and Photos
- **Multiple Interfaces**: Search using a [Web Browser](./src/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or the [API](http://localhost:8000/docs)
- Add this readme and [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs) as org-mode for Khoj to index
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation)
</details>
<details><summary>Analysis</summary>
- The results do not have any words used in the query
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter
These are the general setup instructions for Khoj.
Check the [Khoj Obsidian Readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#Setup) to setup Khoj with the Obsidian Plugin. Its simpler as it can skip the configure step below.
### 1. Install
```shell
pip install khoj-assistant
```
### 2. Start App
```shell
khoj
```
### 3. Configure
1. Enable content types and point to files to search in the First Run Screen that pops up on app start
2. Click `Configure` and wait. The app will download ML models and index the content for search
## Use
### Interfaces
- **Khoj via Obsidian**
- [Install](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) the Khoj Obsidian plugin
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- Open <http://localhost:8000/> via desktop interface or directly
- **Khoj via API**
- See the Khoj FastAPI [Swagger Docs](http://localhost:8000/docs), [ReDocs](http://localhost:8000/redocs)
### Query Filters
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
-`what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
## Upgrade
### Upgrade Khoj Server
```shell
pip install --upgrade khoj-assistant
```
### Upgrade Khoj on Emacs
- Use your Emacs Package Manager to Upgrade
- See [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade) for details
### Upgrade Khoj on Obsidian
- Upgrade via the Community plugins tab on the settings pane in the Obsidian app
- See the [khoj plugin readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) for details
## Troubleshoot
- Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Disable `image` search using the desktop GUI
- Symptom: Errors out with \"Killed\" in error message in Docker
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)
# Advanced Usage
## Access Khoj on Mobile
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
3. Open the Khoj web interface of the server from your phone browser. It should be `http://tailscale-url-of-server:8000` or `http://name-of-server:8000` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
4. Click the [Install/Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
5. Enjoy exploring your notes, transactions and images from your phone!
- The beta [chat](http://localhost:8000/api/beta/chat) and [search](http://localhost:8000/api/beta/search) API endpoints use [OpenAI API](https://openai.com/api/)
- It is disabled by default
- To use it add your `openai-api-key` via the app configure screen
- Warning: *If you use the above beta APIs, your query and top result(s) will be sent to OpenAI for processing*
## Performance
### Query performance
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
python3 -m pip install pyqt6 # As conda does not support pyqt6 yet
```
##### 3. Configure
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image``content-type` section
- Delete `content-type`, `processor` sub-sections irrelevant for your use-case
##### 4. Run
```shell
python3 -m src.main -vv
```
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
##### 5. Upgrade
```shell
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
```
### Test
```shell
pytest
```
## Credits
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
# Set KHOJ_OPERATOR_ENABLED=True in the server service environment variable to enable.
computer:
container_name:khoj-computer
image:ghcr.io/khoj-ai/khoj-computer:latest
# build:
# context: .
# dockerfile: computer.Dockerfile
ports:
- "5900:5900"
volumes:
- khoj_computer:/home/operator
server:
image:ghcr.io/debanjum/khoj:latest
depends_on:
database:
condition:service_healthy
# Use the following line to use the latest version of khoj. Otherwise, it will build from source. Set this to ghcr.io/khoj-ai/khoj-cloud:latest if you want to use the prod image.
image:ghcr.io/khoj-ai/khoj:latest
# Uncomment the following line to build from source. This will take a few minutes. Comment the next two lines out if you want to use the official image.
# build:
# context: .
ports:
# If changing the local port (left hand side), no other changes required.
# If changing the remote port (right hand side),
# change the port in the args in the build section,
# If changing the remote port (right hand side),
# change the port in the args in the build section,
# as well as the port in the command section to match
- "8000:8000"
- "42110:42110"
extra_hosts:
- "host.docker.internal:host-gateway"
working_dir:/app
volumes:
- .:/app
# These mounted volumes hold the raw data that should be indexed for search.
# The path in your local directory (left hand side)
# points to the files you want to index.
# The path of the mounted directory (right hand side),
# must match the path prefix in your config file.
- ./tests/data/org/:/data/org/
- ./tests/data/images/:/data/images/
- ./tests/data/ledger/:/data/ledger/
- ./tests/data/music/:/data/music/
- ./tests/data/markdown/:/data/markdown/
# Embeddings and models are populated after the first run
# You can set these volumes to point to empty directories on host
# Default URL of Terrarium, the default Python sandbox used by Khoj to run code. Its container is specified above
- KHOJ_TERRARIUM_URL=http://sandbox:8080
# Uncomment line below to have Khoj run code in remote E2B code sandbox instead of the self-hosted Terrarium sandbox above. Get your E2B API key from https://e2b.dev/.
# - E2B_API_KEY=your_e2b_api_key
# Default URL of SearxNG, the default web search engine used by Khoj. Its container is specified above
- KHOJ_SEARXNG_URL=http://search:8080
# Uncomment line below to use with Ollama running on your local machine at localhost:11434.
# Change URL to use with other OpenAI API compatible providers like VLLM, LMStudio etc.
> Describes the Khoj settings configurable via the admin panel
By default, your admin panel is available at `http://localhost:42110/server/admin/`. You can access the admin panel by logging in with your admin credentials (this would be your `KHOJ_ADMIN_EMAIL` and `KHOJ_ADMIN_PASSWORD`). The admin panel allows you to configure various settings for your Khoj server.
## App Settings
### Agents
Add all the agents you want to use for your different use-cases like Writer, Researcher, Therapist etc.
-`Personality`: This is a prompt to tell the chat model how to tune the personality of the agent.
-`Chat model`: The chat model to use for the agent.
-`Name`: The name of the agent. This field helps give the agent a unique identity across the app.
-`Avatar`: Url to the agents profile picture. It helps give the agent a unique visual identity across the app.
-`Style color`, `Style icon`: These fields help give the agent a unique, visually identifiable identity across the app.
-`Slug`: This is the agent name to use in urls.
-`Public`: Check this if the agent is expected to be visible to all users on this Khoj server.
-`Managed by admin`: Check this if the agent is managed by admin, not by any user.
-`Creator`: The user who created the agent.
-`Tools`: The list of tools available to this agent. Tools include notes, image, online. This field is not currently configurable and only supports all tools (i.e `["*"]`)
### Chat Model Options
Add all the chat models you want to try, use and switch between for your different use-cases. For each chat model you add:
-`Chat model`: The name of an [OpenAI](https://platform.openai.com/docs/models), [Anthropic](https://docs.anthropic.com/en/docs/about-claude/models#model-names), [Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models) or [Offline](https://huggingface.co/models?pipeline_tag=text-generation&library=gguf) chat model.
-`Model type`: The chat model provider like `OpenAI`, `Offline`.
-`Vision enabled`: Set to `true` if your model supports vision. This is currently only supported for vision capable OpenAI models like `gpt-4o`
-`Max prompt size`, `Subscribed max prompt size`: These are optional fields. They are used to truncate the context to the maximum context size that can be passed to the model. This can help with accuracy and cost-saving.<br/>
-`Tokenizer`: This is an optional field. It is used to accurately count tokens and truncate context passed to the chat model to stay within the models max prompt size.

### Server Chat Settings
The server chat settings are used as:
1. The default chat models for subscribed (`Advanced` field) and unsubscribed (`Default` field) users.
2. The chat model for all intermediate steps like intent detection, web search etc. during chat response generation.
If a server chat setting is not added the first ChatModelOption in your config is used as the default chat model.
To add a server chat setting:
- Set your preferred default chat models in the `Default` fields of your [ServerChatSettings](http://localhost:42110/server/admin/database/serverchatsettings/)
- The `Advanced` field doesn't need to be set when self-hosting. When unset, the `Default` chat model is used for all users and the intermediate steps.
### AI Model API
These settings configure APIs to interact with AI models.
For each AI Model API you [add](http://localhost:42110/server/admin/database/aimodelapi/add):
-`Api key`: Set to your [OpenAI](https://platform.openai.com/api-keys), [Anthropic](https://console.anthropic.com/account/keys) or [Gemini](https://aistudio.google.com/app/apikey) API keys.
-`Name`: Give the configuration any friendly name like `OpenAI`, `Gemini`, `Anthropic`.
-`Api base url`: Set the API base URL. This is only relevant to set if you're using another OpenAI-compatible proxy server like [Ollama](/advanced/ollama) or [LMStudio](/advanced/lmstudio).

### Search Model Configs
Search models are used to generate vector embeddings of your documents for natural language search and chat. You can choose any [embeddings models on HuggingFace](https://huggingface.co/models?pipeline_tag=sentence-similarity) to create vector embeddings of your documents for natural language search and chat.
<imgsrc="/img/example_search_model_admin_settings.png"alt="Example Search Model Settings"style={{width:500}}/>
### Text to Image Model Options
Add text to image generation models with these settings. Khoj currently supports text to image models available via OpenAI, Stability or Replicate API
-`api-key`: Set to your OpenAI, Stability or Replicate API key
-`model`: Set the model name available over the selected model provider
-`model-type`: Set to the appropriate model provider
-`openai-config`: For image generation models available via OpenAI (compatible) API you can set the appropriate OpenAI Processor Conversation Settings instead of specifying the `api-key` field above
### Speech to Text Model Options
Add speech to text models with these settings. Khoj currently only supports whisper speech to text model via OpenAI API or Offline
### Voice Model Options
Add text to speech models with these settings. Khoj currently supports models from [ElevenLabs](https://elevenlabs.io/).
### Reflective Questions
This is a static list of starter question suggestions for each user. It is not currently used in any client app. It used to be shown on the web app home page. We may turn it into a dynamic list of starter questions personalized to each users, say based on their recent conversations or synced knowledge base.
## User Data
- Users, Entrys, Conversations, Subscriptions, Github configs, Notion configs, User search configs, User conversation configs, User voice configs
## Miscellaneous Data
- Process Locks: Persistent Locks for Automations
- Client Applications:
Client applications allow you to setup third party applications that can query your Khoj server using a client application ID + secret. The secret would go in a bearer token.
By default, most of the instructions for self-hosting Khoj assume a single user, and so the default configuration is to run in anonymous mode. However, if you want to enable authentication, you can do so either with with [Magic Links](#using-magic-links) or [Google OAuth](#using-google-oauth) as shown below. This can be helpful to make Khoj securely accessible to you and your team.
:::tip[Note]
Remove the `--anonymous-mode` flag from your khoj start up command or docker-compose file to enable authentication.
:::
## Using Magic Links
The most secure way to do this is to integrate with [Resend](https://resend.com).
1. Setup your account at https://resend.com
2. Set an environment variable for `RESEND_API_KEY`. You can get your API key [here](https://resend.com/api-keys).
3. Set an environment variable for `RESEND_EMAIL`. This is the email address that will show up in your `from` field when sending magic links.
This will allow you to automatically send sign-in links to users who want to log in.
It's still possible to use the magic links feature without Resend, but you'll need to manually send the magic links to users who want to log in.
## Manually sending magic links
1. The user will have to enter their email address in the login page at http://localhost:42110/login.
They'll click `Get Login Link`. Without the Resend API key, this will just create an unverified account for them in the backend
<img src="/img/magic_link.png" alt="Magic link login form" width="400"/>
2. You can get their magic link using the admin panel
Go to the [admin panel](http://localhost:42110/server/admin/database/khojuser/). You'll see a list of users. Search for the user you want to send a magic link to. Tick the checkbox next to their row, and use the action drop down at the top to 'Get email login URL'. This will generate a magic link that you can send to the user, which will appear at the top of the admin interface.
| Get email login URL | Retrieved login URL |
|---------------------|---------------------|
| <img src="/img/admin_get_emali_login.png" alt="Get user magic sign in link" width="400" />| <img src="/img/admin_successful_login_url.png" alt="Successfully retrieved a login URL" width="400" />|
3. Send the magic link to the user. They can click on it to log in.
Once they click on the link, they'll automatically be logged in. They'll have to repeat this process for every new device they want to log in from, but they shouldn't have to repeat it on the same device.
A given magic link can only be used once. If the user tries to use it again, they'll be redirected to the login page to get a new magic link.
## Using Google OAuth
For this method, you'll need to use the prod version of the Khoj package. You can install it as below:
<Tabs groupId="server" queryString>
<TabItem value="docker" label="Docker">
Update your `docker-compose.yml` to use the prod image
```bash
image: ghcr.io/khoj-ai/khoj-cloud:latest
```
</TabItem>
<TabItem value="pip" label="Pip">
```bash
pip install khoj[prod]
```
</TabItem>
</Tabs>
To set up your self-hosted Khoj with Google Auth, you need to create a project in the Google Cloud Console and enable the Google Auth API.
To implement this, you'll need to:
1. [Create authorization credentials](https://developers.google.com/identity/sign-in/web/sign-in) for your application.
2. Open your [Google cloud console](https://console.developers.google.com/apis/credentials) and create a configuration like below for the relevant `OAuth 2.0 Client IDs` project:
3. Configure these environment variables: `GOOGLE_CLIENT_SECRET`, and `GOOGLE_CLIENT_ID`. You can find these values in the Google cloud console, in the same place where you configured the authorized origins and redirect URIs.
That's it! That should be all you have to do. Now, when you reload Khoj without `--anonymous-mode`, you should be able to use your Google account to sign in.
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you can directly use any of the pre-configured AI models.
:::
Khoj can use Google's Gemini and Anthropic's Claude family of AI models from [Vertex AI](https://cloud.google.com/vertex-ai) on Google Cloud. Explore Anthropic and Gemini AI models available on Vertex AI's [Model Garden](https://console.cloud.google.com/vertex-ai/model-garden).
## Setup
1. Follow [these instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#before_you_begin) to use models on GCP Vertex AI.
3. Create a new [API Model API](http://localhost:42110/server/admin/database/aimodelapi/add) on your Khoj admin panel.
- **Name**: `Google Vertex` (or whatever friendly name you prefer).
- **Api Key**: `base64 encoded json keyfile` from step 2.
- **Api Base Url**: `https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`
- MODEL_GCP_REGION: A region the AI model is available in. For example `us-east5` works for [Claude](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#regions).
- YOUR_GCP_PROJECT_ID: Get your project id from the [Google cloud dashboard](https://console.cloud.google.com/home/dashboard)
4. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
- **Name**: `claude-3-7-sonnet@20250219`. Any Claude or Gemini model on Vertex's Model Garden should work.
- **Model Type**: `Anthropic` or `Google`
- **Ai Model API**: *the Google Vertex Ai Model API you created in step 3*
- **Max prompt size**: `60000` (replace with the max prompt size of your model)
- **Tokenizer**: *Do not set*
5. Select the chat model on [your settings page](http://localhost:42110/settings) and start a conversation.
## Troubleshooting & gcp AI Tips
- Permission Denied?
Ensure your service account has the `Vertex AI User` role and that the API is enabled in your GCP project.
- Region Errors?
Double-check that the model you're trying to use is supported in your selected region. Some Claude or Gemini models are restricted to specific zones like `us-east5` or `us-central1`.
- Prompt Size Limitations
The "Max prompt size" should align with the limits defined in the model documentation. Exceeding it can silently fail or truncate inputs.
- Testing the API Key
Before adding it to Khoj, you can verify that your key works by making a simple curl request to Vertex AI. This helps debug auth issues early.
- Use Environment Variables
For better security, consider using environment variables to manage sensitive keys and inject them at runtime during base64 encoding.
If you encounter any issues, the [Khoj Discord](https://discord.gg/BDgyabRM6e) is a great place to ask for help!
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
:::
:::info
Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
:::
[LiteLLM](https://docs.litellm.ai/docs/proxy/quick_start) exposes an OpenAI compatible API that proxies requests to other LLM API services. This provides a standardized API to interact with both open-source and commercial LLMs.
Using LiteLLM with Khoj makes it possible to turn any LLM behind an API into your personal AI agent.
## Setup
1. Install LiteLLM
```bash
pip install litellm[proxy]
```
2. Start LiteLLM and use Mistral tiny via Mistral API
Khoj does not work with LM Studio anymore. Khoj leverages [json mode](https://platform.openai.com/docs/guides/structured-outputs#json-mode) extensively but LMStudio's API seems to have dropped support for json mode. [1](https://x.com/lmstudio/status/1770135858709975547), [2](https://lmstudio.ai/docs/api/structured-output)
:::
:::info
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
:::
:::info
Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
:::
[LM Studio](https://lmstudio.ai/) is a desktop app to chat with open-source LLMs on your local machine. LM Studio provides a neat interface for folks comfortable with a GUI.
LM Studio can expose an [OpenAI API compatible server](https://lmstudio.ai/docs/local-server). This makes it possible to turn chat models from LM Studio into your personal AI agents with Khoj.
## Setup
1. Install [LM Studio](https://lmstudio.ai/) and download your preferred Chat Model
2. Go to the Server Tab on LM Studio, Select your preferred Chat Model and Click the green Start Server button
3. Create a new [AI Model API](http://localhost:42110/server/admin/database/aimodelapi/add/) on your Khoj admin panel
- **Name**: `lmstudio`
- **Api Key**: `any string`
- **Api Base Url**: `http://localhost:1234/v1/` (default for LMStudio)
4. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
- **Name**: `llama3.1` (replace with the name of your local model)
- **Model Type**: `Openai`
- **Ai Model Api**: *the lmstudio Ai Model Api you created in step 3*
- **Max prompt size**: `20000` (replace with the max prompt size of your model)
- **Tokenizer**: *Do not set for OpenAI, mistral, llama3 based models*
5. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you can use our first-party supported models.
:::
:::info
Khoj can directly run local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). The integration with Ollama is useful to run Khoj on Docker and have the chat models use your GPU or to try new models via CLI.
:::
Ollama allows you to run [many popular open-source LLMs](https://ollama.com/library) locally from your terminal.
For folks comfortable with the terminal, Ollama's terminal based flows can ease setup and management of chat models.
Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/ollama/blob/main/docs/openai.md#models). This makes it possible to use chat models from Ollama with Khoj.
## Setup
:::info
Restart your Khoj server after first run or update to the settings below to ensure all settings are applied correctly.
:::
<Tabs groupId="type" queryString>
<TabItem value="first-run" label="First Run">
<Tabs groupId="server" queryString>
<TabItem value="docker" label="Docker">
1. Setup Ollama: https://ollama.com/
2. Download your preferred chat model with Ollama. For example,
```bash
ollama pull llama3.1
```
3. Uncomment `OPENAI_BASE_URL` environment variable in your downloaded Khoj [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml#:~:text=OPENAI_BASE_URL)
4. Start Khoj docker for the first time to automatically integrate and load models from the Ollama running on your host machine
```bash
# run below command in the directory where you downloaded the Khoj docker-compose.yml
docker-compose up
```
</TabItem>
<TabItem value="pip" label="Pip">
1. Setup Ollama: https://ollama.com/
2. Download your preferred chat model with Ollama. For example,
```bash
ollama pull llama3.1
```
3. Set `OPENAI_BASE_URL` environment variable to `http://localhost:11434/v1/` in your shell before starting Khoj for the first time
- Open access to the Khoj port (default: 42110) from your OS and Network firewall.
:::warning[Use HTTPS certificate]
To expose Khoj on a custom domain over the public internet, use of an SSL certificate is strongly recommended. You can use [Let's Encrypt](https://letsencrypt.org/) to get a free SSL certificate for your domain.
To disable HTTPS, set the `KHOJ_NO_HTTPS` environment variable to `True`. This can be useful if Khoj is only accessible behind a secure, private network.
:::
:::info[Try Tailscale]
You can use [Tailscale](https://tailscale.com/) for easy, secure access to your self-hosted Khoj over the network.
1. Set `KHOJ_DOMAIN` to your machines [tailscale ip](https://tailscale.com/kb/1452/connect-to-devices#identify-your-devices) or [fqdn on tailnet](https://tailscale.com/kb/1081/magicdns#fully-qualified-domain-names-vs-machine-names). E.g `KHOJ_DOMAIN=100.4.2.0` or `KHOJ_DOMAIN=khoj.tailfe8c.ts.net`
2. Access Khoj by opening `http://tailscale-ip-of-server:42110` or `http://fqdn-of-server:42110` from any device on your tailscale network
Khoj uses an embedding model to understand documents. Multilingual embedding models improve the search quality for documents not in English. This affects both search and chat with docs experiences across Khoj.
To improve search and chat quality for non-english documents you can use a [multilingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br/>
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has decent search quality and speed for a consumer machine.
To use it:
1. Open [the search config](http://localhost:42110/server/admin/database/searchmodelconfig/) on your server's admin settings page. Either create a new search model, if none exists, or update the existing one. For example,
- Set the `bi_encoder` field to `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`
- Set the `cross_encoder` field to `mixedbread-ai/mxbai-rerank-xsmall-v1`
2. Regenerate your content index from all the relevant clients. This step is very important, as you'll need to re-encode all your content with the new model.
:::info[Note]
Modern search/embedding model like [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) expect a prefix to the query (or docs) string to improve encoding. Update the `bi_encoder_query_encode_config` field of your [embedding model](http://localhost:42110/server/admin/database/searchmodelconfig/) with `{prompt: <prefix-prompt>}` to improve the search quality of these models.
E.g. `{prompt: "Represent this query for searching documents"}`. You can pass any valid JSON object that the SentenceTransformer `encode` function accepts
This is only helpful for secure cross-device access to **self-hosted** Khoj. You **do not** need this if you're using [Khoj Cloud](https://app.khoj.dev).
:::
[Tailscale](https://tailscale.com) simplifies creating a private VPN using [Wireguard](https://www.wireguard.com/) and OAuth. So you can host and access services on your devices from anywhere.
The instructions below are one way to simply and securely access your self-hosted Khoj from your phone, laptop etc.
### Minimal Setup
1. Setup khoj on your preferred machine following the [standard steps](/get-started/setup)
2. Sign-up to [Tailscale](https://tailscale.com) and install the app on machines you want to access Khoj from. This usually includes your khoj server, your phone, laptop. Note the tailscale i.p of your khoj server.
3. Start khoj on your server by including the flag `--host <your_server_tailscale_ip>`
4. Open `http://<your_server_tailscale_ip>:42110` to access khoj from any device on your tailscale network!
### HTTPS Certificate
:::info
Tailscale uses Wireguard to encrypt and route traffic between your machines. So HTTPS isn't required with Tailscale for secure access. HTTPS with Tailscale is only useful for browsers to not complain about security and block certain features like clipboard access unless HTTPS is enabled.
:::
1. Enable [MagicDNS](https://tailscale.com/kb/1081/magicdns#enabling-magicdns) and [HTTPS](https://tailscale.com/kb/1153/enabling-https) toggle on your tailscale admin console [DNS](https://login.tailscale.com/admin/dns) page. Note your unique tailscale domain name (usually ends with .ts.net)
2. Create an https certificate for your Khoj server by running the following command:
```bash
# Assuming the server is named, `server` and your tailnet is `black-forest.ts.net`
# Note path of the .crt and .key files generated
tailscale cert server.black-forest.ts.net
```
3. Start khoj to be served via https on standard port
```bash
sudo KHOJ_DOMAIN=server.black-forest.ts.net \
khoj \
--sslcert /path/to/your/tailscale.crt \
--sslkey path/to/your/tailscale.key \
--host=server.black-forest.ts.net \
--port 443
```
4. You should now be able to access khoj on `https://server.black-forest.ts.net` from any device on your private tailscale network!
This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
:::
:::info
Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
:::
Khoj can use any OpenAI API compatible server including local providers like [Ollama](/advanced/ollama), [LMStudio](/advanced/lmstudio) and [LiteLLM](/advanced/litellm) and commercial providers like [HuggingFace](https://huggingface.co/docs/api-inference/tasks/chat-completion#using-the-api), [OpenRouter](https://openrouter.ai/docs/quick-start) etc.
Configuring this allows you to use non-standard, open or commercial, local or hosted LLM models for Khoj.
Combine them with Khoj can turn your favorite LLM into an AI agent. Allowing you to chat with your docs, find answers from the internet, build custom agents and run automations.
For specific integrations, see our [Ollama](/advanced/ollama), [LMStudio](/advanced/lmstudio) and [LiteLLM](/advanced/litellm) setup docs. For general instructions to setup Khoj with an OpenAI API proxy see below.
## General Setup
1. Start your preferred OpenAI API compatible app locally or get API keys from commercial AI model providers.
3. Create a new [API Model API](http://localhost:42110/server/admin/database/aimodelapi/add) on your Khoj admin panel
- **Name**: `any name`
- **Api Key**: `any string`
- **Api Base Url**: *The URL of your Openai Compatible API*
3. Create a new [Chat Model](http://localhost:42110/server/admin/database/chatmodel/add) on your Khoj admin panel.
- **Name**: `llama3` (replace with the name of your local model)
- **Model Type**: `Openai`
- **Ai Model Api**: *The AI Model API you created in step 2*
- **Max prompt size**: `2000` (replace with the max prompt size of your model)
- **Tokenizer**: *Do not set for OpenAI, mistral, llama3 based models*
4. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
> Upload your knowledge base to Khoj and chat with your whole corpus
## Companion App
Share your files, folders with Khoj using the app.
Khoj will keep these files in sync to provide contextual responses when you search or chat.
## Setup
:::info[Self Hosting]
If you are self-hosting the Khoj server, update the *Settings* page on the Khoj Desktop app to:
- Set the `Khoj URL` field to your Khoj server URL. By default, use `http://127.0.0.1:42110`.
- Do not set the `Khoj API Key` field if your Khoj server runs in anonymous mode. For example, `khoj --anonymous-mode`
:::
1. Install the [Khoj Desktop app](https://khoj.dev/downloads) for your OS
2. Generate an API key on the [Khoj Web App](https://app.khoj.dev/settings#clients)
3. Set your Khoj API Key on the *Settings* page of the Khoj Desktop app
4. [Optional] Add any files, folders you'd like Khoj to be aware of on the *Settings* page and Click *Save*.
These files and folders will be automatically kept in sync for you
# Main App
You can also install the Khoj application on your desktop as a progressive web app.
1. Open the [Khoj Web App](https://app.khoj.dev) in Chrome.
2. Click on the install button in the address bar to install the app. You must be logged into your Chrome browser for this to work.

Alternatively, you can also install using this route:
1. Open the three-dot menu in the top right corner of the browser.
2. Go to 'Cast, Save, and Share' option.
3. Click on the "Open in Khoj" option.

Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.