Compare commits

...

1253 Commits
0.3.0 ... 1.4.0

Author SHA1 Message Date
Debanjum Singh Solanky
17107a0337 Release Khoj version 1.4.0 2024-01-23 10:18:31 +05:30
Debanjum Singh Solanky
f69eafe95a Update Readme with updated capabilties 2024-01-23 09:56:01 +05:30
sabaimran
679db51453 Add support for phone number authentication with Khoj (part 2) (#621)
* Allow users to configure phone numbers with the Khoj server

* Integration of API endpoint for updating phone number

* Add phone number association and OTP via Twilio for users connecting to WhatsApp

- When verified, store the result as such in the KhojUser object

* Add a Whatsapp.svg for configuring phone number

* Change setup hint depending on whether the user has a number already connected or not

* Add an integrity check for the intl tel js dependency

* Customize the UI based on whether the user has verified their phone number

- Update API routes to make nomenclature for phone addition and verification more straightforward (just /config/phone, etc).
- If user has not verified, prompt them for another verification code (if verification is enabled) in the configuration page

* Use the verified filter only if the user is linked to an account with an email

* Add some basic documentation for using the WhatsApp client with Khoj

* Point help text to the docs, rather than landing page info

* Update messages on various callbacks and add link to docs page to learn more about the integration
2024-01-22 18:14:58 -08:00
sabaimran
58bf917775 Update the font used across Khoj desktop and web to be Tajawal (#622) 2024-01-20 23:13:33 +05:30
Debanjum
679f0f24a4 Improve Chat Input Pane Actions. Move to 1 Click Audio Chat on Mobile (#624)
## Major
### Move to single click audio chat UX on Obsidian, Desktop, Web clients
  New default UX has 1 long-press on mobile, 2-click on desktop to send transcribed audio message
  - New Audio Chat Flow
    1. Record audio while microphone button pressed
    2. Show auto-send 3s countdown timer UI for audio chat message
        Provide a visual cue around send button for how long before audio
        message is automatically sent to Khoj for response
    3. Auto-send msg in 3s unless stop send message button clicked
  - Why
    - Removes the previous default of 3 clicks required to send audio message
       The record > stop > send process to send audio messages was unclear and effortful
    - Still allows stopping message from being sent, to make correction to transcribed audio
    - Removes inadvertent long audio transcriptions if forget to press stop while recording

### Improve chat input pane actions & icons on Obsidian. Desktop, Web clients
- Use SVG icons in chat footer on web, desktop app
- Move delete icon to left of chat input. This makes it harder to inadvertently click it
- Add send button to chat input pane
- Color chat message send button to make it primary CTA
- Make chat footer shorter. Use no or round border on action buttons

## Minor
- Stop rendering empty starter questions element when no questions present
- Add round border, hover color to starter questions in web, desktop apps
- Fix auto resizing chat input box when transcribed text added
- Convert chat input into a text area in the Obsidian client
2024-01-20 21:52:56 +05:30
Debanjum Singh Solanky
ec3b837d00 Send audio message in 2-clicks on desktop to avoid holding down mic button 2024-01-20 21:40:38 +05:30
Debanjum Singh Solanky
f0daa45ae0 Move to single click audio chat UX on Obsidian client
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 16:07:12 +05:30
Debanjum Singh Solanky
29a581d2b0 Move to single click audio chat UX on desktop app
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 16:03:51 +05:30
Debanjum Singh Solanky
699e9ff878 Move to single click audio chat UX on web app
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 15:56:46 +05:30
Debanjum Singh Solanky
26bd3533d8 Stop rendering empty starter questions element when no questions present 2024-01-20 11:39:58 +05:30
Debanjum Singh Solanky
7c8c475c3a Add round border, hover color to starter questions in web, desktop apps 2024-01-20 00:51:11 +05:30
Debanjum Singh Solanky
8a488b9e39 Fix auto resizing chat input box when transcribed text added 2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
07ca137bdf Convert chat input into a text area in the Obsidian client
This allows for better readability of multi-line messages by users.
The chat input is a text area in the other clients as well.
2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
d4552117f6 Add and improve chat input pane, actions, icons on Obsidian client
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
c0ad64d9a3 Add and improve chat input pane, actions, icons on desktop client
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-20 00:29:49 +05:30
Debanjum Singh Solanky
ea85ebdacb Add and improve chat input pane, actions, icons on web client
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-19 20:40:42 +05:30
sabaimran
039ed78253 Add support for a first-party client app to call into Khoj (Part 1) (#601)
* Add support for a first party client app
- Based on a client id and client secret, allow a first party app to call into the Khoj backend with a phone number identifier
- Add migration to add phone numbers to the KhojUser object
* Add plus in front of country code when registering a phone number.
- Decrease free tier limit to 5 (from 10)
- Return a response object when handling stripe webhooks
* Fix telemetry method which references authenticated user's client app
* Add better error handling for null phone numbers, simplify logic of authenticating user
* Pull the client_secret in the API call from the authorization header
* Add a migration merge to resolve phone number and other changes
2024-01-18 19:24:14 +05:30
Debanjum Singh Solanky
9dfe1bb003 Fix updating subscription when invoice paid. Revert renewal_date logic
The actual issue was that `get_or_create_user_by_email' tried to
create a subscription even if it already existed.

With updated logic:
- New subscription is only created when it doesn't already exist in
  `get_or_create_user_by_email'
- `set_user_subscription' just updates the subscription state as
  user subscription object creation is already managed by
  `get_or_create_user_by_email'. So the other conditionals are
  unnecessary
2024-01-18 16:20:18 +05:30
Debanjum Singh Solanky
9b1a66c969 Fix updating subscription renewal date when invoice paid 2024-01-18 14:46:10 +05:30
sabaimran
93d5cb128c Initialize embeddings to empty list before processing 2024-01-18 13:27:04 +05:30
Debanjum Singh Solanky
24af888c41 Release Khoj version 1.3.0 2024-01-18 11:42:13 +05:30
Debanjum Singh Solanky
2f1bb5c2c8 Upload Desktop App Artifacts to Github Release 2024-01-18 11:40:04 +05:30
sabaimran
e71ebb8068 Standardize issue templates and make them easier to use 2024-01-18 10:54:05 +05:30
sabaimran
efb4bd6780 Add a template for feature requests 2024-01-18 10:38:53 +05:30
sabaimran
6165ae56c2 Update bug report issue template
- collect info about OS, device, server, client, and prompt to include any relevant data
2024-01-18 10:35:02 +05:30
Debanjum
8b4dd16255 Fix markdownRenderer arg to allow chat responses in Obsidian plugin (#619)
- Issue
Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string

This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview plugin error

- Fix
Pass a string as sourcePath to markdownRenderer to fix failing chat response
and stop throwing dataview errors on search

Resolves #614, Resolves #606
2024-01-18 10:18:31 +05:30
Debanjum
c8dbe8ee7b Improve server status check and message in Obsidian client (#617)
- Update health API to pass authenticated users their info
- Improve Khoj server status check in Khoj Obsidian client
- Show Khoj Obsidian commands even if no connection to server
- Show Khoj chat by default in Obsidian side pane instead of search
2024-01-18 10:17:35 +05:30
Debanjum Singh Solanky
f9420e1209 Show Khoj Obsidian commands even if no connection to server
Server connection check can be a little flaky in Obsidian. Don't gate
the commands behind it to improve usability of Khoj.

Previously the commands would get disabled when server connection
check failed, even though server was actually accessible
2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
36bf42a860 Show Khoj chat by default in Obsidian side pane instead of search 2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
aab75a6ead Improve Khoj server status check in Khoj Obsidian client
- Update server connection status on every edit of khoj url, api key in
  settings instead of only on plugin load

  The error message was stale if connection fixed after changes in
  Khoj plugin settings to URL or API key, like on plugin install

- Show better welcome message on first plugin install.
  Include API key setup instruction

- Show logged in user email on Khoj settings page
2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
1a46734485 Fix markdownRenderer arg to allow chat responses in Obsidian plugin
- Issue: Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string

This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview error

- Fix: Pass a string as sourcePath to markdownRenderer to fix
failing chat response

Resolves #614, Resolves #606
2024-01-18 10:02:50 +05:30
sabaimran
e9e49ea098 Allow custom inference endpoint for the crossencoder model (#616)
* Add support for custom inference endpoints for the cross encoder model
- Since there's not a good out of the box solution, I've deployed a custom model/handler via huggingface to support this use case.
* Use langchain.community for pdf, openai chat modules
* Add an explicit stipulation that the api endpoint for crossencoder inference should be for huggingface for now
2024-01-18 10:02:12 +05:30
Debanjum Singh Solanky
08012c71b1 Update Dockerfile with swig system package required by PyMuPDF 2024-01-17 19:24:27 +05:30
Debanjum Singh Solanky
870af19ba4 Update health API to pass authenticated users their info
This allows Khoj clients to get email address associated with
user's API token for display in client UX

In anonymous mode, default user information is passed
2024-01-17 13:38:57 +05:30
Debanjum
4d30f7d1d9 Short-circuit API rate limiter for unauthenticated users (#607)
### Major
- Short-circuit API rate limiter for unauthenticated user
  Calls by unauthenticated users were failing at API rate limiter as it
  failed to access user info object. This is a bug.
  
  API rate limiter should short-circuit for unauthenicated users so a
  proper Forbidden response can be returned by API
  
  Add regression test to verify that unauthenticated users get 403
  response when calling the /chat API endpoint
  
### Minor
- Remove trailing slash to normalize khoj url in obsidian plugin settings
- Move used /api/config API controllers into separate module
- Delete unused /api/beta API endpoint
- Fix error message rendering in khoj.el, khoj obsidian chat
- Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn
2024-01-17 00:59:52 +05:30
Debanjum Singh Solanky
d26a4ffcea Only run the OpenAI chat client, /online test when API keys are set 2024-01-17 00:36:03 +05:30
Debanjum Singh Solanky
2752e0d607 Update jinja2 and axios min supported package versions 2024-01-16 18:45:38 +05:30
Debanjum Singh Solanky
7039c202c8 Merge branch 'master' into short-circuit-api-rate-limiter 2024-01-16 18:18:34 +05:30
Debanjum Singh Solanky
8917228dbb Remove unused, deprecated /api/config/data API endpoints
- Use /api/health for server up check instead of api/config/default
- Remove unused `khoj--post-new-config' method
- Remove the now unused /config/data GET, POST API endpoints
2024-01-16 18:15:06 +05:30
Debanjum
51c59d0059 Remove the 1000 files limit when syncing from Desktop, Obsidian clients (#605)
### Major
- Push 1000 files at a time from the Desktop client for indexing
- Push 1000 files at a time from the Obsidian client for indexing
- Test 1000 file upload limit to index/update API endpoint

### Minor
- Show relevant error message in desktop app, e.g when can't connect to server
- Pass indexed filenames in API response for client validation
- Collect files to index in single dict to simplify index/update controller

Resolves #573
2024-01-16 17:59:26 +05:30
Debanjum Singh Solanky
6ded4c1d75 Merge branch 'master' into fix-1000-file-index-update-limit 2024-01-16 16:50:58 +05:30
sabaimran
c24389cff5 Add Algolia to documentation website for better search 2024-01-16 15:53:53 +05:30
Debanjum
45f892dfdd Fix Offline Chat without GPU and Decoding Chat Query before Processing
- Only run /online command offline chat director test when `SERPER DEV_API_KEY' present
- Decode URL encoded query string in chat API endpoint before processing
- Make references and online_results optional params to converse_offline
- Pass max context length to fix using updated `GPT4All.list_gpu' method
2024-01-16 14:53:34 +05:30
Debanjum Singh Solanky
e0b381d523 Only run /online command offline chat director test when SERPER KEY present 2024-01-16 13:09:38 +05:30
Debanjum Singh Solanky
16175137e5 Decode URL encoded query string in chat API endpoint before processing 2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky
9fe1c8ae13 Make references and online_results optional params to converse_offline
Fixes all the failing GPT4All tests because they were missing the
online_results argument
2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky
d74f8e03d3 Pass max context length to fix using updated GPT4All.list_gpu method
It's signature was updated in GPT4All 2.1.0 pypi release.

Resolves #610
2024-01-16 12:23:45 +05:30
Debanjum Singh Solanky
1ae6669fbf Correctly handle API response when no files to index 2024-01-16 11:57:40 +05:30
sabaimran
50575b749b Add option to use HuggingFace's inference endpoint for generating embeddings (#609)
* Support using hosted Huggingface inference endpoint for embeddings generation
* Since the huggingface inference endpoint is model-specific, make the URL an optional property of the search model config
* Handle ECONNREFUSED error in desktop app
* Drive API key via the search model config model and use more generic names
2024-01-16 08:58:24 +05:30
Debanjum Singh Solanky
ba37b28fb5 Improve batched error handling. Catch can't connect to server error
Break out of batch processing when unable to connect to server or
when requests throttled by server
2024-01-14 01:04:44 +05:30
Debanjum Singh Solanky
7dfbcd2e5a Handle subscribe renew date, langchain, pydantic & logger.warn warnings
- Ensure langchain less than 0.2.0 is used, to prevent breaking
  ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0
- Set subscription renewal date to a timezone aware datetime
- Use logger.warning instead of logger.warn as latter is deprecated
- Use `model_dump' not deprecated dict to get all configured content_types
2024-01-12 01:46:52 +05:30
Debanjum Singh Solanky
5f97357fe0 Delete unused /api/beta API endpoint 2024-01-12 01:11:05 +05:30
Debanjum Singh Solanky
bb1c1b39d8 Move /api/config API controllers into separate module for code modularity 2024-01-12 01:11:04 +05:30
Debanjum Singh Solanky
ba99089a12 Short-circuit API rate limiter for unauthenticated user
Calls by unauthenticated users were failing at API rate limiter as it
failed to access user info object. This is a bug.

API rate limiter should short-circuit for unauthenicated users so a
proper Forbidden response can be returned by API

Add regression test to verify that unauthenticated users get 403
response when calling the /chat API endpoint
2024-01-12 00:23:50 +05:30
Debanjum Singh Solanky
b1269fdad2 Remove trailing slash to normalize khoj url in obsidian plugin settings 2024-01-11 21:56:36 +05:30
Debanjum Singh Solanky
ffdb291fe0 Fix error message rendering in khoj.el, khoj obsidian chat
- Fix failed to index error message in khoj.el
- Fix chat model not configured message in khoj obsidian chat
2024-01-11 21:55:54 +05:30
Debanjum Singh Solanky
af9ceb00a0 Show relevant error msg in desktop app, e.g when can't connect to server 2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
43423432ce Pass indexed filenames in API response for client validation 2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
5f9ac5a630 Collect files to index in single dict to simplify index/update controller
Simplifies code while maintaining typing
2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
efe41aaaca Push 1000 files at a time from the Desktop client for indexing
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
2024-01-09 23:09:34 +05:30
sabaimran
02187b19bb Customize font styling for documentation 2024-01-08 08:50:42 +05:30
sabaimran
8389108653 Fix reference issue for demos in the main README 2024-01-08 08:29:51 +05:30
Debanjum
dbc59b2952 Fix, Improve Khoj Documentation Layout (#604)
- 26f96e00 Use Khoj Client, Data sources diagrams in feature docs
- c82d34b6 Add Docs footer, nav pane links. Fix tagline, Remove announcement topbar
- d920e4d0 Make the docs overview page as the main docs landing page
- 80d1ad5b Fix image urls on docs overview page. Remove logo header in client docs
2024-01-08 02:00:02 +05:30
Debanjum Singh Solanky
efc7b08cd9 Use Khoj Client, Data sources diagrams in feature docs 2024-01-08 01:58:57 +05:30
Debanjum Singh Solanky
c82d34b659 Add Docs footer, nav pane links. Fix tagline, Remove announcement topbar 2024-01-08 01:17:47 +05:30
Debanjum Singh Solanky
d920e4d0a7 Make the docs overview page as the main docs landing page
- Make the docs overview page available at docs.khoj.dev root instead of
under docs.khoj.dev/docs path
  - Remove the new landing page, it is unnecessary.
- Remove /docs path prefix from links to internal doc pages
- Remove .md path suffix in internal doc pages for consistency
2024-01-08 01:13:42 +05:30
Debanjum Singh Solanky
80d1ad5b6f Fix image urls on docs overview page. Remove logo header in client docs 2024-01-08 00:30:31 +05:30
sabaimran
ce53bc52c5 Modify permissions of the GITHUB_TOKEN for publishing to gh-pages 2024-01-07 20:53:57 +05:30
sabaimran
740453fa18 Use documentation folder for building project and uploading data 2024-01-07 20:50:15 +05:30
sabaimran
2be7c84203 Enter documentation repository before running yarn build 2024-01-07 20:46:21 +05:30
sabaimran
ad95e88838 Update node version in github action 2024-01-07 20:41:24 +05:30
sabaimran
bd9aa578f4 Add a yarn.lock file and use for node.js setup 2024-01-07 20:36:02 +05:30
sabaimran
9b991eb4fe Migrate to using docusaurus, rather than docsify for documentation (#603)
* Add docusaurus documentation (to replace the docsify setup
* Remove older docs
* Specify documentation as the gh pages build action working directory
2024-01-07 20:28:15 +05:30
Debanjum Singh Solanky
98081bc0d3 Update Uninstall Documentation for Khoj Server when Self Hosting 2024-01-06 01:37:29 +05:30
Debanjum Singh Solanky
5d52dc5b35 Fix spelling in the development documentation for Khoj 2024-01-04 19:24:58 +05:30
Debanjum Singh Solanky
b6d5392c0c Release Khoj version 1.2.1 2024-01-04 18:45:37 +05:30
Debanjum Singh Solanky
fca7a5ff32 Push 1000 files at a time from the Obsidian client for indexing
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
2024-01-04 18:43:22 +05:30
Debanjum Singh Solanky
4ded32cc64 Test 1000 file upload limit to index/update API endpoint
Due to FastAPI limitation
2024-01-03 22:14:36 +05:30
Debanjum Singh Solanky
4a234c8db3 Use default offline/openai chat model to extract DB search queries
Make usage of the first offline/openai chat model as the default LLM
to use for background tasks more explicit

The idea is to use the default/first chat model for all background
activities, like user message to extract search queries to perform.
This is controlled by the server admin.

The chat model set by the user is used for user-facing functions like
generating chat responses
2024-01-03 14:04:49 +05:30
Debanjum Singh Solanky
e28adf2884 Also index pdf, markdown and plaintext files using khoj emacs client
Previously you could only index org-mode files and directories from
khoj.el

Mark the `khoj-org-directories', `khoj-org-files' variables for
deprecation, since `khoj-index-directories', `khoj-index-files'
replace them as more appropriate names for the more general case

Resolves #597
2024-01-03 11:46:17 +05:30
Debanjum Singh Solanky
5abaed9d08 Use user chosen OpenAI model to extract DB search questions from query
Previously Khoj was selecting the first OpenAI model configured on
server and not the OpenAI model configured by the user for themselves
2024-01-03 11:45:06 +05:30
Debanjum Singh Solanky
e582639efa Move contributing section back down in sidebar of documentation website 2024-01-03 11:40:14 +05:30
Debanjum Singh Solanky
05536aab6b Merge how users can share personal information in personality prompt 2024-01-03 11:40:14 +05:30
Liam Swayne
455f78b178 Replace var declarations with let declarations (#576)
* Replace var declaration with let declaration
2023-12-29 10:20:48 +05:30
sabaimran
79913d4c17 Add isort to the pre-commit configuration and apply it to the whole project (#595)
* Apply isort to the entire repository
* Fix missing import issues in text_to_entries
* Fix imports in migration files
2023-12-28 18:04:02 +05:30
sabaimran
738f050086 Merge pull request #587 from khoj-ai/features/search-model-options-custom
Support multiple search models, with ability for custom user config
2023-12-28 13:09:49 +05:30
sabaimran
442c913de3 Update telemetry state for search model only if one is found, fix alt text for language setting 2023-12-28 12:53:53 +05:30
sabaimran
d3ab3f1b70 Rename matrix_blog to web and move the language setting into the content section 2023-12-28 12:44:49 +05:30
sabaimran
6946e038c2 Merge pull request #596 from khoj-ai/chore/add-developer-documentation
Improve the developer documentation
2023-12-23 18:43:43 +05:30
sabaimran
00af6baeb6 Resolve merge conflicts with intro message in chat.html web view 2023-12-23 17:52:58 +05:30
sabaimran
c10602b6c5 Put contributing higher in the sidebar 2023-12-23 14:04:53 +05:30
sabaimran
fe415e1508 Add tip for using the good-first-issue tag in GH issues 2023-12-23 14:04:05 +05:30
sabaimran
3280715ca0 Update contributor guidelines
- Add more accurate steps for building Khoj locally
- Remove outdated instructions
- Add specific steps to create a Github Issue
- Add instructions for Obsidian plugin development
2023-12-23 14:00:52 +05:30
sabaimran
afec4394f9 Merge pull request #592 from ayushjha119/Fixed-Health-Check-to-Khoj-api
Fixed health check to khoj api
2023-12-23 13:04:50 +05:30
sabaimran
c50eb8a691 Fix mypy/pre-commit issues 2023-12-23 11:44:37 +05:30
Debanjum Singh Solanky
21c55b4c0d Release Khoj version 1.2.0 2023-12-22 21:43:47 +05:30
Debanjum Singh Solanky
e42111a8af Fix bump_version.sh to commit, clean-up after desktop app version bump 2023-12-22 21:42:03 +05:30
Debanjum Singh Solanky
6a8c1fe423 Sanitize rendering chat references in Web, Desktop and Obsidian clients
Use textContent instead of innerHTML to append references

Resolves #583
2023-12-22 18:11:49 +05:30
Debanjum
6879daccc6 Fix Chat Streaming on Obsidian, Docker Image Version and First-Run, Chat Error Messages in Clients (#589)
- Fix streaming chat response in Obsidian client
- Fix first-run, chat error message in obsidian, desktop and web clients
- Set Khoj app version to latest version in Docker images
- Tag Khoj Docker image built on release with the `latest` tag
   This align docker image release cadence with client, server releases
2023-12-22 04:13:01 -08:00
Debanjum Singh Solanky
074123b9b9 Merge cloud, local dockerize workflows
- Delete unused config directory
2023-12-22 17:11:52 +05:30
Debanjum Singh Solanky
d101297995 Use markdown formatted chat message in chat modal 2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky
350fd89c8d Clear chat history html in Obsidian if getChatHistory works too 2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky
8d1e988059 Update tagging of the docker image on release, push to master & PR
- Tag docker image with `tag_name' on release (i.e tag push)
- Else tag with 'pre' on push to master
- Else tag with 'dev' on push to PR branch

- Only tag the latest release with release tag
  Previously the latest commit on master was being tagged with the
  latest tag. This doesn't sync with the release cadence of the rest
  of Khoj
2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky
b5ae64cb3c Dynamically set Khoj app version in the Dockerization Github workflows 2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky
d3d47dce0b Allow setting Khoj app version during docker build via build-args
This will allow troubleshooting by getting the actual khoj version
being used. Previously it was always set to a static 0.0.0 version

Command to build Khoj docker image with dynamically set current app version:
`docker-compose build server --build-arg VERSION=$(pipx run hatch version)'
2023-12-22 16:47:13 +05:30
ayushjha119
e487ec5370 fixed app to api health Check 2023-12-21 17:51:30 +05:30
Debanjum Singh Solanky
70607cbbbb Update FRE message to get any Khoj client to sync files with server 2023-12-21 15:23:47 +05:30
ayushjha119
b3d7d6a79d used the Response class from fastapi.responses and set the input for status_code to 200 2023-12-21 14:26:40 +05:30
sabaimran
e1aaff2053 Add more details about functionality in Khoj's intro message 2023-12-21 10:09:30 +05:30
sabaimran
a1211f40d7 Fix type declaration for the cross_encoder_model state variable. Update name of the new update API 2023-12-21 09:15:13 +05:30
sabaimran
089e4bee12 FIx unit tests with new search model configurations 2023-12-20 21:50:44 +05:30
Debanjum Singh Solanky
447c1b90e7 Fix streaming chat response in Obsidian client
- Convert renderIncrementalMessage to an async method as
  MarkdownRenderer is an async method

- Simplify code, remove unneeded JSON check
2023-12-20 14:51:19 +05:30
sabaimran
aa23da60a3 Add a notification banner to show temporary messages 2023-12-20 14:22:08 +05:30
Debanjum Singh Solanky
e04fe921eb Fix first-run, chat error message in obsidian, desktop and web clients
- Disable chat input field if getChatHistory had error as Khoj may not
  be setup correctly to chat
2023-12-20 14:03:07 +05:30
sabaimran
5ff9df9d4c Add support per user for configuring the preferred search model from the config page
- Honor this setting across the relevant places where embeddings are used
- Convert the VectorField object to have None for dimensions in order to make the search model easily configurable
2023-12-20 13:25:43 +05:30
sabaimran
0f6e4ff683 Add a model that specifies the user's search model configuration
- Update all endpoints that generate embeddings to use the new model. Incl. generating text embeddings, creating embeddings for a search query
2023-12-20 09:22:26 +05:30
sabaimran
6dd2b05bf5 Rebase with master 2023-12-19 21:02:49 +05:30
sabaimran
e3557cd8b7 Update the personality prompt to make Khoj aware that users can share data via the desktop app 2023-12-19 16:42:45 +05:30
sabaimran
927e477f68 Ignore typing error in custom action short description 2023-12-19 16:10:58 +05:30
sabaimran
946305d977 Add function to export conversations for debugging 2023-12-19 16:05:20 +05:30
sabaimran
903a01745f Use 0px for padding for input row buttons in web 2023-12-18 16:09:06 +05:30
sabaimran
1e14a24f06 Merge pull request #586 from khoj-ai/features/misc-image-and-online-improvements
Improvements to chat functionality and image generation
2023-12-17 23:28:08 +05:30
sabaimran
5b092d59f4 Ignore dict assignment typing error 2023-12-17 22:34:54 +05:30
sabaimran
03cb86ee46 Update typing and object assignment for new text to image method return 2023-12-17 21:28:33 +05:30
sabaimran
0288804f2e Render the inferred query along with the image that Khoj returns 2023-12-17 21:02:55 +05:30
sabaimran
49af2148fe Miscellaneous improvements to image generation
- Improve the prompt before sending it for image generation
- Update the help message to include online, image functionality
- Improve styling for the voice, trash buttons
2023-12-17 20:25:35 +05:30
sabaimran
7cb64cb2f9 Add telemetry for image generation conversation command 2023-12-17 18:25:03 +05:30
sabaimran
e9ea0195b0 Merge pull request #585 from khoj-ai/fix/image-generation-and-csrf-cookie
Fix image generation setup bug and CSRF cookie for admin login
2023-12-17 16:55:45 +05:30
sabaimran
09544dee09 Add TextToImageModelConfig to the admin page 2023-12-17 16:44:19 +05:30
sabaimran
0459666beb CSRF Cookie not set error in prod. Try fixing https forwarding for mitigation 2023-12-17 12:55:18 +05:30
sabaimran
61dde8ed89 If text to image config isn't set, send back an error message to the client 2023-12-17 12:54:50 +05:30
sabaimran
fefaa2271d Merge pull request #584 from khoj-ai/features/enforce-usage-limits-conversation-type
Add a ConversationCommand rate limiter for the chat endpoint
2023-12-17 11:20:35 +05:30
sabaimran
3065cea562 Address mypy typing issues 2023-12-16 09:24:26 +05:30
sabaimran
5f6dcf9f2e Add a rate limiter for the transcribe API endpoint 2023-12-16 09:18:56 +05:30
sabaimran
73a107690d Add a ConversationCommand rate limiter for the chat endpoint 2023-12-16 09:03:52 +05:30
sabaimran
9b961ed496 Merge pull request #580 from khoj-ai/fix-upgrade-chat-to-create-images
Support Image Generation with Khoj
2023-12-07 21:17:58 +05:30
Debanjum Singh Solanky
7504669f2b Fix rendering image on chat response in obsidian client 2023-12-05 03:48:07 -05:00
Debanjum Singh Solanky
408b7413e9 Use global openai client for transcribe, image 2023-12-05 03:36:33 -05:00
Debanjum Singh Solanky
162b219f2b Throw unsupported error when server not configured for image, speech-to-text 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
8f2f053968 Fix rendering image on chat response in web, desktop client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
d124266923 Reduce promise based nesting in chat JS func used in desktop, web client
Use async/await to reduce .then() based nesting to improve code
readability
2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
6e3f66c0f1 Use base64 encoded image instead of source URL for persistence
The source URL returned by OpenAI would expire soon. This would make
the chat sessions contain non-accessible images/messages if using
OpenaI image URL

Get base64 encoded image from OpenAI and store directly in
conversation logs. This resolves the image link expiring issue
2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
52c5f4170a Show generated images in the chat modal of the Khoj Obsidian plugin 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
8016a57b5e Show generated images in chat interface on Desktop client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
cc051ceb4b Show generated images in chat interface on Web client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
252b35b2f0 Support /image slash command to generate images using the chat API 2023-12-05 01:51:14 -05:00
sabaimran
ef21d78c99 Initial changes to support multiple search model configurations
- All search models are loaded into memory, and stored in a dictionary indexed by name
- Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option
2023-12-05 00:35:40 -05:00
Debanjum Singh Solanky
1d9c1333f2 Configure text to image models available on server
- Currently supports OpenAI text to image model, by default dall-e-3
- Allow setting the text to image model via CLI during server setup
2023-12-04 21:27:53 -05:00
Debanjum Singh Solanky
f0222f6d08 Make save_to_conversation_log helper function reusable
- Move it out to conversation.utils from generate_chat_response function
- Log new optional intent_type argument to capture type of response
  expected. This can be type responses by Khoj e.g speech, image. It
  can be used to render responses by Khoj appropriately on clients
- Make user_message_time argument optional, set the time to now by
  default if not passed by calling function
2023-12-04 19:42:12 -05:00
sabaimran
d2ddbef08f Use a unique name for the temp PDF generated 2023-12-04 19:27:00 -05:00
sabaimran
d20746613a Properly filter out empty PDFs for indexing 2023-12-04 16:15:17 -05:00
Debanjum Singh Solanky
316b7d471a Handle offline chat model retrieval when no internet
Offline chat shouldn't fail on retrieve_model when no internet,
if model was previously downloaded and usable offline
2023-12-04 13:46:25 -05:00
Debanjum Singh Solanky
2b09caa237 Make online results an optional argument to the gpt converse method 2023-12-04 12:15:29 -05:00
Debanjum Singh Solanky
7009793170 Migrate to OpenAI Python library >= 1.0 2023-12-03 18:16:00 -05:00
sabaimran
62a89f79b7 Merge pull request #577 from khoj-ai/fix/user-subscription-email-not-exists
Fix null exception when user does not exist for subscription
2023-12-03 15:14:31 -08:00
sabaimran
cc064ea57d Fix circular import issue 2023-12-03 17:46:44 -05:00
sabaimran
21f8d63e89 If a user subscribes to Khoj with an email address that's not present in the DB, create an account 2023-12-03 17:28:40 -05:00
sabaimran
c5d297a9ed Recursively search through folders for indexing 2023-12-03 16:17:28 -05:00
Debanjum Singh Solanky
a57d529f39 Fix path to system tray icon of Khoj desktop app 2023-12-03 00:12:50 -08:00
Debanjum Singh Solanky
106cdbe455 Release Khoj version 1.1.0 2023-11-30 20:09:08 -08:00
Debanjum Singh Solanky
10ce4ee11c Ignore null params type check for markdown renderer in Obsidian client 2023-11-30 20:09:08 -08:00
Debanjum
02f40785aa Merge Github workflows to dockerize for production (#575) 2023-11-30 18:49:16 -08:00
sabaimran
a5ffa2342f Add documentation for local setup and fix admin panel bugs
- Wasn't able to login to the admin panel when KHOJ_DEBUG was not True. Fix this error so self-hosted users can get unblocked from accessing the admin settings
- Don't force users to set their KHOJ_DJANGO_SECRET_KEY
2023-11-30 17:55:27 -08:00
Debanjum Singh Solanky
9d4bfdf47c Merge Github workflows to dockerize for production 2023-11-30 17:18:13 -08:00
Debanjum Singh Solanky
d587632700 Clear result before render thinking placeholder emoji in Obsidian chat 2023-11-30 13:53:09 -08:00
Debanjum
a0686428ff Render Chat Responses as Markdown in Desktop, Obsidian Client (#571)
- Show temporary status message when copied to clipboard
- Render chat responses as markdown in Desktop client
- Render chat responses as markdown in chat modal of Obsidian client
- Render references of new responses in chat modal on Obsidian client. Use new style for references
- Properly stop `mediaRecorder` stream to clear microphone in-use state
- Render newlines when references expanded in Web, Desktop and Obsidian clients
2023-11-30 13:52:02 -08:00
Debanjum Singh Solanky
48719ee0dd Render newline separation in chat references to improve readability 2023-11-30 13:16:48 -08:00
Debanjum Singh Solanky
1a31a2efcf Render Khoj chat streaming response as md & show refs in Obsidian
- Use new style references for Khoj chat modal in Obsidian
- Khoj Chat responses in Obsidian had regressed to not show references
  for new questions after modal has been opened. Now even those are
  rendered, and use new references style
- Render chat response as markdown while it's being streamed
2023-11-30 13:02:00 -08:00
Debanjum Singh Solanky
0430fa67b6 Show temporary status message when copied to clipboard 2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky
491a1a949a Render chat responses as markdown in Desktop client too 2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky
20ef5bfc93 Properly stop mediaRecorder stream to clear microphone in-use state 2023-11-29 13:48:35 -08:00
Debanjum Singh Solanky
8faa63c3c6 Convert config page buttons to use stronger yellow 2023-11-28 19:55:43 -08:00
Debanjum Singh Solanky
de5aa5c32e Update pillow, aiohttp dependencies 2023-11-28 19:55:43 -08:00
sabaimran
fab57cc395 Fix pgvector installation instructions for Windows, Source 2023-11-28 14:46:09 -08:00
sabaimran
c4dcb51c91 Update headings for installation steps to indicate that local and docker setup are exclusive 2023-11-28 14:38:04 -08:00
Debanjum Singh Solanky
a6ca2076d5 Open link to Khoj app landing page from nav pane in current tab 2023-11-28 14:20:37 -08:00
Debanjum Singh Solanky
643e018947 Handle if user subscription field doesn't exists in telemetry func
Avoid null ref in the method when running Khoj server in anon mode
2023-11-28 14:15:14 -08:00
Debanjum Singh Solanky
110d7646fc Use milder yellow as primary Khoj theme color for chat, buttons etc. 2023-11-28 14:15:14 -08:00
sabaimran
18254850ab Set a default value for the khoj django secret key and add additional guidance for setting environment variables on first run 2023-11-28 09:39:44 -08:00
sabaimran
24b5aaef0a Merge pull request #569 from khoj-ai/features/enforce-subscription-status
Enforce subscription state on the chat API access
2023-11-27 16:12:26 -08:00
sabaimran
6290b463f5 Compute size of the indexed data only if explicitly requested to avoid heavy load on the DB 2023-11-27 12:05:00 -08:00
sabaimran
eb5e3096e0 Change subscribed scope to premium 2023-11-27 11:39:20 -08:00
sabaimran
6e1ba11e59 Resolve merge conflicts for rendering chat response 2023-11-27 11:33:13 -08:00
sabaimran
239b31bc85 Clarify some of the langauge in the chat configuration docs 2023-11-27 10:44:05 -08:00
sabaimran
309ba7234c Add instructions for setting up chat settings when locally hosting Khoj 2023-11-27 10:41:29 -08:00
sabaimran
5d8dbbdba4 Update instructions for Windows setup and add prerequisites for Docker 2023-11-27 10:32:02 -08:00
Debanjum Singh Solanky
71f2d54258 Render chat response as markdown while streaming on Web, Desktop clients 2023-11-26 20:27:10 -08:00
Debanjum Singh Solanky
9e714d032b Fix Khoj telemetry server. Add server_version column 2023-11-26 15:05:43 -08:00
Debanjum
ebeae543ee Speak to Khoj via Desktop, Web or Obsidian Client (#566)
- Create speech to text API endpoint
- Use OpenAI Whisper for ASR offline (by downloading Whisper model) or online (via OpenAI API)
- Add speech to text model configuration to Database
- Speak to Khoj from the Web, Desktop or Obsidian client
2023-11-26 14:32:11 -08:00
Debanjum Singh Solanky
b249bbb5b5 Limit max audio file size allowed for transcription on API endpoint 2023-11-26 14:19:46 -08:00
sabaimran
e438853b09 Add additional unit tests to verify behavior of unsubscribed/subscribed users 2023-11-26 13:09:00 -08:00
sabaimran
c18d52d1af Add contributors to the README 2023-11-26 12:05:36 -08:00
Debanjum Singh Solanky
a79604b601 Fix return types of offline, online transcribe methods for python 3.9 2023-11-26 06:26:34 -08:00
Debanjum Singh Solanky
06f99ceb3c Rename /api/speak API endpoint to /api/transcribe 2023-11-26 06:18:44 -08:00
Debanjum Singh Solanky
56a1a61c77 Remove unused button element retrieval code from web, desktop 2023-11-26 06:17:56 -08:00
Debanjum Singh Solanky
877532a167 Speak to Khoj from the Obsidian client
- Add transcription button with mic icon
- Collect audio recording on pressing mic
- Process and send audio recording to server for transcription
- Extract the functionality to flash status in chat input for reuse
2023-11-26 06:17:54 -08:00
Debanjum Singh Solanky
cc9eae5d18 Update default chat model to Mistral in GPT4AllProcessor config 2023-11-26 05:55:43 -08:00
Debanjum Singh Solanky
4636390f7f Transcribe speech to text offline with Whisper
- Allow server admin to configure offline speech to text model during
  initialization
- Use offline speech to text model to transcribe audio from clients
- Set offline whisper as default speech to text model as no setup api key reqd
2023-11-26 05:55:11 -08:00
Debanjum Singh Solanky
a0a7ab7ec8 Rename conversation.gpt4all package to conversation.offline 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
499adf86a0 Move transcription using OpenAI API into independent package 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
897170ab15 Use single db migration script for transcribe model, related updates 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
28090216f6 Show transcription error status in chatInput placeholder on web, desktop
- Extract flashing status message in chat input placeholder into
  reusable function
- Use emoji prefixes for status messages
- Improve alt text of transcribe button to indicate what the button does
2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
fc040825b2 Default to Offline chat with Mistral as minimal setup, no API key reqd. 2023-11-26 01:07:20 -08:00
Debanjum Singh Solanky
5a6547677c Add type of operation variable in latest migration 2023-11-26 00:38:52 -08:00
Debanjum Singh Solanky
3e252036c3 Remove whitespace: pre-line from chat html, since markdown rendering 2023-11-26 00:27:29 -08:00
Debanjum Singh Solanky
b484795b8e Merge branch 'master' into add-speak-to-chat
- Conflicts:
  - src/interface/desktop/chat.html
    Combine and use common class names for speak component
  - src/khoj/database/adapters/__init__.py
    Combine imports
  - src/khoj/interface/web/chat.html
    Combine and use common class names for speak component
  - src/khoj/routers/api.py
    Combine imports
2023-11-26 00:26:21 -08:00
sabaimran
6233a957b4 Merge branch 'master' of github.com:khoj-ai/khoj into features/enforce-subscription-status 2023-11-25 22:46:10 -08:00
sabaimran
52b88de7f4 Indicate in the desktop if the user gets rate limited for indexing 2023-11-25 22:31:23 -08:00
Debanjum
e0a59cff68 Delete Conversation History from Web, Desktop, Obsidian Clients (#551)
Add delete button to clear conversation history from Web, Desktop and Obsidian Khoj clients

Resolves #523
2023-11-25 22:24:12 -08:00
Debanjum Singh Solanky
d0e294d8a5 Clear Conversation History from the Obsidian client
- Fix font color for Khoj chat responses in Obsidian. Previous color
  had too low a contrast to be readable
2023-11-25 22:16:13 -08:00
sabaimran
73e38fccf3 Explicitly set billing to off in the test for being able to index a large set of data 2023-11-25 20:48:32 -08:00
sabaimran
b2afbaa315 Add support for rate limiting the amount of data indexed
- Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed
- Delete any files that are being removed for adminstering the calculation
- Show current amount of data indexed in the config page
2023-11-25 20:28:04 -08:00
Debanjum Singh Solanky
07bf365c7c Clear any network connections to khoj server via khoj.el on reindex
- Ignore errors in deleting network requests to khoj server
- Also delete open network connection to khoj server on auto reindex
  Otherwise when server is unreachable a bunch of failed network
  connections accrue in the processes list
2023-11-25 20:19:41 -08:00
sabaimran
dd1badae81 Use userwithtoken.user when authenticating with an API key 2023-11-24 22:18:45 -08:00
sabaimran
48b9116195 Fix to use user rather than user_with_token in authenticated credentials 2023-11-24 22:18:00 -08:00
sabaimran
771f9bcfa1 If the user subscription was created over 7 days ago, then their trial is expired 2023-11-24 22:08:32 -08:00
sabaimran
e5b1350523 Enforce API use limits depending on whether the server has billing enabled
and whether the given user is subscribed
2023-11-24 21:55:16 -08:00
sabaimran
9c868ee10b Use the state.billing_enabled field to determine whether to use the subscribed scope 2023-11-24 20:41:19 -08:00
sabaimran
69c8f45830 Use scopes to represent whether the use has a valid subscription in the middleware 2023-11-24 20:29:36 -08:00
Debanjum
25f3f2367e Handle Server Unavailable Error from Khoj.el (#568)
- Make auto-update of content index user configurable from khoj.el
- Handle server unavailable error on auto-index schedule job in khoj.el

Resolves #567
2023-11-24 16:46:07 -08:00
Debanjum Singh Solanky
138f4e3f3c Make auto-update of content index user configurable from khoj.el 2023-11-24 16:40:50 -08:00
Debanjum Singh Solanky
0885fc6c23 Handle server unavailable error on auto-index schedule job in khoj.el 2023-11-24 16:39:44 -08:00
sabaimran
c13953311a Add reflective questions to admin pages 2023-11-23 14:01:05 -08:00
sabaimran
c42ec32a95 Merge pull request #552 from khoj-ai/features/internet-enabled-search
Support internet-enabled, online searching using Serper.dev
2023-11-23 12:34:05 -08:00
sabaimran
e3b32e412c Merge pull request #556 from khoj-ai/features/reflective-suggested-questions
Add support for suggesting base questions to users
2023-11-23 11:57:02 -08:00
sabaimran
5fac39afed Fix PYTHONPATH reference in order to maintain appropriate package imports 2023-11-22 20:35:11 -08:00
sabaimran
c641b8df58 Update desktop package version 2023-11-22 17:54:53 -08:00
sabaimran
a1b2289074 Release Khoj version 1.0.1 2023-11-22 17:52:07 -08:00
sabaimran
e34db979b6 Add instructions for using the self hosted URL in clients 2023-11-22 17:32:43 -08:00
sabaimran
b1b037f0ea Fix URL configuration issues with reorganized subfolders 2023-11-22 17:03:33 -08:00
sabaimran
e0949e232b Import random in adapters file for selecting reflective question 2023-11-22 07:52:51 -08:00
sabaimran
256e8de40a Merge with features/internet-enabled-search 2023-11-22 07:25:24 -08:00
Debanjum Singh Solanky
fd60db766e Clear Conversation History from the Web Client 2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky
d5a4830761 Clear Conversation History from the Desktop Client 2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky
3096544cf2 Create API endpoint to clear user's chat history 2023-11-22 03:34:59 -08:00
Debanjum Singh Solanky
63675b3299 Speak to Khoj from the Desktop client
- Use icons to style speech to text recording state
2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky
2951fc92d7 Speak to Khoj from the Web client
- Use icons to style speech to text recording state
2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky
cc77bc4076 Create speech to text API endpoint. Use OpenAI whisper for ASR
- Wrap audio transcription in try/catch and delete audio file after
processing
- Use configured speech to text model, else handle error
2023-11-22 02:47:06 -08:00
Debanjum Singh Solanky
1ca99b6eb0 Add speech to text model configuration to Database 2023-11-22 02:24:31 -08:00
sabaimran
60c23d9e3a Add online search chat director tests 2023-11-21 23:08:36 -08:00
sabaimran
c652a7fd2d Move text_to_entries under the new content folder 2023-11-21 22:25:17 -08:00
sabaimran
1e2af083f0 Rename the data_sources module to content 2023-11-21 22:11:32 -08:00
sabaimran
4cb28aeffb Resolve merge conflicts with master 2023-11-21 22:07:41 -08:00
Debanjum Singh Solanky
4cdfe8fc4f Re-enable Khoj Obsidian plugin for Mobile, as Khoj cloud is available 2023-11-21 16:33:48 -08:00
Debanjum
5d9d50157e Clean Logs, Improve Message Rendering and Make Khoj Trusted Host Configurable (#561)
- Append chat message to chat logs as TextNodes in web, desktop clients

- Simplify Code to Identify Files from Github, Notion on Web, Desktop Client
  - Use file source to find entries from github, notion on web, desktop client
  - Pass file source to clients via text search API response

- Make Django Logs Follow Khoj Log Format, Verbosity
  - Handle image search setup related warning
  - Format Django initializing outputs using Khoj logger format

- Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj
2023-11-21 15:14:34 -08:00
sabaimran
458e794d00 Revert PYTHONPATH to what it was before 2023-11-21 14:40:57 -08:00
Debanjum Singh Solanky
9e736d4340 Use KHOJ_DOMAIN for CORS allow_origins list as well
- Default to app.khoj.dev
- Remove unnecesary any_path regex in allow_origins. It only cares
  about host, paths are not set in origin header
2023-11-21 14:02:04 -08:00
sabaimran
5469e81a87 Use full path for the static directory in FastAPI and reflect deeper nesting of the django app 2023-11-21 13:44:45 -08:00
sabaimran
d199c4c35f Resovle merge conflicts with matser 2023-11-21 13:35:56 -08:00
Debanjum Singh Solanky
76d041f633 Use KHOJ_HOST env var to set allowed/trusted domains to host Khoj
Allows hosting Khoj behind other, non "khoj.dev" domains
2023-11-21 13:11:45 -08:00
Debanjum Singh Solanky
90d463c12a Append chat message to chat logs as TextNodes in web, desktop clients 2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
befcbcdd5d Use file source to find entries from github, notion on web, desktop client
This is a more robust mechanism of identification than via file name
including github or notion domain names
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
3f0de45ec6 Pass file source to clients via text search API response
Source of entry stored in DB is now passed to clients for processing
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
4aec581306 Handle image search setup related warning
Ideally should rename model_directory to config_directory or some such
but the current image search code will need to be migrated soon. So
changing the variable name and creating a migration script for old
khoj.yml files using model-directory variable isn't worth it

Remove the explicity set of number of threads to use by pytorch. Use
the default used by it.
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
b06628ee31 Format Django initializing outputs using Khoj logger format
- Collect STDOUT from the `migrate', `collectstatic' commands and
  output using the Khoj logger format and verbosity settings

- Only show Django `collectstatic' command output in verbose mode

- Fix showing the Initializing Khoj log line by moving it after logger
  level set
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
6d9091bef5 Disable isort for now 2023-11-21 13:03:18 -08:00
sabaimran
341abf03ff Handle none for search_type and use equals comparator rather than in for determining Notion type 2023-11-21 12:55:09 -08:00
Debanjum Singh Solanky
19e042037a Run isort with black profile to avoid conflicts between the two 2023-11-21 12:52:07 -08:00
sabaimran
2bb989e9d8 Resolve merge conflicts and fix some import ordering 2023-11-21 12:30:43 -08:00
sabaimran
244b76ffed Add isort for automatic import sorting and skip main.py because it's a drama queen 👑 2023-11-21 12:20:41 -08:00
Debanjum
8a0d92e2d7 Fix Connectivity Check in Obsidian Client (#559) from dtkav/bugfix-local-connectivity-check
Check connection to Khoj server for self-hosted server. This check had regressed during the cloud rearchitecture
2023-11-21 12:05:16 -08:00
sabaimran
0e6f09b241 Merge pull request #562 from khoj-ai/fix/pypi-package-app-not-included
Fix PyPi package app reference issue
2023-11-21 11:54:46 -08:00
sabaimran
61f6b8c0d4 Ignore-check step failed due to unrecognized code. Try using capital letters for indicator 2023-11-21 11:33:43 -08:00
sabaimran
38144a7a69 pull_request path should be src/khoj rather than src/ 2023-11-21 11:33:07 -08:00
Debanjum
e5130fb3f3 Fix ranking search results on Obsidian (#560)
This bug was causing the search results on the Obsidian client to be shown in the reverse order of their actual relevance.

It reversed since entry scores returned by Khoj server are a distance metric since the move to Postgres. So lesser distance is better. Previously higher score was better.
2023-11-21 11:32:47 -08:00
sabaimran
333cb3445c Use colon rather than equals to indicate typing 2023-11-21 11:28:51 -08:00
Debanjum Singh Solanky
645fd96634 Search across all content types from Khoj Obsidian client
Previously it was only searching for PDF and Markdown files. This was
meant to show only content from current vault as results.

But it has not scaled well as other clients also allow syncing PDF and
markdown files now. So remove this content type filter for now.

A proper solution would limit by using file/dir filters on server or
client side.
2023-11-21 11:19:33 -08:00
sabaimran
a1460a5bf9 Set operations to typed empty list in migration file 2023-11-21 11:14:40 -08:00
sabaimran
8932fc0c36 Ignore w004 check to bypass pypi warnings for check-wheel-contents
- PyPi doesn't like to have files that start with numbers, however all of the generated django migration files start with numbers. To accommodate, skip this check.
- Refer to https://pypi.org/project/check-wheel-contents/ for documentation and recommendation
2023-11-21 11:12:50 -08:00
sabaimran
71e794c26f Remove the sys.append line in the main.py file, as it's not required 2023-11-21 10:57:21 -08:00
sabaimran
a474c31e02 Move the django app into the src/khoj folder for better organization and functionality
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
2023-11-21 10:56:04 -08:00
Debanjum Singh Solanky
c89bd49973 Fix ranking search results on Obsidian
It's reversed since score of entries is now a distance metric on
Khoj server. So lesser distance is better. Previously higher score was
better
2023-11-21 01:24:59 -08:00
Debanjum
6d8e889917 Improve Self Hosted Khoj Setup (#557)
- c07401cf Fix, Improve chat config via CLI on first run by using defaults
- d61b0dd5 Add Khoj Django app package to sys path to load Django module via pip install
- 4e98acbc Update minimum pydantic version to one with model_validate function
2023-11-20 17:25:53 -08:00
Daniel Grossmann-Kavanagh
f142999bce fix khoj local server usage 2023-11-20 17:07:30 -08:00
Debanjum Singh Solanky
c07401cf76 Fix, Improve chat config via CLI on first run by using defaults
- Fix setting prompt size for online chat
- generally improve chat config via cli by using default chat model,
  prompt size for online and offline chat
2023-11-20 17:01:20 -08:00
sabaimran
b142de15a8 Merge branch 'features/internet-enabled-search' of github.com:khoj-ai/khoj into features/reflective-suggested-questions 2023-11-20 15:56:09 -08:00
sabaimran
a9623ef85a Add requisite imports in order to instantiate offline model in adapters file 2023-11-20 15:27:42 -08:00
sabaimran
a8f13f334f Fix merging issues with base after popping the stash 2023-11-20 15:22:50 -08:00
sabaimran
8fa0b69c67 Resolve merge issue with adapters methods 2023-11-20 15:21:06 -08:00
sabaimran
fee99779bf Add subqueries for internet-connected search results and update client-side code accordingly
- Add a wrapper method to help make direct queries to the LLM and determine any intermediate responses needed for handling the request
2023-11-20 15:19:15 -08:00
Debanjum Singh Solanky
d61b0dd55c Add Khoj Django app package to sys path to load Django module via pip install 2023-11-20 14:55:00 -08:00
Debanjum Singh Solanky
4e98acbca7 Update minimum pydantic version to one with model_validate function 2023-11-20 14:52:37 -08:00
sabaimran
b8e6883a81 Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search 2023-11-19 16:20:08 -08:00
sabaimran
237195e20e Make all name-related fields nullable within the GoogleUser 2023-11-19 14:22:32 -08:00
sabaimran
4def8cce36 Merge pull request #541 from asim-shrestha/patch-1
Add test separators
2023-11-19 14:14:34 -08:00
Debanjum
71799add0b Index Parent Headings of Org-Mode Entries to Improve Search Context (#548)
### Overview
The parent hierarchy of org-mode entries can store important context. 
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index

### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser

Resolves #85
2023-11-19 13:18:19 -08:00
sabaimran
e398a76779 Fix test word filter 2023-11-19 13:14:58 -08:00
sabaimran
33a9304428 Resolve merge conflicts 2023-11-19 12:57:55 -08:00
sabaimran
cfd76b8472 Add open graph links to configure Khoj Docs preview 2023-11-19 12:16:59 -08:00
sabaimran
ef5e9d66c1 Resolve merge conflicts in dependency imports 2023-11-19 11:42:20 -08:00
Debanjum Singh Solanky
c3465d6982 Release Khoj version 1.0.0 2023-11-19 09:50:25 -08:00
Debanjum
736744be3a Update documentation to reflect new multi-user config scenario (#550)
- Update docs to show how to use Khoj Cloud
- Move self-hosting Khoj to separate section
- Add page to setup Desktop app
- Set default URL to Khoj Cloud URL in Obsidian, Emacs clients
2023-11-18 18:22:46 -08:00
Debanjum Singh Solanky
d0e84385f2 Simplify links in Khoj docs to use page_name.md with no prefixes
This allows jumping to page via VSCode IDE and on docs website
2023-11-18 18:17:46 -08:00
Debanjum Singh Solanky
fc65d8a9fe Add documentation page for the Khoj Desktop client 2023-11-18 18:17:35 -08:00
Debanjum Singh Solanky
35b469e488 Simplify setup, features since Khoj cloud in docs
- No Khoj server setup required to start using Khoj from Obsidian, Emacs
- Use tabs for install, upgrade in Emacs with different package
  managers
- Use default subtitles in Khoj Docs
- Deduplicate query filters, remove backend setup instructions in
  plugin pages
- Remove stale Setup demo on Khoj Obsidian plugin docs
2023-11-18 17:25:52 -08:00
Debanjum Singh Solanky
e1bf1f0e86 Update default Khoj server URL to Khoj cloud on Emacs, Obsidian clients 2023-11-18 16:25:45 -08:00
Debanjum Singh Solanky
8775ce730a Use URL fragments to allow jumping to config page sections on Web app 2023-11-18 16:25:45 -08:00
sabaimran
a5613cb08a Merge pull request #554 from khoj-ai/fix/issues-with-prod-chat
Fix misc. issues with chat configuration
2023-11-18 14:45:06 -08:00
sabaimran
f792b1e301 Remove already defined identical function 2023-11-18 14:08:50 -08:00
sabaimran
e2fff5dc47 Don't explicitly use value to get the model type value 2023-11-18 14:01:01 -08:00
sabaimran
a8a25ceac2 Honor user's chat settings when running the extract questions phase
- Add marginally better error handling when GPT gives a messed up respones to the extract questions method
- Remove debug log lines
2023-11-18 13:31:51 -08:00
sabaimran
67156e6aec Add new logs for debugging issues with chat references 2023-11-18 12:10:50 -08:00
sabaimran
5de2ab6098 Change parse_obj calls to use model_validate per new pydantic specification 2023-11-18 12:10:36 -08:00
sabaimran
ebdb423d3e Merge pull request #553 from khoj-ai/features/validation-errors
Update types of base config models for pydantic 2.0
2023-11-18 00:42:56 -08:00
sabaimran
6d249645a6 Fix interpretation of the default search type 2023-11-18 00:04:18 -08:00
sabaimran
f180b2ba94 Resolve mypy errors for various data types 2023-11-17 23:26:15 -08:00
sabaimran
3328a41f08 Update types of base config models for pydantic 2.0 2023-11-17 23:08:52 -08:00
sabaimran
f688529150 Update the default configuration for the AppConfig 2023-11-17 19:26:31 -08:00
sabaimran
11ccb92755 Fix formatting of welcome message to use markdown 2023-11-17 18:55:59 -08:00
Debanjum Singh Solanky
ca87b4ede9 Wrap common API query parameters into shared class to deduplicate code
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
  Earlier version didn't support wrapping common query params in class

- Use per fixture app instead of a global FastAPI app in conftest

- Upgrade minimum required Django version

- Fix no notes chat director test with updated no notes message
  No notes message was updated in commit 118f1143
2023-11-17 18:43:49 -08:00
sabaimran
262f3ccb59 Resolve mypy issues with formatting 2023-11-17 17:11:00 -08:00
sabaimran
a7e00898cb Fix rendering even when no online context references are returned 2023-11-17 16:41:28 -08:00
sabaimran
0fcf234f07 Add support for using serper.dev for online queries
- Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command
- Add it as an additional tool for doing Google searches
- Render the results appropriately in the chat web window
- Pass appropriate reference data down to the LLM
2023-11-17 16:19:11 -08:00
Debanjum Singh Solanky
33ad9b8e64 Update text search test since indexing ancestor hierarchy added 2023-11-17 15:26:55 -08:00
Debanjum Singh Solanky
55785d50c3 Use title, when present, as root ancestor of entries instead of file path 2023-11-17 15:03:27 -08:00
sabaimran
bfbe273ffd Add some styling to the copy button for programmatic output 2023-11-17 12:18:35 -08:00
sabaimran
9ddf3b58c3 Use the markdown parser for rendering the chat messages in the web interface 2023-11-17 12:14:02 -08:00
sabaimran
a0b12b001a Provide in-line rendering when output matches certain views 2023-11-17 11:04:36 -08:00
sabaimran
ec06d2c446 Move data indexer files into a separate folder under processor. Update assoc UTs 2023-11-16 17:19:55 -08:00
Debanjum Singh Solanky
68ac1e0193 Automate Desktop app builds on new release or push to master branch 2023-11-16 16:09:03 -08:00
sabaimran
45a42faec8 Make adjectives more positive for api token generation 2023-11-16 15:55:35 -08:00
sabaimran
3934633947 Update references to all documentation to reflect instructions for managed service
- By default assume the audience of this website is people looking to understand the featuer offering of Khoj, and then people who are looking to self-host
2023-11-16 15:26:03 -08:00
sabaimran
7688228b9c Update docs to reflect new setup processes and instructions based on rearchitecture
- Most important updates include the depedency requirement to setup Postgres when running/setting Khoj up locally
- Add instructiosn for Docker
- Shift to recommend desktop client and update instructions for how to configure Khoj for user
2023-11-16 12:56:42 -08:00
sabaimran
118f1143ff When user tries using the notes slash command without having any data indexed 2023-11-16 12:52:39 -08:00
sabaimran
e8a13f0813 Add multi-user support to Khoj and use Postgres for backend storage (#549)
- Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials
- Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration
- Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration
- Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page
- Adds billing to allow users to subscribe to the cloud service easily
- Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server.
- Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration.
- Changes license to GNU AGPLv3

Resolves #467 
Resolves #488 
Resolves #303 
Resolves #345 
Resolves #195 
Resolves #280 
Resolves #461 
Closes #259 
Resolves #351
Resolves #301
Resolves #296
2023-11-16 11:48:01 -08:00
sabaimran
1466aef554 Change license to GNU AGPLv3 from GNU GPLv3
- This enforces that upstream consumers of this code should open source their software for any network-distributed services
2023-11-16 11:14:06 -08:00
sabaimran
36d200580b Use a different name for the production-config containers 2023-11-16 10:28:28 -08:00
sabaimran
ba633c4015 Only build the production docker image when pushing to master 2023-11-16 09:24:57 -08:00
Debanjum Singh Solanky
ddb07def0d Test search uses ancestor headings as context for improved results
- Update test data to add deeper outline hierarchy for testing
  hierarchy as context
- Update collateral tests that need count of entries updated, deleted
  asserts to be updated
2023-11-16 03:05:19 -08:00
Debanjum Singh Solanky
74403e3536 Add ancestor headings of each org-mode entry to their compiled form
Resolves #85
2023-11-16 02:54:41 -08:00
Debanjum Singh Solanky
305c25ae1a Track ancestor headings for each org-mode entry in org-node parser 2023-11-16 02:39:14 -08:00
Debanjum
208ddddc6a Make Search Model Configurable on Server (#544)
- Make search model configurable on server
- Update migration script to get search model from `khoj.yml` to Postgres
- Update first run message on Khoj Desktop and Web app landing page
- Other miscellaneous bug fixes
2023-11-16 00:11:58 -08:00
Debanjum Singh Solanky
cc05013715 Update first run message on Web app with Chat models setup instructions
- Link to Django admin panel for user to create Chat Models on their
  Khoj server
- This should only get hit when user is not using Khoj cloud, as Khoj
  cloud would already have Chat models configured
2023-11-15 22:44:24 -08:00
Debanjum Singh Solanky
6c1693b8f4 Update first run message on Desktop app with API token setup instructions
- Open Web app settings in the default browser via link click
- Open Desktop app settings via link click
2023-11-15 22:44:11 -08:00
Debanjum Singh Solanky
922983bd53 Set max cos distance to 0.18. Test search API query with max distance 2023-11-15 20:26:21 -08:00
Debanjum Singh Solanky
18dbad5edb Use Sigmoid to normalize cross-encoder score between 0-1
- While sigmoid normalization isn't required for reranking.
  Normalizing score to distance metrics for both encoder and cross
  encoder scores is useful to reason about them
- Softmax wasn't required as don't need probabilities, sigmoid is good
  enough to get distance metric
2023-11-15 19:31:59 -08:00
sabaimran
0da4db4310 Merge pull request #547 from khoj-ai/features/fix-api-token-generator
Update the return type of the API token generator
2023-11-15 19:23:18 -08:00
sabaimran
ea144de438 Merge with master 2023-11-15 18:34:46 -08:00
sabaimran
6b17aeb32d Resolve merge conflicts in auth.py with remove KhojApiUser import 2023-11-15 17:32:53 -08:00
Debanjum Singh Solanky
348cc0cf0e Use better name for DB adapter func to create user by Google token 2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky
08a057bdd5 Rename SearchModel to SearchModelConfig DB model, Require Cross-Encoder 2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky
0679b2a7bd Use embeddings model store from state in text to entries
Do not need to instantiating it separately. In all other places we're
using the embeddings model store in global state anyway
2023-11-15 17:31:50 -08:00
sabaimran
f88a5867b4 Allow dockerize step to run for prod from PR temporarily 2023-11-15 17:31:50 -08:00
sabaimran
245a9cbf63 Fix return type of the update_or_create method 2023-11-15 17:31:50 -08:00
sabaimran
10be8dfad9 Rename dockerize dev action to be more accurate 2023-11-15 17:31:50 -08:00
sabaimran
70f5d0ed3c Add a dev workflow for GitHub actions, change the production workflow to only kick off when pushed to master 2023-11-15 17:31:50 -08:00
sabaimran
bbae7dd83c Update logic for creating a new user to use aupdate_or_create 2023-11-15 17:31:50 -08:00
sabaimran
154de8c629 Update format for return type of the generate token method 2023-11-15 17:31:12 -08:00
sabaimran
cf74fa4a70 Allow dockerize step to run for prod from PR temporarily 2023-11-15 17:04:48 -08:00
sabaimran
8e62af77b9 Update format for return type of the generate token mehtod 2023-11-15 17:03:01 -08:00
sabaimran
4a487aff23 Fix return type of the update_or_create method 2023-11-15 14:35:42 -08:00
sabaimran
992e54c218 Rename dockerize dev action to b emore accurate 2023-11-15 14:09:28 -08:00
sabaimran
99f5a6082e Add a dev workflow for GitHub actions, change the production workflow to only kick off when pushed to master 2023-11-15 14:07:25 -08:00
sabaimran
b63856ecb4 Update logic for creating a new user to use aupdate_or_create 2023-11-15 12:50:39 -08:00
sabaimran
b8e7488a95 Use a more permissive distance filter for search results from notes 2023-11-15 11:13:47 -08:00
sabaimran
d06b2cf24b Downgrade pyproject.toml to avert depedency conflict 2023-11-15 10:47:54 -08:00
sabaimran
05b7542115 Remove config lock from the state 2023-11-15 10:44:45 -08:00
sabaimran
ecd005cac0 Check if search model is already in DB before creating a new one 2023-11-15 10:41:35 -08:00
Debanjum Singh Solanky
9c6e7bdea2 Upgrade server, desktop app dependencies to resolve CVE bugs 2023-11-15 01:47:53 -08:00
Debanjum Singh Solanky
5a6ab9cc85 Fix failing client tests 2023-11-15 00:17:44 -08:00
Debanjum Singh Solanky
8f200cf53f Remove unused parameter from configure_search_type method 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
f8e5e118e1 Only create KhojUser on login if doesn't already exist 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
3d8d6145f2 Add search model config from khoj.yml to Postgres DB via migration script 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
4af194d74b Make search model configurable on server
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
  in code
- Now it's user configurable but with a default config generated by
  default
2023-11-14 19:09:35 -08:00
Debanjum
b734984d6d Fix, Improve Khoj with multi-user, db support for Khoj Cloud Release (#539)
### Overview
Prepare Khoj with multi-user, db support for Khoj Cloud release

### Details
- Add first run experience to configure Khoj via khoj CLI 
- Improve Web app settings page: Move files data into content section card. Move content index update button(s) to content section
- Improve OpenAI chat prompts
  - Push more general information for OpenAI models into system prompt
  - Make it more aware of it's current capabilities
  - Weaken asking follow-up questions
- Rate-limit calls to the chat API
- Add back search results quality threshold
  - Normalize quality score definitions across cross_encoder, encoder to distance metric
- Remove reference to deprecated button
- Await result of the search query
- Fixed Langchain issue by allowing the Docker image to rebuild with a later package version
2023-11-14 16:55:34 -08:00
Debanjum Singh Solanky
e98141f4c3 Subscribe default user to standard plan with a far away renewal date
Self hosted users in anonymous mode have all capabilities unlocked
2023-11-14 16:31:39 -08:00
Debanjum Singh Solanky
9d30fda26d Deduplicate, improve name of prompt templates for GPT4All chat models
- Do not pass unused rerank_results parameter to text_search.query method
2023-11-14 16:31:09 -08:00
Debanjum Singh Solanky
795ec9eb55 Add KHOJ_prefix to server admin credentials environment variables 2023-11-14 16:13:13 -08:00
sabaimran
ee005de662 Rename django files URL to server instead of django 2023-11-14 12:36:38 -08:00
sabaimran
75e5a6b6de Remove all the example mounted volumes as they're no longer required in the new architecture 2023-11-14 12:31:24 -08:00
sabaimran
20ce3d0c78 Update default docker compose configuration with Khoj local mode 2023-11-14 12:21:26 -08:00
sabaimran
8c36079f74 Add a first run experience to intialize the admin user if none exists and setup chat models 2023-11-13 21:07:12 -08:00
Debanjum Singh Solanky
e9adb58c16 Rate limit calls to the /chat API per user, per day/minute 2023-11-13 19:41:46 -08:00
Debanjum Singh Solanky
33a8eb0470 Log when new user is created 2023-11-13 19:37:24 -08:00
sabaimran
603f838115 Block input text field when waiting for chat response 2023-11-11 17:14:37 -08:00
Asim Shrestha
0bfc094e18 Add test separators 2023-11-11 17:08:58 -08:00
Debanjum Singh Solanky
9c321ac070 Fix cross encoder to use softmax to convert it to a distance metric 2023-11-11 16:12:24 -08:00
sabaimran
8a824167cf Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references 2023-11-11 12:59:31 -08:00
sabaimran
fa428932a8 Update URL for downloading the desktop application 2023-11-11 12:59:15 -08:00
Debanjum Singh Solanky
941c7f23a3 Only get text search results above confidence threshold via API
- During the migration, the confidence score stopped being used. It
  was being passed down from API to some point and went unused

- Remove score thresholding for images as image search confidence
  score different from text search model distance score

- Default score threshold of 0.15 is experimentally determined by
  manually looking at search results vs distance for a few queries

- Use distance instead of confidence as metric for search result quality
  Previously we'd moved text search to a distance metric from a
  confidence score.

  Now convert even cross encoder, image search scores to distance metric
  for consistent results sorting
2023-11-11 04:11:33 -08:00
Debanjum Singh Solanky
e44e6df221 Reduce data dumped in console log from web, desktop app 2023-11-11 02:05:07 -08:00
Debanjum Singh Solanky
f044a89d50 Show status in Save, Reinitialize button of config page on web app
- Show non-transient error message in status element if action fails
- On success, just show temporary success message within button
2023-11-11 02:04:58 -08:00
Debanjum Singh Solanky
f17d9da36c Move Configure, Reinitialize buttons into the Content section on Web app
Remove the Results Count button from the web app. It's hanging weirdly
with not much context to its purpose.

Reintroduce it in the Search card when created under the Features section
2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky
325cb0f7fb Show message in Save button of Github, Notion config save in web app
Show the success, failure message only temporarily. Previously it
stuck around after clicking save until page refresh
2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky
b34d4fa741 Save config, update index on save of Github, Notion config in web app
Reduce user confusion by joining config update with index updation for
each content type.

So only a single click required to configure any content type instead
of two clicks on two separate pages
2023-11-11 00:33:49 -08:00
Debanjum Singh Solanky
c4364b9100 Weaken asking follow-up qs and q&a mode in notes prompt to OpenAI models
- Notes prompt doesn't need to be so tuned to question answering. User
could just want to talk about life. The notes need to be used to
response to those, not necessarily only retrieve answers from notes

- System and notes prompts were forcing asking follow-up questions a
  little too much. Reduce strength of follow-up question asking
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
cba371678d Stop OpenAI chat from emitting reference notes directly in chat body
The Chat models sometime output reference notes directly in the chat
body in unformatted form, specifically as Notes:\n['. Prevent that.
Reference notes are shown in clean, formatted form anyway
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
8585976f37 Revert "Use notes in system prompt, rather than in the user message"
This reverts commit e695b9ab8c.
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
b6441683c6 Increase reference text on 1st expansion to 3 lines and 140 characters 2023-11-10 23:36:43 -08:00
sabaimran
55c97241b5 Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references 2023-11-10 22:38:34 -08:00
sabaimran
e2e96f9aa4 Add default settings to let new users be subscribed on trial
- Add the default user to a subscription trial
- Update associated unit tests
2023-11-10 22:38:28 -08:00
Debanjum Singh Solanky
501e7606a0 Increase reference text on 1st expansion to 3 lines and 140 characters 2023-11-10 21:27:04 -08:00
sabaimran
0a950d9382 Fix checker to determine if obsidian client is connected 2023-11-10 19:21:58 -08:00
sabaimran
c736604366 Merge with remote 2023-11-10 17:50:15 -08:00
sabaimran
b0b07bde6c Allow chat reference to expand enough to show the whole reference, rather than constraining the height 2023-11-10 17:49:20 -08:00
sabaimran
14f8c151c8 Fix return type of the generate_chat_response method 2023-11-10 17:48:54 -08:00
Debanjum Singh Solanky
45b8670c25 Fix return type hint for generate_chat_response func 2023-11-10 17:34:19 -08:00
Debanjum Singh Solanky
c9c0ba67c6 Fix chat_client configurations for OpenAI chat director tests 2023-11-10 17:29:23 -08:00
Debanjum Singh Solanky
9b6c5ddba4 Update action row padding in cards on config page of web app 2023-11-10 16:53:25 -08:00
sabaimran
54d4fd0e08 Add chat_model data for logging selected models to telemetry 2023-11-10 16:46:34 -08:00
sabaimran
e695b9ab8c Use notes in system prompt, rather than in the user message 2023-11-10 15:09:33 -08:00
sabaimran
cec932d88a Update prompt so that GPT is more context aware with its capabilities 2023-11-10 14:37:11 -08:00
sabaimran
262a8574d1 Add a test to verify that a user without data sucessfully returns a respones to the /search endpoint 2023-11-10 14:00:58 -08:00
sabaimran
e62788ad79 Await result for determining if user has entries 2023-11-10 13:51:56 -08:00
sabaimran
1a56344f12 Remove the old syncData reference as it no longer exists 2023-11-10 10:10:07 -08:00
Debanjum
a348f1a6ab Reduce Desktop App UX Save, Sync Confusion (#538)
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
  previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease Cloud sync setup
- Add detailed chat intro message, mention download desktop app for docs sync
- Only show search in web app nav pane if user has documents indexed
- Hide download desktop app message in web app if synced files exist
- Mark generated profile pic with subscription circle in web app
2023-11-10 00:57:45 -08:00
Debanjum Singh Solanky
39ad1c6ce6 Release Khoj version 0.14.0
Fix Khoj subtitle in manifest of Khoj Obsidian plugin
2023-11-10 00:28:33 -08:00
Debanjum Singh Solanky
745d6bfeed Add detailed intro message, mention download desktop app for docs sync 2023-11-10 00:20:28 -08:00
Debanjum Singh Solanky
6eb7df717c Only show search in web app nav pane if user has documents indexed 2023-11-09 19:14:54 -08:00
Debanjum Singh Solanky
c0789dc57b Use email to get_user_subscription from DB and other DB adapters
- Needing user subscription requires chaining function
- Simplify get_file_sources DB adapter
2023-11-09 19:09:57 -08:00
Debanjum Singh Solanky
841ed95521 Move active user profile halo check into nav pane macro on web app 2023-11-09 18:05:19 -08:00
Debanjum Singh Solanky
ddac693762 Hide download desktop app message in web app if synced files exist 2023-11-09 17:47:00 -08:00
Debanjum Singh Solanky
30a9674f25 Mark generated profile pic with subscription circle in web app 2023-11-09 15:22:38 -08:00
Debanjum Singh Solanky
d6e6ed1cfa Keep single Save button, Show next sync, default to prod Khoj URL in Desktop app
- Make mutable syncing variable not a const
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
  previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease setup
2023-11-09 14:04:58 -08:00
Debanjum Singh Solanky
e1f0128576 Change config migration script to update to 0.15.0 version
Next release, 0.14.0 wouldn't contain the migration to Postgres
2023-11-09 12:21:58 -08:00
Debanjum Singh Solanky
17cbbb0b01 Use Consistent Environment Variable for KHOJ_DEBUG 2023-11-09 11:01:28 -08:00
Debanjum Singh Solanky
391db80499 Improve subscribed user profile pictures and nav pane selection
- Add yellow halo around subscribed user profile
- Fix highlighting current page in header nav pane
2023-11-09 00:57:05 -08:00
Debanjum Singh Solanky
605058c72a Allow null user profile picture from Google OAuth in DB
- Fix width of generated profile picture generated for user
- Ignore unused Stripe webhook events
2023-11-09 00:46:59 -08:00
Debanjum
1d3bdf8fdb Create Billing integration. Improve Settings pages on Desktop, Web apps (#537)
### Major
- Expose Billing via Stripe on Khoj Web app for Khoj Cloud subscription
  - Expose card on web app config page to manage subscription to Khoj cloud
  - Create API webhook, endpoints for subscription payments using Stripe
- Put Computer files to index into Card under Content section
  - Show file type icons for each indexed file in config card of web app
  - Enable deleting all indexed desktop files from Khoj via Desktop app
  - Create config page on web app to manage computer files indexed by Khoj
- Track data source (computer, github, notion) of each entry
  - Update content by source via API. Make web client use this API for config
  - Store the data source of each entry in database

### Cleanup
- Set content enabled status on update via config buttons on web app
- Delete deprecated content config pages for local files from web client
- Rename Sync button, Force Sync toggle to Save, Save All buttons

### Fixes
- Prevent Desktop app triggering multiple simultaneous syncs to server
- Upgrade langchain version since adding support for OCR-ing PDFs
- Bubble up content indexing errors to notify user on client apps
2023-11-08 19:55:35 -08:00
Debanjum Singh Solanky
a2609973b8 Disable Subscription if Stripe environment not setup
Deduplicate DJANGO_SECRET_KEY and KHOJ_DJANGO_SECRET_KEY to latter
name as prefixed with KHOJ as KHOJ app specific
2023-11-08 19:39:32 -08:00
Debanjum Singh Solanky
09e1235832 Auto update billing card UI on (re/un-)subscribe click on web app
Previously required a page load to see the updated billing state after
clicking resubscribe or unsubscribe buttons
2023-11-08 18:38:12 -08:00
Debanjum Singh Solanky
8b8bb15866 Keep sync state in memory, initialized to false in Desktop app
Prevent deadlock if desktop app killed in middle of syncing
2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
c043eb54ae Use typed entry source instead of raw str to map source to conf in api.py 2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
8178004e6d Move Subscription data into separate table in DB. Merge migrations 2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
3bb10128ef Move subscription API to separate, independent router 2023-11-08 16:20:27 -08:00
Debanjum Singh Solanky
ec1395d072 Clean, merge subscription update events, API and functions
- Reduce webhook triggers for subscription updates
- Merge subscription update API endpoint, functions for (re/un-)subscribe
2023-11-08 15:55:20 -08:00
Debanjum Singh Solanky
ef5c13f968 Keep user subscription state. Update it when user has unsubscribed 2023-11-08 12:08:36 -08:00
Debanjum Singh Solanky
c52affc6d9 Get Khoj Cloud Subscription URL via environment variable 2023-11-08 12:07:53 -08:00
sabaimran
609d358b1a Use sql datetime comparison for detecting validity of subscription renewal date
- Update the unsubscribe endpoint to use query params
- Use subscription id to process unsubscribe endpoint, rather than the customer id
2023-11-07 19:17:36 -08:00
sabaimran
98cf095b65 Fix bug for rendering chat references in LLM response 2023-11-07 16:44:41 -08:00
sabaimran
0e1cdb6536 Add additional error handling for processing unknown Stripe events and fix typo in STRIPE_SIGNING env variable 2023-11-07 16:43:05 -08:00
sabaimran
08c86927cb Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into fix-improve-config-page-on-desktop-and-web-app 2023-11-07 12:46:49 -08:00
sabaimran
cec54e3a8a Merge pull request #536 from khoj-ai/features/update-chat-ui
Update the chat UI to have richer representation of the references
2023-11-07 12:34:57 -08:00
Debanjum Singh Solanky
f466751f4d Expose card on web app config page to manage subscription to Khoj cloud 2023-11-07 10:21:00 -08:00
Debanjum Singh Solanky
9aaf475c8a Create API webhook, endpoints for subscription payments using Stripe
- Add fields to mark users as subscribed to a specific plan and
  subscription renewal date in DB
- Add ability to unsubscribe a user using their email address
- Expose webhook for stripe to callback confirming payment
2023-11-07 10:20:51 -08:00
Debanjum Singh Solanky
156421d30a Show file type icons for each indexed file in config card of web app 2023-11-07 05:48:44 -08:00
Debanjum Singh Solanky
045c2252d6 Set content enabled status on update via config buttons on web app
Previously hitting configure or disable wouldn't update the state of
the content cards. It needed page refresh to see if the content was
synced correctly.

Now cards automatically get set to new state on hitting disable button
on card or global configure buttons
2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
7c424e0d5f Enable deleting all indexed desktop files from Khoj via Desktop app 2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
779fa531a5 Prevent Desktop app triggering multiple simultaneous syncs to server
Lock syncing to server if a sync is already in progress.

While the sync save button gets disabled while sync is in progress,
the background sync job can still trigger a sync in parallel. This
sync lock prevents that
2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
404d47f1a1 Bubble up content indexing errors to notify user on client apps 2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
6e957584ac Create config page on web app to manage computer files indexed by Khoj
Remove the table of all files indexed by Khoj. This seems overkill and
doesn't match the UI semantics of the other data sources like Github,
Notion.

Create instead a data source card for computer files with the same
update, disable semantics of the Github and Notion data source cards

Users can disable each data source from its card on the main config page.

They can see/delete individual files indexed from the computer data source
once they click into the computer files data source card on the config page
2023-11-07 04:42:53 -08:00
Debanjum Singh Solanky
d527b644f4 Update content by source via API. Make web client use this API for config 2023-11-07 03:41:19 -08:00
Debanjum Singh Solanky
9ab327a2b6 Store the data source of each entry in database
This will be useful for updating, deleting entries by their data
source. Data source can be one of Computer, Github or Notion for now

Store each file/entries source in database
2023-11-07 02:18:48 -08:00
Debanjum Singh Solanky
c82cd0862a Delete deprecated content config pages for local files from web client
The desktop app now manages syncing local computer files to index
The server only manages "cloud" data source like github and notion.
2023-11-06 23:55:37 -08:00
Debanjum Singh Solanky
9f47fc8e34 Upgrade langchain version since adding support for OCR-ing PDFs 2023-11-06 21:58:33 -08:00
Debanjum Singh Solanky
97cf8339aa Rename Sync button, Force Sync toggle to Save, Save All buttons 2023-11-06 21:57:37 -08:00
Debanjum Singh Solanky
a08b152358 Improve log messages in text_entries and memory leak unit test 2023-11-06 19:27:31 -08:00
sabaimran
6c8689e4ae Update corresponding chat UX in the desktop client as well 2023-11-06 16:18:41 -08:00
sabaimran
e01ecf1419 /s/references/reference to fix bug of jumping references 2023-11-06 16:12:25 -08:00
Debanjum
38f24a037d Improve Indexing Text Entries (#535)
Major
- Ensure search results logic consistent across migration to DB, multi-user
- Manually verified search results for sample queries look the same across migration
 - Flatten indexing code for better indexing progress tracking and code readability

Minor
- a4f407f Test memory leak on MPS device when generating vector embeddings
- ef24485 Improve Khoj with DB setup instructions in the Django app readme (for now)
- f212cc7 Arrange remaining text search tests in arrange, act, assert order
- 022017d Fix text search tests to test updated indexing log messages
2023-11-06 16:01:53 -08:00
sabaimran
270f7b3eb3 Update the chat UI to have richer representation of the references 2023-11-05 15:46:43 -08:00
sabaimran
81a615d7dd Merge pull request #534 from khoj-ai/features/code-config-cleanup
Small fixes and update config UI to manage indexed data
2023-11-05 15:45:45 -08:00
sabaimran
8ebb12820c Add OCR runtime dependencies to prod Dockerfile as well 2023-11-05 15:40:05 -08:00
sabaimran
d697d752c2 Use repeat rather than manually specify auto in grid-template-rows
Co-authored-by: Debanjum <debanjum@gmail.com>
2023-11-05 15:23:42 -08:00
sabaimran
3d6e8d53fe Try adding dependencies for libgl in order to run OCR in github action unit tests 2023-11-05 15:09:40 -08:00
sabaimran
5f1e37fff0 Adjust indentation for css property 2023-11-05 14:33:23 -08:00
sabaimran
fdd727712f Rename test files from x_to_jsonl to x_to_entries 2023-11-05 14:33:07 -08:00
Debanjum Singh Solanky
a4f407f595 Test memory leak on MPS device when generating vector embeddings
Slope threshold of 2.0 determined qualitatively on local Mac device
Minor unused import and clean-up
2023-11-05 03:48:54 -08:00
Debanjum Singh Solanky
ef24485ada Improve Khoj with DB setup instructions in the Django app readme (for now) 2023-11-05 02:04:52 -08:00
Debanjum Singh Solanky
f212cc7174 Arrange remaining text search tests in arrange, act, assert order 2023-11-05 02:04:52 -08:00
Debanjum Singh Solanky
022017dd0f Fix text search tests to test updated indexing log messages 2023-11-05 02:04:52 -08:00
sabaimran
084a8becc5 Fix but to prevent default in chat trigger 2023-11-04 20:13:33 -07:00
Debanjum Singh Solanky
5489e98b9c Do not index org heading entries by default
This is to maintain the previous default behavior
2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky
34b5a86d1d Use SentenceTransformer to disable progress bar when encoding query
The Langchain HuggingFaceEmbeddings wrapper doesn't support disabling
progressbar, not especially for only query but not documents.

This makes the logs noisy with encoding progressbar for each
incremental queries

No features of the Langchain wrapper for SentenceTransformer was
currently being used anyway for now, and we can always switch back to
it if required
2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky
dc9946fc03 Flatten nested loops, improve progress reporting in text_to_jsonl indexer
Flatten the nested loops to improve visibilty into indexing progress

Reduce spurious logs, report the logs at aggregated level and update
the logging description text to improve indexing progress reporting
2023-11-04 20:09:25 -07:00
sabaimran
88eeee3f4b Move try/catch for import one line later 2023-11-04 19:46:47 -07:00
sabaimran
dbaa892665 Flip catching modulenotfound to import error exception 2023-11-04 19:34:10 -07:00
sabaimran
8c3d5a49da Add try/except around image extraction step 2023-11-04 19:27:18 -07:00
sabaimran
fdfab39942 Update the config UI to show all files indexed with option to delete
- Given the separation of the client and server now, the web UI will no longer support configuration of local file paths of data to index
- Expose a way to show all the files that are currently set for indexing, along with an option to delete all or specific files
2023-11-04 19:03:34 -07:00
sabaimran
800bb4f458 Remove references to demo
- The demo setting is no longer necessary for the time being, as we won't have anymore demo instances
2023-11-04 17:17:04 -07:00
sabaimran
b5972e9311 Use OCR to extract image text in PDFs 2023-11-04 17:15:28 -07:00
sabaimran
d1d210605e Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj 2023-11-04 14:29:34 -07:00
sabaimran
3678aa5614 Add tests to validate expected behaviors in the multi-user scenario 2023-11-04 14:29:30 -07:00
Debanjum
12b5ef6540 Improve Theming of Web, Desktop and Obsidian Client App (#532)
- Update theme for Desktop, Web and Obsidian client apps to use lighter colors
- Show splash screen on starting Desktop app
- Make chat the landing page on Desktop and Web clients
- Simplify style of login page on Web app
- Add About page for Desktop app accessible from system tray menu
2023-11-04 12:29:56 -07:00
Debanjum Singh Solanky
8273bf26b7 Fix multi-line chat input and output render on web, desktop clients
- Remove spurious whitespace in chat input box on page load being
  added because text area element was ending on newline
- Do not insert newline in message when send message by hitting enter key
  This would be more evident when send message with cursor in the
  middle of the sentence, as a newline would be inserted at the cursor
  point
- Remove chat message separator tokens from model output. Model
  sometimes starts to output text in it's chat format
2023-11-04 01:09:35 -07:00
Debanjum Singh Solanky
2f1756cc15 Do not use icon for each file, folder to index in desktop app.
Other minor fixes based on PR feedback
2023-11-04 00:13:10 -07:00
Debanjum Singh Solanky
e8f568d79c Make splash screen wider, opaque and fix it's spinner radius
Radius should be such that final spin doesn't extend out of the circle
Opaque background improves contrast for better visual
2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
3ef05f4803 Use css var for main font color in search, chat page of desktop app 2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
a19cbde2d7 Add About page for Khoj to Desktop app. Expose it via system tray
- Pass current khoj version from package.json to about page via
  electron IPC between backend js and frontend page
- Update Khoj information in default About screen as well, in case
  it's exposed anywhere else
2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
a327294ee9 Rename khoj.js to utils.js in web and desktop client apps 2023-11-03 18:13:37 -07:00
Debanjum Singh Solanky
db57eeaefe Console log a welcome message on loading Desktop client 2023-11-03 05:15:41 -07:00
Debanjum Singh Solanky
6fae6fb2a4 Merge branch 'features/multi-user-support-khoj' into improve-client-app-theming 2023-11-03 04:58:41 -07:00
Debanjum Singh Solanky
4cd76311ad Slow down spinning at end of splash sequence. Make animation bigger 2023-11-03 04:28:17 -07:00
Debanjum Singh Solanky
34661c33a2 Show splash screen on starting desktop app 2023-11-03 03:19:08 -07:00
Debanjum Singh Solanky
126d3f4563 Render each file, folder to index row with icon in desktop app
Make the file, folders to index look less like an editable field
2023-11-03 02:48:42 -07:00
Debanjum Singh Solanky
80ae132cad Update Desktop, Obsidian client color theme to lighter yellow
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
  with lantern flame shade
- Add water, leaf, flower color variables
2023-11-03 02:48:42 -07:00
sabaimran
fb6ebd19fc Fix refactor bugs, CSRF token issues for use in production (#531)
Fix refactor bugs, CSRF token issues for use in production
* Add flags for samesite settings to enable django admin login
* Include tzdata to dependencies to work around python package issues in linux
* Use DJANGO_DEBUG flag correctly
* Fix naming of entry field when creating EntryDate objects
* Correctly retrieve openai config settings
* Fix datefilter with embeddings name for field
2023-11-02 23:02:38 -07:00
Debanjum Singh Solanky
345856e7be Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj
Merge changes to use latest GPT4All with GPU, GGUF model support into
khoj multi-user support rearchitecture branch
2023-11-02 22:44:25 -07:00
Debanjum Singh Solanky
041074ccd6 Make chat the landing page for the desktop app
Chat, unlike search, doesn't knowledge base indexing setup.
So you can get started with chat much faster.
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
3801105b2a Make chat the landing page for the web app
Chat, unlike search, doesn't knowledge base indexing setup.
So you can get started with chat much faster.
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
0d4e7d46c2 Fix color and size of profile picture circle in nav pane 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
4fbe8ac6b1 Console log a welcome message on loading web client 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
9fc6c97139 Use Khoj standard font family, weight in web client settings page 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
b6f07099cd Simplify login page styling on web client
- Center all elements: icon, text and button
- Use khoj icon not logo-text
- Simplify login title text
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
7b7f6d3bc8 Update web client theme to a lighter
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
  with lantern flame shade
- Add water, leaf, flower color variables
2023-11-02 20:42:21 -07:00
sabaimran
fe860aaf83 Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj 2023-11-02 14:56:01 -07:00
sabaimran
2c9496bcf1 Add additional null checks in the migrate_server_pg script 2023-11-02 14:55:58 -07:00
sabaimran
20df0f5330 Use url_path_for for creating the login page URL in the application 2023-11-02 14:55:14 -07:00
sabaimran
fd11b78552 Fix migration script error when openai not available (#530) 2023-11-02 11:28:08 -07:00
sabaimran
fe6720fa06 [Multi-User Part 8]: Make conversation processor settings server-wide (#529)
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
2023-11-02 10:43:27 -07:00
Debanjum
0fb81189ca [Multi-User Part 7]: Improve Sign-In UX & Rename DB Models for Readability (#528)
###  New
- Create profile pic drop-down menu in navigation pane
  Put settings page, logout action under drop-down menu

### ⚙️ Fix
- Add Key icon for API keys table on Web Client's settings page

### 🧪 Improve
- Rename `TextEmbeddings` to `TextEntries` for improved readability
- Rename `Db.Models` `Embeddings`, `EmbeddingsAdapter` to `Entry`, `EntryAdapter`
- Show truncated API key for identification & restrict table width for config page responsiveness
2023-11-01 18:05:20 -07:00
Debanjum Singh Solanky
12b3eeae9e Use Khoj fonts on config page of web and desktop apps too
Previously pico.css font-families were being selected for the config
page. This was different from the fonts used by index.html, chat.html

This improves spacing issue of heading further
2023-11-01 17:50:50 -07:00
Debanjum Singh Solanky
022d695309 Switch to narrow view below width of 700px on web client
This makes the dropdown menu align better to the profile picture in
mobile view
2023-11-01 17:49:44 -07:00
Debanjum Singh Solanky
6a0adfbfbb Default to profile picture with Initial if user has no profile picture 2023-11-01 17:49:44 -07:00
Tuan Nguyen
354605e73e Autofocus to chat input when openning chat (#524) 2023-11-01 16:09:45 -07:00
Debanjum Singh Solanky
d92a2d03a7 Rename Files, Classes from X_To_JSONL to more appropriate X_To_Entries
These content processors are converting content into entries in DB
instead of entries in JSONL file
2023-11-01 14:51:33 -07:00
Debanjum Singh Solanky
2ad2055bcb Remove user null check in API controllers that require authentication 2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky
7ac5a4766d Match spacing of navigation header pane in config vs search/chat pages 2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky
2e3a4a6a9b Use Jinja macro to deduplicate navigation header HTML 2023-11-01 14:38:12 -07:00
Debanjum Singh Solanky
c631b61a81 Put colors shared by index, chat html into khoj css global variables 2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky
f585a71744 Put logout, settings under dropdown menu with logged in user's profile picture
- Create dropdown menu. Put settings page, logout action under it
- Make user's profile picture the dropdown menu heading
- Create khoj.js to store shared js across web client
  It currently stores the dropdown menu open, close functionality
- Put shared styling for khoj dropdown menu under khoj.css
2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky
58a7171911 Show truncated API key for identification & restrict table width
- Use a function to generate API Key table row HTML, to dedup logic
- Show delete, copy icon hints on hover
- Reduce length of copied message to not expand table width
- Truncating API key helps keep the API key table width within width
  of smaller width displays
2023-10-31 23:10:26 -07:00
Debanjum Singh Solanky
9cebd7f856 Add emoji icons to Search, Chat, Settings items in nav menu of Web client
Emoji icons have already been added to the Search, Chat and Settings
top navigation menu in the desktop client. This change adds these to
the web client as well
2023-10-31 22:38:44 -07:00
Debanjum Singh Solanky
f77336ba61 Add key icon for API keys table in Web client config page 2023-10-31 19:01:09 -07:00
Debanjum Singh Solanky
87e6b1eab9 Rename TextEmbeddings to TextEntries for improved readability
Improves readability as name has closer match to underlying
constructs
2023-10-31 18:55:59 -07:00
Debanjum Singh Solanky
bcbee05a9e Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter
Improves readability as name has closer match to underlying
constructs

- Entry is any atomic item indexed by Khoj. This can be an org-mode
  entry, a markdown section, a PDF or Notion page etc.

- Embeddings are semantic vectors generated by the search ML model
  that encodes for meaning contained in an entries text.

- An "Entry" contains "Embeddings" vectors but also other metadata
  about the entry like filename etc.
2023-10-31 18:50:54 -07:00
sabaimran
54a387326c [Multi-User Part 6]: Address small bugs and upstream PR comments (#518)
- 08654163cb: Add better parsing for XML files
- f3acfac7fb: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd62c0: Chunk embeddings generation in order to avoid large memory load
- e02d751eb3: Addresses comments from PR #498 
- a3f393edb4: Addresses comments from PR #503 
- 66eb078286: Addresses comments from PR #511 
- Address various items in https://github.com/khoj-ai/khoj/issues/527
2023-10-31 17:59:53 -07:00
sabaimran
5f3f6b7c61 [Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514)
- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
2023-10-26 13:15:31 -07:00
Debanjum
9acc722f7f [Multi-User Part 4]: Authenticate using API Tokens (#513)
###  New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database

### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme

### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
2023-10-26 12:33:03 -07:00
sabaimran
4b6ec248a6 [Multi-User Part 3]: Separate chat sesssions based on authenticated users (#511)
- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.

This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
2023-10-26 11:37:41 -07:00
sabaimran
a8a82d274a [Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503)
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
2023-10-26 10:17:29 -07:00
sabaimran
216acf545f [Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498)
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
2023-10-26 09:42:29 -07:00
Debanjum Singh Solanky
9677eae791 Expose CLI flag to disable using GPU for offline chat model
- Offline chat models outputing gibberish when loaded onto some GPU.
  GPU support with Vulkan in GPT4All seems a bit buggy

- This change mitigates the upstream issue by allowing user to
  manually disable using GPU for offline chat

Closes #516
2023-10-25 17:51:46 -07:00
Debanjum Singh Solanky
5bb14a05a0 Update system requirements in docs for offline chat models 2023-10-22 19:04:23 -07:00
Debanjum Singh Solanky
0f1ebcae18 Upgrade to latest GPT4All. Use Mistral as default offline chat model
GPT4all now supports gguf llama.cpp chat models. Latest
GPT4All (+mistral) performs much at least 3x faster.

On Macbook Pro at ~10s response start time vs 30s-120s earlier.
Mistral is also a better chat model, although it hallucinates more
than llama-2
2023-10-22 19:04:23 -07:00
sabaimran
6dc0df3afb Pin pytorch version to 2.0.1 in order to avoid exit code 139 in Docker container (#512) 2023-10-20 14:10:21 -07:00
sabaimran
963cd165eb Resolve merge conflicts 2023-10-19 14:39:05 -07:00
Simon Butler
e3f8a95784 Update emacs.md (#510)
Minor correction for emacs-lisp in minimal install
2023-10-19 12:28:08 -07:00
Debanjum
d93395ae48 Set >=6Gb RAM required for offline chat
Llama v2 7B with 4bit quantization technically needs ~3.5Gb RAM (7B * 0.5byte), practically a system with 6Gb of RAM should suffice
2023-10-18 12:05:54 -07:00
Debanjum Singh Solanky
8346e1193c Release Khoj version 0.13.0 2023-10-18 03:43:54 -07:00
Debanjum Singh Solanky
6631fc38db Delete plaintext config via API. Catch any offline model loading exception 2023-10-18 03:37:45 -07:00
Debanjum Singh Solanky
53abd1a506 Mark sync completed on desktop client, even when no files to send
Previously Sync spinner on desktop config screen would hang when no
files to send to server & the Sync button had been manually triggered
2023-10-18 01:30:56 -07:00
Debanjum Singh Solanky
71b0012e8c Set offline chat config to default value if unset on server load 2023-10-18 00:59:43 -07:00
Debanjum Singh Solanky
cf1cdc3fe1 Disambiguate input_filter variable names in fs_syncer functions 2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky
e3cd8b4150 Only index files returned by input-filter globs in fs_syncer
Ignore .org, .pdf etc. suffixed directories under `input-filter' from
being evaluated as files.

Explicitly filter results by input-filter globs to only index files,
not directory for each text type

Add test to prevent regression

Closes #448
2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky
51363d280d Do not configure khoj server for pull based indexing from khoj.el
Do not make khoj server pull update index on Obsidian plugin load.
Index is updated on push from plugin instead now/
2023-10-17 21:47:19 -07:00
Debanjum Singh Solanky
d9d133dfb9 Read text files as utf-8, instead of default os locale
On Windows, the default locale isn't utf8. Khoj had regressed to
reading files in OS specified locale encoding, e.g cp1252, cp949 etc.

It now explicitly uses utf8 encoding to read text files for indexing

Resolves #495, resolves #472
2023-10-17 21:47:19 -07:00
Debanjum
3d4576ae38 Fix encoding binary files for sync from the Desktop, Obsidian client (#506)
- Fix encoding binary files like PDFs for sync from Desktop client
- Fix encoding binary files like PDFs for sync from Obsidian client
2023-10-17 15:37:22 -07:00
Debanjum Singh Solanky
c8293998d9 Fix encoding binary files like PDFs for sync from Obsidian client
Use readBinary to read binary files like PDFs instead of read
2023-10-17 15:08:30 -07:00
sabaimran
ba60c869c9 Fix encoding binary files like PDFs for sync from Desktop client
Use readFileSync, Buffer to pass appropriately formatted binary data
2023-10-17 15:08:23 -07:00
Andrew Spott
3d7381446d Changed globbing. Now doesn't clobber a users glob if they want to a… (#496)
* Changed globbing.  Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob.

Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included.

* Fix typo in globformat variable

* Use older glob pattern for plaintext files

---------

Co-authored-by: Saba <narmiabas@gmail.com>
2023-10-17 11:26:06 -07:00
sabaimran
2646c8554d Provide a default value to offline_chat configuration of the conversation processor 2023-10-17 10:35:22 -07:00
Debanjum Singh Solanky
b8976426eb Update offline chat model config schema used by Emacs, Obsidian clients
The server uses a new schema for the conversation config. The Emacs,
Obsidian clients need to use this schema to update the conversation
config
2023-10-17 07:01:35 -07:00
Debanjum
ecc6fbfeb2 Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499)
### Overview
- Add ability to push data to index from the Emacs, Obsidian client
- Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON
  - Benefits of new mechanism
    - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests
    - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required
    - Binary files don't need to be encoded on client and decoded on server

### Code Details
### Major
- Use multi-part form to receive files to index on server
- Use multi-part form to send files to index on desktop client
- Send files to index on server from the khoj.el emacs client
  - Send content for indexing on server at a regular interval from khoj.el
- Send files to index on server from the khoj obsidian client
- Update tests to test multi-part/form method of pushing files to index

#### Minor
- Put indexer API endpoint under /api path segment
- Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
- Improve emoji, message on content index updated via logger
- Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user
- Improve indexing of binary files
  - Let fs_syncer pass PDF files directly as binary before indexing
  - Use encoding of each file set in indexer request to read file 
- Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost
- Update indexer API endpoint URL to` index/update` from `indexer/batch`

Resolves #471 #243
2023-10-17 06:05:15 -07:00
Debanjum Singh Solanky
7b1c62ba53 Mark test_get_configured_types_via_api unit test as flaky
It passes locally on running individually but fails when run in
parallel on local or CI
2023-10-17 05:56:00 -07:00
Debanjum Singh Solanky
6a4f1b2188 Add more client, request details in logs by index/update API endpoint 2023-10-17 05:43:29 -07:00
Debanjum Singh Solanky
5efae1ad55 Update indexer API endpoint query params for force, content type
New URL query params, `force' and `t' match name of query parameter in
existing Khoj API endpoints

Update Desktop, Obsidian and Emacs client to call using these new API
query params. Set `client' query param from each client for telemetry
visibility
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
84654ffc5d Update indexer API endpoint URL to index/update from indexer/batch
New URL follows action oriented endpoint naming convention used for
other Khoj API endpoints

Update desktop, obsidian and emacs client to call this new API
endpoint
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
e347823ff4 Log telemetry for index updates via push to API endpoint 2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
05be6bd877 Clicking Update Index in Obsidian settings should push files to index
Use the indexer/batch API endpoint to regenerate content index rather
than the previous pull based content indexing API endpoint
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
13a3122bf3 Stop configuring server to pull files to index from Obsidian client
Obsidian client now pushes vault files to index instead
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
99a2c934a3 Add CORS policy to allow requests from khoj apps, obsidian & localhost
Using fetch from Khoj Obsidian plugin was failing due to cross-origin
request and method: no-cors didn't allow passing x-api-key custom
header. And using Obsidian's request with multi-part/form-data wasn't
possible either.
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
541cd59a49 Let fs_syncer pass PDF files directly as binary before indexing
No need to do unneeded base64 encoding/decoding to pass pdf contents
for indexing from fs_syncer to pdf_to_jsonl
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
d27dc71dfe Use encoding of each file set in indexer request to read file
Get encoding type from multi-part/form-request body for each file
Read text files as utf-8 and pdfs, images as binary
2023-10-17 04:58:12 -07:00
Debanjum Singh Solanky
8e627a5809 Pass any files to be deleted to indexer API via Khoj Obsidian plugin
- Keep state of previously synced files to identify files to be deleted
- Last synced files stored in settings for persistence of this data
  across Obsidian reboots
2023-10-17 03:34:49 -07:00
Debanjum Singh Solanky
f2e293a149 Push Vault files to index to Khoj server using Khoj Obsidian plugin
Use the multi-part/form-data request to sync Markdown, PDF files in
vault to index on khoj server

Run scheduled job to push updates to value for indexing every 1 hour
2023-10-17 03:05:30 -07:00
Debanjum Singh Solanky
6baaaaf91a Test request body of multi-part form to update content index from khoj.el 2023-10-16 23:54:32 -07:00
Debanjum Singh Solanky
79b3f8273a Make khoj.el send files to be deleted from index to server 2023-10-16 23:53:02 -07:00
Debanjum Singh Solanky
5dc399b32e Document system requirements to run offline chat
Closes #375
2023-10-16 19:39:06 -07:00
Debanjum Singh Solanky
f64fa06e22 Initialize the Khoj Transient menu on first run instead of load
This prevents Khoj from polling the Khoj server until explicitly
invoked via `khoj' entrypoint function.

Previously it'd make a request to the khoj server every time Emacs or
khoj.el was loaded

Closes #243
2023-10-16 19:11:46 -07:00
Debanjum
b4949f7f0b Improve Offline Chat Model Experience (#494)
- Make offline chat model user configurable. Use `filename` of any [GPT4All supported  model](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json) like below:
- Run GPT4All Chat Model on GPU, when available via [GPT4All Vulcan support](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan)
- Use default Llama 2 supported by GPT4All
- Make `tokenizer` and `max-prompt-size` of chat model user configurable. E.g When using chat models not in [this pre-defined list](https://github.com/khoj-ai/khoj/blob/master/src/khoj/processor/conversation/utils.py) that support larger context window or a different tokenizer.

Closes #406, #418
2023-10-16 17:44:49 -07:00
Debanjum Singh Solanky
644c3b787f Scale no. of chat history messages to use as context with max_prompt_size
Previously lookback turns was set to a static 2. But now that we
support more chat models, their prompt size vary considerably.

Make lookback_turns proportional to max_prompt_size. The truncate_messages
can remove messages if they exceed max_prompt_size later

This lets Khoj pass more of the chat history as context for models
with larger context window
2023-10-16 17:22:28 -07:00
Debanjum Singh Solanky
90e1d9e3d6 Pin gpt4all to 1.0.12 as next version will introduce breaking changes 2023-10-16 10:57:16 -07:00
Debanjum Singh Solanky
1a9023d396 Update Chat Actor test to not incept with prior world knowledge 2023-10-15 17:22:44 -07:00
Debanjum Singh Solanky
df1d74a879 Use max_prompt_size, tokenizer from config for chat model context stuffing 2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky
116595b351 Use chat_model specified in new offline_chat section of config
- Dedupe offline_chat_model variable. Only reference offline chat
  model stored under offline_chat. Delete the previous chat_model
  field under GPT4AllProcessorConfig

- Set offline chat model to use via config/offline_chat API endpoint
2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky
feb4f17e3d Update chat config schema. Make max_prompt, chat tokenizer configurable
This provides flexibility to use non 1st party supported chat models

- Create migration script to update khoj.yml config
  - Put `enable_offline_chat' under new `offline-chat' section
    Referring code needs to be updated to accomodate this change
  - Move `offline_chat_model' to `chat-model' under new `offline-chat' section
  - Put chat `tokenizer` under new `offline-chat' section
  - Put `max_prompt' under existing `conversation' section
    As `max_prompt' size effects both openai and offline chat models
2023-10-15 16:35:11 -07:00
sabaimran
c125995d94 [Multi-User]: Part 0 - Add support for logging in with Google (#487)
* Add concept of user authentication to the request session via GoogleUser
2023-10-14 19:39:13 -07:00
Debanjum Singh Solanky
247e75595c Use AutoTokenizer to support more tokenizers 2023-10-14 16:54:52 -07:00
Saba
ff2dbadc9d Use computed plaintext_content to set file content rather than calling f.read again 2023-10-14 13:28:34 -07:00
Debanjum Singh Solanky
1ad8b150e8 Add default tokenizer, max_prompt as fallback for non-default offline chat models
Pass user configured chat model as argument to use by converse_offline

The proper fix for this would allow users to configure the max_prompt
and tokenizer to use (while supplying default ones, if none provided)
For now, this is a reasonable start.
2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky
56bd69d5af Improve Llama v2 extract questions actor and associated prompt
- Format extract questions prompt format with newlines and whitespaces
- Make llama v2 extract questions prompt consistent

- Remove empty questions extracted by offline extract_questions actor
- Update implicit qs extraction unit test for offline search actor
2023-10-13 22:48:56 -07:00
sabaimran
09bb3686cc Strip the incoming query from the slash conversation command (#500)
* Strip the incoming query from the slash conversation command before passing it to the model or for search
* Return q when content index not loaded
* Remove -n 4 from pytest ini configuration to isolate test failures
2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky
96c0b21285 Sync desktop app package.json with other Khoj clients metadata
- Make `bump_version.sh' script set version for the Khoj desktop app too
- Sync Khoj desktop app authors, license, description and version with
  the other interfaces and server
- Update description in packages metadata to match project subtitle on Github
2023-10-13 20:43:55 -07:00
sabaimran
80fb56b8a5 Sync deksktop app package version with the other releases 2023-10-13 19:23:00 -07:00
Debanjum Singh Solanky
b669aa2395 Clean and fix the content indexing code in the Emacs client
- Pass payloads as unibyte. This was causing the request to fail for
  files with unicode characters
- Suppress messages with file content in on index updates
- Fix rendering response from server on index update API call
- Extract code to populate body of index update HTTP request with files
2023-10-13 18:00:37 -07:00
Debanjum Singh Solanky
bea196aa30 Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
Previously global state of `url-request-method' would affect the
kind of request made to api/config/data API endpoint as it wasn't
being explicitly being set before calling the API endpoint

This was done with the assumption that the default value of GET for
url-request-method wouldn't change globally

But in some cases, experientially, it can get changed. This was
resulting in khoj.el load failing as POST request was being made
instead which would throw error
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
292f0420ad Send content for indexing on server at a regular interval from khoj.el
- Allow indexing frequency to be configurable by user
- Ensure there is only one khoj indexing timer running
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
bed3aff059 Update tests to test multi-part/form method of pushing files to index
Instead of using the previous method to push data as json payload of POST request
pass it as files to upload via the multi-part/form to the batch indexer API endpoint
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
fc99431754 Send files to index on server from the khoj.el emacs client
- Add elisp variable to set API key to engage with the Khoj server
- Use multi-part form to POST the files to index to the indexer API
  endpoint on the khoj server
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
68018ef397 Use multi-part form to send files to index on desktop client
- Add typing for variables in for loop and other minor formatting clean-up
- Assume utf8 encoding for text files and binary for image, pdf files
2023-10-12 20:58:49 -07:00
Debanjum Singh Solanky
7190b3811d Remove all filter terms in user query from defiltered_query
Previously only the the last filter's terms were getting effectively
applied as the `filter.defilter' operation was being done on
`user_query' but was updating the `defiltered_query'
2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky
72f8fde7ef Run pytests in parallel on multiple CPU cores using pytest-xdist for speed 2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky
60e9a61647 Use multi-part form to receive files to index on server
- This uses existing HTTP affordance to process files
  - Better handling of binary file formats as removes need to url encode/decode
  - Less memory utilization than streaming json as files get
    automatically written to disk once memory utilization exceeds preset limits
  - No manual parsing of raw files streams required
2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky
9ba173bc2d Improve emoji, message on content index updated via logger
Use mailbox closed with flag down once content index completed.

Use standard, existing logger messages in new indexer messages, when
files to index sent by clients
2023-10-11 17:12:03 -07:00
Debanjum Singh Solanky
6aa69da3ef Put indexer API endpoint under /api path segment
Update FastAPI app router, desktop app and to use new url path to
batch indexer API endpoint

All api endpoints should exist under /api path segment
2023-10-09 21:35:58 -07:00
Debanjum Singh Solanky
148e8f468f Restrict openai package version below 1.0.0 to avoid breaking changes 2023-10-09 19:30:58 -07:00
Debanjum Singh Solanky
f6f7a62d80 Wait for user to stop typing to trigger search from khoj.el in Emacs
- Improves user experience by aligning idle time with search latency
  to avoid display jitter (to render results) while user is typing

- Makes the idle time configurable

Closes #480
2023-10-06 12:44:45 -07:00
sabaimran
5c4f0d42b7 Return new default config in API endpoint 2023-10-06 12:30:09 -07:00
sabaimran
052b25af0a Update default configuration passed to Khoj clients to circumvent valiation issues 2023-10-06 12:29:15 -07:00
Debanjum Singh Solanky
a85ff941ca Make offline chat model user configurable
Only GPT4All supported Llama v2 models will work given the prompt
structure is not currently configurable
2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky
d1ff812021 Run GPT4All Chat Model on GPU, when available
GPT4All now supports running models on GPU via Vulkan
2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky
13b16a4364 Use default Llama 2 supported by GPT4All
Remove custom logic to download custom Llama 2 model.
This was added as GPT4All didn't support Llama 2 when it was added to Khoj
2023-10-03 19:01:54 -07:00
sabaimran
4a5ed7f06c Update Khoj package version for Electron, Desktop app (#492)
* Address package upgrade for Electron application
* Update package version for Electron desktop application
2023-10-03 12:21:32 -07:00
sabaimran
3f962a55c3 Fix Linux Desktop Application (#491)
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
2023-10-03 11:43:19 -07:00
sabaimran
63b3696af0 Release Khoj version 0.12.3 2023-09-26 22:41:11 -07:00
sabaimran
d2f9bca1cf Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian 2023-09-26 22:33:44 -07:00
sabaimran
2f18383349 Release Khoj version 0.12.2 2023-09-26 11:59:47 -07:00
sabaimran
588f35b6e9 Add max prompt size for gpt-3.5-turbo-16k 2023-09-26 10:57:35 -07:00
sabaimran
99f9c3f8e2 Update setup instructions 2023-09-26 09:40:36 -07:00
sabaimran
4e370d7a18 Release Khoj version 0.12.1 2023-09-26 09:24:53 -07:00
sabaimran
3675aa348a Update naming of Khoj in manifest.json for Obsidian 2023-09-26 09:24:36 -07:00
sabaimran
4b6d8af218 Update metadata in manifest.json 2023-09-26 09:19:56 -07:00
sabaimran
a82d1becc3 Release Khoj version 0.12.0 2023-09-26 09:17:56 -07:00
sabaimran
38f0df3d53 Remove unused icons from electron app folder 2023-09-26 07:56:29 -07:00
sabaimran
29a64be939 Deprecate desktop build instructions from old setup 2023-09-25 22:02:02 -07:00
sabaimran
99995b2497 Add basic instructions for setting up the Khoj desktop interface 2023-09-25 21:08:14 -07:00
sabaimran
5e16074b92 Fix comparison for search type in plugins mode 2023-09-25 10:57:17 -07:00
sabaimran
efe5e09c3a Use jammy for docker base image due to dependency issue with arm64 image 2023-09-18 15:38:18 -07:00
sabaimran
6df728c445 Move bash command in Dockerfile into single line 2023-09-18 15:13:11 -07:00
sabaimran
96a9fa07f0 Fix conf test setup for offline chat 2023-09-18 15:05:15 -07:00
sabaimran
2dd15e9f63 Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483)
- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
2023-09-18 14:41:26 -07:00
sabaimran
8141be97f6 Update date filter test to use compiled rather than raw key 2023-09-18 11:24:56 -07:00
sabaimran
b225d1188c Fix formatting of gpt.py 2023-09-18 11:09:02 -07:00
Jonny-GM
34b202b868 More lenient date searching (#481)
* Modify DateFilter to use compiled entry key
* Instruct search to include date in query
* Minor prompt change
* Prompt fix
2023-09-18 10:46:00 -07:00
sabaimran
16874e1953 Provide force fallback for regeneration 2023-09-12 16:35:07 -07:00
sabaimran
9f42a1a036 Propagate flags to configure index command 2023-09-11 10:33:44 -07:00
sabaimran
343854752c Improve docker builds for local hosting (#476)
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
2023-09-08 17:07:26 -07:00
sabaimran
dccfae3853 Remove PySide dependency and deprecate desktop builds (#475)
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
2023-09-07 11:36:27 -07:00
sabaimran
76562f4250 Add front-end Electron application for Khoj local file syncing (#473)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
2023-09-06 12:04:18 -07:00
bholagabbar
205dc90746 Fix notion title bug (#474)
* Update notion_to_jsonl.py
* Fix try-catch block
2023-09-05 10:47:42 -07:00
sabaimran
922222a813 Fix anyio package version to avoid backwards compatibility issue with start_blocking_portal method 2023-08-31 14:14:13 -07:00
sabaimran
4854258047 Move to a push-first model for retrieving embeddings from local files (#457)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
2023-08-31 12:55:17 -07:00
sabaimran
92cbfef7ab Skip plaintext file indexing if there's a parsing issue and log the file 2023-08-29 14:34:08 -07:00
sabaimran
74409c2c64 Release Khoj version 0.11.4 2023-08-29 11:44:35 -07:00
sabaimran
1b85958bcc trim chat input start 2023-08-28 19:18:10 -07:00
sabaimran
e592f6eac8 Release Khoj version 0.11.3 2023-08-28 14:46:03 -07:00
sabaimran
7c35da9fc4 Fix bug in /chat endpoint for general and update depdendencies 2023-08-28 14:12:11 -07:00
Debanjum Singh Solanky
c93dcc948a Exclude tests data file from programming stats on Github
Git tag tests/data files with the linguist-vendored attribute to
prevent github from including them in stats.

Otherwise Khoj is getting marked as an HTML project due to the
tardigrades html page in tests data, when it's primarily a python
project currently
2023-08-28 11:00:52 -07:00
Debanjum Singh Solanky
59ffd1dc94 Document slash command and query filter in docs for chat and search 2023-08-28 11:00:52 -07:00
sabaimran
bc09143856 Release Khoj version 0.11.2 2023-08-28 10:16:13 -07:00
Debanjum
bc5e60defb Filter knowledge base used by chat to respond (#469)
- Overview
  - Allow applying word, file or date filters on your knowledge base from the chat interface
  - This will limit the portion of the knowledge base Khoj chat can use to respond to your query
2023-08-28 09:32:33 -07:00
Debanjum Singh Solanky
01b310635e Enable passing search query filters via chat and test it 2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky
794bad8bcb Make date_filter.extract_date_range method always return a list type 2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky
d5a2de6222 Add method to extract filter terms from query to all filters
- Test the get_filter_term method in all 3 word, file, date filters
- Make the existing can_filter method by default in base filter abstract class
2023-08-28 00:55:28 -07:00
Debanjum
150105505b Add Default chat command. Make Khoj ask clarifying questions (#468)
- Make Khoj ask clarifying questions when answer not in provided context
- Add default conversation command to auto switch b/w general, notes modes
- Show filtered list of commands available with the currently input text
- Use general prompt when no references found and not in Notes mode
- Test general and notes slash commands in offline chat director tests
2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky
319f066aec Test general and notes slash commands in offline chat director tests 2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
eb6cd4f8d0 Use general prompt when no references found and not in Notes mode 2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
edffbad837 Make Khoj ask clarifying questions when answer not in provided context
Previously it would just refuse ask for clarification. This improves
the chat quality score for the existing director tests
2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
75c1016ec0 Show filtered list of commands available with the currently input text 2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky
74605f6159 Add default conversation command to auto switch b/w general, notes modes
This was the default behavior but behavior regressed when adding slash
commands in PR #463
2023-08-28 00:46:10 -07:00
sabaimran
cbc978ea08 Update help links for notion, github to point to the main docs 2023-08-27 15:02:55 -07:00
sabaimran
b45e1d8c0d Fix plaintext HTML parsing and rendering (#464)
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
---------

Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2023-08-27 11:24:30 -07:00
Debanjum
7919787fb7 Use Slash Commands and Add Notes Slash Command (#463)
* Store conversation command options in an Enum

* Move to slash commands instead of using @ to specify general commands

* Calculate conversation command once & pass it as arg to child funcs

* Add /notes command to respond using only knowledge base as context

This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base

* Test general and notes slash commands in openai chat director tests

* Update gpt4all tests to use md configuration

* Add a /help tooltip

* Add dynamic support for describing slash commands. Remove default and treat notes as the default type

---------

Co-authored-by: sabaimran <narmiabas@gmail.com>
2023-08-26 18:11:18 -07:00
sabaimran
e64357698d Skip indexing single bad markdown, plaintext file (#460) 2023-08-23 15:34:56 -07:00
sabaimran
84bd579077 Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445. 2023-08-19 20:02:57 -07:00
sabaimran
f9e09ba490 Do not try downloading model from GPT4All if the user is not connected to the internet 2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky
3ff4e19dd2 Release Khoj version 0.11.1 2023-08-16 22:53:29 -07:00
sabaimran
4fb8c2c5e1 Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread 2023-08-16 21:27:05 -07:00
Debanjum Singh Solanky
34d5cd2bd8 Increase pytests workflow timeout duration to reduce intermittent failures
The test workflow fails regularly with an OperationCancelled error.
This is an intermittent failure that gets resolved on running the
failed workflows a few times.
2023-08-16 20:00:36 -07:00
sabaimran
4e03dfea43 Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446) 2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky
3c58ab5fcb Unmark Python 3.8 as supported in khoj-assistant pypi package 2023-08-16 00:58:59 -07:00
Debanjum Singh Solanky
26c3977fb9 Remove info hint to reindex khoj on unexpected search results
The index corruption was issue resolved a while ago in #325 and
hasn't cropped up again
2023-08-16 00:58:59 -07:00
sabaimran
def909a913 Revert "Open Web interface within Desktop app in GUI mode" (#444) 2023-08-15 23:26:28 -07:00
sabaimran
6562ec6531 Release Khoj version 0.11.0 2023-08-14 19:25:03 -07:00
sabaimran
064b2fbc4a Add a link to the FAQ in our docs (#438)
* Add a link to faq.khoj.dev in the docs
2023-08-14 15:05:08 -07:00
sabaimran
0ea901c7c1 Allow indexing to continue even if there's an issue parsing a particular org file (#430)
* Allow indexing to continue even if there's an issue parsing a particular org file
* Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files
* Change error of expected failure
2023-08-14 07:56:33 -07:00
sabaimran
7b907add77 Add support for indexing plaintext files (#420)
* Add support for indexing plaintext files
- Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md)
- Add equivalent frontend views for setting up plaintext file indexing
- Update config, rawconfig, default config, search API, setup endpoints
* Add a nifty plaintext file icon to configure plaintext files in the Web UI
* Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist
2023-08-09 15:44:40 -07:00
Debanjum Singh Solanky
84d774ea34 Retain desktop builds for 3 days to allow user tests
Upgrade minimum tiktoken version to work for encoding gpt4
2023-08-08 23:02:13 -07:00
Ellen7ions
26bddcb65c Add support for starting a new line with shift-enter (#412)
* Add support for starting a new line with shift-enter
* Remove useless comments. Set font-size: medium.
* Update src/khoj/interface/web/chat.html
Update the styling to have the padding, margin and line-height like before.
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/interface/web/chat.html
Make the chat-body scroll to the bottom after resizing
Co-authored-by: Debanjum <debanjum@gmail.com>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky
97609e4995 Use 500px png of khoj logo instead svg for much smaller asset size
The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb.
Given all usage of khoj-logo are below 230px this should work fine
2023-08-07 18:27:11 -07:00
Debanjum
14a816d173 Open Web interface within Desktop app in GUI mode (#429)
Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel
2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky
378b96ec1b Open the khoj app window maximized on startup 2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky
ea734ba1c8 Open app in native view on starting it in GUI mode instead of on web browser
- Opens settings page on first run and landing page after in GUI mode
  Previously was only opening the GUI on linux after first run as it
  doesn't have a system tray
- Both the views are from the web interface but are rendered within
  the app instead of the browser
2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
9c494705a8 Open the search, chat or config view in app from the system tray menu 2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
cc36b87345 Render the web interface directly within the desktop app as a webview 2023-08-07 13:41:12 -07:00
Debanjum
c832e456e0 Merge pull request #427 from Comprehensive-Jason/master-2
Update obsidian/manifest.json
2023-08-07 10:46:35 -07:00
Jason Qin
3ef1b7073d Update obsidian/manifest.json
Closes #426
2023-08-07 10:41:39 -07:00
sabaimran
738cf650b3 Explicitly set Khoj to use the default locale of the user (#425)
- Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).
2023-08-07 09:23:24 -07:00
Debanjum
cc951450fb Build Khoj Debian package on Ubuntu 20.04 to work with Glibc 2.31 (#424)
Build the Debian package using Ubuntu 20.04 instead of 22.04 as Ubuntu 20.04 comes pre-installed with glibc_2.31 unlike Ubuntu 22.04 which uses glibc_2.35
2023-08-06 23:21:51 -07:00
Debanjum Singh Solanky
75c16432a4 Loosen dateparser dependency to get python3.10 wheel for regex package
This should reduce chances of installation errors due to regex package
being built from source for python3.11

Previously, the regex dependency of dateparser = 1.1.1 didn't have a
wheel for python 3.11. This would trigger building the regex package
from scratch which would fail for a lot of folks
2023-08-06 22:48:40 -07:00
Debanjum Singh Solanky
8b41eb9f14 Create Pypi package on Ubuntu 20.04 LTS as well 2023-08-06 21:34:38 -07:00
Debanjum Singh Solanky
1cbacf20dc Build Khoj Debian package on Ubuntu 20.04 to work with glibc 2.31 2023-08-06 20:02:42 -07:00
Jason Qin
0bb5c808e5 Update manifest.json (#422)
Mark plugin as desktop-only in Obsidian to stop 'fails to load' messages in Obsidian Mobile
2023-08-06 14:21:32 -07:00
Muftawo
c8ef619090 fixed reference link to landing page (#417)
* Fixed zsh error no matches found
* Fixed home page 404 error
2023-08-04 10:38:14 -07:00
Debanjum
952bd39536 Merge pull request #413 from Muftawo/update-docs
updated the setup file path to fix the 404 error
2023-08-03 10:44:17 -07:00
Muftawo
18e94d9e60 updated the setup file path to fix the 404 error 2023-08-03 13:35:16 +00:00
sabaimran
78012b8111 Avoid null ref issue when setting model state for web UI. Closes #410 2023-08-03 00:39:06 -07:00
sabaimran
0baed742e4 Add checksums to verify the correct model is downloaded as expected (#405)
* Add checksums to verify the correct model is downloaded as expected
- This should help debug issues related to corrupted model download
- If download fails, let the application continue
* If the model is not download as expected, add some indicators in the settings UI
* Add exc_info to error log if/when download fails for llamav2 model
* Simplify checksum checking logic, update key name in model state for web client
2023-08-02 23:26:52 -07:00
sabaimran
6aa998e047 Add note about system requirements for Linux - debian installation. Closes #378. 2023-08-02 10:57:36 -07:00
sabaimran
d00f51b531 Fix the minimum version of the transformers library required to address #404 2023-08-02 09:10:14 -07:00
Debanjum Singh Solanky
e6e3acdbe4 Release Khoj version 0.10.1 2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky
7c1d70aa17 Bump GPT4All response generation batch size to 512 from 256
A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb
RAM. Seems worth the tradeoff for now
2023-08-01 23:34:02 -07:00
Debanjum Singh Solanky
e42fd8ae91 Make desktop app workflow apt update before install of linux packages
- See if this fixes the issue with the workflows failing to install
system packages

- Make the build desktop app run on changes to the workflow file as well
2023-08-01 23:15:13 -07:00
Debanjum
16c6bfce8e Improve Quality and Reliability of Offline Chat (#393)
# Incoming
## Major
### Fix Prompt Size Exceeded Issue
- Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not.

### Improve Llama 2 Model Download
- Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium
- Add better downloading logic to retry download if it failed, Closes #379 

### Fix Segmentation Fault due to Race
- Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367.
- Add a loading symbol to the web chat UI when the model is thinking. Closes #392

### Improve Chat Response Latency
- Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363

### Fix Fake Dialogue Continuation
- Fix formatting of user query with offline chat, this was contributing to #398
- Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398

## Minor
- Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
- Add null check in `perform_chat_checks` method
- Add offline chat director unit tests

## Performance Analysis (Time to First Token)
|  | v0.10.0 | this branch |
|-|-|-|
| Query 1 | 52s | 28s |
| Query 2 | 33s| 42s |
| Query 3 | 67s| 38s|
2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky
44292afff2 Put offline model response generation behind the chat lock as well
Not just the chat response streaming
2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky
1812473d27 Extract new schema version for each migration script into a variable
This should ease readability, indicates which version this
migration script will update the schema to once applied
2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky
b9937549aa Simplify migration scripts management. Make them use static version
- Only make them update config when it's run conditions are satisfies
- Use static schema version to simplify reasoning about run conditions
2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky
185a1fbed7 Remove old chat setup timer. It is mislabelled, irrelevant since streaming 2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
95acb1583d Update local Chat Actor and Director tests expected to fail 2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
c2b7a14ed5 Fix context, response size for Llama 2 to stay within max token limits
Create regression text to ensure it does not throw the prompt size
exceeded context window error
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
6e4050fa81 Make Llama 2 stop generating response on hitting specified stop words
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.

This is a jarring experience. Stop generation response when hit <s>

Resolves #398
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
aa6846395d Fix offline model migration script to run for version < 0.10.1
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
  it's temp filename while it is being downloaded
2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine
49abb9df9c Fix typo in orgnode.py (#397)
Fix spelling of Ouput in org parser property drawer comment to Output.
2023-08-01 19:54:57 -07:00
sabaimran
d8fa967b43 Update chat actor unit tests for greater accuracy and benchmarking 2023-08-01 12:24:43 -07:00
sabaimran
f409e16137 Update some of the extract question prompts for llamav2 2023-08-01 12:23:36 -07:00
sabaimran
b11b00a9ff Add log line for time to first response 2023-08-01 10:57:38 -07:00
sabaimran
778df6be71 Add a logline when the offline model migration script runs 2023-08-01 09:27:42 -07:00
sabaimran
48363ec861 Add additional check for chat_messages length in UT 2023-08-01 09:25:52 -07:00
sabaimran
3a5d93d673 Add migration script for getting the new offline model 2023-08-01 09:25:05 -07:00
sabaimran
90efc2ea7a Update comments and add explanations 2023-08-01 09:24:03 -07:00
sabaimran
f7e03f6d63 Switch spinner snake case -> camel case 2023-08-01 08:52:25 -07:00
sabaimran
1c52a6993f add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources
- This also solves #367
2023-08-01 00:23:17 -07:00
sabaimran
6c3074061b Disable the input bar when chat response is in flight 2023-08-01 00:21:39 -07:00
sabaimran
c14cbe926a Add a loading symbol to web chat. Closes #392 2023-07-31 23:35:48 -07:00
sabaimran
8054bdc896 Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU) 2023-07-31 23:25:08 -07:00
sabaimran
e55e9a7b67 Fix unit tests and truncation logic 2023-07-31 21:37:59 -07:00
sabaimran
2335f11b00 Add better error handling for download processes incase of failure 2023-07-31 21:07:38 -07:00
sabaimran
95c7b07c20 Make the fake message longer 2023-07-31 20:55:19 -07:00
sabaimran
8dd5756ce9 Add new director tests for the offline chat model with llama v2 2023-07-31 20:24:52 -07:00
sabaimran
209975e065 Resolve merge conflicts: let Khoj fail if the model tokenizer is not found 2023-07-31 19:12:26 -07:00
sabaimran
2d6c3cd4fa Misc. quality improvements for Llama V2
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
2023-07-31 19:11:20 -07:00
sabaimran
ca195097d7 Update chat hint message at first run 2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky
ded606c7cb Fix format of user query during general conversation with Llama 2 2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky
48e5ac0169 Do not drop system message when truncating context to max prompt size
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model

Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
2023-07-31 17:21:14 -07:00
Saba
02e216c135 Clarify usage in telmetry.md 2023-07-30 22:37:20 -07:00
Saba
7eabf8ab0f Add instructions for installing the desktop app and opting out of telemetry 2023-07-30 22:26:52 -07:00
sabaimran
88ef86ad5c Fix typing issues for mypy (#372) 2023-07-30 19:27:48 -07:00
sabaimran
ca2c942b65 Add typing to compiled_references and inferred_queries 2023-07-30 19:10:30 -07:00
sabaimran
dbb54cfcfa Merge branch 'master' of github.com:khoj-ai/khoj 2023-07-30 18:52:17 -07:00
sabaimran
3646fd1449 Add a warning to indicate that Khoj is not configured to work with personal data sources 2023-07-30 18:52:10 -07:00
sabaimran
996832dc72 Allow user to chat even if content types aren't configured - use empty references 2023-07-30 18:47:45 -07:00
Debanjum
41d36a5ecc Merge pull request #371 from felixonmars/patch-1
Correct typos in setup.md in the Khoj documentation
2023-07-30 18:37:22 -07:00
Felix Yan
f4fdfe8d8c Correct typos in setup.md 2023-07-31 03:32:56 +03:00
Debanjum Singh Solanky
28df08b907 Fix configure openai processor for khoj docker
Store khoj search models and embeddings in default location in docker
container under /root/.khoj
2023-07-30 02:07:33 -07:00
Debanjum Singh Solanky
dffbfee62b Fix sample khoj docker config to index test data using new schema 2023-07-30 01:48:18 -07:00
Debanjum Singh Solanky
53810a0ff7 Create khoj config dir if non-existant, before writing to khoj env file 2023-07-30 01:35:36 -07:00
Debanjum Singh Solanky
56394d2879 Update demo video to configure offline chat via the web interface 2023-07-29 19:17:40 -07:00
Debanjum Singh Solanky
b32673db8e Fix link to Docs website in Khoj readme on Github 2023-07-29 12:50:39 -07:00
Debanjum Singh Solanky
a3d1212e79 Align docs landing page with updated github readme
- Screenshots of khoj search, chat
- Put quickstart on landing page
- Put miscellaneous pages under separate section
- Move credits to separate page under miscellaneous
2023-07-29 12:42:36 -07:00
Debanjum Singh Solanky
d7205aed36 Update docs with setup instructions for Offline and Online Chat 2023-07-29 11:18:12 -07:00
Debanjum
0404e33437 Add screenshots, style content in README 2023-07-29 01:22:48 -07:00
sabaimran
f65d157244 Release Khoj version 0.10.0 2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky
f76af869f1 Do not log the gpt4all chat response stream in khoj backend
Stream floods stdout and does not provide useful info to user
2023-07-28 19:14:04 -07:00
sabaimran
5ccb01343e Add Offline chat to Obsidian (#359)
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
2023-07-28 18:47:56 -07:00
Debanjum
b3c1507708 Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs
- Configure using Offline Chat from Emacs: 
- Enable, Disable Offline Chat from Emacs

- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
2023-07-28 18:06:58 -07:00
sabaimran
9f78db0579 Let Offline chat override OpenAI API settings (#362)
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky
ebfbef1f68 Configure using offline chat from Emacs
Closes #358
2023-07-28 16:07:33 -07:00
Debanjum Singh Solanky
9b1048caf7 Remove asymmetric from name of remaining text search tests
Asymmetric search is the only search type used now in khoj.el. So
making distinction between between symmetric and asymmetric search
isn't necessary anymore
2023-07-28 15:33:22 -07:00
sabaimran
12cfb48f16 Fix gpt4all import error in Desktop builds (#356)
* Add gpt4all to imports via sysconfig path
2023-07-28 11:54:18 -07:00
Debanjum
4b0639cfbd Merge pull request #354 from ducksblock/master
Fix #353: Remove references to localhost:8000 in docs
2023-07-28 11:00:12 -07:00
ducksblock
cbecd7b66f Fix #353: Remove references to localhost:8000 2023-07-28 13:57:00 +05:30
sabaimran
702486dab7 Add gpt4all for copying metadata 2023-07-27 22:22:24 -07:00
sabaimran
29081f4429 Adjust parameters for offline chat 2023-07-27 22:22:09 -07:00
sabaimran
124d97c26d Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352)
* Working example with LlamaV2 running locally on my machine

- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format

* Add appropriate prompts for extracting questions based on a query based on llama format

* Rename Falcon to Llama and make some improvements to the extract_questions flow

* Do further tuning to extract question prompts and unit tests

* Disable extracting questions dynamically from Llama, as results are still unreliable
2023-07-27 20:51:20 -07:00
sabaimran
55965eea7d Delete FUNDING.yml
Instead of this file, use an organization-level file: https://github.com/khoj-ai/.github
2023-07-27 15:28:47 -07:00
sabaimran
925177b150 Update FUNDING.yml
Change to use a single organization (remove list brackets)
2023-07-27 15:19:20 -07:00
sabaimran
78197bb5c3 Create FUNDING.yml
- Add github sponsor information directly to khoj project. Closes #302
2023-07-27 15:16:45 -07:00
Debanjum Singh Solanky
da3f4dc7e4 Fix test config to run OpenAI Chat Actor, Director tests
OpenAI conversation processor schema had updated but conftest hadn't
been updated to reflect the same.

Update conftest setup of conversation processor to fix this
2023-07-27 11:30:04 -07:00
Debanjum Singh Solanky
715d56d4f0 Use new schema to update khoj.yml config from khoj.el 2023-07-26 17:34:16 -07:00
sabaimran
8b2af0b5ef Add support for our first Local LLM 🤖🏠 (#330)
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
2023-07-26 16:27:08 -07:00
sabaimran
23d77ee338 Fix import issues in desktop image builds (#343) 2023-07-26 15:45:52 -07:00
Justin Bassett-Green
8dcc21052f Add chat-model param in sample config yml and document (#341)
* add chat-model config param to docs

* add chat-model param to sample config yml
2023-07-22 16:53:08 -07:00
Debanjum Singh Solanky
5bb42e56a8 Fix formatting of khoj test config and unused references in conftests 2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky
7722a9c347 Default to using the gpt-3.5-turbo model for chat from khoj.el 2023-07-22 00:29:26 -07:00
Saba
36d25c4f1d Center the title, add table headers 2023-07-21 23:36:38 -07:00
Saba
01b6a10cd1 Simplify readme 2023-07-21 23:30:44 -07:00
sabaimran
4ce072c4b3 Make the README on our Github minimal (#334)
* Make the README on our Github minimal
* Add a bit of formatting and more background
2023-07-21 23:29:04 -07:00
Debanjum Singh Solanky
4089e38283 Fix links to demos and screenshots in docs 2023-07-21 20:01:19 -07:00
Debanjum Singh Solanky
89ad362758 Update Screenshots and Demos in Docs 2023-07-21 15:22:35 -07:00
Debanjum Singh Solanky
f0d4a4cf9a Revert "Make configure_content functional. Do not pass content index state to it."
This reverts commit 2ddee7e745 as it
broke partial updates of the content index for just the specified
content types
2023-07-21 13:59:09 -07:00
sabaimran
82c725817e Merge branch 'master' of github.com:khoj-ai/khoj 2023-07-21 13:24:05 -07:00
sabaimran
596e11ec6d Use the same function for computing entries for IDs regardless of whether it has prev entries 2023-07-21 13:23:56 -07:00
Saba
634f0b4cc4 Fix docs indexing issue 2023-07-21 08:30:00 -07:00
Debanjum Singh Solanky
c28755ccd2 Fix diff blocks, links, remove footnotes & rearrange sections in docs
Extract performance into separate sectin into shoving it under search
Create page for web interface
2023-07-21 00:58:30 -07:00
Debanjum Singh Solanky
2ddee7e745 Make configure_content functional. Do not pass content index state to it. 2023-07-20 23:24:08 -07:00
Debanjum
e92bc0e2e6 Create CNAME to make Docs accessible at docs.khoj.dev 2023-07-20 23:24:08 -07:00
sabaimran
1610d2ebd9 📝 Add a documentation base for Khoj! (#333)
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky
3e59be7f1d Release Khoj version 0.9.0 2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky
d078e7b1f6 Clean up search type usage in khoj server, tests and Readme 2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky
4d910936b7 Fix triggering index update on khoj server from khoj.el 2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky
5c7d7f558d Make AI model used for Khoj chat configurable from khoj.el
- Fix bug. Set the unused model-name to a standad default value
2023-07-18 19:57:54 -07:00
Debanjum
5f2be2a9bb Merge pull request #298 from HyunggyuJang/patch-1
Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config
2023-07-18 17:54:11 -07:00
Debanjum
3a1c5a6dab Merge pull request #329 from khoj-ai/create-schema-migration-func-and-reindex-to-fix-corruption
Create Schema Migrator and Reindex to Apply Index Corruption Fixes

- 83e1088 Manage `khoj.yml' config migrations on app start. Version the `khoj.yml' schema
- 429e1b4 Regenerate index to apply corruption fixes on first run of this khoj version
   Otherwise users would need to manually re-index their contents with khoj
2023-07-18 16:43:17 -07:00
Debanjum Singh Solanky
429e1b4b48 Regenerate index to apply corruption fixes on first run of new khoj 2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky
83e1088d42 Manage khoj.yml config migrations on app start. Version the schema
- Add version to khoj.yml schema
  Versioning the khoj.yml config schema will simplify future migrations
2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky
71e8ddd9a2 Check if PDF is configured before showing it as an option in khoj.el 2023-07-17 15:49:20 -07:00
Debanjum
d00c5da8b7 Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing
## Stabilize and Simplify Content Indexing

### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
  - Refer: 1482fd4, 7669b85, 88d1a29 
  - ad41ef3 Make normalization of embeddings configurable to test this in c73feeb

### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method 
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods

Resolves #190
2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky
3e3a1ecbc8 Start app even if server init fails to let user fix it
Show stacktrace on error to help debugging
2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky
ef6a0044f4 Drop embeddings of deleted text entries from index
Previously the deleted embeddings would continue to be in the index,
even after the entry was deleted
2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky
c73feebf25 Test index embeddings are stable on incremental update & no norm
Ensure order of new embedding insertion on incremental update
does not affect the order and value of existing embeddings when
normalization is turned off
2023-07-16 02:22:28 -07:00
Debanjum Singh Solanky
ad41ef3991 Make normalizing embeddings configurable 2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky
1482fd4d4d Test index is stable sorted on incremental update with new entry
Ensure order of new embedding, entry insertion on incremental update
is stable
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
b02323ade6 Improve name of text search test functions
Asymmetric was older name used to differentiate between symmetric,
asymmetric search.

Now that text search just uses asymmetric search stick to simpler name
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
89c7819cb7 Unify logic to generate embeddings from scratch and incrementally
This simplifies the `compute_embeddings' method and avoids potential
later divergence in handling the index regenerate vs update scenarios
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6a0297cc86 Stable sort new entries when marking entries for update 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
7669b85da6 Test index is stable sorted on regenerate with new entry 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6e70b914c2 Remove unused dump_jsonl method
The entries index is stored ingzipped jsonl files for each content type
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
9bcca43299 Use single func to handle indexing from scratch and incrementally
Previous regenerate mechanism did not deduplicate entries with same key
So entries looked different between regenerate and update
Having single func, mark_entries_for_update, to handle both scenarios
will avoid this divergence

Update all text_to_jsonl methods to use the above method for
generating index from scratch
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
1673bb5558 Add todo state to compiled form of each org-mode entry 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
88d1a29a84 Test index is stable for duplicate entries across regenerate, update
- Current incorrect behavior:
  All entries with duplicate compiled form are kept on regenerate
  but on update only the last of the duplicated entries is kept

This divergent behavior is not ideal to prevent index corruption
across reconfigure and update
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
da98b92dd4 Create helper function to test value, order of entries & embeddings
This helper should be used to observe if the current embeddings are
stable sorted on regenerate and incremental update of index in text
search tests
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
7ad96036b0 Improve lock name to config_lock instead of search_index_lock
It is used to lock updates to all app config state, including processor
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
58d86d7876 Use single func to configure server via API and on server start
Improve error messages on failure to configure server components
2023-07-16 01:45:53 -07:00
sabaimran
a15711e635 Fix null type checks in get /config 2023-07-15 15:53:56 -07:00
sabaimran
e590d75b20 Start Khoj even when config is not valid (#320)
* Add icon to indicate bad config, start Khoj even if there was an issue setting up the index
2023-07-15 14:11:54 -07:00
sabaimran
49ab201c30 Fix issues importing PySide in Docker container (#322)
* Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode
2023-07-15 13:33:13 -07:00
sabaimran
ba47f2ab39 Merge branch 'master' of github.com:debanjum/khoj 2023-07-14 22:28:05 -07:00
sabaimran
874cffd256 Add additional support for parsing notion workspaces 2023-07-14 22:27:56 -07:00
Debanjum
52f68167ce Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication
Reuse Search Models across Content Types to reduce Memory Consumption

- Memory consumption now only scales with search models used, not with content types. 
  Previously each content type had it's own copy of the search ML models. 
  That'd result in 300+ Mb per enabled text content type

- Split model state into 2 separate state objects, `search_models` and `content_index`. 
  This allows loading text_search and image_search models first
  and then reusing them across all content_types in content_index

- The change should cut down memory utilization quite a bit for most users.
  I see a >50% drop in memory utilization on my Khoj instance. 
  But this will vary for each user based on the amount of content indexed vs number of plugins enabled.

- This change does not solve the RAM utilization scaling with size of the index,
  as the whole content index is still kept in RAM while Khoj is running

Should help with #195, #301 and #303
2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky
f08e9539f1 Release lock after updating index even if update fails to prevent deadlock
Wrap acquire/release locks in try/catch/finally when updating content
index and search models to prevent lock not being released on error
and causing a deadlock
2023-07-14 16:57:27 -07:00
sabaimran
37f7f9fd1d Add additional telemetry for system understanding (#316)
* Add additional telemetry in order to understand which data sources are the most useful
* Make actions side by side in the configuration page
* Restore main run command
* Update links to point to wiki pages for Github, Notion integrations
* Stanardize nomenclature of the api_type to use _config suffix

Remove header fields that aren't actually helpful for understanding config usage
2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky
b9fb656657 Update Tests to setup both content_index, search_models before testing
This is required by the updated structure of Khoj setup

- Add content_config pytest fixture, pass bi_encoder from
  search_models.[text|image]_search
2023-07-14 01:29:48 -07:00
Debanjum Singh Solanky
86e2bec9a0 Reuse Search Models across Content Types to Reduce Memory Consumption
- Memory consumption now only scales with search models used, not with
  content types as well. Previously each content type had it's own
  copy of the search ML models. That'd result in 300+ Mb per enabled
  content type

- Split model state into 2 separate state objects, `search_models' and
  `content_index'.
  This allows loading text_search and image_search models first and then
  reusing them across all content_types in content_index

- This should cut down memory utilization quite a bit for most users.
  I see a ~50% drop in memory utilization.

  This will, of course, vary for each user based on the amount of
  content indexed vs number of plugins enabled

- This does not solve the RAM utilization scaling with size of the index.
  As the whole content index is still kept in RAM while Khoj is running

Should help with #195, #301 and #303
2023-07-14 01:27:22 -07:00
sabaimran
c2249eadb2 Add a Github workflow that allows you to build dev versions of Desktop applications (#309)
* Add a Github workflow that allows you to build dev versions of Desktop applications
* Add pull_request trigger for testing
* Fix errant open quote in Package Khoj App step
* Nix the release step, since this isn't associated with any tags
- Set retention period for uploaded artifacts to 1 day
* Remove pull_request trigger - limit to manual triggers and pushes to master
2023-07-13 22:11:39 -07:00
Debanjum
b2718d330c Merge pull request #304 from migrate-from-pyqt-to-pyside
Migrate from PyQT6 to PySide6
2023-07-13 11:54:47 -07:00
sabaimran
31e933207f Set default values for sys.stdout if they're unavailable 2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky
9c76150895 Migrate from PyQT6 to PySide6 2023-07-11 18:43:44 -07:00
Debanjum
83ed8561ee Reduce size of Docker image and build it from local code
- Improvements
  - Install Khoj on Docker from local code instead of pulling from Github
  - Reduce Khoj Docker image size by 2Gb by not caching installed pip packages. Refer [issue comment](https://github.com/khoj-ai/khoj/issues/148#issuecomment-1627443570)
2023-07-11 01:30:06 -07:00
HyunggyuJang
88c42b3043 Encode data as utf-8
otherwise it will complain, see 1c85531090
2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky
6308388dfc Install Khoj on Docker from local app instead of pulling from github
Just use a random static version for Khoj on the Docker as otherwise
the hatch vcs dynamic versioning requires the .git directory in the
docker image too
2023-07-11 00:41:05 -07:00
Debanjum Singh Solanky
802472cd99 Reduce Khoj Docker image size by 2Gb by not caching pip packages
Resolve #148
2023-07-10 23:27:02 -07:00
Debanjum Singh Solanky
f664a74e77 Update Khoj server to run on non standard port, 42110 instead of 8000
Resolves #295
2023-07-10 21:27:58 -07:00
Debanjum Singh Solanky
bfd516c1a4 Deprecate (unmaintained) support to setup Khoj via Conda 2023-07-10 21:27:58 -07:00
Debanjum Singh Solanky
58c2c3b71a Add Documentation to Release Khoj 2023-07-10 21:27:58 -07:00
sabaimran
effb52f859 Fix demo rendering with the new header 2023-07-10 21:16:19 -07:00
sabaimran
55f5be7b03 Release Khoj version 0.8.2 2023-07-10 14:39:32 -07:00
sabaimran
9a63f89f33 Merge branch 'master' of github.com:debanjum/khoj 2023-07-10 14:31:19 -07:00
sabaimran
53809298c0 Release Khoj version 0.8.1 2023-07-10 14:30:04 -07:00
tjsousa
5b37e988e6 Allow using configured GPT chat model (#292)
My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.
2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky
75ff871217 Release Khoj version 0.8.0 2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky
979088b3dc Add tooltip helper text on web settings page buttons
- Provide more details on what clicking configure, initialize buttons
  or changing the results count slider does
- This shows up on user hovering over those buttons
2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky
255781e135 Use relative link on logo to jump to correct page on local and cloud 2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky
b2d229c116 Move header pane style to base khoj.css for reuse. Fix logo size 2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky
f4cef377ca Add details to run, configure Khoj via Web in Readme 2023-07-10 12:10:20 -07:00
Debanjum Singh Solanky
20cb314171 Open the Khoj config page in the browser on first run 2023-07-10 12:10:20 -07:00
sabaimran
07cf5a214a Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293) 2023-07-10 10:33:04 -07:00
sabaimran
7364bac8ae Make the header take up less space
- Use a single row for the header
- Needed custom styling for each page because each of them are different in subtle ways, unfortunately
2023-07-09 22:31:37 -07:00
sabaimran
62704cac09 Add a plugin which allows users to index their Notion pages (#284)
* For the demo instance, re-instate the scheduler, but infrequently for api updates
- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session
* Conditionally skip updating the index altogether if it's a demo isntance
* Add backend support for Notion data parsing
- Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token
- Make corresponding updates to the default config, raw config to support the new notion addition
* Add corresponding views to support configuring Notion from the web-based settings page
- Support backend APIs for deleting/configuring notion setup as well
- Streamline some of the index updating code
* Use defaults for search and chat queries results count
* Update pagination of retrieving pages from Notion
* Update state conversation processor when update is hit
* frequency_penalty should be passed to gpt through kwargs
* Add check for notion in render_multiple method
* Add headings to Notion render
* Revert results count slider and split Notion files by blocks
* Clean/fix misc things in the function to update index
- Use the successText and errorText variables appropriately
- Name parameters in function calls
- Add emojis, woohoo
* Clean up and further modularize code for processing data in Notion
2023-07-09 15:29:26 -07:00
Debanjum
77755c0284 Fix Packaging the Khoj Desktop Apps (#289)
* Add langchain static files and pytorch metadata to Khoj native app

* Add pillow static files, metadata & hidden imports to Khoj native app

* Fix path to web interface static files on Khoj native app

* Add tiktoken hidden imports to make chat work from Khoj native app

* Fix Khoj native app to run with GUI mode enabled

This got broken when we moved from using the --no-gui flag to using
--gui in https://github.com/khoj-ai/khoj/pull/263
2023-07-09 10:21:16 -07:00
sabaimran
4c135ea316 Make streaming optional for the /chat endpoint (#287)
* Update the /chat endpoint to conditionally support streaming

- If streams are enabled, return the threadgenerator as it does currently
- If stream is disabled, return a JSON response with the response/compiled references separated out
- Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian
- Rename chat/init/ to chat/history

* Update khoj.el to use the /history endpoint

- Update corresponding unit tests to use stream=true

* Remove & from call to /chat for obsidian

* Abstract functions out into a helpers.py file and clean up some of the error-catching
2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky
0a86220d42 Use default values, delete content config on disable and update state 2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky
362063f5fe By default, connect to Khoj server over IPv4 from Obsidian plugin 2023-07-07 20:36:16 -07:00
Debanjum Singh Solanky
571e8c2548 Add rerank, index corruption hint on search page of web interface
Similar to the hint alrady in the Obsidian search modal
Closes #272
2023-07-07 20:36:16 -07:00
Debanjum
4b79d8216f Move remaining chat actors to use OpenAI chat models
- Deprecate the unused beta /answer and /search type identification endpoints and associated GPT functions
- Update extract_questions to use GPT4
- Update summarize method to default to GPT-3.5
- Update date filter to support quoting values in single quotes too. So now both dt>'2023-04-01' and dt>"2023-04-01" should work
- Remove "model" field from chat settings on the web interface
2023-07-07 18:53:05 -07:00
Debanjum Singh Solanky
61e131f95c Hide unused model field from chat settings on web interface 2023-07-07 18:43:53 -07:00
Debanjum Singh Solanky
af30d01e85 Move to newer chat models to extract questions & summarize chats
Deprecate usage of the older gpt3 models in-place of the newer chat
based models
- text-davinci-003 is only 50% cheaper than gpt4 and less reliable for
  question extraction
- Using gpt-3.50turbo for summarization should reduce cost of chat

- Keep conversation.chat_session as a list instead of a string
- Update completion_with_backoff func to use ChatML format
2023-07-07 17:32:27 -07:00
Debanjum Singh Solanky
171ce19e1f Update date filter to allow quoting values in single quotes 2023-07-07 17:13:47 -07:00
Debanjum Singh Solanky
e588f7c528 Deprecate unused beta search and answer API endpoints 2023-07-07 16:38:07 -07:00
Debanjum Singh Solanky
c9fc4d1296 Revert to using cross-encoder to improve search results used by chat 2023-07-07 15:31:34 -07:00
Debanjum Singh Solanky
11f0a9f196 Fix chat tests since streaming. Pass args correctly to chat methods
- Fix testing gpt converse method after it started streaming responses
- Pass stop in model_kwargs dictionary and api key in openai_api_key
  parameter to chat completion methods. This should resolve the arg
  warning thrown by OpenAI module
2023-07-07 15:23:44 -07:00
Debanjum Singh Solanky
48870d9170 Fix parsing questions generated by extract_questions actor into list
The previous json parsing was failing to handle questions with date
filters

Fix the chat actor tests to run without throwing error with freezegun
complaining about importing transformers.local_llama model

Remove quote escapes from date filter examples provided to
extract_questions actor
2023-07-07 15:18:55 -07:00
Debanjum Singh Solanky
279662620b Move results count to settings page on web. Use it for search & chat
- Before
  Only the search interface had the results count configuration option

- After
  - The results count is set on the settings page instead of the
    search page
  - Both search and chat can use the configured results count instead
    of just search
2023-07-07 14:08:08 -07:00
Debanjum Singh Solanky
2ec8da89e8 Remove Update button from Khoj Search page on the Web interface
The settings page on the Khoj web interface already has a configure
button. Don't need the Update button on the search page as well
2023-07-07 12:49:58 -07:00
Debanjum Singh Solanky
bf427cd8dd Set no. of results used to generate chat response from Khoj Emacs 2023-07-07 12:34:50 -07:00
Debanjum Singh Solanky
1d77fe712c Set no. of results used to generate chat response from Khoj Obsidian 2023-07-07 12:32:32 -07:00
Debanjum Singh Solanky
2f31de5ed5 Set no. of references to use for chat configurable in Chat API 2023-07-07 12:29:36 -07:00
Debanjum Singh Solanky
d97682fdac Use tooltip, placeholders to guide Khoj setup via web settings page 2023-07-06 21:37:48 -07:00
Debanjum Singh Solanky
f5cf09424b Use more descriptive field names for content type settings on Khoj web
Resolves #281
2023-07-06 20:47:39 -07:00
Debanjum Singh Solanky
a2c668268f Use node-fetch >=3.1.0 in khoj obsidian plugin to avoid security vulnerability 2023-07-06 13:05:52 -07:00
sabaimran
d688ddf92c Re-instate the scheduler for the demo instances (#279)
* For the demo instance, re-instate the scheduler, but infrequently for api updates

- In constants, determine the cadence based on whether it's a demo instance or not
- This allow us to collect telemetry again. This will also allow us to save the chat session

* Conditionally skip updating the index altogether if it's a demo isntance
2023-07-06 11:01:32 -07:00
Debanjum Singh Solanky
8f36572a9b Improve typing, null checks in controllers and gpt functions 2023-07-05 20:49:25 -07:00
Debanjum Singh Solanky
41ac1e24c9 Add docs for a pre-emptive setup of Khoj for later offline usage
Closes #151
2023-07-05 20:48:51 -07:00
Debanjum
6c2a8a5bce ️ Stream Responses by Khoj Chat on Web, Obsidian
- What
   - Stream chat responses from OpenAI API to Web, Obsidian clients
      - Implement using a callback function which manages a queue where new tokens can be placed as they come on. As the thread is read from, tokens are removed.
      - When the final token has been processed, add the `compiled_references` to the queue to be rendered by the `chat` client
      - When the thread has been closed, save the accumulated conversation log in the user's history using a `partial func`
      - Incrementally decode tokens on the front end and add them as they appear from the streamed response

- Why
This significantly reduces perceived latency and OpenAI API request timeouts for Chat

Closes https://github.com/khoj-ai/khoj/issues/257
2023-07-05 20:02:11 -07:00
Debanjum Singh Solanky
e111eda6ae Make client, app_config optional in telemetry logger for correct typing 2023-07-05 18:57:38 -07:00
Debanjum Singh Solanky
e562114f6b Improve comments, var names in js for chat streaming on web interface 2023-07-05 18:57:27 -07:00
Debanjum Singh Solanky
46269ddfd3 Fix chat logging messages to get context without flooding logs 2023-07-05 18:27:06 -07:00
Debanjum Singh Solanky
0ba838b53a Show temp status message in Khoj Obsidian chat while Khoj is thinking
- Scroll to bottom after adding temporary status message and
references too
2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky
8271abe729 Use optional chaining operator to extract khojBannerSubmit from conditional 2023-07-05 18:02:43 -07:00
Debanjum Singh Solanky
c12ec1fd03 Show temp status message in Khoj web chat while Khoj is thinking
- Scroll to bottom after adding temporary status message and
references too
2023-07-05 18:02:30 -07:00
sabaimran
257a421e45 Bonus: add try-catch logic around telemetry upload in case of JSON serializability issues 2023-07-05 15:12:18 -07:00
sabaimran
4e6b66b139 Add support for streaming chat response from OpenAI to Obsidian
- I needed to installed node-fetch to accomplish this, as the built-in request object from Obsidian doesn't seem to support streaming and the built-in fetch object is very sensitive to any and all cross origin requests
2023-07-05 15:01:22 -07:00
sabaimran
3ff5074cf5 Log the end-to-end time of generating a streamed response from OpenAI 2023-07-05 14:59:44 -07:00
sabaimran
68e635cc32 Remove additional comments and debug statements 2023-07-05 11:33:56 -07:00
sabaimran
67a8795b1f Clean-up commented out code 2023-07-05 11:24:40 -07:00
sabaimran
79b1b1d350 Save streamed chat conversations via partial function passed to the ThreadGenerator 2023-07-04 17:33:52 -07:00
sabaimran
afd162de01 Add reference notes to result response from GPT when streaming is completed
- NOTE: results are still not being saved to conversation history
2023-07-04 12:47:50 -07:00
sabaimran
8f491d72de Initial code with chat streaming working (warning: messy code) 2023-07-04 10:14:39 -07:00
Debanjum Singh Solanky
5889eceba4 Make text selectable in Khoj chat modal on Obsidian
Previously the text in the Khoj chat modal couldn't be copied as it
was not selectable

Resolves #206
2023-07-03 23:24:04 -07:00
sabaimran
89354def9b Update request timeout window to 20 seconds 2023-07-03 22:28:18 -07:00
sabaimran
b1940519c3 Log error if unable to decode chunk from Github 2023-07-03 16:29:32 -07:00
Debanjum Singh Solanky
ecf9730cd7 Disable Chat, Search on Web if Khoj not configured & show next steps 2023-07-03 16:04:32 -07:00
sabaimran
017e8c1aef Skip indexing a PDF that has an indexing error (#274) 2023-07-03 15:55:11 -07:00
sabaimran
a6f313589e Release Khoj version 0.7.1 2023-07-03 12:26:41 -07:00
Debanjum Singh Solanky
70f6b8266c Upgrade minimum supported pydantic version 2023-07-03 12:22:56 -07:00
sabaimran
8bfd5828e6 Remove deprecation notice since we're opening the web UI by default 2023-07-03 12:01:09 -07:00
sabaimran
92d81d3b16 Initialize the search.model field to SearchModels() and fix Reinitialize API call (#273) 2023-07-03 11:32:44 -07:00
sabaimran
61403138d5 Merge pull request #269 from khoj-ai/features/simplify-configuration-steps
Simplify some common configuration steps
2023-07-03 00:16:51 -07:00
sabaimran
ea3dc2cfa3 Simplify rendering of content type pages and logic of selecting config 2023-07-03 00:15:29 -07:00
sabaimran
260272dca2 Check if state.config is populated before configuring via the update method 2023-07-03 00:10:56 -07:00
sabaimran
bf8914d0c8 Fix default config initialization for for chat.html 2023-07-03 00:00:47 -07:00
Debanjum
faad1297f4 Drop Support for Org Music, Ledger Content Types
Removing unused content types will reduce khoj code to manage

- 0f993b3 Drop support for Ledger as a separate content type
   Khoj will soon get a generic text indexing content type in Index plain text files #237.
   This along with a file filter should suffice for searching through Ledger transactions

- c9db532 Remove unused org-music as an indexable content type from Khoj
   Org-music was just a custom content type that worked with org-music.
   It was mostly only useful for me.
2023-07-02 17:48:29 -07:00
Debanjum Singh Solanky
0f993b332e Drop support for Ledger as a separate content type
Khoj will soon get a generic text indexing content type. This along
with a file filter should suffice for searching through Ledger
transactions, if required.

Having a specific content type for niche use-case like ledger isn't
useful. Removing unused content types will reduce khoj code to manage.
2023-07-02 16:57:49 -07:00
sabaimran
fa218ff5aa Fix call to update for Reinitialize button 2023-07-02 16:31:30 -07:00
sabaimran
a8b83da872 Merge branch 'master' of github.com:debanjum/khoj into features/simplify-configuration-steps 2023-07-02 16:21:54 -07:00
Debanjum Singh Solanky
c9db5321e7 Remove unused org-music as an indexable content type from Khoj
Org-music was just a custom content type that worked with org-music.
It was mostly only useful for me.

Cleaning up that code will reduce number of content types for khoj to
manage.
2023-07-02 16:21:21 -07:00
sabaimran
77a45f4215 Merge pull request #265 from khoj-ai/fix/obsidian-setup-issues
Fix configuration setup logic in Obsidian
2023-07-02 16:21:18 -07:00
sabaimran
b86a3bb0c5 Merge branch 'master' of github.com:debanjum/khoj into fix/obsidian-setup-issues 2023-07-02 16:21:05 -07:00
sabaimran
a52c1c8380 Use built-in app.vault to determine whether there are any PDF files within 2023-07-02 16:20:43 -07:00
sabaimran
eff1436857 Overwrite existing PDFs in Obsidian as well, make if-block more legible 2023-07-02 16:17:25 -07:00
Debanjum Singh Solanky
30459ee4ba Fix Khoj subtitle in desktop entry, pyproject, cli and Obsidian Readme 2023-07-02 16:09:07 -07:00
sabaimran
feac71ce1e Merge pull request #268 from khoj-ai/fix/threading-issue-in-update-api
Add try-except-finally blocks around configure calls in /update
2023-07-02 16:08:29 -07:00
sabaimran
1a1b044d12 Simplify settings pages for configuration
- Add one-click disablement
- Remove fields that probably don't need to be edited (our implementation details)
- Add a green tick if a given field is configured
2023-07-02 16:04:05 -07:00
sabaimran
e4c445f805 Add try-except-finally blocks around configure calls in /update 2023-07-02 13:35:02 -07:00
sabaimran
4b02a8c788 Fix PDF setup in Obsidian plugin and force Obsidian configuration for markdown 2023-07-02 12:37:24 -07:00
sabaimran
b6772d8fc3 Merge pull request #264 from khoj-ai/fix/remove-guidance-for-desktop-gui
Escape special characters in the URL when adding a link to the remote file
2023-07-02 09:14:08 -07:00
sabaimran
2a7e4f2b71 Escape special characters in the URL when adding a link to the remote file 2023-07-02 09:13:28 -07:00
sabaimran
4915b7214d Merge pull request #263 from khoj-ai/fix/remove-guidance-for-desktop-gui
[Fix] Remove the default behavior of using GUI for Khoj
2023-07-01 21:37:11 -07:00
sabaimran
c747562897 Update the GUI to just be a simple box with a button for the web UI 2023-07-01 20:37:21 -07:00
sabaimran
bab7f39d47 Move logic to open the web browser into the GUI section 2023-07-01 20:11:27 -07:00
sabaimran
36537606da Update unit test and preserve prior operational ordering in main.py 2023-07-01 20:02:35 -07:00
sabaimran
ea9ae4ae28 Configure Khoj to automatically open the browser to their web home page when Khoj is up 2023-07-01 19:46:31 -07:00
sabaimran
d2083dd395 Remove bespoke processing for GithubToJsonl file demo 2023-07-01 19:09:22 -07:00
sabaimran
a71440f62a Update the guidance in the error message if config is not set 2023-07-01 19:09:00 -07:00
sabaimran
7db97d8aa9 Fix: don't try to render the search_type.ALL 2023-07-01 19:08:19 -07:00
sabaimran
f0f6390366 Make --no-gui the default behavior of Khoj and update corresponding documentation 2023-07-01 19:07:59 -07:00
Debanjum Singh Solanky
2fbc609233 Add content write permission to jobs in github release workflow 2023-07-01 06:23:45 -07:00
Debanjum Singh Solanky
d77e05c279 Release Khoj version 0.7.0 2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky
32d73500ba Update Khoj Github Plugin details in main Readme 2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky
30d87a9a01 Update color of Khoj chat in Obsidinan plugin to Lantern theme 2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky
51826d28d6 Ensure clicking Update in Khoj Obsidian indexes PDF files too 2023-07-01 02:18:47 -07:00
sabaimran
dac2d14380 Handle file names appropriately for md files and render commits in github results 2023-07-01 01:20:58 -07:00
sabaimran
dbe713604d Fix error in tests for markdown_to_jsonl 2023-07-01 00:49:40 -07:00
sabaimran
931aab4464 Handle case for when headers value is None 2023-07-01 00:37:30 -07:00
sabaimran
d01afb3ee4 Fix path issues for URL-based markdown files 2023-07-01 00:25:11 -07:00
sabaimran
01aa285d7b Merge pull request #260 from khoj-ai/features/add-demo-views-for-khoj
Add demo view for Khoj
2023-06-30 21:57:43 -07:00
sabaimran
31655447e7 Add the sign-up list to the chat page as well and update copy 2023-06-30 21:43:01 -07:00
sabaimran
cebaa51c2f Merge branch 'master' of github.com:debanjum/khoj into features/add-demo-views-for-khoj 2023-06-30 20:39:02 -07:00
sabaimran
796102c74e Add separate configuration if the given Khoj instance is meant for demo
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
2023-06-30 20:38:55 -07:00
sabaimran
a443af3a71 Merge pull request #256 from khoj-ai/features/improve-telemetry
Add additional request headers to improve telemetry
2023-06-30 20:35:41 -07:00
sabaimran
db3026739d Resolve diffs in api.py to make /chat endpoint async with new request parameter 2023-06-30 00:25:37 -07:00
sabaimran
ef72508914 Try/catch around github file decoding, await call to search in chat API, fix img width 2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky
b950889f47 Fix org-mode web renderer to handle results containing list in block
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface

Should try fix this upstream in org.js as well
2023-06-29 19:01:25 -07:00
sabaimran
780c769567 Add additional request headers to improve telemetry 2023-06-29 18:51:24 -07:00
sabaimran
6c10d68262 Merge pull request #253 from khoj-ai/features/github-issues-indexing
Support indexing Github issues as well as corresponding comments
2023-06-29 16:02:47 -07:00
sabaimran
b2dd946c6d Rename issue to entry method for accuracy 2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky
51dfa48e2b Have Khoj support Python 3.11 as Pytorch supports it now
- Previously Khoj could only support Python upto 3.10 due to pytorch.
  But lots of folks had python 3.11 installed by default on their machines.

  This required installing python 3.10 and dealing with virtual envs.

  With Torch >= 2.0.1 now able to support python 3.11, at least one
  class of installation troubles for Khoj should drop. See
  https://github.com/pytorch/pytorch/issues/86566 for reference

- Preliminary testing indicates using the new torch 2.x may reduce
  search time by 25% (from 80ms to 60ms on Mac M1)

- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
2023-06-29 15:13:26 -07:00
sabaimran
65bf894302 Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards 2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky
d212298573 Make Configure button on web interface incrementally update by default
We should add a way to force index everything.

But force indexing should not be the default when user is just trying
update content to index
2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky
da2de21339 Only return requested result count even if search in multiple content types
- Set results_count to default value at start so it is an int, never None
2023-06-29 14:49:05 -07:00
sabaimran
77672ac0ae Demarcate different results with a border box
- Add back support for searching by type Github
- Remove custom class name in markdown js file
2023-06-29 14:14:25 -07:00
sabaimran
6edc32f2f4 Accept current changes to include issues in rendering flow 2023-06-29 12:25:29 -07:00
Debanjum
f272d4503e Search across all Asymmetric Text Content Types in Parallel
- Allow searching across asymmetric text content types using threads
   - Query time on my Mac averages 95ms latency (140ms at 90 percentile) across (Org, Markdown, Github, PDF and Music content types)
   - This is not too much more than search for a single content type (maybe max ~50% latency increase?). Encoding query is what takes most of the time anyway and that's just done once like before, threading adds some overhead
   - An **average** of `95 ms` latency or `140ms` at **90th percentile** is inline with keeping an incremental search (search-as-you-type) experience
- Put logic to remove filter terms from query in a `defilter` method for each filter
- Encode query once during search to encode query once across all (asymmetric) content types
- Search across all content types via the web and emacs interfaces in [d5fb419](d5fb4196de) and [5c4eb95](5c4eb950d5) respectively
- Allow Khoj Chat to pull relevant data from across content types (without the perf hit). Khoj chat is only pulling data from a single content type currently
2023-06-29 12:21:27 -07:00
sabaimran
b41c14b258 Use *.markdown in the khoj_docker.yml 2023-06-29 11:55:18 -07:00
sabaimran
e6053951f0 In chat conftest fixtures, use *.markdown rather than *.md 2023-06-29 11:53:47 -07:00
sabaimran
ab7dabe74f Explicitly use Union type for function parameters for lint checks 2023-06-29 11:44:30 -07:00
sabaimran
601b738135 Bonus: Rename all md files to markdown for cleanliness 2023-06-29 11:27:47 -07:00
sabaimran
fecf6700d2 Limit small image rendering to just the avatar images 2023-06-29 11:27:18 -07:00
sabaimran
70e550250a Add an additional data source for issues from Github repositories + quality of life updates
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
2023-06-29 10:59:54 -07:00
Debanjum Singh Solanky
5f2717cc4b Use logger.warning since logger.warn is deprecated 2023-06-28 22:15:27 -07:00
Debanjum Singh Solanky
5f7eaa7ded Add trio, move freezegun, factory-boy to project test dependencies 2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
56ce97ef9e Use async/await in tests for query method of text and image search
The text, image search query method has become async. So async/await
is required to get results correctly in tests etc
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
f516d127c8 Update client tests to expect "all" as a valid new content type 2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
b1767f93d6 Get any configured asymmetric search model to encode query for search
- Set image_search.query to async to use it with multi-threading
  This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
8eae7c898c Put each result under org heading when query for "all" content type in khoj.el
- Add "all" as default content type when no content type retrieved
  from server
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
630bf995f1 Style each result based on its content type in same view on Khoj web
- So when searching across content types (with content-type = "all")
  org-mode results get rendered differently than markdown, PDF etc. results

- Set div class for each result separately instead of a single uber div
  for styling. This allows styling div of each result based on the
  content-type of that result

- No need to create placeholder "all" content type on web interface as
  server is passing an all content type by itself
2023-06-28 22:07:01 -07:00
Debanjum Singh Solanky
1773a78339 Fix createRequestUrl method signature to fetch results from khoj web 2023-06-28 12:10:45 -07:00
Debanjum Singh Solanky
212b1a96c8 Create "all" search type for search across all content types on khoj server
Allows moving logic to handle search across all content types to
server from clients
2023-06-28 11:34:26 -07:00
Debanjum Singh Solanky
0636ceaf14 Merge branch 'master' of github.com:khoj-ai/khoj into parallelize-search-across-all-asymmetric-text-content-types
Conflicts:
- src/khoj/routers/api.py: Use theirs
2023-06-27 16:10:32 -07:00
Debanjum Singh Solanky
510bb7e684 Use typing union in text_search for python 3.8 compatible type hinting 2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky
1b11d5723d Extract search request URL builder into js function in web interface 2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky
09f739b8cc Null check config, log warning instead of error when configuring search 2023-06-27 15:48:48 -07:00
sabaimran
c0d35bafdd Merge pull request #250 from khoj-ai/features/github-multi-repo-and-more
Support multiple Github repositories and support indexing of multiple file types
2023-06-27 15:14:49 -07:00
sabaimran
9d62d66a77 Simplify construction of repo shorthand in GithubToJsonl 2023-06-27 15:05:03 -07:00
sabaimran
2697c7a186 Update org tests to use new method, update Github configuration in tests 2023-06-27 15:04:48 -07:00
sabaimran
227169ebde Support configuration of multiple Github repositories in the settings interface
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
2023-06-27 14:10:09 -07:00
sabaimran
37a1f15c38 Add backend support for indexing multiple repositories
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
2023-06-27 12:06:15 -07:00
Debanjum Singh Solanky
5da6a5e669 Build docker image using latest khoj from git master
- Previous state
  Ideally docker image should use latest app code available locally.
  But this is better than the previous state where the latest Docker
  image was being built using older khoj package published to pypi

  This would happen because the workflow to publish the khoj-assistant
  pypi package runs in parallel to the dockerize workflow so the latest
  khoj pypi package isn't published before the latest docker image is
  built on master

- Updated state
  Now at least the docker image published via the dockerize github
  workflow will be built using the latest khoj code on github
2023-06-26 20:16:07 -07:00
sabaimran
ddd550e6f4 Add call to use X-CSRFToken in relevant POST methods 2023-06-26 12:38:00 -07:00
sabaimran
35e24d7851 Fix null checking in state for content config API and telemetry API 2023-06-26 11:37:34 -07:00
sabaimran
5e39421f56 Merge branch 'master' of github.com:debanjum/khoj 2023-06-25 11:41:47 -07:00
sabaimran
4410a3bb4b Limit max width of the pre tag to 100% of the screen width 2023-06-25 11:41:15 -07:00
sabaimran
ffe66b848a Use a single column tempalte for config plugins when in mobile 2023-06-25 11:27:41 -07:00
Debanjum Singh Solanky
b1890aa050 Null check intermediary objects when config not fully initialized 2023-06-24 15:34:18 -07:00
Debanjum Singh Solanky
946af0889d Improve showing status message on saving config via web interface
- Show success/failure status message much closer to the save button
  Previously status message was shown on top of the page, which wasn't
  always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
2023-06-24 00:49:57 -07:00
Debanjum Singh Solanky
40d1abfe50 Update the new /config APIs to configure Khoj for first time users
- Setup state.config and sub-components from unset state
- Setup search types with default settings
2023-06-24 00:45:30 -07:00
Debanjum Singh Solanky
05a3c81adb Add beautiful as dependency to pass pytests 2023-06-23 15:10:09 -07:00
Debanjum Singh Solanky
edabede93a Fix post configuration state update on error or success on config html 2023-06-23 14:52:25 -07:00
Debanjum
98642e01b5 Update Web Interface with Lantern Theme
- Style all pages with consistent lantern theme styling
  - Add navigation pane to all web interface pages
  - a200af68b38d0625c42e2098d171c6ddab121bd2 Keep pico.css locally for offline usage
  - cd8d069e6673b4db4c14f736c3d8af80bf94614d Highlight currently active tab in web interface
- Update config pages to use Lantern theme
2023-06-23 14:39:25 -07:00
Debanjum Singh Solanky
4744d69221 Resolve button name, anchor tag feedback. Add status message to settings page
- Use "Configure" name for settings config action
- Use more standard anchor tag instead of button
- Add configure status message
2023-06-23 09:48:38 -07:00
Debanjum Singh Solanky
26abafa658 Highlight currently active tab in web interface for orientation 2023-06-22 00:33:28 -07:00
Debanjum Singh Solanky
2728c714d7 Put pico.css in local assets. Move common css styling into khoj.css 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
20a37697de Add Khoj header with navigation pane to Search and Chat Interfaces 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
c467a0cbb0 Update UI of config sub pages to use khoj lantern theme styling 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
0ce2ec590a Update main config page on khoj server to match khoj lantern theme 2023-06-21 20:25:25 -07:00
Debanjum Singh Solanky
d30a9ddd33 Use Khoj Logo on Search, Chat pages of Web Interface 2023-06-21 12:34:53 -07:00
Debanjum Singh Solanky
6d4aad57e1 Use new Khoj Lantern Logo in Web, Emacs, Obsidian UIs and Docs 2023-06-21 01:57:22 -07:00
Debanjum Singh Solanky
69d4fa6525 Rename project links across repo from debanjum/khoj to khoj-ai/khoj 2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky
5c4eb950d5 Search across all content types via khoj.el on Emacs
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.

This results in search across all enabled asymmetric search text
content types
2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky
2cd3e799d3 Improve null and type checks 2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky
d5fb4196de Update web interface to allow querying all content types at once 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
5c7c8d1f46 Use async/await to fix parallelization of search across content types 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
1192e49307 Pass default value matching argument types expected by text_search methods 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
0144e610d6 Only search across content types that work with asymmetric search 2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky
f6a7aa6c96 Style Khoj chat on web interface with new lantern theme
- Color khoj chat message with new yellow theme color
- Update Khoj chat emoji to lantern
- Add page type to title of pages on web interface
2023-06-20 01:39:33 -07:00
Debanjum Singh Solanky
6d94d6e75a Encode the asymmetric, symmetric search queries in parallel for speed
Use timer to measure time to encode queries and total search time
2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky
d292dc03b3 Use new Khoj Logotype in Web interface 2023-06-20 01:13:06 -07:00
Debanjum Singh Solanky
db07362ca3 Encode user query as same across search types to speed up query time
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
  query and pass it to the query methods of each search types

TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky
285d17af2a Search in parallel across all enabled content types requested via API
- Update API to return content from all enabled content types when type
  is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky
79d325fbb6 Fix triggering @general queries in Khoj Chat 2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky
e97a20d70c Set conversation type if query param set, else return chat history
Only initialize variables if query is not empty, to avoid unnecessary
compute, variable null checks etc.

Fixes #230
2023-06-19 19:59:16 -07:00
sabaimran
6224dce49d Merge pull request #228 from debanjum/features/pretty-config-page
Update the config page to be more usable
2023-06-19 18:11:35 -07:00
sabaimran
4722a2c16d Add Github configuration page and success notifications 2023-06-18 10:06:45 -07:00
sabaimran
668135c763 Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page 2023-06-18 08:35:09 -07:00
sabaimran
81183a1fe1 Address misc PR comments and update logo in all clients
- Rename the new logo to reflect accuracy on size (e.g., 128x128)
- Update the icns file for Mac
- Update nomenclature in settings pages
2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky
a44cde2865 Show hint to re-index vault if wonky results in Obsidian search modal
Remove spurious indentation in Obsidian styles.css

Resolves #207
2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky
595cc5b0f5 Use printer icon for PDF logs. Only split lines if file at web link in web interface 2023-06-18 02:26:03 -07:00
Debanjum
e06be395f9 Use Github REST API and Index Commit Messages off Github Repository
- Migrate to Github REST API instead of Llama Hub to index Markdown Docs in Github Repository
- Index Commit Messages from Github Repository as well
2023-06-18 14:51:32 +05:30
Debanjum Singh Solanky
e31a540a5e Get all md files recursively in repository by passing recursive param
Previously the `get_markdown_files' method was only getting files at
root of the repository

Fix, improve logger messages in github to jsonl processor
2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky
6fdac24416 Set page size to 100 to reduce requests required to Github API to 1/3
- Default is 30. So number of paginated requests required to get all
  items (commits, files) will reduce by 67%

- No need to increase page size for the get tree Github API request from
  `get_markdown_files'

  Get tree Github API doesn't support pagination and return 100K items
  in response. This should be way more than enough for our current
  use-cases
2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky
87975e589a Fix passing auth token to Github API to increase rate limits by x85
- Previously wasn't prefixing "token" to PAT token in Auth header
  This resulted in the request being considered unauthenticated

- Unauthenticated requests to Github API are limited to 60 requests/hour
  Authenticated requests to Github API are allowed 5000 requests/hour
2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky
9c70af960c Extract logic to get file content from Github into a separate method 2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky
10d4c38ce9 Extract Wait for rate limit reset logic into a function for reuse 2023-06-18 01:06:46 -07:00
sabaimran
aad7f825e0 Remove music configuration 2023-06-17 21:23:56 -07:00
sabaimran
5f97afbfac Ignore type checks from mypy in subindexed fields 2023-06-17 16:53:36 -07:00
sabaimran
c2d46de8bc Add endpoint for regenerating directly from the config page and add music content-type 2023-06-17 15:47:33 -07:00
sabaimran
ded3100caf Update the configuration page to make config management easier
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky
3f24e53b6e Render URL as link in web interface if file param of result is a web link 2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky
63ec84ad78 Store Github URL of Markdown files on Github in file jsonl param 2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky
0c1c7583b5 Handle pagination, API rate limits. Get all commits from Github repo 2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky
31d17d0b22 Index commits message from repository with the github plugin 2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky
c29c141a7e Use Github Rest API to index Markdown files in Github Repository
The Llama_Hub Github plugin is fairly limited.

The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
2023-06-17 02:16:13 -07:00
Debanjum
9f00a366ab Add a Github plugin to index content from a Github repository
- Use the Github plugin on LlamaHub to read in markdown files from specified Github repository for indexing
- Update the desktop GUI application to take in the required parameters to read from Github
- Requires a classic PAT token for Github access
2023-06-17 12:28:47 +05:30
Saba
ac96f43b1b Remove try-catch specific to Github plugin; consolidate GUI logic 2023-06-16 23:46:25 -07:00
Saba
07ade2262a Set default value of pat_token in conftest.py to be empty string 2023-06-13 17:03:03 -07:00
Saba
751edfefe5 Add separate unit test for github. Will only run of a PAT token is set 2023-06-13 16:55:58 -07:00
Saba
3a61919344 Fix failing unit tests by hard-coding model presence of expected search types 2023-06-13 16:32:47 -07:00
Saba
019d3732de Rename orgmode_search to org_search 2023-06-13 16:06:54 -07:00
Saba
08d79f5ba4 Unify types used in Github and other text-based configs. Fix typing issues 2023-06-13 15:52:36 -07:00
Saba
a6cd96a6a9 Add a Github plugin which can be used to read from a Github repository 2023-06-13 14:40:06 -07:00
Debanjum
c68cde4803 Log clients calling API endpoints on Khoj server
- Make API endpoints on Khoj server accept `client` as request parameter
  - Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
  - Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
2023-06-09 18:36:49 +05:30
sabaimran
59fa48036f Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size
Pass truncated message as string in ChatMessage when exceeding max prompt size
2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky
139a3ba060 Update server to log new server version field to telemetry db 2023-06-08 14:14:21 +05:30
Saba
c5666e0404 Move factory dependencies to optional settings 2023-06-06 23:26:24 -07:00
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
948ba6ddca Remove unused logger 2023-06-05 19:01:03 -07:00
Saba
6212d7c2e8 Remove debug line 2023-06-05 19:00:25 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky
eb6175e9b0 Update description field in webmanifest of Khoj, Khoj Chat PWA 2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky
bb2363f324 Set client request param when calling khoj server APIs from Web 2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky
caab55fbdd Set client request param when calling khoj server APIs from Obsidian 2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky
de2494154f Set client request param when calling khoj server APIs from Emacs 2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky
168c11cea7 Make server API endpoints accept client as query param
- The chat, search and update API will accept client as request param.
- This will allow logging the client from which these APIs was called.
2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky
8617cf1389 Push telemetry to Posthog to grok Khoj usage 2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky
d13db2e666 Make old telemetry server forward requests to new server 2023-06-05 13:06:45 +05:30
Saba
5f4223efb4 Increase timeout to OpenAI call 2023-06-04 20:49:47 -07:00
Saba
0e63a90377 Fix the mechanism to retrieve the message content 2023-06-04 20:25:37 -07:00
Saba
f0efe0177e Pass truncated message as string in ChatMessage when exceeding max prompt size 2023-06-04 19:33:46 -07:00
Debanjum
f6ceb22373 Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 15:05:34 +05:30
Saba
068ee0ac5e Swap elif with else, as usage of this method does not use openai_api_key 2023-06-04 02:25:08 -07:00
Saba
6508379d7b Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky
7af8a56434 Remove filename from reference before rendering references in khoj.el
Fixes bug where actual reference heading in next line jumping out of
references footnote section
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
ec280067ef Do not retrieve relevant notes when having a general chat with Khoj
- This improves latency of @general chat by avoiding unnecessary
  compute
- It also avoids passing references in API response when they haven't
  been used to generate the chat response. So interfaces don't have to
  add logic to not render them unnecessarily
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
90439a8db1 Update Khoj subtitle to AI personal assistant for your digital brain 2023-06-02 10:42:44 +05:30
Debanjum
e022910f31 Search PDF files with Khoj. Integrate with LangChain
- **Introduce Khoj to LangChain**: 
    Call GPT with LangChain for Khoj Chat
- **Search (and Chat about) PDF files with Khoj**
  - Create PDF to JSONL Processor: Convert PDF content into standardized JSONL format
  - Expose PDF search type via Khoj server API
  - Enable querying PDF files via Obsidian, Emacs and Web interfaces
2023-06-02 10:20:26 +05:30
Debanjum Singh Solanky
e9ed7a19fd Update search prompt to extract PDF search type. Fix extract_question prompt 2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky
89fbfce20a Mention PDF are also supported in Khoj Readme 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
bbe3bf9733 Render PDF search results in Khoj Obsidian interface
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
  - Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
e3892945d4 Render PDF search results in Khoj.el Emacs interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
85144006a1 Render PDF search results in khoj web interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
d63194c3a9 Create tests for PDF to JSONL processor 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
1b3effd8e6 Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor 2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky
1cd9ecd449 Truncate last message if still over max supported prompt size by model 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
ed4d0f9076 Simplify argument names used in khoj openai completion functions
- Match argument names passed to khoj openai completion funcs with
  arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
703a7c89c0 Reduce retry count and request timeout for faster response or failure
- Fix bug where both LangChain and Khoj retry requests 6 times each.
  So a total of 12 requests at >1minute intervals for each chat
  response in case of OpenAI API being down

- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
  between retries way too much. This slowed down chat response times
  quite a bit when API was being flaky

- With these updates you'll know if call to chat API failed in under a
  minute
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
18081b3bc6 Use LangChain to call GPT over API 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
277d2f5c96 Do not add "Notes:" suffix to chat messages when no notes retrieved
This was causing spurious "Notes:" suffix being added to Khoj Chat in
response
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
334be4e600 Use LangChain to call OpenAI for Khoj Chat
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
  using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
  creating agents and managing their state
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
b484953bb3 Import app state correctly to generate embeddings with OpenAI model
Resolves #216
2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
9cfaaf0941 Update docs to configure khoj.yml for using OpenAI model for embeddings 2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
a0d0dbaca7 Fix link to Khoj Obsidian Demo video in Readmes 2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum
dc495babb3 Add Telemetry to Understand Khoj Usage
### Objective: 
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.

Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?

### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky
55d72231b3 Generate docker image for telemetry server using Github workflow 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
134cce9d32 Batch upload telemetry data at regular interval instead of while querying 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
3ede919c66 Log usage of /search, /chat, /update API endpoints to telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
f2e89f6f46 Add khoj app helper methods to log app usage to a telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
9ca61d62ff Enable/disable logging telemetry by setting bool in khoj.yml config
We log usage telemetry by default, unless setting explicitly set in
khoj.yml
2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky
131b8407b5 Allow Khoj Chat to respond to general queries not in reference notes
- Khoj chat will now respond to general queries if:
  1. no relevant reference notes available or
  2. when explicitly induced by prefixing the chat message with "@general"

- Previously Khoj Chat would a lot of times refuse to respond to
  general queries not answerable from reference notes or chat history

- Make chat quality tests more robust
  - Add more equivalent chat response options refusing to answer
  - Force haiku writing to not give any preable, just the haiku
2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky
cc75f986b2 Test text search index only updates on changes to text content 2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky
f9ccce430e Allow configuring OpenAI chat model for Khoj chat
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
  defaults to using gpt-3.5-turbo, unless chat-model field under
  conversation processor updated in khoj.yml
2023-05-03 23:01:13 +08:00
Debanjum
f0253e2cbb Include Filename, Entry Heading in All Compiled Entries to Improve Search Context
Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context

- Set filename as top heading in compiled org, markdown entries
  - Note: *Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text*. The updated structure should provide more schematic context of relevance
- Set entry heading as heading for compiled org, md entries, even if split by max tokens
- Snip prepended heading to avoid crossing model max_token limits
- Entries with no md headings should not get heading prefix prepended
2023-05-03 22:59:30 +08:00
Debanjum Singh Solanky
6b535cc345 Snip prepended heading to avoid crossing model max_token limits
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.

100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
02aeee60aa Set filename as top heading of org entries for better search context
Previously filename was only being appended to markdown entries.

Test filename getting prepended to compiled entry as heading
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
94825a70b9 Set heading of md entries to improve search context for long entries
Otherwise if a markdown entry is longer than max_tokens, the split
entries (apart from first one) do not get their heading context set
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
5de04621b5 Set filename as top heading of md entries for better search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context

Test filename getting prepended as heading to compiled entry
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
0e3fb59e09 Entries with no md headings should not get heading prefix prepended
Files with no headings would previously get their entry be prefixed
with a markdown heading prefix (#)
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
45a991d75c Prepend entry heading to all compiled org snippets to improve search context
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.

This limits search context required to retrieve these continuation
entries
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
3386cc92b5 Fix khoj server config update in khoj.el by unquoting list to cl-push to
- cl-push expects a generatlized variable. Else throws (setf quote)
  undefined warning
- This results in the config call failing on calling khoj entrypoint
2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky
948a4274e4 Fix documentation strings and simplify not null checks 2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky
731ef5688f Use cl-pushnew to fix byte-compile errors with using add-to-list 2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky
f046523b33 Improve khoj.el messages to convey state of khoj server
- Remove waiting for server message as it hides the messages from the
  server
- Fix the nil message that were being rendered, by checking before
  showing messages from server
- Consistently prefix messages from khoj with khoj.el
2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky
76df393eb5 Only call khoj server configure API from khoj.el when config updated
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs

Also show more details to user about what in khoj is being configured
2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
ceae06ae9d Fix khoj.el compilation warnings around unused variables 2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
8269adf849 Refactor khoj-setup in khoj.el for readability. No functional change 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
865d12b6f2 Fix escaping quote in chat references to prevent it breaking out of html 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
26cb878327 Add Yarn lockfile for Khoj Obsidian 2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky
e3180d63e6 Sync Khoj Obsidian Tagline with Khoj tagline 2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky
62e6e09521 Release Khoj version 0.6.1 2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky
b079fb31bc Replace Windows path separators in indexName configured via Khoj Obsidian
Resolves #185, #199

- Issue
  IndexName created from Obsidian Absolute Vault path wasn't replacing
  windows path, drive separators with underscore. It was only
  replacing unix path separators

- Fix
  Also replace windows drive and path separators with _ while creating
  IndexName in Khoj Obsidian plugin
2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky
d90df966a9 Make khoj logger use utf-8 encoding when writing to khoj log file
Resolve logger error issue mentioned in #199
2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky
dc3f399f91 Fix to get score associated with SearchResponse in result as string 2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky
d5000c63e1 Update Readmes to use python -m pip install khoj-assistant
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky
453c84ab79 Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes 2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky
35aa06067f Release Khoj version 0.6.0
Upload styles.css via release workflow
2023-03-31 18:13:16 +07:00
Debanjum
8f4e5d3d83 Improve Styling of Khoj Search Modal on Obsidian and Indexing of Markdown
Merge pull request #198 from debanjum/improve-khoj-search-for-markdown-obsidian

### Overview
- Copied Khoj Search Modal styling from Jim Prince's PR #135 with minor improvements
- Implements improvements to the Khoj Search in Markdown/Obsidian suggested by folks. Specifically:
  - #133
  - #134
  - #142

### Changes
- 5673bd5 Keep original formatting in compiled text entry strings
- a2ab68a Include filename of markdown entries for search indexing
- 6712996 Create Note with Query as title from within Khoj Search Modal
- d3257cb Style the search result. Use Obsidian theme colors and font-size
- 4009148 For each result: snip it by lines, show filename, remove frontmatter
2023-03-30 14:15:23 +07:00
Debanjum Singh Solanky
5673bd5b96 Keep original formatting in compiled text entry strings
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
  No point in dropping the formatting unnecessarily,
  even if (say) the currrent search models don't account for it (yet)
2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky
a2ab68a7a2 Include filename of markdown entries for search indexing
Append originating filename to compiled string of each entry for
better search quality by providing more context to model

Update markdown_to_jsonl tests to ensure filename being added

Resolves #142
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
67129964a7 Create Note with Query as title from within Khoj Search Modal
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.

The note creation code is borrowed from Omnisearch.

Resolves #133
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
d3257cb24e Style the search result. Use Obsidian theme colors and font-size
Based on PR #135
2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky
40091489c0 For each result: snip it by lines, show filename, remove frontmatter
Based on PR #135
Resolves #134
2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky
240db7b4f0 Add screenshot of Khoj chat on Obsidian to Readme. Fix links 2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky
234be96e53 Fix processor key used to configure chat model in khoj obsidian 2023-03-30 01:47:09 +07:00
Debanjum
53d421f9c6 Create Chat Modal for Obsidian Plugin
Merge pull request #196 from debanjum/create-chat-modal-for-obsidian

- Set your OpenAI API key in the Khoj Obsidian Settings
- Use Modal in Obsidian for Chat
- Style Chat Modal combining the Khoj Web interface and Obsidian theme style
2023-03-30 01:37:07 +07:00
Debanjum Singh Solanky
c8c0cfd10e Add Chat features, setup and usage to Khoj Obsidian plugin Readme 2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky
7ecae224e7 Configure OpenAI API Key from the Khoj plugin setting in Obsidian 2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky
3d616c8d65 Use Obsidian font sizes. Improve input field, reference indexing
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky
23bd737f6b Use chat input element to send message on Enter. No send button required 2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky
81e98c3079 Scroll to bottom of modal on open and message send 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
59ff1ae27f Use obsidian theme colors for bg, text. Restrict css namespace via prefix 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
001ac7b5eb Style Obsidian Chat Modal like Khoj Chat Web Interface
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
  - Modify it to work under a Obsidian modal
  - So replace html, body styling from web interface to instead
    styling new "khoj-chat" class attached to contentEl of modal
2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
112f388ada Render references next to chat responses by khoj in chat modal 2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky
1d3d949962 Render conversation logs on page load 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
cd46a17e5f Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
c0972e09e6 Rename KhojModal to KhojSearchModal, a more specific name for it
In preparation to introduce Khoj chat in Obsidian
2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
64fff1d372 Release Khoj version 0.5.0 2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky
7478d08803 Update main readme to mention chat features 2023-03-27 22:02:53 +07:00
Debanjum Singh Solanky
fc218508f9 Update khoj.el docs and Emacs Readme for chat, simplified setup 2023-03-27 22:02:47 +07:00
Debanjum
87090531da Install, Start and Configure Khoj Server from Emacs
Merge pull request #193 from debanjum/simplify-khoj-server-setup-on-emacs

## Major Changes
- ae535a0 Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs
- 82eb4bf Setup Khoj server on opening khoj.el
- 99d19dc Start Khoj server from Emacs using khoj.el
- c92d791 Install Khoj server from Emacs using khoj.el
  *This assumes you have python (<3.11) and pip installed in a system path*

### Sample Config
- Enable Khoj Chat by configuring you OpenAI API Key
- Specify Org Files, Directories to Index for Search (and Chat)
  By default, your org-agenda-files (include archive files)) are indexed
- Invoke khoj by calling `C-c s`

``` emacs-lisp
(use-package khoj
  :after org
  :straight (khoj
             :type git
             :host github
             :repo "debanjum/khoj"
             :files ("src/interface/emacs/khoj.el"))
  :bind ("C-c s" . 'khoj)
  :config (setq
           khoj-openai-api-key "<YOUR_OPENAI_API_KEY_FOR_KHOJ_CHAT>"
           khoj-org-directories '("~/docs/notes" "~/docs/journals")
           khoj-org-files '("~/docs/tasks.org" "~/docs/journal.org" "~/docs/archive.org")))
```
2023-03-27 18:49:43 +07:00
Debanjum Singh Solanky
83a7ccd729 Fix docstrings and method ordering in khoj.el 2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky
5c2327ee4f Configure org directories to index from khoj.el
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path

Should help setup for org-roam users from khoj.el
2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky
6e8a40906d Allow disabling automatic server setup. Fix server start vs ready logic
- khoj-auto-setup controls whether to automatically check for and
  setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
  method. Allows calling khoj-setup during package load via init.el

- Fix: Do not attempt to configure or wait for server ready if
  user has said no to auto-setup request
- Fix logic to mark server started vs ready
  - Previously the started/running vs ready variables defs were getting
    intertwined
  - Server started indicates server bootup has been triggered
  - Server ready indicates server API ready to accept requests
2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky
526a927bce Fix org entry extraction test, variable prefixed with khoj in khoj.el
Discovered via failing build and test workflows on Github
2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky
7243059507 Track index update asynchronously via moon phase progressbar in khoj.el 2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky
8a9055f918 Restrict server messages show in echo area to main server files 2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky
ae535a06eb Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs 2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky
36b17d4ae0 Generalize the directory from config extraction elisp method 2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky
924424c754 Throw actionable exceptions when content types or chat not configured 2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky
359a2cacef Fix khoj--server-running to work with unconfigured or external server
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)

- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky
d7fb9a596e Auto configure server before loading khoj-menu
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done

As easier to know when server ready to configure
2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky
8a21aff438 Make khoj.el server start, stop, restart, setup methods interactive
No need to erase temporary buffers before working on them
2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky
cb40a96c85 Index configured org files from khoj.el
- Set `khoj-org-files-index' to list of files to index
- Defaults to indexing org-agenda-files
- Uses khoj server api to configure org files to index
2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky
50760acc37 Wait for Khoj server to get ready before opening khoj.el transient menu
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
  Until then khoj features wouldn't work anyway, so avoids confusion
2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky
82eb4bfd0d Setup Khoj server on opening khoj from with Emacs
- Create helper methods to check, stop, restart, setup khoj server
- (Ask to) setup khoj server on calling khoj main entrypoint function
2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky
99d19dcf43 Start Khoj server from Emacs using khoj.el 2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky
c92d79118a Install Khoj server from Emacs using khoj.el 2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky
e281a498b4 Style Khoj search org buffer via elisp instead of in-buffer settings 2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky
4f655d20ae Style Khoj chat directly via elisp instead of via in-buffer settings 2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky
f6ff7b1beb Render foonote reference links as superscript for Khoj Chat on Emacs 2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky
285a2b86d2 Use aiohttp version 3.8.4 as 4.x breaks docker image build 2023-03-26 05:33:02 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum
0aebf624fc Improve Khoj Chat in Emacs, Server
Merge pull request #192 from debanjum/improvements-to-khoj-chat-in-emacs

### Khoj Chat on Emacs Improvements
- d78454d Load Khoj Chat buffer before asking for query to provide context
- 93e2aff Use org footnotes to add references, allows jump to def on click
- 5e9558d Stylize reference links as superscripts and show definition on hover
- bc71c19 Use `m` or `C-x m` in-buffer keybindings to send messages to Khoj

### Khoj Chat Server Improvements
- 27217a3 Time chat API sub-components for performance analysis
- 508b217 Update Chat API, Logs, Interfaces to store, use references as list
- d4b3866 Truncate message logs to below max supported prompt size by chat model
- cf28f10 Register separate timestamps for user query and response by Khoj Chat
2023-03-25 05:49:27 +07:00
Debanjum Singh Solanky
ff846f05c5 Clean-up khoj.el based on linting helpers and manual review 2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
4725416fbd Use shortcut keybindings in buffer to ease sending messages to Khoj 2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky
508b2176b7 Update Chat API, Logs, Interfaces to store, use references as list
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
b08745b541 Keep chat messages at 1 empty line visible distance in khoj.el
- Clean redundant concat, format string
- Improve variable name to emojified sender
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
27217a330d Time chat API sub-components for performance analysis
Time and the search query extraction, search and response generation
components
2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky
5e9558d39d Stylize references shown as footnote links in chat messages
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
  - Add continuation suffix, ..., when reference definition truncated
2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky
cf28f104c7 Register separate timestamps for user query and response by Khoj Chat 2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky
93e2aff786 Add references as org footnotes instead of links 2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky
d78454d4ad Load Khoj Chat buffer before asking for query to provide context 2023-03-24 13:43:46 +07:00
Debanjum
4070d13a96 Create Khoj Chat Interface in Emacs
Merge pull request #191 from debanjum/create-chat-interface-on-emacs

- Render conversation history in a read-only org-mode buffer for Khoj Chat
- Add `chat` as a transient action in the Khoj transient menu
- Style chat messages as org-mode entries
  - Put received date in property drawer and keep it hidden/folded by default
  - Add Khoj chat response as child entry of the users associated question org entry
    This allows folding back-n-forth between user and Khoj for easier viewing
  - Render source notes snippets used as references for response as org-mode links
    Hovering mouse on link or opening links shows reference note snippets used
2023-03-22 16:32:40 -06:00
Debanjum Singh Solanky
863933daaa Resolve build issues found by melpazoid 2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky
e9ca04af0d Require dash, org to run ERT tests for khoj.el 2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky
06df394d6c Style chat messages as org-mode entries in Emacs
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
  - Improves color coding for readability
  - Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky
364e6c11af Render chat history from API in chat buffer on first run
- Generalize the render-chat-response method to handle rendering
  history or chat response from chat API reponse

- Trigger rendering of khoj chat history if Khoj chat buffer not
  created for this session yet
2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky
36b52fdd0a Properly escape reference links before rendering
- Use org-insert-link method to improve link rendering robustness
  Previous simple mechanism to crete org-links would result in links
  escaping out of formating. Use a user-facing org-mode method to
  remove/reduce probability of this

- Replace newlines with space to render reference notes as links
2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky
72f63a6ef7 Add basic chat interface for Khoj on Emacs
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
  - [sender-name]: *[message]*
    - /[receive-date]/
- Add references as org links with context visible on hover,
  but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
  Use `-map-indexed' method from dash
2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky
e4d67694e1 Add search to method, variable names meant for khoj search in khoj.el
In preparation to introduce Khoj chat in Emacs
2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky
98e5ea4940 Fix name of default encoder to replace in multi-lingual model setup docs 2023-03-21 20:38:17 -06:00
Debanjum Singh Solanky
2f6284872d Mention Khoj needs Python version 3.10 or lower in docs 2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky
a9b81975f2 Fix encoder model name to configure multilingual search in Readme
See comment in issue #98 for stale model name comment
2023-03-19 17:27:53 -06:00
Debanjum
b351cfb8a0 Add Search Actor to Improve Querying Notes for Khoj Chat
Merge pull request #189 from debanjum/add-search-actor-to-improve-notes-lookup-for-chat

### Introduce Search Actor
Search actor infers Search Queries from user's message
- Capabilities
  - Use previous messages to add context to current search queries[^1]
    This improves quality of responses in multi-turn conversations. 
  - Deconstruct users message into multiple search queries to lookup notes[^2]
  - Use relative date awareness to add date filters to search queries[^3]

- Chat Director now does the following:
  1. [*NEW*] Use Search Actor to generate search queries from user's message
  2. Retrieve relevant notes from Knowledge Base using the Search queries
  3. Pass retrieved relevant notes to Chat Actor to respond to user

### Add Chat Quality Tests 
- Test Search Actor capabilities
- Mark Chat Director Tests for Relative Date, Multiple Search Queries as Expected Pass

### Give More Search Results as Context to Chat Actor
- Loosen search results score threshold to work better for searches with date filters
- Pass more search results (up to 5 from 2) as context to Chat Actor to improve inference

[^1]: Multi-Turn Example
Q: "When did I go to Mars?"
Search: "When did I go to Mars?"
A: "You went to Mars in the future"
Q: "How was that experience?"
Search: "How my Mars experience?"
*This gives better context for the Chat actor to respond* 
[^2]: Deconstruct Example: 
Is Alpha older than Beta? => What is Alpha's age? & When was Beta born?

[^3]: Date Example: 
Convert user messages containing relative dates like last month, yesterday to date filters on specific dates like dt>="2023-03-01"
2023-03-18 18:02:12 -06:00
Debanjum Singh Solanky
601ff2541b Revert to using GPT to extract search queries from users message
- Reasons:
  - GPT can extract date aware search queries with date filters
    better than ChatGPT given the same prompt.
  - Need quality more than cost savings for now.
  - Need to figure ways to improve prompt for ChatGPT before using it
2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky
e28526bbc9 Extract search queries from users message using ChatGPT as Search Actor
- Reasons
  - ChatGPT should be better at following instructions than GPT
  - At 1/10th the cost, it's much cheaper than using older GPT models
2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky
939d7731da Fix-up Search Actor GPT's response for decoding it as valid JSON 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f63fd0995e Pass more search results as context to Chat Actor to improve inference 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
10836dedee Search should return user message if GPT response is not valid JSON
Previously would throw if GPT response is not valid JSON. Better to
return original message to use for search instead
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
08f5fb315f Add answers to context for Search Actor to generate relevant queries
Update Search Actor prompt with answers, more precise primer and
two more examples for context

Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f09bdd515b Expect Chat Director can extract relative dates using new Search Actor 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
36c7389b46 Test Search Actor generating search query from Chat History 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
2600cc9d4d Test Search Actor extracting relative dates & multiple questions 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
45cb510421 Loosen search results score thresold used by chat for more context 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
d871e04a81 Use past user messages, inferred questions as context to extract questions
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
  extract complete questions
- This should improve search results quality

- Example Expected Inferred Questions from User Message using History:
  1. "What is the name of Arun's daughter?"
    => "What is the name of Arun's daughter"
  2. "Where does she study?" =>
    => "Where does Arun's daughter study?" OR
    => "Where does Arun's daughter, Reena study?"
2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky
1a5d1130f4 Generate search queries from message to answer users chat questions
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
   E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
   Answer time range based questions
   Limit search to specified timeframe in question using date filter
   E.g "What national parks did I visit last year?" adds
   dt>="2022-01-01" dt<"2023-01-01" to Khoj search

Note: Temperature set to 0. Message to search queries should be deterministic
2023-03-18 16:28:51 -06:00
Debanjum Singh Solanky
d0f14d3f85 Test usage of = in date filter queries 2023-03-16 14:52:59 -06:00
Debanjum Singh Solanky
dfb277ee37 Set skipif at module level if OpenAI API key not set for chat tests
- Remove stale message_to_prompt test
  It is too broad, reduces maintainability.
  Remove as it doesn't really need its own test right now
- Setting skipif at module level for chat actor, director tests
  reduces code duplication as earlier was using decorator on each chat
  test
2023-03-16 12:23:52 -06:00
Debanjum
e75e13d788 Create Tests to Measure Chat Quality, Capabilities
Create Rubric to Test Chat Quality and Capabilities

### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities

### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
   - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
   - Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
   - Measure quality at 2 separate layers
     - **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
     - **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message.  This is what the `/api/chat` API exposes.
   - Mark desired but not currently available capabilities as expected to fail <br />
     This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky
4e15b4e411 Create test notes dataset for chat testing
Combine hand-written custom notes and PG essays with personal
content to bulk up notes count

Delete old documentation markdown as not a representative dataset for
application (which is more tuned for personal notes)
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
1b4d562700 Test Chat Director Capabilities: Answer from notes, chat history etc
- Chat directors are broad agents.
  - Chat directors orchestrate narrow actor agents to synthesize
    final response for the user
  - Agents are Prompts + ML Model

- Test Chat Director Capabilities
  1. [X] Answer from retrieved notes
  2. [X] Answer from chat history
  3. [X] Answer general questions
  4. [X] Carry out multi-turn conversation
  5. [X] Say don't know when answer not in provided context
  6. [X] Answers that require current date awareness
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  7. [X] Date-aware aggregation across multiple different notes
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  8. [X] Ask clarification questions if no unambiguous answer in provided context
  9. [X] Retrieve answer from chat history beyond lookback window
     This test is expected to fail as the chat director is not capable
     of searching chat history yet. But the test allows assessing chat quality
 10. [X] Retrieve context for answer using multiple independent
         searches on knowledge base
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
b6d63137f1 Setup Pytest fixture for conversation processor to test chat API
- Index markdown test data as knowledge base. As easier to get good
  markdown content (vs org)
- Setup markdown_content_config, processor_config and chat_client to
  test chat API
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
3f719c9e17 Rename Chat Model+Prompt tests to chat actor tests 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
7526a50dd4 Extract conversation processor utility funcs from gpt.py into utils.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
7c4d546039 Configure tests to mark chat quality tests & filter unhelpful warnings
- Mark chat quality tests, register custom mark for chat quality
- Filter unhelpful deprecation warnings from within dateparser library
- Error if tests use unregistered marks
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
c1128a1ad8 Test Chat Actor Capabilities; ability to answer from notes, chat logs etc
- Chat actors are narrow agents (prompt + ML model)
  Chat actors are different from the Chat director. who orchestrates
  the narrow actor agents to synthesize final response to the user

- Test Chat Actor Capabilities
  1. Answer from retrieved notes
  2. Answer from chat history
  3. Answer general questions
  4. Carry out multi-turn conversation
  5. Say don't know when answer not in provided context
  6. Answers that require current date awareness
  7. Date-aware aggregation across multiple different notes
  8. Ask clarification questions if no unambiguous answer in provided context
     This test is expected to fail as the chat is not capable of doing
     this consistently yet. But having the test allows assessing chat quality

- Use Openai API Key from OPENAI_API_KEY environment variable
- Gitignore .env file, python virtualenv directory
  Put OpenAI API Key in .env file to run chatbot tests via vscode
  The .env file is default location for importing env vars
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
9306cd901a Clean up chat tests to work with updated chat methods in gpt.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
24ddebf3ce Make converse prompt more precise. Fix default arg vals in gpt methods
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
  answering
- Make GPT be more reliable in looking at past conversations for
  forming response
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
8609e3129e Fix, improve displaying chat messages, sources by Khoj in web interface
Pretty pretty json in conversation logs
2023-03-14 11:24:47 -06:00
Debanjum
6c0e82b2d6 Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
  - Previously was asking GPT to summarize the notes
  - Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
  - Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
  - Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now

## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
  - Allows Khoj to say don't know if it can't find answer to query from notes
  - Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky
cccd225247 Deduplicate and simplify logic to render chat message with reference 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
b9caad458e Type score_threshold with union, not |, to support python <3.10 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
198d9af8cf Update Readme to reflect Khoj Chat out of Beta 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
a71f168273 Move the chat API out of beta. Save chat sessions at 15min intervals 2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky
bcc0bed9db Upgrade bump_version script to handle release and post-release commit
- Updates version in khoj.el and Obsidian manifest, package, versions
  json files under interface and project root
- Create and tag release commit with updated files
- Creates commit with post-release version upgrade in files
- Use flags to specify whether to create a release or post-release commit
2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
8bb8824d0c Bump khoj versions in obsidian, emacs files 2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
e16d0b6d7e Open references notes used for chat on mobile too (by clicking)
Requires clicking the reference as hover doesn't work on mobile
2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky
c3c7b8a951 Make Khoj chat a separate Progressive Web App (PWA) for easier access 2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky
3838f9d8e3 Remove explicitly asking GPT to say I don't know in prompt for now
GPT still mostly says I don't know when answer not in notes or chats

But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky
f7b8cdd02e Log prompts being passed to GPT for debugging 2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky
2739a492b4 Log message metadata along with Khoj message instead of user message
References should be attached to khoj chat messsage rather than the
users message in the chat interface
2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
87d1e1341d Show reference notes used as response context in chat interface 2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
280061e1fa Do not deduplicate search results used for chat context
- Chat uses compiled form of search results, not the raw entries to
  provide context for chat. The compiled snipped search results
  themselves are unique and using multiple of them for context from
  the same raw note is fine if they cross the score and rank thresholds

  This should improve the context provided for chat

- Also apply score_threshold, no deduplication to the answers API
2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky
672f61529e Make getting deduped search results configurable via Search API 2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky
4fb628975c Fix jumping to note from Khoj Obsidian search modal result on Windows
- Issue
  The file path separator by khoj server and the Obsidian vault were
  different on Windows
- Fix
  Normalize file path to use forward slash(/) to find the matching
  note file in the Obsidian vault for jump to it

Resolves #177
2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky
b6cdc5c7cb Do not expose answer API as a chat type in chat web interface or API
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat

For now it is only exposed via API. Later it will be expose in the
interfaces as well

Remove ability to select different chat types from the chat web
interface as there is only a single chat type

Stop appending answers to the conversation logs
2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky
7f994274bb Support multi-turn conversations in chat mode
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context

- Make GPT provide reason when it can't answer the question. Gives
  user context to tune their questions
2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
d73042426d Support filtering for results above threshold score in search API 2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
45f461d175 Keep search results passed to GPT as context in conversation logs
This will be useful to
1. Show source references used to arrive at answer
2. Carry out multi-turn conversations
2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky
7cad1c9428 Only use past chat message, not session summaries as chat context
Passing only chat messages for current active, and summaries
for past session isn't currently as useful
2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky
ad1f1cf620 Improve and simplify Khoj Chat using ChatGPT
- Set context by either including last 2 chat messages from active
  session or past 2 conversation summaries from conversation logs

- Set personality in system message
- Place personality system message before last completed back & forth
  This may stop ChatGPT forgetting its personality as conversation progresses given:
  - The conditioning based on system role messages is light
  - If system message is too far back in conversation history, the
    model may forget its personality conditioning
  - If system message at end of conversation, the model can think its
    the start of a new conversation
  - Inserting the system message before last completed back & forth should
    prevent ChatGPT from assuming its the start of a new conversation
    while not losing personality conditioning from the system message

- Simplfy the Khoj Chat API to for now just answer from users notes
  instead of trying to infer other potential interaction types.
  - This is the default expected behavior from the feature anyway
  - Use the compiled text of the top 2 search results for context

- Benefits of using ChatGPT
  - Better model
  - 1/10th the price
  - No hand rolled prompt required to make GPT provide more chatty,
    assistant type responses
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
9d42b5d60d Use multiple compiled search results for more relevant context to GPT
Increase temperature to allow GPT to collect answer across multiple
notes
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
c3b624e351 Introduce improved answer API and prompt. Use by default in chat web interface
- Improve GPT prompt
  - Make GPT answer users query based on provided notes instead
    of summarizing the provided notes
  - Make GPT be truthful using prompt and reduced temperature
  - Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
  fit for default consumption yet
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
7184508784 Mention Python and Pip need to be installed in Main and Emacs Readme 2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky
211e460398 Output date filter from cache log at debug level. Remove unused imports
Other logs not directly useful to user have already been converted
to debug log levels in 1ae4016. Just forgot to convert this log line too
2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky
c823f46d89 Test error on missing fields in ContentConfig pulled from Khoj.yml
Resolves #9
2023-03-02 15:35:39 -06:00
Debanjum Singh Solanky
b6dbe4dd1d Do not try retrieve an unconfigured core content type in Config GUI
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config

Fallback to using default config for unconfigured core content type
instead

See #165 for details
2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky
1ae40163a9 Show user friendly information logs by default for context
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
  take time
- Convert all other info logs to be only shown in verbose mode
2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky
fe03ba3dce Index intro text before headings in org files
- Text before headings was not being indexed due to buggy orgnode
  parsing logic
- Resolved indexing intro text from files with and without headings in
  them
- Ensure intro text node has heading set to all title lines collected
  from the file

Resolves #165
2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky
ed177db2be Emojify step names in workflows. Stop publishing to TestPyPi from PR 2023-03-01 10:56:39 -06:00
Debanjum Singh Solanky
7ad251b8ef Log and Continue on OSError while collating dates for date filters
Log to understand if error, date can be handled better
Mitigates #172
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
2bed4c3b50 Fix configuring search types & /config/types API when no plugin configured
- Test /config/types API when no plugin configured, only plugin configured
  and no content configured scenarios
- Do not throw null reference exception while configuring search types
  when no plugin configured
- Do not throw null reference exception on calling /config/types API
  when no plugin configured

Resolves bug introduced by #173
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
8914dbd073 Fix creating GUI panels for unconfigured search, processor types
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`

Bug:
- Unconfigured processor and search_types are instantiated as None in
  self.current_config
- While creating the desktop GUI, these null configs are attempted to
  be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors

Fix:
Use default config to create their GUI elements for unconfigured
search and processor types

Resolves #167
2023-03-01 01:20:58 -06:00
Debanjum
e77a5ffc83 Merge pull request #173 from debanjum/enable-creating-content-plugins
## Enable Creating Content Plugins

### Goal
Index, Search text content not supported by default in Khoj using plugins

### Code Changes
- fcbbe8c Configure content plugins to index using `khoj.yml`
- Index content plugins from standardized JSONL format for ingestion
  - 55a032e Add jsonl processor to index plugin content
  - ab0d3a0 Index configured plugins on app start and via update API endpoint
- Expose plugin content types for usage by interfaces
  - 47b58a2 Dynamically update available types on loading the Khoj server
  - Expose indexed types via API (9d38ead). Simplify getting enabled types in Web (f3f2438), Emacs (1e43f1a) interfaces
- Search plugin content from the Web and Emacs Interfaces
  - d91c7e2 Search plugin content via the search API
  - Render plugin content on Web (88344f9) and Emacs (c2814fc) interfaces
    - The Web, Emacs interfaces are general interfaces, they allow searching across all content types
    - The Obsidian interface is currently tuned for only markdown content
      It will be extended to render more content plugins later

### Testing
- fcbbe8c Add unit tests to test reading plugin config from khoj.yml
- 55a032e Add unit tests for the `JsonlToJsonl` processor
- 88a9ead Add unit tests to validate search, incremental update, force-update API works with plugin content types
- b09350c Add unit test to validate only configure search types returned by the new /api/config/types API endpoint
- Manually test the config read, indexing, search and update with local khoj
2023-02-28 22:23:25 -06:00
Debanjum Singh Solanky
b09350c052 Fix to return only enabled content types via the new config/types API
- Previously was return all core content types even if they had not been
  setup
- Add test to validate only configured content types are returned by
  the api/config/types API endpoint
2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky
b177adf3a7 Return value of search_type in /config/type API endpoint
- Remove need for interfaces to downcase content types returned by API
  before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky
ede6eb6879 Re-enable testing search and update API with image content type
It may have been disabled due to issues with image search earlier
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
88a9eadfba Use client pytest fixture to test API with plugin type configured 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
ab501a56c9 Create pytest fixture to configure app with plugin, search types 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
f944408e69 Update content_config pytest fixture to index plugin content 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
88344f9ed2 Improve rendering search results of plugin content types on web interface
Render only the entry from plugin search response instead of raw json
Use the results-ledger styling for results-plugin styling
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
c2814fce58 Improve rendering search results of plugin content types in khoj.el
Render only the entry from plugin search response instead of raw json
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
f3f24387ec Use new config/types API to set enabled content types on web interface 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
1e43f1a12e Use new config/types API to set enabled content types in khoj.el menu 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
9d38eadd42 Return enabled content types via api/config/types API endpoint
Simplifies dynamically populating enabled content types for interfaces
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
68bd5d9ebc Configure API routes after set up search types while configuring server
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
d91c7e2761 Search for plugin content via the search API 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
47b58a2a4d Configure, use dynamically instantiated SearchType enum on app start
The SearchType is now dynamically populated with core and configured
plugin types

Use the new dynamic SearchType enum from state.py across codebase
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
ab0d3a08e2 Index configured plugins on app start and via update API endpoint 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
55a032e8c4 Add processor to index entries from jsonl files for plugins
- Read, merge entries from input jsonl files and filters
- Mark new, modified entries for update
2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky
fcbbe8c759 Read content plugin configs from Khoj config YAML
Configure external text content plugins via the Khoj YAML
Reuse existing TextContentConfig definition for external text content plugins
2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky
f57d7bf5ad Use pypi khoj to fix docker builds and dockerize github workflow
- Instead of building the package locally like before
  The issue started since moving to dynamic git based versioning with hatch-vcs
  This should reduce image size of docker builds too

- Also move to ubuntu image since pyqt6 builds available on it, so do
  not need to build it locally for image

- This s
2023-02-19 01:57:01 -06:00
Debanjum Singh Solanky
fada617faa Fix TOC links, Add how to auto start Khoj server to Readme
Rename tools directory to more standard scripts directory
2023-02-18 23:51:02 -06:00
Debanjum Singh Solanky
61b6ee2857 Use helper script to bump khoj pre-release versions 2023-02-17 20:31:51 -06:00
364 changed files with 80966 additions and 5704 deletions

View File

@@ -6,4 +6,5 @@ docs/
tests/
build/
dist/
scripts/
*.egg-info/

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
# Exclude tests data file from programming stats on Github
tests/data/** linguist-vendored

42
.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file
View File

@@ -0,0 +1,42 @@
---
name: Bug Report
about: Create a bug to help fix something that might not be working correctly
title: "[FIX]"
labels: fix
assignees: ''
---
## Describe the bug
A clear and concise description of what the bug is. Please include what you were expecting to happen vs. what actually happened.
## To Reproduce
Steps to reproduce the behavior:
## Screenshots
If applicable, add screenshots to help explain your problem.
## Platform
- Server:
- [ ] Cloud-Hosted (https://app.khoj.dev)
- [ ] Self-Hosted Docker
- [ ] Self-Hosted Python package
- [ ] Self-Hosted source code
- Client:
- [ ] Obsidian
- [ ] Emacs
- [ ] Desktop app
- [ ] Web browser
- [ ] WhatsApp
- OS:
- [ ] Windows
- [ ] macOS
- [ ] Linux
- [ ] Android
- [ ] iOS
### If self-hosted
- Server Version [e.g. 1.0.1]:
## Additional context
Add any other context about the problem here.

View File

@@ -0,0 +1,11 @@
---
name: Feature Request
about: Suggest an idea to help make Khoj a better tool
title: "[IDEA]"
labels: "upgrade"
assignees: ''
---
## Describe the feature you'd like
A clear and concise description of what you want to happen. Include any relevant links or screenshots or inspiration.

View File

@@ -24,16 +24,16 @@ jobs:
- name: Set up Python 3.9
uses: actions/setup-python@v1
with: { python-version: 3.9 }
- name: Install
- name: ⏬️ Install Dependencies
run: |
python -m pip install --upgrade pip
sudo apt-get install emacs && emacs --version
git clone https://github.com/riscy/melpazoid.git ~/melpazoid
pip install ~/melpazoid
- name: Run
- name: 🌡️ Validate Khoj.el
env:
# Khoj recipe from https://github.com/melpa/melpa/pull/8321/files
RECIPE: (khoj :fetcher github :repo "debanjum/khoj" :files ("src/interface/emacs/*.el"))
RECIPE: (khoj :fetcher github :repo "khoj-ai/khoj" :files ("src/interface/emacs/*.el"))
EXIST_OK: true
LOCAL_REPO: ${{ github.workspace }}
run: echo $GITHUB_REF && make -C ~/melpazoid

99
.github/workflows/desktop.yml vendored Normal file
View File

@@ -0,0 +1,99 @@
name: desktop
on:
push:
tags:
- "*"
branches:
- 'master'
paths:
- src/interface/desktop/**
- .github/workflows/desktop.yml
jobs:
build:
name: 🖥️ Build, Release Desktop App
runs-on: ubuntu-latest
env:
TODESKTOP_ACCESS_TOKEN: ${{ secrets.TODESKTOP_ACCESS_TOKEN }}
TODESKTOP_EMAIL: ${{ secrets.TODESKTOP_EMAIL }}
defaults:
run:
shell: bash
working-directory: src/interface/desktop
steps:
- name: ⬇️ Checkout Code
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: ⤵️ Install Node
uses: actions/setup-node@v3
with:
node-version: "lts/*"
- name: ⚙️ Setup Desktop Build
run: |
yarn
npm install -g @todesktop/cli
sed -i "s/\"id\": \"\"/\"id\": \"${{ secrets.TODESKTOP_ID }}\"/g" todesktop.json
- name: ⚙️ Build Desktop App
run: |
npx todesktop build
- name: 📦 Release Desktop App
if: startsWith(github.ref, 'refs/tags/')
run: |
npx todesktop release --latest --force
- name: ⤵️ Get Desktop Apps
if: startsWith(github.ref, 'refs/tags/')
run: |
build_no=`npx todesktop builds --latest | tail -n 1 | awk -F'/' '{print $NF}'`
sleep 900 # wait for 15 minutes for the build to be available
wget https://download.khoj.dev/builds/$build_no/mac/dmg/arm64 -O khoj-${{ github.ref_name }}-arm64.dmg
wget https://download.khoj.dev/builds/$build_no/mac/dmg/x64 -O khoj-${{ github.ref_name }}-x64.dmg
wget https://download.khoj.dev/builds/$build_no/windows/nsis/x64 -O khoj-${{ github.ref_name }}-x64.exe
wget https://download.khoj.dev/builds/$build_no/linux/deb/x64 -O khoj-${{ github.ref_name }}-x64.deb
wget https://download.khoj.dev/builds/$build_no/linux/appImage/x64 -O khoj-${{ github.ref_name }}-x64.AppImage
- name: ⏫ Upload Mac ARM App
if: startsWith(github.ref, 'refs/tags/')
uses: actions/upload-artifact@v3
with:
if-no-files-found: warn
name: khoj-${{ github.ref_name }}-arm64.dmg
path: src/interface/desktop/khoj-${{ github.ref_name }}-arm64.dmg
- name: ⏫ Upload Mac x64 App
if: startsWith(github.ref, 'refs/tags/')
uses: actions/upload-artifact@v3
with:
if-no-files-found: warn
name: khoj-${{ github.ref_name }}-x64.dmg
path: src/interface/desktop/khoj-${{ github.ref_name }}-x64.dmg
- name: ⏫ Upload Windows App
if: startsWith(github.ref, 'refs/tags/')
uses: actions/upload-artifact@v3
with:
if-no-files-found: warn
name: khoj-${{ github.ref_name }}-x64.exe
path: src/interface/desktop/khoj-${{ github.ref_name }}-x64.exe
- name: ⏫ Upload Debian App
if: startsWith(github.ref, 'refs/tags/')
uses: actions/upload-artifact@v3
with:
if-no-files-found: warn
name: khoj-${{ github.ref_name }}-x64.deb
path: src/interface/desktop/khoj-${{ github.ref_name }}-x64.deb
- name: ⏫ Upload Linux App Image
if: startsWith(github.ref, 'refs/tags/')
uses: actions/upload-artifact@v3
with:
if-no-files-found: warn
name: khoj-${{ github.ref_name }}-x64.AppImage
path: src/interface/desktop/khoj-${{ github.ref_name }}-x64.AppImage

View File

@@ -8,23 +8,47 @@ on:
- master
paths:
- src/khoj/**
- config/**
- pyproject.toml
- Dockerfile
- prod.Dockerfile
- docker-compose.yml
- .github/workflows/dockerize.yml
workflow_dispatch:
inputs:
tag:
description: 'Docker image tag'
default: 'dev'
khoj:
description: 'Build Khoj docker image'
type: boolean
default: true
khoj-cloud:
description: 'Build Khoj cloud docker image'
type: boolean
default: true
env:
DOCKER_IMAGE_TAG: ${{ github.ref == 'refs/heads/master' && 'latest' || github.ref_name }}
# Tag Image with tag name on release
# else with user specified tag (default 'dev') if triggered via workflow
# else with 'pre' (if push to master)
DOCKER_IMAGE_TAG: ${{ github.ref_type == 'tag' && github.ref_name || github.event_name == 'workflow_dispatch' && github.event.inputs.tag || 'pre' }}
jobs:
build:
name: Build Docker Image, Push to Container Registry
name: Publish Khoj Docker Images
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
image:
- 'local'
- 'cloud'
steps:
- name: Checkout Code
uses: actions/checkout@v3
with:
# Get all history to correctly infer Khoj version using hatch
fetch-depth: 0
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
@@ -36,12 +60,36 @@ jobs:
username: ${{ github.repository_owner }}
password: ${{ secrets.PAT }}
- name: Build and Push Docker Image
- name: Get App Version
id: hatch
run: echo "version=$(pipx run hatch version)" >> $GITHUB_OUTPUT
- name: 📦 Build and Push Docker Image
uses: docker/build-push-action@v2
if: (matrix.image == 'local' && github.event_name == 'workflow_dispatch') && github.event.inputs.khoj == 'true' || (matrix.image == 'local' && github.event_name == 'push')
with:
context: .
file: Dockerfile
platforms: linux/amd64, linux/arm64
push: true
tags: ghcr.io/${{ github.repository }}:${{ env.DOCKER_IMAGE_TAG }}
tags: |
ghcr.io/${{ github.repository }}:${{ env.DOCKER_IMAGE_TAG }}
${{ github.ref_type == 'tag' && format('ghcr.io/{0}:latest', github.repository) || '' }}
build-args: |
PORT=8000
VERSION=${{ steps.hatch.outputs.version }}
PORT=42110
- name: 📦️⛅️ Build and Push Cloud Docker Image
uses: docker/build-push-action@v2
if: (matrix.image == 'cloud' && github.event_name == 'workflow_dispatch') && github.event.inputs.khoj-cloud == 'true' || (matrix.image == 'cloud' && github.event_name == 'push')
with:
context: .
file: prod.Dockerfile
platforms: linux/amd64
push: true
tags: |
ghcr.io/${{ github.repository }}-cloud:${{ env.DOCKER_IMAGE_TAG }}
${{ github.ref_type == 'tag' && format('ghcr.io/{0}-cloud:latest', github.repository) || '' }}
build-args: |
VERSION=${{ steps.hatch.outputs.version }}
PORT=42110

View File

@@ -0,0 +1,47 @@
name: dockerize telemetry server
on:
push:
branches:
- master
paths:
- src/telemetry/**
- .github/workflows/dockerize_telemetry_server.yml
pull_request:
branches:
- master
paths:
- src/telemetry/**
- .github/workflows/dockerize_telemetry_server.yml
workflow_dispatch:
env:
DOCKER_IMAGE_TAG: ${{ github.ref == 'refs/heads/master' && 'latest' || github.event.pull_request.number }}
jobs:
build:
name: Build Docker Image, Push to Container Registry
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.PAT }}
- name: 📦 Build and Push Docker Image
uses: docker/build-push-action@v2
with:
context: src/telemetry
file: src/telemetry/Dockerfile
push: true
tags: ghcr.io/${{ github.repository }}-telemetry:${{ env.DOCKER_IMAGE_TAG }}
secrets: |
"POSTHOG_API_KEY=${{ secrets.POSTHOG_API_KEY }}"

View File

@@ -0,0 +1,46 @@
name: build and deploy github pages for documentation
on:
push:
branches:
- 'master'
permissions:
contents: read
pages: write
id-token: write
jobs:
deploy:
environment:
name: github-pages
url: https://docs.khoj.dev
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
# 👇 Build steps
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: 18.x
cache: yarn
cache-dependency-path: documentation/yarn.lock
- name: Install dependencies
run: |
cd documentation
yarn install --frozen-lockfile --non-interactive
- name: Build
run: |
cd documentation
yarn build
# 👆 Build steps
- name: Setup Pages
uses: actions/configure-pages@v3
- name: Upload artifact
uses: actions/upload-pages-artifact@v2
with:
# 👇 Specify build output path
path: documentation/build
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2

48
.github/workflows/pre-commit.yml vendored Normal file
View File

@@ -0,0 +1,48 @@
name: pre-commit
on:
pull_request:
paths:
- src/**
- tests/**
- config/**
- pyproject.toml
- .pre-commit-config.yml
- .github/workflows/test.yml
push:
branches:
- master
paths:
- src/khoj/**
- tests/**
- config/**
- pyproject.toml
- .pre-commit-config.yml
- .github/workflows/test.yml
jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
strategy:
fail-fast: false
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: ⏬️ Install Dependencies
run: |
sudo apt update && sudo apt install -y libegl1
python -m pip install --upgrade pip
- name: ⬇️ Install Application
run: pip install --upgrade .[dev]
- name: 🌡️ Validate Application
run: pre-commit run --hook-stage manual --all

View File

@@ -21,7 +21,7 @@ on:
jobs:
publish:
name: Publish Python Package to PyPI
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
@@ -32,7 +32,7 @@ jobs:
with:
python-version: '3.10'
- name: Install Application
- name: ⬇️ Install Application
run: python -m pip install --upgrade pip && pip install --upgrade .
- name: ⚙️ Build Python Package
@@ -45,10 +45,10 @@ jobs:
# Build PyPi Package
pipx run build
- name: 👀 Validate Python Package
- name: 🌡️ Validate Python Package
run: |
# Validate PyPi Package
pipx run check-wheel-contents dist/*.whl
pipx run check-wheel-contents dist/*.whl --ignore W004
pipx run twine check dist/*
- name: ⏫ Upload Python Package Artifacts
@@ -62,10 +62,3 @@ jobs:
uses: pypa/gh-action-pypi-publish@v1.6.4
with:
password: ${{ secrets.PYPI_API_KEY }}
- name: 📦 Publish Python Package to Test PyPI
if: ${{ github.event.pull_request.head.repo.full_name == github.repository }}
uses: pypa/gh-action-pypi-publish@v1.6.4
with:
password: ${{ secrets.PYPI_API_KEY }}
repository_url: https://test.pypi.org/legacy/

View File

@@ -13,8 +13,10 @@ on:
jobs:
publish_obsidian_plugin:
name: Publish Obsidian Plugin
name: 💎 Publish Obsidian Plugin
runs-on: ubuntu-latest
permissions:
contents: write
defaults:
run:
shell: bash
@@ -27,26 +29,33 @@ jobs:
with:
node-version: "lts/*"
- name: Build Obsidian Plugin
- name: ⚙️ Build Obsidian Plugin
run: |
yarn
yarn run build --if-present
- name: Upload Obsidian Plugin main.js
- name: Upload Obsidian Plugin main.js
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: main.js
path: src/interface/obsidian/main.js
- name: Upload Obsidian Plugin manifest.json
- name: Upload Obsidian Plugin manifest.json
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: manifest.json
path: src/interface/obsidian/manifest.json
- name: Create Release
- name: ⏫ Upload Obsidian Plugin styles.css
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: styles.css
path: src/interface/obsidian/styles.css
- name: 🌈 Create Release
uses: softprops/action-gh-release@v1
if: startsWith(github.ref, 'refs/tags/')
with:
@@ -54,109 +63,4 @@ jobs:
files: |
src/interface/obsidian/main.js
src/interface/obsidian/manifest.json
publish_desktop_apps:
name: Publish Desktop Apps
strategy:
matrix:
include:
- os: ubuntu-latest
extension: deb
- os: macos-latest
extension: dmg
- os: windows-latest
extension: exe
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.9
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install Dependencies
shell: bash
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
sudo apt install libegl1 libxcb-xinerama0 python3-tk -y
fi
python -m pip install --upgrade pip
pip install pyinstaller
- name: Install Khoj App
run: |
pip install --upgrade .
- name: Package Khoj App
shell: bash
run: |
# Setup Environment for Reproducible Builds
export PYTHONHASHSEED=42
export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
pyinstaller --noconfirm Khoj.spec
if [ "$RUNNER_OS" == "Windows" ]; then
mv dist/Khoj.exe dist/khoj_"$GITHUB_REF_NAME"_amd64.exe
fi
- name: Create Mac App DMG
if: matrix.os == 'macos-latest'
run: |
# Install Mac DMG Creator
brew install create-dmg
# Copy app to separate dmg folder
mkdir -p dist/dmg && cp -r dist/Khoj.app dist/dmg
# Create disk image with the app
create-dmg \
--volname "Khoj" \
--volicon "src/khoj/interface/web/assets/icons/favicon.icns" \
--window-pos 200 120 \
--window-size 600 300 \
--icon-size 100 \
--icon "Khoj.app" 175 120 \
--hide-extension "Khoj.app" \
--app-drop-link 425 120 \
"dist/khoj_"$GITHUB_REF_NAME"_amd64.dmg" \
"dist/dmg/"
- uses: ruby/setup-ruby@v1
if: matrix.os == 'ubuntu-latest'
with:
ruby-version: '3.0'
- name: Create Debian Package
if: matrix.os == 'ubuntu-latest'
shell: bash
env:
DEBIAN_PACKAGE_VERSION: ${{ inputs.version }}
run: |
# Install Debian Packager
gem install fpm
# Copy app files into expected output directory structure
mkdir -p package/opt package/usr/share/applications package/usr/share/icons/hicolor/128x128/apps
cp -r dist/Khoj package/opt/Khoj
cp src/khoj/interface/web/assets/icons/favicon-128x128.png package/usr/share/icons/hicolor/128x128/apps/Khoj.png
cp Khoj.desktop package/usr/share/applications
# Fix permissions to be usable by non-root users
find package/usr/share -type f -exec chmod 644 -- {} +
chmod 755 package/opt/Khoj
# Package the app
if [ -z "$DEBIAN_PACKAGE_VERSION" ]; then
DEBIAN_PACKAGE_VERSION=$(echo $GITHUB_REF_NAME | sed -E 's/v(.*)/\1/g')
fi
fpm -C package -s dir -t deb -n Khoj --version $DEBIAN_PACKAGE_VERSION -p dist/khoj_"$GITHUB_REF_NAME"_amd64.deb
- uses: actions/upload-artifact@v3
with:
name: khoj_${{github.ref_name}}_amd64.${{matrix.extension}}
path: dist/khoj_${{github.ref_name}}_amd64.${{matrix.extension}}
- name: Release
uses: softprops/action-gh-release@v1
if: startsWith(github.ref, 'refs/tags/')
with:
generate_release_notes: true
files: dist/khoj_${{github.ref_name}}_amd64.${{matrix.extension}}
src/interface/obsidian/styles.css

View File

@@ -2,8 +2,6 @@ name: test
on:
pull_request:
branches:
- 'master'
paths:
- src/khoj/**
- tests/**
@@ -13,7 +11,7 @@ on:
- .github/workflows/test.yml
push:
branches:
- 'master'
- master
paths:
- src/khoj/**
- tests/**
@@ -26,31 +24,66 @@ jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
container: ubuntu:jammy
strategy:
fail-fast: false
matrix:
python_version:
- '3.8'
- '3.9'
- '3.10'
- '3.11'
services:
postgres:
image: ankane/pgvector
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
ports:
- 5432:5432
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python_version }}
- name: Install Dependencies
- name: Install Git
run: |
sudo apt update && sudo apt install -y libegl1
apt update && apt install -y git
- name: ⏬️ Install Dependencies
env:
DEBIAN_FRONTEND: noninteractive
run: |
apt update && apt install -y libegl1 sqlite3 libsqlite3-dev libsqlite3-0 ffmpeg libsm6 libxext6
- name: ⬇️ Install Postgres
env:
DEBIAN_FRONTEND: noninteractive
run : |
apt install -y postgresql postgresql-client && apt install -y postgresql-server-dev-14
- name: ⬇️ Install pip
run: |
apt install -y python3-pip
python -m ensurepip --upgrade
python -m pip install --upgrade pip
- name: Install Application
run: pip install --upgrade .[dev]
- name: ⬇️ Install Application
run: sed -i 's/dynamic = \["version"\]/version = "0.0.0"/' pyproject.toml && pip install --upgrade .[dev]
- name: Validate Application
run: pre-commit run --hook-stage manual --all
- name: Test Application
- name: 🧪 Test Application
env:
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
run: pytest
timeout-minutes: 10

View File

@@ -33,7 +33,7 @@ jobs:
with:
version: ${{ matrix.emacs_version }}
- uses: actions/checkout@v3
- name: Test Khoj.el
- name: 🧪 Test Khoj.el
run: |
# Run ERT tests on khoj.el
emacs -batch \
@@ -42,7 +42,10 @@ jobs:
(push '(\"melpa\" . \"https://melpa.org/packages/\") package-archives) \
(package-initialize) \
(unless package-archive-contents (package-refresh-contents)) \
(unless (package-installed-p 'transient) (package-install 'transient)))" \
(unless (package-installed-p 'transient) (package-install 'transient)) \
(unless (package-installed-p 'dash) (package-install 'dash)) \
(unless (package-installed-p 'org) (package-install 'org)) \
)" \
-l ert \
-l ./src/interface/emacs/khoj.el \
-l ./src/interface/emacs/tests/khoj-tests.el \

8
.gitignore vendored
View File

@@ -10,6 +10,9 @@ __pycache__
.emacs.desktop*
*.py[cod]
.vscode
.env
.venv/*
todesktop.json
# Build artifacts
/src/khoj/interface/web/images
@@ -18,7 +21,8 @@ __pycache__
khoj_assistant.egg-info
/config/khoj*.yml
.pytest_cache
khoj.log
*.log
static
# Obsidian plugin artifacts
# ---
@@ -27,7 +31,7 @@ node_modules
# Don't include the compiled obsidian main.js file in the repo.
# They should be uploaded to GitHub releases instead.
main.js
src/interface/obsidian/main.js
# Exclude sourcemaps
*.map

View File

@@ -15,6 +15,13 @@ repos:
- id: check-toml
- id: check-yaml
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.0.0
hooks:

View File

@@ -1,21 +1,28 @@
# syntax=docker/dockerfile:1
FROM python:3.10-slim-bullseye
LABEL org.opencontainers.image.source https://github.com/debanjum/khoj
FROM ubuntu:jammy
LABEL org.opencontainers.image.source https://github.com/khoj-ai/khoj
# Install System Dependencies
RUN apt-get update -y && \
apt-get -y install python3-pyqt5
RUN apt update -y && apt -y install python3-pip git swig
# Copy Application to Container
COPY . /app
WORKDIR /app
# Install Python Dependencies
RUN pip install --upgrade pip && pip install --upgrade ".[dev]"
# Install Application
COPY pyproject.toml .
COPY README.md .
ARG VERSION=0.0.0
RUN sed -i "s/dynamic = \\[\"version\"\\]/version = \"$VERSION\"/" pyproject.toml && \
pip install --no-cache-dir .
# Copy Source Code
COPY . .
# Set the PYTHONPATH environment variable in order for it to find the Django app.
ENV PYTHONPATH=/app/src:$PYTHONPATH
# Run the Application
# There are more arguments required for the application to run,
# but these should be passed in through the docker-compose.yml file.
ARG PORT
EXPOSE ${PORT}
ENTRYPOINT ["khoj"]
ENTRYPOINT ["python3", "src/khoj/main.py"]

View File

@@ -1,7 +0,0 @@
[Desktop Entry]
Type=Application
Name=Khoj
Comment=A natural language search engine for your personal notes, transactions and images.
Path=/opt
Exec=/opt/Khoj
Icon=Khoj

115
Khoj.spec
View File

@@ -1,115 +0,0 @@
# -*- mode: python ; coding: utf-8 -*-
from os.path import join
from platform import system
from PyInstaller.utils.hooks import copy_metadata
import sysconfig
datas = [
('src/khoj/interface/web', 'src/khoj/interface/web'),
(f'{sysconfig.get_paths()["purelib"]}/transformers', 'transformers')
]
datas += copy_metadata('tqdm')
datas += copy_metadata('regex')
datas += copy_metadata('requests')
datas += copy_metadata('packaging')
datas += copy_metadata('filelock')
datas += copy_metadata('numpy')
datas += copy_metadata('tokenizers')
block_cipher = None
a = Analysis(
['src/khoj/main.py'],
pathex=[],
binaries=[],
datas=datas,
hiddenimports=['huggingface_hub.repository'],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
)
# Filter out unused and/or duplicate shared libs
torch_lib_paths = {
join('torch', 'lib', 'libtorch_cuda.so'),
join('torch', 'lib', 'libtorch_cpu.so'),
}
a.datas = [entry for entry in a.datas if not entry[0] in torch_lib_paths]
os_path_separator = '\\' if system() == 'Windows' else '/'
a.datas = [entry for entry in a.datas if not f'torch{os_path_separator}_C.cp' in entry[0]]
a.datas = [entry for entry in a.datas if not f'torch{os_path_separator}_dl.cp' in entry[0]]
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
if system() != 'Darwin':
# Add Splash screen to show on app launch
splash = Splash(
'src/khoj/interface/web/assets/icons/favicon-144x144.png',
binaries=a.binaries,
datas=a.datas,
text_pos=(10, 160),
text_size=12,
text_color='black',
minify_script=True,
always_on_top=True
)
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
splash,
splash.binaries,
[],
name='Khoj',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=False,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch='x86_64',
codesign_identity=None,
entitlements_file=None,
icon='src/khoj/interface/web/assets/icons/favicon-144x144.ico',
)
else:
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='Khoj',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=False,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch='x86_64',
codesign_identity=None,
entitlements_file=None,
icon='src/khoj/interface/web/assets/icons/favicon.icns',
)
app = BUNDLE(
exe,
name='Khoj.app',
icon='src/khoj/interface/web/assets/icons/favicon.icns',
bundle_identifier=None,
)

152
LICENSE
View File

@@ -1,23 +1,21 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
@@ -26,44 +24,34 @@ them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
@@ -72,7 +60,7 @@ modification follow.
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
@@ -549,35 +537,45 @@ to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
@@ -619,3 +617,45 @@ Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.

478
README.md
View File

@@ -1,459 +1,53 @@
# Khoj 🦅
[![test](https://github.com/debanjum/khoj/actions/workflows/test.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/test.yml)
[![dockerize](https://github.com/debanjum/khoj/actions/workflows/dockerize.yml/badge.svg)](https://github.com/debanjum/khoj/pkgs/container/khoj)
[![pypi](https://github.com/debanjum/khoj/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/khoj-assistant/)
<p align="center"><img src="src/khoj/interface/web/assets/icons/khoj-logo-sideways-500.png" width="230" alt="Khoj Logo"></p>
*A natural language search engine for your personal notes, transactions and images*
<div align="center">
**Supported Plugins**
[![test](https://github.com/khoj-ai/khoj/actions/workflows/test.yml/badge.svg)](https://github.com/khoj-ai/khoj/actions/workflows/test.yml)
[![dockerize](https://github.com/khoj-ai/khoj/actions/workflows/dockerize.yml/badge.svg)](https://github.com/khoj-ai/khoj/pkgs/container/khoj)
[![pypi](https://github.com/khoj-ai/khoj/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/khoj-assistant/)
[![Khoj on Obsidian](https://img.shields.io/badge/Obsidian-%23483699.svg?style=for-the-badge&logo=obsidian&logoColor=white)](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#readme)
[![Khoj on Emacs](https://img.shields.io/badge/Emacs-%237F5AB6.svg?&style=for-the-badge&logo=gnu-emacs&logoColor=white)](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#readme)
</div>
## Table of Contents
<div align="center">
<b>An AI personal assistant for your digital brain</b>
</div>
- [Features](#Features)
- [Demos](#Demos)
- [Khoj in Obsidian](#khoj-in-obsidian)
- [Khoj in Emacs, Browser](#khoj-in-emacs-browser)
- [Interfaces](#Interfaces)
- [Architecture](#Architecture)
- [Setup](#Setup)
- [Install](#1-Install)
- [Configure](#2-Configure)
- [Run](#3-Run)
- [Use](#Use)
- [Interfaces](#Interfaces-1)
- [Query Filters](#Query-filters)
- [Upgrade](#Upgrade)
- [Khoj Server](#upgrade-khoj-server)
- [Khoj.el](#upgrade-khoj-on-emacs)
- [Khoj Obsidian](#upgrade-khoj-on-obsidian)
- [Uninstall Khoj](#uninstall-khoj)
- [Troubleshoot](#Troubleshoot)
- [Advanced Usage](#advanced-usage)
- [Access Khoj on Mobile](#access-khoj-on-mobile)
- [Chat with Notes](#chat-with-notes)
- [Use OpenAI Models for Search](#use-openai-models-for-search)
- [Search across Different Languages](#search-across-different-languages)
- [Miscellaneous](#Miscellaneous)
- [Setup OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
- [Beta API](#beta-api)
- [Performance](#Performance)
- [Query Performance](#Query-performance)
- [Indexing Performance](#Indexing-performance)
- [Miscellaneous](#Miscellaneous-1)
- [Development](#Development)
- [Visualize Codebase](#visualize-codebase)
- [Setup](#Setup)
- [Using Pip](#Using-Pip)
- [Using Docker](#Using-Docker)
- [Using Conda](#Using-Conda)
- [Validate](#Validate)
- [Credits](#Credits)
<br />
## Features
<div align="center">
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search, indexing is done on your machine[\*](https://github.com/debanjum/khoj#beta-api)
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Search your Org-mode and Markdown notes, Beancount transactions and Photos
- **Multiple Interfaces**: Search from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
[📜 Read Docs](https://docs.khoj.dev)
<span>&nbsp;&nbsp;&nbsp;&nbsp;</span>
[🌍 Try Khoj Cloud](https://khoj.dev)
<span>&nbsp;&nbsp;&nbsp;&nbsp;</span>
[💬 Get Involved](https://discord.gg/BDgyabRM6e)
## Demos
### Khoj in Obsidian
https://user-images.githubusercontent.com/6413477/210486007-36ee3407-e6aa-4185-8a26-b0bfc0a4344f.mp4
</div>
<details><summary>Description</summary>
<div align="center">
- Install Khoj via `pip` and start Khoj backend in non-gui mode
- Install Khoj plugin via Community Plugins settings pane on Obsidian app
- Check the new Khoj plugin settings
- Let Khoj backend index the markdown files in the current Vault
- Open Khoj plugin on Obsidian via Search button on Left Pane
- Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs](https://marcus.se.net/obsidian-plugin-docs/)
- Jump to the [search result](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin)
</details>
***
### Khoj in Emacs, Browser
https://user-images.githubusercontent.com/6413477/184735169-92c78bf1-d827-4663-9087-a1ea194b8f4b.mp4
Khoj is an AI application to search and chat with your notes and documents.<br />
It is open-source, self-hostable and accessible on Desktop, Emacs, Obsidian, Web and Whatsapp.<br />
It works with pdf, markdown, org-mode, notion files and github repositories.<br />
It can paint, search the internet and understand speech.<br />
<details><summary>Description</summary>
***
- Install Khoj via pip
- Start Khoj app
- Add this readme and [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs) as org-mode for Khoj to index
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#2-Install-Khojel)
</details>
</div>
<details><summary>Analysis</summary>
| 🔎 Search | 💬 Chat |
|:---------:|:-------:|
| Quickly retrieve relevant documents using natural language | Get answers and create content from your existing knowledge base |
| Does not need internet | Can be configured to work without internet |
| <img src="https://docs.khoj.dev/img/khoj_search_on_web.png" width="400px"> | <img src="https://docs.khoj.dev/img/khoj_chat_on_web.png" width="400px"> |
- The results do not have any words used in the query
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter
</details>
## Contributors
Cheers to our awesome contributors! 🎉
### Interfaces
<a href="https://github.com/khoj-ai/khoj/graphs/contributors">
<img src="https://contrib.rocks/image?repo=khoj-ai/khoj" />
</a>
![](https://github.com/debanjum/khoj/blob/master/docs/interfaces.png?)
## Architecture
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_architecture.png?)
## Setup
These are the general setup instructions for Khoj.
- Check the [Khoj.el Readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Setup) to setup Khoj with Emacs
- Check the [Khoj Obsidian Readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#Setup) to setup Khoj with Obsidian<br />
Its simpler as it can skip the configure step below.
### 1. Install
```shell
pip install khoj-assistant
```
### 2. Start App
```shell
khoj
```
### 3. Configure
1. Enable content types and point to files to search in the First Run Screen that pops up on app start
2. Click `Configure` and wait. The app will download ML models and index the content for search
## Use
### Interfaces
- **Khoj via Obsidian**
- [Install](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) the Khoj Obsidian plugin
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **Khoj via Emacs**
- [Install](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation) [khoj.el](./src/interface/emacs/khoj.el)
- Run `M-x khoj <user-query>`
- **Khoj via Web**
- Open <http://localhost:8000/> via desktop interface or directly
- **Khoj via API**
- See the Khoj FastAPI [Swagger Docs](http://localhost:8000/docs), [ReDocs](http://localhost:8000/redocs)
### Query Filters
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
## Upgrade
### Upgrade Khoj Server
```shell
pip install --upgrade khoj-assistant
```
- Note: To upgrade to the latest pre-release version of the khoj server run below command
```shell
# Maps to the latest commit on the master branch
pip install --upgrade --pre khoj-assistant
```
### Upgrade Khoj on Emacs
- Use your Emacs Package Manager to Upgrade
- See [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade) for details
### Upgrade Khoj on Obsidian
- Upgrade via the Community plugins tab on the settings pane in the Obsidian app
- See the [khoj plugin readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) for details
## Uninstall Khoj
1. (Optional) Hit `Ctrl-C` in the terminal running the khoj server to stop it
2. Delete the khoj directory in your home folder (i.e `~/.khoj` on Linux, Mac or `C:\Users\<your-username>\.khoj` on Windows)
3. Uninstall the khoj server with `pip uninstall khoj-assistant`
4. (Optional) Uninstall khoj.el or the khoj obsidian plugin in the standard way on Emacs, Obsidian
## Troubleshoot
#### Install fails while building Tokenizer dependency
- **Details**: `pip install khoj-assistant` fails while building the `tokenizers` dependency. Complains about Rust.
- **Fix**: Install Rust to build the tokenizers package. For example on Mac run:
```shell
brew install rustup
rustup-init
source ~/.cargo/env
```
- **Refer**: [Issue with Fix](https://github.com/debanjum/khoj/issues/82#issuecomment-1241890946) for more details
#### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:8000/api/update?force=true)[^2] in browser to regenerate index from scratch
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
#### Khoj in Docker errors out with \"Killed\" in error message
- **Fix**: Increase RAM available to Docker Containers in Docker Settings
- **Refer**: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)
#### Khoj errors out complaining about Tensors mismatch or null
- **Mitigation**: Disable `image` search using the desktop GUI
## Advanced Usage
### Access Khoj on Mobile
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:8000` or `http://name-of-server:8000` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
5. Enjoy exploring your notes, transactions and images from your phone!
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_pwa_android.png?)
### Chat with Notes
#### Overview
- Provides a chat interface to inquire and engage with your notes
- Chat Types:
- **Summarize**: Pulls the most relevant note from your notes and summarizes it
- **Chat**: Also does general chat. It guesses whether to give a general response or search, summarizes from your note. <br />
E.g *"how was your day?"* will give a general response. But *When did I go surfing?* should give a response from your notes
- **Note**: *Your query and top note from search result will be sent to OpenAI for processing*
#### Use
1. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
2. Open [/chat?t=summarize](http://localhost:8000/chat?t=summarize)[^2]
3. Type your queries, see summarized response by Khoj from your notes
#### Demo
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_chat_web_interface.png?)
### Use OpenAI Models for Search
#### Setup
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: text-embedding-ada-002
+ encoder-type: src.khoj.utils.models.OpenAI
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index
### Search across Different Languages
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update `search-type > asymmetric > encoder` to `sentence-transformer/paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-vi"
+ encoder: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "~/.khoj/search/asymmetric/"
```
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:8000/api/update?t=force)
## Miscellaneous
### Set your OpenAI API key in Khoj
If you want, Khoj can be configured to use OpenAI for search and chat.<br />
Add your OpenAI API to Khoj by using either of the two options below:
- Open the Khoj desktop GUI, add your [OpenAI API key](https://beta.openai.com/account/api-keys) and click *Configure*
Ensure khoj is started without the `--no-gui` flag. Check your system tray to see if Khoj 🦅 is minimized there.
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
```diff
processor:
conversation:
- openai-api-key: # "YOUR_OPENAI_API_KEY"
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
```
**Warning**: *This will enable khoj to send your query and note(s) to OpenAI for processing*
### Beta API
- The beta [chat](http://localhost:8000/api/beta/chat), [summarize](http://localhost:8000/api/beta/summarize) and [search](http://localhost:8000/api/beta/search) API endpoints use [OpenAI API](https://openai.com/api/)
- They are disabled by default
- To use them:
1. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
2. Interact with them from the [Khoj Swagger docs](http://locahost:8000/docs)[^2]
## Performance
### Query performance
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
## Development
### Visualize Codebase
*[Interactive Visualization](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=debanjum%2Fkhoj)*
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_codebase_visualization_0.2.1.png?)
### Setup
#### Using Pip
##### 1. Install
```shell
git clone https://github.com/debanjum/khoj && cd khoj
python3 -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
```
##### 2. Run
1. Start Khoj
```shell
khoj -vv
```
2. Configure Khoj
- **Via GUI**: Add files, directories to index in the GUI window that pops up on starting Khoj, then Click Configure
- **Manually**:
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type` and `processor` sub-section(s) irrelevant for your use-case
- Restart khoj
Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
#### Using Docker
##### 1. Clone
```shell
git clone https://github.com/debanjum/khoj && cd khoj
```
##### 2. Configure
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes and beancount directories
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
##### 3. Run
```shell
docker-compose up -d
```
*Note: The first run will take time. Let it run, it\'s mostly not hung, just generating embeddings*
##### 4. Upgrade
```shell
docker-compose build --pull
```
#### Using Conda
##### 1. Install Dependencies
- [Install Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
##### 2. Install Khoj
```shell
git clone https://github.com/debanjum/khoj && cd khoj
conda env create -f config/environment.yml
conda activate khoj
python3 -m pip install pyqt6 # As conda does not support pyqt6 yet
```
##### 3. Configure
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type`, `processor` sub-sections irrelevant for your use-case
##### 4. Run
```shell
python3 -m src.khoj.main -vv
```
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
##### 5. Upgrade
```shell
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
```
### Validate
#### Before Make Changes
1. Install Git Hooks for Validation
```shell
pre-commit install -t pre-push -t pre-commit
```
- This ensures standard code formatting fixes and other checks run automatically on every commit and push
- Note 1: If [pre-commit](https://pre-commit.com/#intro) didn't already get installed, [install it](https://pre-commit.com/#install) via `pip install pre-commit`
- Note 2: To run the pre-commit changes manually, use `pre-commit run --hook-stage manual --all` before creating PR
#### Before Creating PR
1. Run Tests
```shell
pytest
```
2. Run MyPy to check types
```shell
mypy --config-file pyproject.toml
```
#### After Creating PR
- Automated [validation workflows](.github/workflows) run for every PR.
Ensure any issues seen by them our fixed
- Test the python packge created for a PR
1. Download and extract the zipped `.whl` artifact generated from the pypi workflow run for the PR.
2. Install (in your virtualenv) with `pip install /path/to/download*.whl>`
3. Start and use the application to see if it works fine
## Credits
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
[^2]: Default Khoj url @ http://localhost:8000
Made with [contrib.rocks](https://contrib.rocks).

View File

@@ -1,22 +0,0 @@
name: khoj
channels:
- conda-forge
dependencies:
- python=3.8.*
- numpy=1.22.4
- pytorch=1.13.1
- torchvision=0.14.1
- transformers=4.21.0
- sentence-transformers=2.1.0
- fastapi=0.77.1
- uvicorn=0.17.6
- pyyaml=6.0
- pytest=7.1.2
- pillow=9.3.0
- openai=0.20.0
- pydantic=1.9.1
- jinja2=3.1.2
- aiofiles=0.8.0
- huggingface_hub=0.8.1
- dateparser=1.1.1
- schedule=1.1.0

View File

@@ -1,116 +0,0 @@
name: khoj
channels:
- conda-forge
dependencies:
- aiofiles=0.8.0=pyhd8ed1ab_0
- asgiref=3.4.1=pyhd8ed1ab_0
- attrs=21.2.0=pyhd8ed1ab_0
- brotlipy=0.7.0=py39h5161555_1001
- ca-certificates=2022.6.15=h4653dfc_0
- certifi=2022.6.15=py39h2804cbe_0
- cffi=1.14.6=py39hda8b47f_0
- chardet=4.0.0=py39h2804cbe_1
- charset-normalizer=2.0.0=pyhd8ed1ab_0
- click=8.0.1=py39h2804cbe_0
- colorama=0.4.4=pyh9f0ad1d_0
- cryptography=3.4.7=py39h73257c9_0
- dataclasses=0.8=pyhc8e2a94_3
- dateparser=1.1.1=pyhd8ed1ab_0
- et_xmlfile=1.0.1=py_1001
- fastapi=0.68.2=pyhd8ed1ab_0
- filelock=3.0.12=pyh9f0ad1d_0
- freetype=2.10.4=h17b34a0_1
- future=0.18.2=py39h2804cbe_3
- h11=0.12.0=pyhd8ed1ab_0
- huggingface_hub=0.2.1=pyhd8ed1ab_0
- idna=3.1=pyhd3deb0d_0
- importlib-metadata=4.6.4=py39h2804cbe_0
- importlib_metadata=4.6.4=hd8ed1ab_0
- iniconfig=1.1.1=pyh9f0ad1d_0
- jbig=2.1=h3422bc3_2003
- jinja2=3.0.3=pyhd8ed1ab_0
- joblib=1.0.1=pyhd8ed1ab_0
- jpeg=9d=h27ca646_0
- lcms2=2.12=had6a04f_0
- lerc=2.2.1=h9f76cd9_0
- libblas=3.9.0=11_osxarm64_openblas
- libcblas=3.9.0=11_osxarm64_openblas
- libcxx=12.0.1=h168391b_0
- libdeflate=1.7=h27ca646_5
- libffi=3.3=h9f76cd9_2
- libgfortran=5.0.0.dev0=11_0_1_hf114ba7_23
- libgfortran5=11.0.1.dev0=hf114ba7_23
- liblapack=3.9.0=11_osxarm64_openblas
- libopenblas=0.3.17=openmp_h5dd58f0_1
- libpng=1.6.37=hf7e6567_2
- libprotobuf=3.16.0=hccf11d3_0
- libtiff=4.3.0=hc6122e1_1
- libwebp-base=1.2.1=h3422bc3_0
- llvm-openmp=12.0.1=hf3c4609_1
- lz4-c=1.9.3=hbdafb3b_1
- markupsafe=2.0.1=py39h5161555_1
- more-itertools=8.8.0=pyhd8ed1ab_0
- ncurses=6.2=h9aa5885_4
- ninja=1.10.2=h4d860bb_0
- nltk=3.6.2=pyhd8ed1ab_0
- numpy=1.21.4=py39h1f3b974_0
- olefile=0.46=pyh9f0ad1d_1
- openai=0.11.4=py39h2804cbe_0
- openjpeg=2.4.0=h062765e_1
- openpyxl=3.0.9=pyhd8ed1ab_0
- openssl=1.1.1q=ha287fd2_0
- packaging=21.0=pyhd8ed1ab_0
- pandas=1.3.4=py39h7f752ed_1
- pandas-stubs=1.2.0.38=py39h2804cbe_0
- pillow=8.3.2=py39ha74c66e_0
- pip=21.2.4=pyhd8ed1ab_0
- pluggy=0.13.1=py39h2804cbe_4
- py=1.10.0=pyhd3deb0d_0
- pycparser=2.20=pyh9f0ad1d_2
- pydantic=1.8.2=py39h5161555_2
- pyopenssl=20.0.1=pyhd8ed1ab_0
- pyparsing=2.4.7=pyh9f0ad1d_0
- pysocks=1.7.1=py39h2804cbe_3
- pytest=6.2.5=py39h2804cbe_1
- python=3.9.7=h54d631c_3_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-tzdata=2022.1=pyhd8ed1ab_0
- python_abi=3.9=2_cp39
- pytorch=1.9.0=cpu_py39he8fdc14_2
- pytorch-cpu=1.9.0=cpu_py39hd610c6a_2
- pytz=2021.3=pyhd8ed1ab_0
- pytz-deprecation-shim=0.1.0.post0=py39h2804cbe_2
- pyyaml=5.4.1=py39h5161555_1
- readline=8.1=hedafd6a_0
- regex=2021.8.21=py39h5161555_0
- requests=2.26.0=pyhd8ed1ab_0
- sacremoses=0.0.43=pyh9f0ad1d_0
- scikit-learn=0.24.2=py39hef7049f_1
- scipy=1.7.0=py39h5060c3b_0
- sentence-transformers=2.1.0=pyhd8ed1ab_0
- sentencepiece=0.1.95=py39h4d2d688_1
- setuptools=57.4.0=py39h2804cbe_0
- six=1.16.0=pyh6c4a22f_0
- sleef=3.5.1=h27ca646_1
- sqlite=3.36.0=h72a2b83_0
- starlette=0.14.2=pyhd8ed1ab_0
- threadpoolctl=2.2.0=pyh8a188c0_0
- tk=8.6.11=he1e0b03_0
- tokenizers=0.10.3=py39hab32027_1
- toml=0.10.2=pyhd8ed1ab_0
- torchvision=0.10.1=py39h0a40b5a_0_cpu
- tqdm=4.62.1=pyhd8ed1ab_0
- transformers=4.14.1=pyhd8ed1ab_0
- typing-extensions=3.10.0.0=hd8ed1ab_0
- typing_extensions=3.10.0.0=pyha770c72_0
- tzdata=2021a=he74cb21_1
- tzlocal=4.2=py39h2804cbe_1
- urllib3=1.26.6=pyhd8ed1ab_0
- uvicorn=0.16.0=py39h2804cbe_0
- wheel=0.37.0=pyhd8ed1ab_1
- xz=5.2.5=h642e427_1
- yaml=0.2.5=h642e427_0
- zipp=3.5.0=pyhd8ed1ab_0
- zlib=1.2.11=h31e879b_1009
- zstd=1.5.0=h861e0a7_0
prefix: /opt/homebrew/Caskroom/miniforge/base/envs/khoj

View File

@@ -1,55 +0,0 @@
content-type:
# The /data/folder/ prefix to the folders is here because this is
# the directory to which the local files are copied in the docker-compose.
# If changing, the docker-compose volumes should also be changed to match.
org:
input-files: null
input-filter: ["/data/org/**/*.org"]
compressed-jsonl: "/data/embeddings/notes.jsonl.gz"
embeddings-file: "/data/embeddings/note_embeddings.pt"
index_heading_entries: false
markdown:
input-files: null
input-filter: ["/data/markdown/**/*.md"]
compressed-jsonl: "/data/embeddings/markdown.jsonl.gz"
embeddings-file: "/data/embeddings/markdown_embeddings.pt"
ledger:
input-files: null
input-filter: ["/data/ledger/**/*.beancount"]
compressed-jsonl: /data/embeddings/transactions.jsonl.gz
embeddings-file: /data/embeddings/transaction_embeddings.pt
image:
input-directories: ["/data/images/"]
embeddings-file: "/data/embeddings/image_embeddings.pt"
batch-size: 50
use-xmp-metadata: false
music:
input-files: ["/data/music/music.org"]
input-filter: null
compressed-jsonl: "/data/embeddings/songs.jsonl.gz"
embeddings-file: "/data/embeddings/song_embeddings.pt"
search-type:
symmetric:
encoder: "sentence-transformers/all-MiniLM-L6-v2"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "/data/models/symmetric"
asymmetric:
encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "/data/models/asymmetric"
image:
encoder: "sentence-transformers/clip-ViT-B-32"
model_directory: "/data/models/image_encoder"
processor:
#conversation:
# openai-api-key: null
# model: "text-davinci-003"
# conversation-logfile: "/data/embeddings/conversation_logs.json"

View File

@@ -1,56 +0,0 @@
content-type:
org:
input-files: # ["/path/to/org-file.org"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/org/*.org"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/org/org.jsonl.gz"
embeddings-file: "~/.khoj/content/org/org_embeddings.pt"
index_heading_entries: false # Set to true to index entries with empty body
markdown:
input-files: # ["/path/to/markdown-file.md"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/markdown/*.md"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/markdown/markdown.jsonl.gz"
embeddings-file: "~/.khoj/content/markdown/markdown_embeddings.pt"
ledger:
input-files: # ["/path/to/ledger-file.beancount"] REQUIRED IF input-filter is not set OR
input-filter: # ["/path/to/ledger/*.beancount"] REQUIRED IF input-files is not set
compressed-jsonl: "~/.khoj/content/ledger/ledger.jsonl.gz"
embeddings-file: "~/.khoj/content/ledger/ledger_embeddings.pt"
image:
input-directories: # ["/path/to/images/"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/images/*.jpg"] REQUIRED IF input-directories IS NOT SET
embeddings-file: "~/.khoj/content/image/image_embeddings.pt"
batch-size: 50
use-xmp-metadata: false
music:
input-files: # ["/path/to/music-file.org"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/music/*.org"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/music/music.jsonl.gz"
embeddings-file: "~/.khoj/content/music/music_embeddings.pt"
search-type:
symmetric:
encoder: "sentence-transformers/all-MiniLM-L6-v2"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/symmetric/"
asymmetric:
encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/asymmetric/"
image:
encoder: "sentence-transformers/clip-ViT-B-32"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/image/"
processor:
conversation:
openai-api-key: # "YOUR_OPENAI_API_KEY"
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"

View File

@@ -1,29 +1,54 @@
version: "3.9"
services:
database:
image: ankane/pgvector
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
volumes:
- khoj_db:/var/lib/postgresql/data/
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 30s
timeout: 10s
retries: 5
server:
image: ghcr.io/debanjum/khoj:latest
depends_on:
database:
condition: service_healthy
# Use the following line to use the latest version of khoj. Otherwise, it will build from source.
image: ghcr.io/khoj-ai/khoj:latest
# Uncomment the following line to build from source. This will take a few minutes. Comment the next two lines out if you want to use the offiicial image.
# build:
# context: .
ports:
# If changing the local port (left hand side), no other changes required.
# If changing the remote port (right hand side),
# change the port in the args in the build section,
# as well as the port in the command section to match
- "8000:8000"
- "42110:42110"
working_dir: /app
volumes:
- .:/app
# These mounted volumes hold the raw data that should be indexed for search.
# The path in your local directory (left hand side)
# points to the files you want to index.
# The path of the mounted directory (right hand side),
# must match the path prefix in your config file.
- ./tests/data/org/:/data/org/
- ./tests/data/images/:/data/images/
- ./tests/data/ledger/:/data/ledger/
- ./tests/data/music/:/data/music/
- ./tests/data/markdown/:/data/markdown/
# Embeddings and models are populated after the first run
# You can set these volumes to point to empty directories on host
- ./tests/data/embeddings/:/data/embeddings/
- ./tests/data/models/:/data/models/
- khoj_config:/root/.khoj/
- khoj_models:/root/.cache/torch/sentence_transformers
# Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/
command: --no-gui --host="0.0.0.0" --port=8000 -c=config/khoj_docker.yml -vv
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_HOST=database
- POSTGRES_PORT=5432
- KHOJ_DJANGO_SECRET_KEY=secret
- KHOJ_DEBUG=True
- KHOJ_ADMIN_EMAIL=username@example.com
- KHOJ_ADMIN_PASSWORD=password
command: --host="0.0.0.0" --port=42110 -vv --anonymous-mode
volumes:
khoj_config:
khoj_db:
khoj_models:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 979 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

20
documentation/.gitignore vendored Normal file
View File

@@ -0,0 +1,20 @@
# Dependencies
/node_modules
# Production
/build
# Generated files
.docusaurus
.cache-loader
# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local
npm-debug.log*
yarn-debug.log*
yarn-error.log*

41
documentation/README.md Normal file
View File

@@ -0,0 +1,41 @@
# Website
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
### Installation
```
$ yarn
```
### Local Development
```
$ yarn start
```
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
### Build
```
$ yarn build
```
This command generates static content into the `build` directory and can be served using any static contents hosting service.
### Deployment
Using SSH:
```
$ USE_SSH=true yarn deploy
```
Not using SSH:
```
$ GIT_USER=<Your GitHub username> yarn deploy
```
If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.2 MiB

View File

Before

Width:  |  Height:  |  Size: 350 KiB

After

Width:  |  Height:  |  Size: 350 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 302 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 394 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 358 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 27 KiB

View File

Before

Width:  |  Height:  |  Size: 544 KiB

After

Width:  |  Height:  |  Size: 544 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 43 KiB

View File

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 49 KiB

View File

Before

Width:  |  Height:  |  Size: 445 KiB

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 333 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 420 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 478 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 268 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 6.3 KiB

View File

@@ -0,0 +1,3 @@
module.exports = {
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
};

View File

@@ -0,0 +1,8 @@
{
"label": "Clients",
"position": 4,
"link": {
"type": "generated-index",
"description": "Different ways for indexing data with the Khoj backend"
}
}

View File

@@ -0,0 +1,32 @@
---
sidebar_position: 1
---
# Desktop
> Query your Second Brain from your machine
Use the Desktop app to chat and search with Khoj.
You can also sync any relevant files with Khoj using the app.
Khoj will use these files to provide contextual reponses when you search or chat.
## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Incremental**: Incremental search for a fast, search-as-you-type experience
## Setup
1. Install the [Khoj Desktop app](https://khoj.dev/downloads) for your OS
2. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
3. Set your Khoj API Key on the *Settings* page of the Khoj Desktop app
4. [Optional] Add any files, folders you'd like Khoj to be aware of on the *Settings* page and Click *Save*
## Interface
| Chat | Search |
|:----:|:------:|
| ![](/img/khoj_chat_on_desktop.png) | ![](/img/khoj_search_on_desktop.png) |

View File

@@ -0,0 +1,137 @@
---
sidebar_position: 2
---
# Emacs
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="130" alt="Melpa Stable Badge" />
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge" />
<img src="https://github.com/khoj-ai/khoj/actions/workflows/build_khoj_el.yml/badge.svg" width="150" alt="Build Badge" />
<img src="https://github.com/khoj-ai/khoj/actions/workflows/test_khoj_el.yml/badge.svg" width="150" alt="Test Badge" />
<br />
<br />
> Query your Second Brain from Emacs
## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Incremental**: Incremental search for a fast, search-as-you-type experience
## Interface
| Search | Chat |
|:------:|:----:|
| ![khoj search on emacs](/img/khoj_search_on_emacs.png) | ![khoj chat on emacs](/img/khoj_chat_on_emacs.png) |
## Setup
1. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
2. Add below snippet to your Emacs config file, usually at `~/.emacs.d/init.el`
#### **Direct Install**
*Khoj will index your org-agenda files, by default*
```elisp
;; Install Khoj.el
M-x package-install khoj
; Set your Khoj API key
(setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY")
```
#### **Minimal Install**
*Khoj will index your org-agenda files, by default*
```elisp
;; Install Khoj client from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj)
:config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"))
```
#### **Standard Install**
*Configures the specified org files, directories to be indexed by Khoj*
```elisp
;; Install Khoj client from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj)
:config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"
khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")))
```
#### **Straight.el**
*Configures the specified org files, directories to be indexed by Khoj*
```elisp
;; Install Khoj client using Straight.el
(use-package khoj
:after org
:straight (khoj :type git :host github :repo "khoj-ai/khoj" :files (:defaults "src/interface/emacs/khoj.el"))
:bind ("C-c s" . 'khoj)
:config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"
khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")))
```
## Use
### Search
See [Khoj Search](/features/search) for details
1. Hit `C-c s s` (or `M-x khoj RET s`) to open khoj search
2. Enter your query in natural language<br/>
E.g *"What is the meaning of life?"*, *"My life goals for 2023"*
### Chat
See [Khoj Chat](/features/chat) for details
1. Hit `C-c s c` (or `M-x khoj RET c`) to open khoj chat
2. Ask questions in a natural, conversational style<br/>
E.g *"When did I file my taxes last year?"*
### Find Similar Entries
This feature finds entries similar to the one you are currently on.
1. Move cursor to the org-mode entry, markdown section or text paragraph you want to find similar entries for
2. Hit `C-c s f` (or `M-x khoj RET f`) to find similar entries
### Advanced Usage
- Add [query filters](https://github.com/khoj-ai/khoj/#query-filters) during search to narrow down results further
e.g `What is the meaning of life? -"god" +"none" dt>"last week"`
- Use `C-c C-o 2` to open the current result at cursor in its source org file
- This calls `M-x org-open-at-point` on the current entry and opens the second link in the entry.
- The second link is the entries [org-id](https://orgmode.org/manual/Handling-Links.html#FOOT28), if set, or the heading text.
The first link is the line number of the entry in the source file. This link is less robust to file changes.
- Note: If you have [speed keys](https://orgmode.org/manual/Speed-Keys.html) enabled, `o 2` will also work
### Khoj Menu
![](/img/khoj_emacs_menu.png)
Hit `C-c s` (or `M-x khoj`) to open the khoj menu above. Then:
- Hit `t` until you preferred content type is selected in the khoj menu
`Content Type` specifies the content to perform `Search`, `Update` or `Find Similar` actions on
- Hit `n` twice and then enter number of results you want to see
`Results Count` is used by the `Search` and `Find Similar` actions
- Hit `-f u` to `force` update the khoj content index
The `Force Update` switch is only used by the `Update` action
## Upgrade
Use your Emacs package manager to upgrade `khoj.el`
<!-- tabs:start -->
#### **With MELPA**
1. Run `M-x package-refresh-content`
2. Run `M-x package-reinstall khoj`
#### **With Straight.el**
- Run `M-x straight-pull-package khoj`
<!-- tabs:end -->

View File

@@ -0,0 +1,59 @@
---
sidebar_position: 3
---
# Obsidian
> Query your Second Brain from Obsidian
## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Incremental**: Incremental search for a fast, search-as-you-type experience
## Interface
| Search | Chat |
|:------:|:----:|
| ![](/img/khoj_search_on_obsidian.png) | ![](/img/khoj_chat_on_obsidian.png) |
## Setup
1. Open [Khoj](https://obsidian.md/plugins?id=khoj) from the *Community plugins* tab in Obsidian settings panel
2. Click *Install*, then *Enable* on the Khoj plugin page in Obsidian
3. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
4. Set your Khoj API Key in the Khoj plugin settings in Obsidian
See the official [Obsidian Plugin Docs](https://help.obsidian.md/Extending+Obsidian/Community+plugins) for more details on installing Obsidian plugins.
## Use
### Chat
Run *Khoj: Chat* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) and ask questions in a natural, conversational style.<br />
E.g *"When did I file my taxes last year?"*
See [Khoj Chat](/features/chat) for more details
### Find Similar Notes
To see other notes similar to the current one, run *Khoj: Find Similar Notes* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
### Search
Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or run *Khoj: Search* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
See [Khoj Search](/features/search) for more details. Use [query filters](/miscellaneous/advanced#query-filters) to limit entries to search
[search_demo](https://user-images.githubusercontent.com/6413477/218801155-cd67e8b4-a770-404a-8179-d6b61caa0f93.mp4 ':include :type=mp4')
## Upgrade
1. Open *Community plugins* tab in Obsidian settings
2. Click the *Check for updates* button
3. Click the *Update* button next to Khoj, if available
## Troubleshooting
- Open the Khoj plugin settings pane, to configure Khoj
- Toggle Enable/Disable Khoj, if setting changes have not applied
- Click *Update* button to force index to refresh, if results are failing or stale

View File

@@ -0,0 +1,27 @@
---
sidebar_position: 4
---
# Web
> Query your Second Brain from your Web Browser
Without any desktop clients, you can start chatting with Khoj on the web. Bear in mind you do need one of the desktop clients in order to share and sync your data with Khoj.
## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Incremental**: Incremental search for a fast, search-as-you-type experience
## Setup
No setup required. The Khoj web app is the default interface to Khoj. You can access it from any web browser. Try it on [Khoj Cloud](https://app.khoj.dev)
## Interface
| Search | Chat |
|:------:|:----:|
| ![](/img/khoj_search_on_web.png) | ![](/img/khoj_chat_on_web.png) |

View File

@@ -0,0 +1,28 @@
---
sidebar_position: 5
---
# WhatsApp
> Query your Second Brain from WhatsApp
Text [+1 (848) 800 4242](https://wa.me/18488004242) or scan [this QR code](https://khoj.dev/whatsapp) on your phone to chat with Khoj on WhatsApp.
Without any desktop clients, you can start chatting with Khoj on WhatsApp. Bear in mind you do need one of the desktop clients in order to share and sync your data with Khoj. The WhatsApp AI bot will work right away for answering generic queries and using Khoj in default mode.
In order to use Khoj on WhatsApp with your own data, you need to setup a Khoj Cloud account and connect your WhatsApp account to it. This is a one time setup and you can do it from the [Khoj Cloud config page](https://app.khoj.dev/config).
If you hit usage limits for the WhatsApp bot, upgrade to [a paid plan](https://khoj.dev/pricing) on Khoj Cloud.
## Features
- **Slash Commands**: Use slash commands to quickly access Khoj features
- `/online`: Get responses from Khoj powered by online search.
- `/dream`: Generate an image in response to your prompt.
- `/notes`: Explicitly force Khoj to retrieve context from your notes. Note: You'll need to connect your WhatsApp account to a Khoj Cloud account for this to work.
We have more commands under development, including `/share` to uploading documents directly to your Khoj account from WhatsApp, and `/speak` in order to get a speech response from Khoj. Feel free to [raise an issue](https://github.com/khoj-ai/flint/issues) if you have any suggestions for new commands.
## Nerdy Details
You can find all of the code for the WhatsApp bot in the the [flint repository](https://github.com/khoj-ai/flint). As all of our code, it is open source and you can contribute to it.

View File

@@ -0,0 +1,8 @@
{
"label": "Contributing",
"position": 2,
"link": {
"type": "generated-index",
"description": "Development Setup"
}
}

View File

@@ -0,0 +1,181 @@
---
sidebar_position: 0
---
# Development
Welcome to the development docs of Khoj! Thanks for you interesting in being a contributor ❤️. Open source contributors are a corner-store of the Khoj community. We welcome all contributions, big or small.
To get started with contributing, check out the official GitHub docs on [contributing to an open-source project](https://docs.github.com/en/get-started/exploring-projects-on-github/contributing-to-a-project).
Join the [Discord](https://discord.gg/WaxF3SkFPU) server and click the ✅ for the question "Are you interested in becoming a contributor?" in the `#welcome-and-rules` channel. This will give you access to the `#contributors` channel where you can ask questions and get help from other contributors.
If you're looking for a place to get started, check out the list of [Github Issues](https://github.com/khoj-ai/khoj/issues) with the tag `good first issue` to find issues that are good for first-time contributors.
## Local Server Installation
### Using Pip
#### 1. Install
```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
```
```mdx-code-block
<Tabs>
<TabItem value="macos" label="MacOS">
```shell
# Get Khoj Code
git clone https://github.com/khoj-ai/khoj && cd khoj
# Create, Activate Virtual Environment
python3 -m venv .venv && source .venv/bin/activate
# For MacOS or zsh users run this
pip install -e '.[dev]'
```
</TabItem>
<TabItem value="win" label="Windows">
```shell
# Get Khoj Code
git clone https://github.com/khoj-ai/khoj && cd khoj
# Create, Activate Virtual Environment
python3 -m venv .venv && .venv\Scripts\activate
# Install Khoj for Development
pip install -e .[dev]
```
</TabItem>
<TabItem value="unix" label="Linux">
```shell
# Get Khoj Code
git clone https://github.com/khoj-ai/khoj && cd khoj
# Create, Activate Virtual Environment
python3 -m venv .venv && source .venv/bin/activate
# Install Khoj for Development
pip install -e .[dev]
```
</TabItem>
</Tabs>
```
#### 2. Run
1. Start Khoj
```bash
khoj -vv
```
2. Configure Khoj
- **Via the Desktop application**: Add files, directories to index using the settings page of your desktop application. Click "Save" to immediately trigger indexing.
Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, documents etc specified in config YAML
### Using Docker
Make sure you install the latest version of [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/).
#### 1. Clone
```shell
git clone https://github.com/khoj-ai/khoj && cd khoj
```
#### 2. Configure
1. Update [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to use relevant environment variables.
2. Comment out the `image` line and uncomment the `build` line in the `server` service
#### 3. Run
This will start the Khoj server, and the database.
```shell
docker-compose up -d
```
#### 4. Upgrade
If you've made changes to the codebase, you'll need to rebuild the Docker image before running the container again.
```shell
docker-compose build --no-cache
```
## Update clients
In whichever clients you're using for testing, you'll need to update the server URL to point to your local server. By default, the local server URL should be `http://127.0.0.1:42110`.
## Validate
### Before Making Changes
1. Install Git Hooks for Validation
```shell
pre-commit install -t pre-push -t pre-commit
```
- This ensures standard code formatting fixes and other checks run automatically on every commit and push
- Note 1: If [pre-commit](https://pre-commit.com/#intro) didn't already get installed, [install it](https://pre-commit.com/#install) via `pip install pre-commit`
- Note 2: To run the pre-commit changes manually, use `pre-commit run --hook-stage manual --all` before creating PR
### Before Creating PR
:::tip[Note]
You should be in an active virtual environment for Khoj in order to run the unit tests and linter.
:::
1. Ensure that you have a [Github Issue](https://github.com/khoj-ai/khoj/issues) that can be linked to the PR. If not, create one. Make sure you've tagged one of the maintainers to the issue. This will ensure that the maintainers are notified of the PR and can review it. It's best discuss the code design on an existing issue or Discord thread before creating a PR. This helps get your PR merged faster.
1. Run unit tests.
```shell
pytest
```
2. Run the linter.
```shell
mypy
```
4. Think about how to add unit tests to verify the functionality you're adding in the PR. If you're not sure how to do this, ask for help in the Github issue or on Discord's `#contributors` channel.
### After Creating PR
1. Automated [validation workflows](https://github.com/khoj-ai/khoj/tree/master/.github/workflows) should run for every PR. Tag one of the maintainers in the PR to trigger it.
## Obsidian Plugin Development
### Plugin development setup
The core code for the Obsidian plugin is under `src/interface/obsidian`. The file `main.ts` is a good place to start.
1. In your CLI, go to the directory `src/interface/obsidian` in the Khoj repository.
2. Run `yarn install` to install the dependencies.
3. Run `yarn dev` to start the development server. This will continually rebuild the plugin as you make changes to the code.
- Your code changes will be outputted to a file called `main.js` in the `obsidian` directory.
### Loading your development plugin in Obsidian
1. Make sure you have the Khoj plugin installed in Obsidian. [See the plugin page](https://publish.obsidian.md/hub/02+-+Community+Expansions/02.05+All+Community+Expansions/Plugins/khoj).
1. Open Obsidian and go to your settings (gear icon in the bottom left corner)
2. Click on 'Community Plugins' in the left panel
3. Next to the 'Installed Plugins' heading, click on the folder icon to open the folder with the plugin's source code.
4. Open the `khoj` folder in the file explorer that opens. You'll see a file called `main.js` in this folder. To test your changes, replace this file with the `main.js` file that was generated by the development server in the previous section.
## Create Khoj Release (Only for Maintainers)
Follow the steps below to [release](https://github.com/debanjum/khoj/releases/) Khoj. This will create a stable release of Khoj on [Pypi](https://pypi.org/project/khoj-assistant/), [Melpa](https://stable.melpa.org/#%252Fkhoj) and [Obsidian](https://obsidian.md/plugins?id%253Dkhoj). It will also create desktop apps of Khoj and attach them to the latest release.
1. Create and tag release commit by running the bump_version script. The release commit sets version number in required metadata files.
```shell
./scripts/bump_version.sh -c "<release_version>"
```
2. Push commit and then the tag to trigger the release workflow to create Release with auto generated release notes.
```shell
git push origin master # push release commit to khoj repository
git push origin <release_version> # push release tag to khoj repository
```
3. [Optional] Update the Release Notes to highlight new features, fixes and updates
## Architecture
![](/img/khoj_architecture.png)
## Visualize Codebase
*[Interactive Visualization](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=debanjum%2Fkhoj)*
![](/img/khoj_codebase_visualization_0.2.1.png)
## Visualize Khoj Obsidian Plugin Codebase
![](/img/khoj_obsidian_codebase_visualization_0.2.1.png)

View File

@@ -0,0 +1,8 @@
{
"label": "Features",
"position": 3,
"link": {
"type": "generated-index",
"description": "Features supported by Khoj"
}
}

View File

@@ -0,0 +1,34 @@
---
sidebar_position: 1
---
# Features
Khoj supports a variety of features, including search and chat with a wide range of data sources and interfaces.
#### [Search](/features/search)
- **Local**: Your personal data stays local. All search and indexing is done on your machine when you [self-host](/get-started/setup)
- **Incremental**: Incremental search for a fast, search-as-you-type experience
#### [Chat](/features/chat)
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
- **Works online or offline**: Chat using online or offline AI chat models
#### General
- **Cloud or Self-Host**: Use [cloud](https://app.khoj.dev/login) to use Khoj anytime from anywhere or [self-host](/get-started/setup) for privacy
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Index your Org-mode, Markdown, PDF, plaintext files, Github repos and Notion pages
- **Multiple Interfaces**: Interact from your Web Browser, Emacs, Obsidian, Desktop app or even Whatsapp
### Supported Interfaces
Khoj is available as a [Desktop app](/clients/desktop), [Emacs package](/clients/emacs), [Obsidian plugin](/clients/obsidian), [Web app](/clients/web) and a [Whatsapp AI](https://khoj.dev/whatsapp).
![](/img/khoj_clients.svg ':size=400px')
### Supported Data Sources
Khoj can understand your org-mode, markdown, PDF, plaintext files, [Github projects](/online-data-sources/github_integration) and [Notion pages](/online-data-sources/notion_integration).
![](/img/khoj_datasources.svg ':size=200px')

View File

@@ -0,0 +1,64 @@
---
sidebar_position: 2
---
# Chat
You can configure Khoj to chat with you about anything. When relevant, it'll use any notes or documents you shared with it to respond.
### Overview
- Creates a personal assistant for you to inquire and engage with your notes
- You can choose to use Online or Offline Chat depending on your requirements
- Supports multi-turn conversations with the relevant notes for context
- Shows reference notes used to generate a response
### Setup (Self-Hosting)
#### Offline Chat
Offline chat stays completely private and works without internet using open-source models.
> **System Requirements**:
> - Minimum 8 GB RAM. Recommend **16Gb VRAM**
> - Minimum **5 GB of Disk** available
> - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required
> - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times
1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We currently only support offline models that use the [Llama chat prompt](https://replicate.com/blog/how-to-prompt-llama#wrap-user-input-with-inst-inst-tags) format. We recommend using `mistral-7b-instruct-v0.1.Q4_0.gguf`.
:::tip[Note]
Offline chat is not supported for a multi-user scenario. The host machine will encounter segmentation faults if multiple users try to use offline chat at the same time.
:::
#### Online Chat
Online chat requires internet to use ChatGPT but is faster, higher quality and less compute intensive.
:::danger[Warning]
This will enable Khoj to send your chat queries and query relevant notes to OpenAI for processing.
:::
1. Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
2. Open your [Khoj Online Chat settings](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/). Add a new setting with your OpenAI API key, and click *Save*. Only one configuration will be used, so make sure that's the only one you have.
3. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the OpenAI chat model you want to use. Make sure to use `OpenAI` as its type.
### Use
1. Open Khoj Chat
- **On Web**: Open [/chat](https://app.khoj.dev/chat) in your web browser
- **On Obsidian**: Search for *Khoj: Chat* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>`
2. Enter your queries to chat with Khoj. Use [slash commands](#commands) and [query filters](/miscellaneous/advanced#query-filters) to change what Khoj uses to respond
![](/img/khoj_chat_on_web.png ':size=400px')
#### Details
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
2. These notes, the last few messages and associated metadata is passed to the enabled chat model along with your query to generate a response
#### Commands
Slash commands allows you to change what Khoj uses to respond to your query
- **/notes**: Limit chat to only respond using your notes, not just Khoj's general world knowledge as reference
- **/general**: Limit chat to only respond using Khoj's general world knowledge, not using your notes as reference
- **/default**: Allow chat to respond using your notes or it's general knowledge as reference. It's the default behavior when no slash command is used
- **/online**: Use online information and incorporate it in the prompt to the LLM to send you a response.
- **/image**: Generate an image in response to your query.
- **/help**: Use /help to get all available commands and general information about Khoj

View File

@@ -0,0 +1,17 @@
---
sidebar_position: 3
---
# Search
Take advantage of super fast search to find relevant notes and documents from your Second Brain.
### Use
1. Open Khoj Search
- **On Web**: Open https://app.khoj.dev/ in your web browser
- **On Obsidian**: Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>`
2. Query using natural language to find relevant entries from your knowledge base. Use [query filters](/miscellaneous/advanced#query-filters) to limit entries to search
### Demo
![](/img/khoj_search_on_web.png ':size=400px')

View File

@@ -0,0 +1,8 @@
{
"label": "Get Started",
"position": 1,
"link": {
"type": "generated-index",
"description": "Learn how to get started with using Khoj"
}
}

View File

@@ -0,0 +1,51 @@
---
sidebar_position: 2
---
# Demos
Check out a couple of demos and screenshots of Khoj in action.
### Screenshots
| Web | Obsidian | Emacs |
|:---:|:--------:|:-----:|
| ![](/img/khoj_search_on_web.png ':size=300px') | ![](/img/khoj_search_on_obsidian.png ':size=300px') | ![](/img/khoj_search_on_emacs.png ':size=300px') |
| ![](/img/khoj_chat_on_web.png ':size=300px') | ![](/img/khoj_chat_on_obsidian.png ':size=300px') | ![](/img/khoj_chat_on_emacs.png ':size=400px') |
### Videos
#### Khoj in Obsidian
[Link to Video](https://github-production-user-asset-6210df.s3.amazonaws.com/6413477/240061700-3e33d8ea-25bb-46c8-a3bf-c92f78d0f56b.mp4)
##### Installation
1. Install Khoj via `pip` and start Khoj backend in a terminal (Run `khoj`)
```bash
python -m pip install khoj-assistant
khoj
```
2. Install Khoj plugin via Community Plugins settings pane on Obsidian app
- Check the new Khoj plugin settings
- Let Khoj backend index the markdown, pdf, Github markdown files in the current Vault
- Open Khoj plugin on Obsidian via Search button on Left Pane
- Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs](https://marcus.se.net/obsidian-plugin-docs/)
- Jump to the [search result](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin)
#### Khoj in Emacs, Browser
[Link to Video](https://user-images.githubusercontent.com/6413477/184735169-92c78bf1-d827-4663-9087-a1ea194b8f4b.mp4)
##### Installation
- Install Khoj via pip
- Start Khoj app
- Add this readme and [khoj.el readme](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs) as org-mode for Khoj to index
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#2-Install-Khojel)
##### Analysis
- The results do not have any words used in the query
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter

View File

@@ -0,0 +1,52 @@
---
sidebar_position: 0
slug: /
---
# Overview
<p align="center"><img src="/img/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"></img></p>
<div align="center">
<b>An AI copilot for your Second Brain</b>
</div>
<br />
<div align="center">
[📜 Explore Code](https://github.com/khoj-ai/khoj)
<span>&nbsp;&nbsp;&nbsp;&nbsp;</span>
[🌍 Try Khoj Cloud](https://khoj.dev)
<span>&nbsp;&nbsp;&nbsp;&nbsp;</span>
[💬 Get Involved](https://discord.gg/BDgyabRM6e)
</div>
## Introduction
Welcome to the Khoj Docs! This is the best place to get setup and explore Khoj's features.
- Khoj is an open source, personal AI
- You can [chat](/features/chat) with it about anything. It'll use files you shared with it to respond, when relevant
- Quickly [find](/features/search) relevant notes and documents using natural language
- It understands pdf, plaintext, markdown, org-mode files, [notion pages](/online-data-sources/notion_integration) and [github repositories](/online-data-sources/github_integration)
- Access it from your [Emacs](/clients/emacs), [Obsidian](/clients/obsidian), [Web browser](/clients/web) or the [Khoj Desktop app](/clients/desktop)
- Use [cloud](https://app.khoj.dev/login) to access your Khoj anytime from anywhere, [self-host](/get-started/setup) on consumer hardware for privacy
## Quickstart
- [Try Khoj Cloud](https://app.khoj.dev) to get started quickly
- [Read these instructions](/get-started/setup) to self-host a private instance of Khoj
## At a Glance
<img src="https://docs.khoj.dev/img/khoj_search_on_web.png" width="400px" />
<span>&nbsp;&nbsp;</span>
<img src="https://docs.khoj.dev/img/khoj_chat_on_web.png" width="400px" />
#### [Search](/features/search)
- **Natural**: Use natural language queries to quickly find relevant notes and documents.
- **Incremental**: Incremental search for a fast, search-as-you-type experience
#### [Chat](/features/chat)
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
- **Online or Offline**: Choose online or offline chat depending on your requirements

View File

@@ -0,0 +1,281 @@
---
sidebar_position: 1
---
# Self-Host
Learn about how to self-host Khoj on your own machine.
```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
```
## Setup
These are the general setup instructions for Khoj.
- Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine
- Check the [Khoj Emacs docs](/clients/emacs#setup) to setup Khoj with Emacs<br />
It's simpler as it can skip the server *install*, *run* and *configure* step below.
- Check the [Khoj Obsidian docs](/clients/obsidian#setup) to setup Khoj with Obsidian<br />
Its simpler as it can skip the *configure* step below.
For Installation, you can either use Docker or install Khoj locally.
### Installation Option 1 (Docker)
#### Prerequisites
1. Install Docker Engine. See [official instructions](https://docs.docker.com/engine/install/).
2. Ensure you have Docker Compose. See [official instructions](https://docs.docker.com/compose/install/).
#### Setup
Use the sample docker-compose [in Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to run Khoj in Docker. Start by configuring all the environment variables to your choosing. Your admin account will automatically be created based on the admin credentials in that file, so pay attention to those. To start the container, run the following command in the same directory as the docker-compose.yml file. This will automatically setup the database and run the Khoj server.
```shell
docker-compose up
```
Khoj should now be running at http://localhost:42110. You can see the web UI in your browser.
### Installation Option 2 (Local)
#### Prerequisites
##### Install Postgres (with PgVector)
Khoj uses the `pgvector` package to store embeddings of your index in a Postgres database. In order to use this, you need to have Postgres installed.
```mdx-code-block
<Tabs groupId="operating-systems">
<TabItem value="macos" label="MacOS">
Install [Postgres.app](https://postgresapp.com/). This comes pre-installed with `pgvector` and relevant dependencies.
</TabItem>
<TabItem value="win" label="Windows">
1. Use the [recommended installer](https://www.postgresql.org/download/windows/).
2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#windows) in case you need to manually install it. Windows support is experimental for pgvector currently, so we recommend using Docker.
</TabItem>
<TabItem value="unix" label="Linux">
From [official instructions](https://wiki.postgresql.org/wiki/Apt)
</TabItem>
<TabItem value="source" label="From Source">
1. Follow instructions to [Install Postgres](https://www.postgresql.org/download/)
2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#installation) in case you need to manually install it.
</TabItem>
</Tabs>
```
##### Create the Khoj database
Make sure to update your environment variables to match your Postgres configuration if you're using a different name. The default values should work for most people. When prompted for a password, you can use the default password `postgres`, or configure it to your preference. Make sure to set the environment variable `POSTGRES_PASSWORD` to the same value as the password you set here.
```mdx-code-block
<Tabs groupId="operating-systems">
<TabItem value="macos" label="MacOS">
```shell
createdb khoj -U postgres --password
```
</TabItem>
<TabItem value="win" label="Windows">
```shell
createdb -U postgres khoj --password
```
</TabItem>
<TabItem value="unix" label="Linux">
```shell
sudo -u postgres createdb khoj --password
```
</TabItem>
</Tabs>
```
#### Install package
##### Local Server Setup
- *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
Run the following command in your terminal to install the Khoj backend.
```mdx-code-block
<Tabs groupId="operating-systems">
<TabItem value="macos" label="MacOS">
```shell
python -m pip install khoj-assistant
```
</TabItem>
<TabItem value="win" label="Windows">
```shell
py -m pip install khoj-assistant
```
</TabItem>
<TabItem value="unix" label="Linux">
```shell
python -m pip install khoj-assistant
```
</TabItem>
</Tabs>
```
##### Local Server Start
Before getting started, configure the following environment variables in your terminal for the first run
```mdx-code-block
<Tabs groupId="operating-systems">
<TabItem value="macos" label="MacOS">
```shell
export KHOJ_ADMIN_EMAIL=<your-email>
export KHOJ_ADMIN_PASSWORD=<your-password>
```
</TabItem>
<TabItem value="win" label="Windows">
If you're using PowerShell:
```shell
$env:KHOJ_ADMIN_EMAIL="<your-email>"
$env:KHOJ_ADMIN_PASSWORD="<your-password>"
```
</TabItem>
<TabItem value="unix" label="Linux">
```shell
export KHOJ_ADMIN_EMAIL=<your-email>
export KHOJ_ADMIN_PASSWORD=<your-password>
```
</TabItem>
</Tabs>
```
Run the following command from your terminal to start the Khoj backend and open Khoj in your browser.
```shell
khoj --anonymous-mode
```
`--anonymous-mode` allows you to run the server without setting up Google credentials for login. This allows you to use any of the clients without a login wall. If you want to use Google login, you can skip this flag, but you will have to add your Google developer credentials.
On the first run, you will be prompted to input credentials for your admin account and do some basic configuration for your chat model settings. Once created, you can go to http://localhost:42110/server/admin and login with the credentials you just created.
Khoj should now be running at http://localhost:42110. You can see the web UI in your browser.
Note: To start Khoj automatically in the background use [Task scheduler](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10) on Windows or [Cron](https://en.wikipedia.org/wiki/Cron) on Mac, Linux (e.g with `@reboot khoj`)
### 2. Download the desktop client
You can use our desktop executables to select file paths and folders to index. You can simply select the folders or files, and they'll be automatically uploaded to the server. Once you specify a file or file path, you don't need to update the configuration again; it will grab any data diffs dynamically over time.
**To download the latest desktop client, go to https://download.khoj.dev** and the correct executable for your OS will automatically start downloading. Once downloaded, you can configure your folders for indexing using the settings tab. To set your chat configuration, you'll have to use the web interface for the Khoj server you setup in the previous step.
To use the desktop client, you need to go to your Khoj server's settings page (http://localhost:42110/config) and copy the API key. Then, paste it into the desktop client's settings page. Once you've done that, you can select files and folders to index.
### 3. Configure
1. Go to http://localhost:42110/server/admin and login with your admin credentials.
1. Go to [OpenAI settings](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/) in the server admin settings to add an Open AI processor conversation config. This is where you set your API key. Alternatively, you can go to the [offline chat settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and simply create a new setting with `Enabled` set to `True`.
2. Go to the ChatModelOptions if you want to add additional models for chat. For example, you can specify `gpt-4` if you're using OpenAI or `mistral-7b-instruct-v0.1.Q4_0.gguf` if you're using offline chat. Make sure to configure the `type` field to `OpenAI` or `Offline` respectively.
1. Select files and folders to index [using the desktop client](/get-started/setup#2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
- Select Notion workspaces and Github repositories to index using the web interface.
:::tip[Note]
Using Safari on Mac? You might not be able to login to the admin panel. Try using Chrome or Firefox instead.
:::
### 4. Install Client Plugins (Optional)
Khoj exposes a web interface to search, chat and configure by default.<br />
The optional steps below allow using Khoj from within an existing application like Obsidian or Emacs.
- **Khoj Obsidian**:<br />
[Install](/clients/obsidian#setup) the Khoj Obsidian plugin
- **Khoj Emacs**:<br />
[Install](/clients/emacs#setup) khoj.el
#### Setup host URL
To configure your host URL on your clients when self-hosting, use `http://127.0.0.1:42110`. This is the default value for the `KHOJ_HOST` environment variable. Note that `localhost` will not work.
### 5. Use Khoj 🚀
You can head to http://localhost:42110 to use the web interface. You can also use the desktop client to search and chat.
## Upgrade
### Upgrade Khoj Server
```mdx-code-block
<Tabs groupId="environment">
<TabItem value="localsetup" label="Local Setup">
```shell
pip install --upgrade khoj-assistant
```
*Note: To upgrade to the latest pre-release version of the khoj server run below command*
</TabItem>
<TabItem value="docker" label="Docker">
From the same directory where you have your `docker-compose` file, this will fetch the latest build and upgrade your server.
```shell
docker-compose up --build
```
</TabItem>
<TabItem value="emacs" label="Emacs">
- Use your Emacs Package Manager to Upgrade
- See [khoj.el package setup](/clients/emacs#setup) for details
</TabItem>
<TabItem value="obsidian" label="Obsidian">
- Upgrade via the Community plugins tab on the settings pane in the Obsidian app
- See the [khoj plugin setup](/clients/obsidian#setup) for details
</TabItem>
</Tabs>
```
## Uninstall
### Uninstall Khoj Server
```mdx-code-block
<Tabs groupId="environment">
<TabItem value="localsetup" label="Local Setup">
```shell
# uninstall khoj server
pip uninstall khoj-assistant
# delete khoj postgres db
dropdb khoj -U postgres
```
</TabItem>
<TabItem value="docker" label="Docker">
From the same directory where you have your `docker-compose` file, run the command below to remove the server to delete its containers, networks, images and volumes.
```shell
docker-compose down --volumes
```
</TabItem>
<TabItem value="emacs" label="Emacs">
Uninstall the khoj Emacs, or desktop client in the standard way from Emacs or your OS respectively
You can also `rm -rf ~/.khoj` to remove the Khoj data directory if did a local install.
</TabItem>
<TabItem value="obsidian" label="Obsidian">
Uninstall the khoj Obisidan, or desktop client in the standard way from Obsidian or your OS respectively
You can also `rm -rf ~/.khoj` to remove the Khoj data directory if did a local install.
</TabItem>
</Tabs>
```
## Troubleshoot
#### Install fails while building Tokenizer dependency
- **Details**: `pip install khoj-assistant` fails while building the `tokenizers` dependency. Complains about Rust.
- **Fix**: Install Rust to build the tokenizers package. For example on Mac run:
```shell
brew install rustup
rustup-init
source ~/.cargo/env
```
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details
#### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
- **Note**: *This is a fix for when you perceive the search results have degraded. Not if you think they've always given wonky results*
#### Khoj in Docker errors out with \"Killed\" in error message
- **Fix**: Increase RAM available to Docker Containers in Docker Settings
- **Refer**: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)
#### Khoj errors out complaining about Tensors mismatch or null
- **Mitigation**: Disable `image` search using the desktop GUI

View File

@@ -0,0 +1,8 @@
{
"label": "Miscellaneous",
"position": 6,
"link": {
"type": "generated-index",
"description": "Additional resources for learning about Khoj"
}
}

View File

@@ -0,0 +1,32 @@
---
sidebar_position: 3
---
# Advanced Usage
### Search across Different Languages (Self-Hosting)
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update the search config in server's admin settings page. Go to [the search config](http://localhost:42110/server/admin/database/searchmodelconfig/). Either create a new one, if none exists, or update the existing one. Set the bi_encoder to `sentence-transformers/multi-qa-MiniLM-L6-cos-v1` and the cross_encoder to `cross-encoder/ms-marco-MiniLM-L-6-v2`.
2. Regenerate your content index from all the relevant clients. This step is very important, as you'll need to re-encode all your content with the new model.
### Query Filters
Use structured query syntax to filter entries from your knowledge based used by search results or chat responses.
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*

View File

@@ -0,0 +1,13 @@
---
sidebar_position: 4
---
# Credits
Many Open Source projects are used to power Khoj. Here's a few of them:
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
- [GPT4All](https://github.com/nomic-ai/gpt4all) to chat with local LLM

View File

@@ -0,0 +1,25 @@
---
sidebar_position: 2
---
# Performance
Here are some top-level performance metrics for Khoj. These are rough estimates and will vary based on your hardware and data.
### Search performance
- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet

View File

@@ -0,0 +1,22 @@
---
sidebar_position: 1
---
# Telemetry
We collect some high level, anonymized metadata about usage of Khoj. This includes:
- Client (Web, Emacs, Obsidian)
- API usage (Search, Chat)
- Configured content types (Github, Org, etc)
- Request metadata (e.g., host, referrer)
We don't send any personal information or any information from/about your content. We only send the above metadata. This helps us prioritize feature development and understand how people are using Khoj. Don't just take our word for it -- you can see [the code here](https://github.com/khoj-ai/khoj/tree/master/src/telemetry).
## Disable Telemetry
You can opt out of telemetry at any time. To do so,
1. Open `~/.khoj/khoj.yml`
2. Set `should-log-telemetry` to `false`
3. Save the file and restart Khoj
If you have any questions or concerns, please reach out to us on [Discord](https://discord.gg/BDgyabRM6e).

View File

@@ -0,0 +1,8 @@
{
"label": "Online Data Sources",
"position": 5,
"link": {
"type": "generated-index",
"description": "Online data sources for indexing via Khoj"
}
}

View File

@@ -0,0 +1,14 @@
# Setup the Github integration
The Github integration allows you to index as many repositories as you want. It's currently default configured to index Issues, Commits, and all Markdown/Org files in each repository. For large repositories, this takes a fairly long time, but it works well for smaller projects.
# Configure your settings
1. Go to [https://app.khoj.dev/config](https://app.khoj.dev/config) and enter in settings for the data sources you want to index. You'll have to specify the file paths.
## Use the Github plugin
1. Generate a [classic PAT (personal access token)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) from [Github](https://github.com/settings/tokens) with `repo` and `admin:org` scopes at least.
2. Navigate to [https://app.khoj.dev/config/content-source/github](https://app.khoj.dev/config/content-source/github) to configure your Github settings. Enter in your PAT, along with details for each repository you want to index.
3. Click `Save`. Go back to the settings page and click `Configure`.
4. Go to [https://app.khoj.dev/](https://app.khoj.dev/) and start searching!

View File

@@ -0,0 +1,14 @@
# Notion Integration
The Notion integration allows you to search/chat with your Notion workspaces. [Notion](https://notion.so/) is a platform people use for taking notes, especially for collaboration.
We haven't setup a fancy integration with OAuth yet, so this integration still requires some effort on your end to generate an API key.
1. Go to https://www.notion.so/my-integrations and create a new integration called Khoj to get an API key.
![setup_new_integration](https://github.com/khoj-ai/khoj/assets/65192171/b056e057-d4dc-47dc-aad3-57b59a22c68b)
3. Share all the workspaces that you want to integrate with the Khoj integration you just made in the previous step
![enable_workspace](https://github.com/khoj-ai/khoj/assets/65192171/98290303-b5b8-4cb0-b32c-f68c6923a3d0)
4. In the first step, you generated an API key. Use the newly generated API Key in your Khoj settings, by default at https://app.khoj.dev/config/content-source/notion. Click `Save`.
5. Click `Configure` in https://app.khoj.dev/config to index your Notion workspace(s).
That's it! You should be ready to start searching and chatting. Make sure you've configured your OpenAI API Key for chat.

View File

@@ -0,0 +1,187 @@
// @ts-check
// `@type` JSDoc annotations allow editor autocompletion and type checking
// (when paired with `@ts-check`).
// There are various equivalent ways to declare your Docusaurus config.
// See: https://docusaurus.io/docs/api/docusaurus-config
import {themes as prismThemes} from 'prism-react-renderer';
/** @type {import('@docusaurus/types').Config} */
const config = {
title: 'Khoj AI',
tagline: 'An AI copilot for your Second Brain',
staticDirectories: ['assets'],
favicon: 'img/favicon-128x128.ico',
// Set the production url of your site here
url: 'https://docs.khoj.dev',
// Set the /<baseUrl>/ pathname under which your site is served
// For GitHub pages deployment, it is often '/<projectName>/'
baseUrl: '/',
// GitHub pages deployment config.
// If you aren't using GitHub pages, you don't need these.
organizationName: 'khoj-ai', // Usually your GitHub org/user name.
projectName: 'khoj', // Usually your repo name.
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
// Even if you don't use internationalization, you can use this field to set
// useful metadata like html lang. For example, if your site is Chinese, you
// may want to replace "en" with "zh-Hans".
i18n: {
defaultLocale: 'en',
locales: ['en'],
},
presets: [
[
'classic',
/** @type {import('@docusaurus/preset-classic').Options} */
({
docs: {
sidebarPath: './sidebars.js',
routeBasePath: '/',
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/khoj-ai/khoj/tree/master/documentation/',
},
blog: {
showReadingTime: true,
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/khoj-ai/khoj/tree/master/documentation/blog/',
},
theme: {
customCss: './src/css/custom.css',
},
}),
],
],
themeConfig:
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
image: 'img/khoj-logo-sideways-500.png',
metadata: [
{name: 'keywords', content: 'khoj, khoj ai, chatgpt, open ai, open source, productivity'},
{name: 'og:title', content: 'Khoj Documentation'},
{name: 'og:type', content: 'website'},
{name: 'og:site_name', content: 'Khoj Documentation'},
{name: 'og:description', content: 'Quickly get started with using or self-hosting Khoj'},
{name: 'og:image', content: 'https://khoj-web-bucket.s3.amazonaws.com/link_preview_docs.png'},
{name: 'og:url', content: 'https://docs.khoj.dev'},
{name: 'keywords', content: 'khoj, khoj ai, chatgpt, open ai, open source, productivity'}
],
navbar: {
title: 'Khoj',
logo: {
alt: 'Khoj AI',
src: 'img/favicon-128x128.ico',
},
items: [
{
href: 'https://github.com/khoj-ai/khoj',
label: '📜 Code',
position: 'right',
},
{
href: 'https://app.khoj.dev/login',
label: '🌍 Cloud',
position: 'right',
},
{
href: 'https://discord.gg/BDgyabRM6e',
label: '💬 Discord',
position: 'right',
},
],
},
footer: {
style: 'dark',
links: [
{
title: 'Docs',
items: [
{
label: 'Get Started',
to: '/',
},
{
label: 'Features',
to: '/features/all_features',
},
{
label: 'Client Apps',
to: '/category/clients',
},
{
label: 'Self-Hosting',
to: '/get-started/setup',
},
{
label: 'Contributing',
to: '/contributing/development',
},
],
},
{
title: 'Community',
items: [
{
label: 'Discord',
href: 'https://discord.gg/BDgyabRM6e',
},
{
label: 'LinkedIn',
href: 'https://www.linkedin.com/company/khoj-ai/'
},
{
label: 'Twitter',
href: 'https://twitter.com/khoj_ai',
},
],
},
{
title: 'More',
items: [
// {
// label: 'Blog',
// to: '/blog',
// },
{
label: 'Cloud',
href: 'https://app.khoj.dev/login',
},
{
label: 'Code',
href: 'https://github.com/khoj-ai/khoj',
},
{
label: 'Website',
href: 'https://khoj.dev',
},
],
},
],
copyright: `Copyright © ${new Date().getFullYear()} Khoj, Inc.`,
},
prism: {
theme: prismThemes.github,
darkTheme: prismThemes.dracula,
},
algolia: {
appId: "NBR0FXJNGW",
apiKey: "8841b34192a28b2d06f04dd28d768017",
indexName: "khoj",
contextualSearch: false,
}
}),
};
export default config;

14629
documentation/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,44 @@
{
"name": "documentation",
"version": "0.0.0",
"private": true,
"scripts": {
"docusaurus": "docusaurus",
"start": "docusaurus start",
"build": "docusaurus build",
"swizzle": "docusaurus swizzle",
"deploy": "docusaurus deploy",
"clear": "docusaurus clear",
"serve": "docusaurus serve",
"write-translations": "docusaurus write-translations",
"write-heading-ids": "docusaurus write-heading-ids"
},
"dependencies": {
"@docusaurus/core": "3.1.0",
"@docusaurus/preset-classic": "3.1.0",
"@mdx-js/react": "^3.0.0",
"clsx": "^2.0.0",
"prism-react-renderer": "^2.3.0",
"react": "^18.0.0",
"react-dom": "^18.0.0"
},
"devDependencies": {
"@docusaurus/module-type-aliases": "3.1.0",
"@docusaurus/types": "3.1.0"
},
"browserslist": {
"production": [
">0.5%",
"not dead",
"not op_mini all"
],
"development": [
"last 3 chrome version",
"last 3 firefox version",
"last 5 safari version"
]
},
"engines": {
"node": ">=18.0"
}
}

33
documentation/sidebars.js Normal file
View File

@@ -0,0 +1,33 @@
/**
* Creating a sidebar enables you to:
- create an ordered group of docs
- render a sidebar for each doc of that group
- provide next/previous navigation
The sidebars can be generated from the filesystem, or explicitly defined here.
Create as many sidebars as you want.
*/
// @ts-check
/** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */
const sidebars = {
// By default, Docusaurus generates a sidebar from the docs folder structure
tutorialSidebar: [{type: 'autogenerated', dirName: '.'}],
// But you can create a sidebar manually
/*
tutorialSidebar: [
'intro',
'hello',
{
type: 'category',
label: 'Tutorial',
items: ['tutorial-basics/create-a-document'],
},
],
*/
};
export default sidebars;

View File

@@ -0,0 +1,11 @@
.features {
display: flex;
align-items: center;
padding: 2rem 0;
width: 100%;
}
.featureSvg {
height: 200px;
width: 200px;
}

View File

@@ -0,0 +1,37 @@
/**
* Any CSS included here will be global. The classic template
* bundles Infima by default. Infima is a CSS framework designed to
* work well for content-centric websites.
*/
@import url('https://fonts.googleapis.com/css2?family=Source+Sans+3&display=swap');
/* You can override the default Infima variables here. */
:root {
--ifm-color-primary: #fcc50b;
--ifm-color-primary-dark: #fcc50b;
--ifm-color-primary-darker: #fcc50b;
--ifm-color-primary-darkest: #fcc50b;
--ifm-color-primary-light: #fcc50b;
--ifm-color-primary-lighter: #fcc50b;
--ifm-color-primary-lightest: #fcc50b;
--ifm-code-font-size: 95%;
--ifm-heading-font-family: 'Source Sans 3', sans-serif;
--docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.1);
}
/* For readability concerns, you should choose a lighter palette in dark mode. */
[data-theme='dark'] {
--ifm-color-primary: #fcc50b;
--ifm-color-primary-dark: #fcc50b;
--ifm-color-primary-darker: #fcc50b;
--ifm-color-primary-darkest: #fcc50b;
--ifm-color-primary-light: #fcc50b;
--ifm-color-primary-lighter: #fcc50b;
--ifm-color-primary-lightest: #fcc50b;
--docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3);
}
body {
font-family: 'Source Sans 3', sans-serif;
}

8344
documentation/yarn.lock Normal file

File diff suppressed because it is too large Load Diff

10
gunicorn-config.py Normal file
View File

@@ -0,0 +1,10 @@
import multiprocessing
bind = "0.0.0.0:42110"
workers = 4
worker_class = "uvicorn.workers.UvicornWorker"
timeout = 120
keep_alive = 60
accesslog = "access.log"
errorlog = "error.log"
loglevel = "debug"

View File

@@ -1,10 +1,10 @@
{
"id": "khoj",
"name": "Khoj",
"version": "0.2.6",
"minAppVersion": "0.15.0",
"description": "Natural, Incremental Search for your Second Brain 🦅",
"author": "Debanjum Singh Solanky",
"authorUrl": "https://github.com/debanjum",
"isDesktopOnly": false
"id": "khoj",
"name": "Khoj",
"version": "1.4.0",
"minAppVersion": "0.15.0",
"description": "An AI copilot for your Second Brain",
"author": "Khoj Inc.",
"authorUrl": "https://github.com/khoj-ai",
"isDesktopOnly": false
}

31
prod.Dockerfile Normal file
View File

@@ -0,0 +1,31 @@
# Use Nvidia's latest Ubuntu 22.04 image as the base image
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
LABEL org.opencontainers.image.source https://github.com/khoj-ai/khoj
# Install System Dependencies
RUN apt update -y && apt -y install python3-pip git libsqlite3-0 ffmpeg libsm6 libxext6
WORKDIR /app
# Install Application
COPY pyproject.toml .
COPY README.md .
ARG VERSION=0.0.0
RUN sed -i "s/dynamic = \\[\"version\"\\]/version = \"$VERSION\"/" pyproject.toml && \
TMPDIR=/home/cache/ pip install --cache-dir=/home/cache/ -e .
# Copy Source Code
COPY . .
RUN apt install vim -y
# Set the PYTHONPATH environment variable in order for it to find the Django app.
ENV PYTHONPATH=/app/src:$PYTHONPATH
# Run the Application
# There are more arguments required for the application to run,
# but these should be passed in through the docker-compose.yml file.
ARG PORT
EXPOSE ${PORT}
ENTRYPOINT [ "gunicorn", "-c", "gunicorn-config.py", "src.khoj.main:app" ]

View File

@@ -4,10 +4,10 @@ build-backend = "hatchling.build"
[project]
name = "khoj-assistant"
description = "A natural language search engine for your personal notes, transactions and images"
description = "An AI copilot for your Second Brain"
readme = "README.md"
license = "GPL-3.0-or-later"
requires-python = ">=3.8, <3.11"
requires-python = ">=3.8"
authors = [
{ name = "Debanjum Singh Solanky, Saba Imran" },
]
@@ -19,45 +19,74 @@ keywords = [
"AI",
"org-mode",
"markdown",
"beancount",
"images",
"pdf",
]
classifiers = [
"Development Status :: 4 - Beta",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Topic :: Internet :: WWW/HTTP :: Indexing/Search",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Scientific/Engineering :: Human Machine Interfaces",
"Topic :: Text Processing :: Linguistic",
]
dependencies = [
"dateparser == 1.1.1",
"bs4 >= 0.0.1",
"dateparser >= 1.1.1",
"defusedxml == 0.7.1",
"fastapi == 0.77.1",
"jinja2 == 3.1.2",
"openai == 0.20.0",
"pillow == 9.3.0",
"pydantic == 1.9.1",
"pyqt6 == 6.3.1",
"fastapi >= 0.104.1",
"python-multipart >= 0.0.5",
"jinja2 == 3.1.3",
"openai >= 1.0.0",
"tiktoken >= 0.3.2",
"tenacity >= 8.2.2",
"pillow ~= 9.5.0",
"pydantic >= 2.0.0",
"pyyaml == 6.0",
"rich >= 13.3.1",
"schedule == 1.1.0",
"sentence-transformers == 2.2.2",
"torch == 1.13.1",
"transformers >= 4.28.0",
"torch == 2.0.1",
"uvicorn == 0.17.6",
"aiohttp ~= 3.9.0",
"langchain <= 0.2.0",
"requests >= 2.26.0",
"bs4 >= 0.0.1",
"anyio == 3.7.1",
"pymupdf >= 1.23.5",
"django == 4.2.7",
"authlib == 1.2.1",
"gpt4all >= 2.1.0; platform_system == 'Linux' and platform_machine == 'x86_64'",
"gpt4all >= 2.1.0; platform_system == 'Windows' or platform_system == 'Darwin'",
"itsdangerous == 2.1.2",
"httpx == 0.25.0",
"pgvector == 0.2.4",
"psycopg2-binary == 2.9.9",
"google-auth == 2.23.3",
"python-multipart == 0.0.6",
"gunicorn == 21.2.0",
"lxml == 4.9.3",
"tzdata == 2023.3",
"rapidocr-onnxruntime == 1.3.8",
"stripe == 7.3.0",
"openai-whisper >= 20231117",
"django-phonenumber-field == 7.3.0",
"phonenumbers == 8.13.27",
"twilio == 8.11"
]
dynamic = ["version"]
[project.urls]
Homepage = "https://github.com/debanjum/khoj#readme"
Issues = "https://github.com/debanjum/khoj/issues"
Discussions = "https://github.com/debanjum/khoj/discussions"
Releases = "https://github.com/debanjum/khoj/releases"
Homepage = "https://github.com/khoj-ai/khoj#readme"
Issues = "https://github.com/khoj-ai/khoj/issues"
Discussions = "https://github.com/khoj-ai/khoj/discussions"
Releases = "https://github.com/khoj-ai/khoj/releases"
[project.scripts]
khoj = "khoj.main:run"
@@ -65,12 +94,19 @@ khoj = "khoj.main:run"
[project.optional-dependencies]
test = [
"pytest >= 7.1.2",
"freezegun >= 1.2.0",
"factory-boy >= 3.2.1",
"trio >= 0.22.0",
"pytest-xdist",
"psutil >= 5.8.0",
]
dev = [
"khoj-assistant[test]",
"mypy >= 1.0.1",
"black >= 23.1.0",
"pre-commit >= 3.0.4",
"pytest-django == 4.5.2",
"pytest-asyncio == 0.21.1",
]
[tool.hatch.version]
@@ -95,3 +131,12 @@ warn_unused_ignores = false
[tool.black]
line-length = 120
[tool.isort]
profile = "black"
[tool.pytest.ini_options]
addopts = "--strict-markers"
markers = [
"chatquality: Evaluate chatbot capabilities and quality",
]

6
pytest.ini Normal file
View File

@@ -0,0 +1,6 @@
[pytest]
DJANGO_SETTINGS_MODULE = khoj.app.settings
pythonpath = . src
testpaths = tests
markers =
chatquality: marks tests as chatquality (deselect with '-m "not chatquality"')

94
scripts/bump_version.sh Executable file
View File

@@ -0,0 +1,94 @@
#!/bin/zsh
project_root=$PWD
while getopts 'nc:' opt;
do
case "${opt}" in
c)
# Get current project version
current_version=$OPTARG
# Bump Desktop app to current version
cd $project_root/src/interface/desktop
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$current_version\",/" package.json
rm *.bak
# Bump Obsidian plugin to current version
cd $project_root/src/interface/obsidian
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$current_version\",/" package.json
sed -E -i.bak "s/version\": \"(.*)\"/version\": \"$current_version\"/" manifest.json
cp $project_root/versions.json .
npm run version # append current version
rm *.bak
# Bump Emacs package to current version
cd ../emacs
sed -E -i.bak "s/^;; Version: (.*)/;; Version: $current_version/" khoj.el
git add khoj.el
rm *.bak
# Copy current obsidian versioned files to project root
cd $project_root
cp src/interface/obsidian/versions.json .
cp src/interface/obsidian/manifest.json .
# Run pre-commit validation to fix jsons
pre-commit run --hook-stage manual --all
# Commit changes and tag commit for release
git add \
$project_root/src/interface/desktop/package.json \
$project_root/src/interface/obsidian/package.json \
$project_root/src/interface/obsidian/manifest.json \
$project_root/src/interface/obsidian/versions.json \
$project_root/src/interface/emacs/khoj.el \
$project_root/manifest.json \
$project_root/versions.json
git commit -m "Release Khoj version $current_version"
git tag $current_version master
;;
n)
# Induce hatch to compute next version number
# remove .dev[commits-since-tag] version suffix from hatch computed version number
next_version=$(touch bump.txt && git add bump.txt && hatch version | sed 's/\.dev.*//g')
git rm --cached -- bump.txt && rm bump.txt
# Bump Desktop app to next version
cd $project_root/src/interface/desktop
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$current_version\",/" package.json
rm *.bak
# Bump Obsidian plugins to next version
cd $project_root/src/interface/obsidian
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$next_version\",/" package.json
sed -E -i.bak "s/version\": \"(.*)\"/version\": \"$next_version\"/" manifest.json
npm run version # updates versions.json
rm *.bak
# Bump Emacs package to next version
cd $project_root/src/interface/emacs
sed -E -i.bak "s/^;; Version: (.*)/;; Version: $next_version/" khoj.el
rm *.bak
# Run pre-commit validations to fix jsons
pre-commit run --hook-stage manual --all
# Commit changes
git add \
$project_root/src/interface/desktop/package.json \
$project_root/src/interface/obsidian/package.json \
$project_root/src/interface/obsidian/manifest.json \
$project_root/src/interface/obsidian/versions.json \
$project_root/src/interface/emacs/khoj.el
git commit -m "Bump Khoj to pre-release version $next_version"
;;
?)
echo -e "Invalid command option.\nUsage: $(basename $0) [-c] [-n]"
exit 1
;;
esac
done
# Restore State
cd $project_root

View File

@@ -0,0 +1,29 @@
# Run it locally
## Prerequisites
Install the runtime dependencies. This command should install all dev dependencies.
```bash
yarn add
```
Run the application
```bash
yarn start
```
# Deploying the Electron App
## Prerequisites
Install the ToDesktop CLI. Full documentation can be found here: https://www.npmjs.com/package/@todesktop/cli
```bash
yarn global add @todesktop/cli
```
Configure the `todesktop.json` file. Fill in the `id` based on the application ID.
## Build
This will prompt you to login. It triggers builds for all platforms.
```bash
todesktop build
```
If you get an error saying the command is not found, make sure that your `yarn` global bin directory is in your `PATH` environment variable. You can find the location of the global bin directory by running `yarn global bin`. Add this line to your `.bashrc` or `.zshrc` file: `export PATH="$PATH:$(yarn global bin)"`.

View File

@@ -0,0 +1,88 @@
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0 maximum-scale=1.0">
<title>Khoj - About</title>
<link rel="icon" type="image/png" sizes="128x128" href="./assets/icons/favicon-128x128.png">
<link rel="manifest" href="/static/khoj_chat.webmanifest">
<link rel="stylesheet" href="./assets/khoj.css">
</head>
<script type="text/javascript" src="./utils.js"></script>
<style>
html, body {
height: 100%;
width: 100%;
padding: 0px;
margin: 0px;
}
body {
display: grid;
grid-template-rows: auto;
background: var(--background-color);
color: var(--main-text-color);
text-align: center;
font-family: var(--font-family);
font-size: small;
font-weight: 300;
line-height: 1.5em;
}
header > *,
body > * {
padding: 0px;
margin: 0px;
}
header > * {
margin-top: 20px;
}
img {
width: 100px;
height: 100px;
margin-top: 32px;
}
p {
font-size: 14px;
}
#about-page-version {
margin: 0;
}
.button {
display: block;
width: 60%;
padding: 10px 16px;
margin: 10px auto;
background-color: var(--primary);
border: none;
border-radius: 8px;
cursor: pointer;
transition: background-color 0.3s;
}
.button:hover {
background-color: var(--primary-hover);
}
footer {
font-size: 10px;
color: slategray;
margin-top: 10px;
}
</style>
<body>
<header>
<img id="logo" src="./assets/icons/favicon-128x128.png" alt="Khoj Logo">
<p id="about-page-title"><b>Khoj for Desktop</b>
<p id="about-page-version"></p>
</header>
<div class="action">
<button class="button" onclick="window.open('https://khoj.dev/terms-of-service', '_blank')">Terms of Service</button>
<button class="button" onclick="window.open('https://khoj.dev/privacy-policy', '_blank')">Privacy Policy</button>
</div>
<footer>
© 2023 Khoj Inc. All rights reserved.
</footer>
</body>
</html>

View File

@@ -0,0 +1,5 @@
<?xml version="1.0" encoding="utf-8"?><!-- Uploaded to: SVG Repo, www.svgrepo.com, Generator: SVG Repo Mixer Tools -->
<svg width="800px" height="800px" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M15 12L12 12M12 12L9 12M12 12L12 9M12 12L12 15" stroke="#1C274C" stroke-width="1.5" stroke-linecap="round"/>
<path d="M7 3.33782C8.47087 2.48697 10.1786 2 12 2C17.5228 2 22 6.47715 22 12C22 17.5228 17.5228 22 12 22C6.47715 22 2 17.5228 2 12C2 10.1786 2.48697 8.47087 3.33782 7" stroke="#1C274C" stroke-width="1.5" stroke-linecap="round"/>
</svg>

After

Width:  |  Height:  |  Size: 580 B

View File

@@ -0,0 +1 @@
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 122.88 122.88"><defs><style>.cls-1{fill:#00a912;}.cls-1,.cls-2{fill-rule:evenodd;}.cls-2{fill:#fff;}</style></defs><title>confirm</title><path class="cls-1" d="M61.44,0A61.44,61.44,0,1,1,0,61.44,61.44,61.44,0,0,1,61.44,0Z"/><path class="cls-2" d="M42.37,51.68,53.26,62,79,35.87c2.13-2.16,3.47-3.9,6.1-1.19l8.53,8.74c2.8,2.77,2.66,4.4,0,7L58.14,85.34c-5.58,5.46-4.61,5.79-10.26.19L28,65.77c-1.18-1.28-1.05-2.57.24-3.84l9.9-10.27c1.5-1.58,2.7-1.44,4.22,0Z"/></svg>

After

Width:  |  Height:  |  Size: 549 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

Binary file not shown.

View File

@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- Uploaded to: SVG Repo, www.svgrepo.com, Generator: SVG Repo Mixer Tools -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
viewBox="0 0 512 512" xml:space="preserve">
<path id="SVGCleanerId_0" style="fill:#FFC36E;" d="M183.295,123.586H55.05c-6.687,0-12.801-3.778-15.791-9.76l-12.776-25.55
l12.776-25.55c2.99-5.982,9.103-9.76,15.791-9.76h128.246c6.687,0,12.801,3.778,15.791,9.76l12.775,25.55l-12.776,25.55
C196.096,119.808,189.983,123.586,183.295,123.586z"/>
<g>
<path id="SVGCleanerId_0_1_" style="fill:#FFC36E;" d="M183.295,123.586H55.05c-6.687,0-12.801-3.778-15.791-9.76l-12.776-25.55
l12.776-25.55c2.99-5.982,9.103-9.76,15.791-9.76h128.246c6.687,0,12.801,3.778,15.791,9.76l12.775,25.55l-12.776,25.55
C196.096,119.808,189.983,123.586,183.295,123.586z"/>
</g>
<path style="fill:#EFF2FA;" d="M485.517,70.621H26.483c-4.875,0-8.828,3.953-8.828,8.828v44.138h476.69V79.448
C494.345,74.573,490.392,70.621,485.517,70.621z"/>
<rect x="17.655" y="105.931" style="fill:#E1E6F2;" width="476.69" height="17.655"/>
<path style="fill:#FFD782;" d="M494.345,88.276H217.318c-3.343,0-6.4,1.889-7.895,4.879l-10.336,20.671
c-2.99,5.982-9.105,9.76-15.791,9.76H55.05c-6.687,0-12.801-3.778-15.791-9.76L28.922,93.155c-1.495-2.99-4.552-4.879-7.895-4.879
h-3.372C7.904,88.276,0,96.18,0,105.931v335.448c0,9.751,7.904,17.655,17.655,17.655h476.69c9.751,0,17.655-7.904,17.655-17.655
V105.931C512,96.18,504.096,88.276,494.345,88.276z"/>
<path style="fill:#FFC36E;" d="M485.517,441.379H26.483c-4.875,0-8.828-3.953-8.828-8.828l0,0c0-4.875,3.953-8.828,8.828-8.828
h459.034c4.875,0,8.828,3.953,8.828,8.828l0,0C494.345,437.427,490.392,441.379,485.517,441.379z"/>
<path style="fill:#EFF2FA;" d="M326.621,220.69h132.414c4.875,0,8.828-3.953,8.828-8.828v-70.621c0-4.875-3.953-8.828-8.828-8.828
H326.621c-4.875,0-8.828,3.953-8.828,8.828v70.621C317.793,216.737,321.746,220.69,326.621,220.69z"/>
<path style="fill:#C7CFE2;" d="M441.379,167.724h-97.103c-4.875,0-8.828-3.953-8.828-8.828l0,0c0-4.875,3.953-8.828,8.828-8.828
h97.103c4.875,0,8.828,3.953,8.828,8.828l0,0C450.207,163.772,446.254,167.724,441.379,167.724z"/>
<path style="fill:#D7DEED;" d="M441.379,203.034h-97.103c-4.875,0-8.828-3.953-8.828-8.828l0,0c0-4.875,3.953-8.828,8.828-8.828
h97.103c4.875,0,8.828,3.953,8.828,8.828l0,0C450.207,199.082,446.254,203.034,441.379,203.034z"/>
</svg>

After

Width:  |  Height:  |  Size: 2.4 KiB

View File

@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<svg width="800px" height="800px" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path fill-rule="evenodd" clip-rule="evenodd" d="M22 8.29344C22 11.7692 19.1708 14.5869 15.6807 14.5869C15.0439 14.5869 13.5939 14.4405 12.8885 13.8551L12.0067 14.7333C11.4883 15.2496 11.6283 15.4016 11.8589 15.652C11.9551 15.7565 12.0672 15.8781 12.1537 16.0505C12.1537 16.0505 12.8885 17.075 12.1537 18.0995C11.7128 18.6849 10.4783 19.5045 9.06754 18.0995L8.77362 18.3922C8.77362 18.3922 9.65538 19.4167 8.92058 20.4412C8.4797 21.0267 7.30403 21.6121 6.27531 20.5876L5.2466 21.6121C4.54119 22.3146 3.67905 21.9048 3.33616 21.6121L2.45441 20.7339C1.63143 19.9143 2.1115 19.0264 2.45441 18.6849L10.0963 11.0743C10.0963 11.0743 9.3615 9.90338 9.3615 8.29344C9.3615 4.81767 12.1907 2 15.6807 2C19.1708 2 22 4.81767 22 8.29344ZM15.681 10.4889C16.8984 10.4889 17.8853 9.50601 17.8853 8.29353C17.8853 7.08105 16.8984 6.09814 15.681 6.09814C14.4635 6.09814 13.4766 7.08105 13.4766 8.29353C13.4766 9.50601 14.4635 10.4889 15.681 10.4889Z" fill="#1C274C"/>
</svg>

After

Width:  |  Height:  |  Size: 1.1 KiB

Some files were not shown because too many files have changed in this diff Show More