Compare commits

...

490 Commits

Author SHA1 Message Date
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum
dc495babb3 Add Telemetry to Understand Khoj Usage
### Objective: 
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.

Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?

### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky
55d72231b3 Generate docker image for telemetry server using Github workflow 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
134cce9d32 Batch upload telemetry data at regular interval instead of while querying 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
3ede919c66 Log usage of /search, /chat, /update API endpoints to telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
f2e89f6f46 Add khoj app helper methods to log app usage to a telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
9ca61d62ff Enable/disable logging telemetry by setting bool in khoj.yml config
We log usage telemetry by default, unless setting explicitly set in
khoj.yml
2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky
131b8407b5 Allow Khoj Chat to respond to general queries not in reference notes
- Khoj chat will now respond to general queries if:
  1. no relevant reference notes available or
  2. when explicitly induced by prefixing the chat message with "@general"

- Previously Khoj Chat would a lot of times refuse to respond to
  general queries not answerable from reference notes or chat history

- Make chat quality tests more robust
  - Add more equivalent chat response options refusing to answer
  - Force haiku writing to not give any preable, just the haiku
2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky
cc75f986b2 Test text search index only updates on changes to text content 2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky
f9ccce430e Allow configuring OpenAI chat model for Khoj chat
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
  defaults to using gpt-3.5-turbo, unless chat-model field under
  conversation processor updated in khoj.yml
2023-05-03 23:01:13 +08:00
Debanjum
f0253e2cbb Include Filename, Entry Heading in All Compiled Entries to Improve Search Context
Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context

- Set filename as top heading in compiled org, markdown entries
  - Note: *Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text*. The updated structure should provide more schematic context of relevance
- Set entry heading as heading for compiled org, md entries, even if split by max tokens
- Snip prepended heading to avoid crossing model max_token limits
- Entries with no md headings should not get heading prefix prepended
2023-05-03 22:59:30 +08:00
Debanjum Singh Solanky
6b535cc345 Snip prepended heading to avoid crossing model max_token limits
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.

100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
02aeee60aa Set filename as top heading of org entries for better search context
Previously filename was only being appended to markdown entries.

Test filename getting prepended to compiled entry as heading
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
94825a70b9 Set heading of md entries to improve search context for long entries
Otherwise if a markdown entry is longer than max_tokens, the split
entries (apart from first one) do not get their heading context set
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
5de04621b5 Set filename as top heading of md entries for better search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context

Test filename getting prepended as heading to compiled entry
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
0e3fb59e09 Entries with no md headings should not get heading prefix prepended
Files with no headings would previously get their entry be prefixed
with a markdown heading prefix (#)
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
45a991d75c Prepend entry heading to all compiled org snippets to improve search context
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.

This limits search context required to retrieve these continuation
entries
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
3386cc92b5 Fix khoj server config update in khoj.el by unquoting list to cl-push to
- cl-push expects a generatlized variable. Else throws (setf quote)
  undefined warning
- This results in the config call failing on calling khoj entrypoint
2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky
948a4274e4 Fix documentation strings and simplify not null checks 2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky
731ef5688f Use cl-pushnew to fix byte-compile errors with using add-to-list 2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky
f046523b33 Improve khoj.el messages to convey state of khoj server
- Remove waiting for server message as it hides the messages from the
  server
- Fix the nil message that were being rendered, by checking before
  showing messages from server
- Consistently prefix messages from khoj with khoj.el
2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky
76df393eb5 Only call khoj server configure API from khoj.el when config updated
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs

Also show more details to user about what in khoj is being configured
2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
ceae06ae9d Fix khoj.el compilation warnings around unused variables 2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
8269adf849 Refactor khoj-setup in khoj.el for readability. No functional change 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
865d12b6f2 Fix escaping quote in chat references to prevent it breaking out of html 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
26cb878327 Add Yarn lockfile for Khoj Obsidian 2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky
e3180d63e6 Sync Khoj Obsidian Tagline with Khoj tagline 2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky
62e6e09521 Release Khoj version 0.6.1 2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky
b079fb31bc Replace Windows path separators in indexName configured via Khoj Obsidian
Resolves #185, #199

- Issue
  IndexName created from Obsidian Absolute Vault path wasn't replacing
  windows path, drive separators with underscore. It was only
  replacing unix path separators

- Fix
  Also replace windows drive and path separators with _ while creating
  IndexName in Khoj Obsidian plugin
2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky
d90df966a9 Make khoj logger use utf-8 encoding when writing to khoj log file
Resolve logger error issue mentioned in #199
2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky
dc3f399f91 Fix to get score associated with SearchResponse in result as string 2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky
d5000c63e1 Update Readmes to use python -m pip install khoj-assistant
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky
453c84ab79 Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes 2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky
35aa06067f Release Khoj version 0.6.0
Upload styles.css via release workflow
2023-03-31 18:13:16 +07:00
Debanjum
8f4e5d3d83 Improve Styling of Khoj Search Modal on Obsidian and Indexing of Markdown
Merge pull request #198 from debanjum/improve-khoj-search-for-markdown-obsidian

### Overview
- Copied Khoj Search Modal styling from Jim Prince's PR #135 with minor improvements
- Implements improvements to the Khoj Search in Markdown/Obsidian suggested by folks. Specifically:
  - #133
  - #134
  - #142

### Changes
- 5673bd5 Keep original formatting in compiled text entry strings
- a2ab68a Include filename of markdown entries for search indexing
- 6712996 Create Note with Query as title from within Khoj Search Modal
- d3257cb Style the search result. Use Obsidian theme colors and font-size
- 4009148 For each result: snip it by lines, show filename, remove frontmatter
2023-03-30 14:15:23 +07:00
Debanjum Singh Solanky
5673bd5b96 Keep original formatting in compiled text entry strings
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
  No point in dropping the formatting unnecessarily,
  even if (say) the currrent search models don't account for it (yet)
2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky
a2ab68a7a2 Include filename of markdown entries for search indexing
Append originating filename to compiled string of each entry for
better search quality by providing more context to model

Update markdown_to_jsonl tests to ensure filename being added

Resolves #142
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
67129964a7 Create Note with Query as title from within Khoj Search Modal
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.

The note creation code is borrowed from Omnisearch.

Resolves #133
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
d3257cb24e Style the search result. Use Obsidian theme colors and font-size
Based on PR #135
2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky
40091489c0 For each result: snip it by lines, show filename, remove frontmatter
Based on PR #135
Resolves #134
2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky
240db7b4f0 Add screenshot of Khoj chat on Obsidian to Readme. Fix links 2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky
234be96e53 Fix processor key used to configure chat model in khoj obsidian 2023-03-30 01:47:09 +07:00
Debanjum
53d421f9c6 Create Chat Modal for Obsidian Plugin
Merge pull request #196 from debanjum/create-chat-modal-for-obsidian

- Set your OpenAI API key in the Khoj Obsidian Settings
- Use Modal in Obsidian for Chat
- Style Chat Modal combining the Khoj Web interface and Obsidian theme style
2023-03-30 01:37:07 +07:00
Debanjum Singh Solanky
c8c0cfd10e Add Chat features, setup and usage to Khoj Obsidian plugin Readme 2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky
7ecae224e7 Configure OpenAI API Key from the Khoj plugin setting in Obsidian 2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky
3d616c8d65 Use Obsidian font sizes. Improve input field, reference indexing
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky
23bd737f6b Use chat input element to send message on Enter. No send button required 2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky
81e98c3079 Scroll to bottom of modal on open and message send 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
59ff1ae27f Use obsidian theme colors for bg, text. Restrict css namespace via prefix 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
001ac7b5eb Style Obsidian Chat Modal like Khoj Chat Web Interface
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
  - Modify it to work under a Obsidian modal
  - So replace html, body styling from web interface to instead
    styling new "khoj-chat" class attached to contentEl of modal
2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
112f388ada Render references next to chat responses by khoj in chat modal 2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky
1d3d949962 Render conversation logs on page load 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
cd46a17e5f Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
c0972e09e6 Rename KhojModal to KhojSearchModal, a more specific name for it
In preparation to introduce Khoj chat in Obsidian
2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
64fff1d372 Release Khoj version 0.5.0 2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky
7478d08803 Update main readme to mention chat features 2023-03-27 22:02:53 +07:00
Debanjum Singh Solanky
fc218508f9 Update khoj.el docs and Emacs Readme for chat, simplified setup 2023-03-27 22:02:47 +07:00
Debanjum
87090531da Install, Start and Configure Khoj Server from Emacs
Merge pull request #193 from debanjum/simplify-khoj-server-setup-on-emacs

## Major Changes
- ae535a0 Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs
- 82eb4bf Setup Khoj server on opening khoj.el
- 99d19dc Start Khoj server from Emacs using khoj.el
- c92d791 Install Khoj server from Emacs using khoj.el
  *This assumes you have python (<3.11) and pip installed in a system path*

### Sample Config
- Enable Khoj Chat by configuring you OpenAI API Key
- Specify Org Files, Directories to Index for Search (and Chat)
  By default, your org-agenda-files (include archive files)) are indexed
- Invoke khoj by calling `C-c s`

``` emacs-lisp
(use-package khoj
  :after org
  :straight (khoj
             :type git
             :host github
             :repo "debanjum/khoj"
             :files ("src/interface/emacs/khoj.el"))
  :bind ("C-c s" . 'khoj)
  :config (setq
           khoj-openai-api-key "<YOUR_OPENAI_API_KEY_FOR_KHOJ_CHAT>"
           khoj-org-directories '("~/docs/notes" "~/docs/journals")
           khoj-org-files '("~/docs/tasks.org" "~/docs/journal.org" "~/docs/archive.org")))
```
2023-03-27 18:49:43 +07:00
Debanjum Singh Solanky
83a7ccd729 Fix docstrings and method ordering in khoj.el 2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky
5c2327ee4f Configure org directories to index from khoj.el
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path

Should help setup for org-roam users from khoj.el
2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky
6e8a40906d Allow disabling automatic server setup. Fix server start vs ready logic
- khoj-auto-setup controls whether to automatically check for and
  setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
  method. Allows calling khoj-setup during package load via init.el

- Fix: Do not attempt to configure or wait for server ready if
  user has said no to auto-setup request
- Fix logic to mark server started vs ready
  - Previously the started/running vs ready variables defs were getting
    intertwined
  - Server started indicates server bootup has been triggered
  - Server ready indicates server API ready to accept requests
2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky
526a927bce Fix org entry extraction test, variable prefixed with khoj in khoj.el
Discovered via failing build and test workflows on Github
2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky
7243059507 Track index update asynchronously via moon phase progressbar in khoj.el 2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky
8a9055f918 Restrict server messages show in echo area to main server files 2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky
ae535a06eb Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs 2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky
36b17d4ae0 Generalize the directory from config extraction elisp method 2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky
924424c754 Throw actionable exceptions when content types or chat not configured 2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky
359a2cacef Fix khoj--server-running to work with unconfigured or external server
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)

- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky
d7fb9a596e Auto configure server before loading khoj-menu
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done

As easier to know when server ready to configure
2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky
8a21aff438 Make khoj.el server start, stop, restart, setup methods interactive
No need to erase temporary buffers before working on them
2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky
cb40a96c85 Index configured org files from khoj.el
- Set `khoj-org-files-index' to list of files to index
- Defaults to indexing org-agenda-files
- Uses khoj server api to configure org files to index
2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky
50760acc37 Wait for Khoj server to get ready before opening khoj.el transient menu
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
  Until then khoj features wouldn't work anyway, so avoids confusion
2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky
82eb4bfd0d Setup Khoj server on opening khoj from with Emacs
- Create helper methods to check, stop, restart, setup khoj server
- (Ask to) setup khoj server on calling khoj main entrypoint function
2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky
99d19dcf43 Start Khoj server from Emacs using khoj.el 2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky
c92d79118a Install Khoj server from Emacs using khoj.el 2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky
e281a498b4 Style Khoj search org buffer via elisp instead of in-buffer settings 2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky
4f655d20ae Style Khoj chat directly via elisp instead of via in-buffer settings 2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky
f6ff7b1beb Render foonote reference links as superscript for Khoj Chat on Emacs 2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky
285a2b86d2 Use aiohttp version 3.8.4 as 4.x breaks docker image build 2023-03-26 05:33:02 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum
0aebf624fc Improve Khoj Chat in Emacs, Server
Merge pull request #192 from debanjum/improvements-to-khoj-chat-in-emacs

### Khoj Chat on Emacs Improvements
- d78454d Load Khoj Chat buffer before asking for query to provide context
- 93e2aff Use org footnotes to add references, allows jump to def on click
- 5e9558d Stylize reference links as superscripts and show definition on hover
- bc71c19 Use `m` or `C-x m` in-buffer keybindings to send messages to Khoj

### Khoj Chat Server Improvements
- 27217a3 Time chat API sub-components for performance analysis
- 508b217 Update Chat API, Logs, Interfaces to store, use references as list
- d4b3866 Truncate message logs to below max supported prompt size by chat model
- cf28f10 Register separate timestamps for user query and response by Khoj Chat
2023-03-25 05:49:27 +07:00
Debanjum Singh Solanky
ff846f05c5 Clean-up khoj.el based on linting helpers and manual review 2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
4725416fbd Use shortcut keybindings in buffer to ease sending messages to Khoj 2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky
508b2176b7 Update Chat API, Logs, Interfaces to store, use references as list
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
b08745b541 Keep chat messages at 1 empty line visible distance in khoj.el
- Clean redundant concat, format string
- Improve variable name to emojified sender
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
27217a330d Time chat API sub-components for performance analysis
Time and the search query extraction, search and response generation
components
2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky
5e9558d39d Stylize references shown as footnote links in chat messages
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
  - Add continuation suffix, ..., when reference definition truncated
2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky
cf28f104c7 Register separate timestamps for user query and response by Khoj Chat 2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky
93e2aff786 Add references as org footnotes instead of links 2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky
d78454d4ad Load Khoj Chat buffer before asking for query to provide context 2023-03-24 13:43:46 +07:00
Debanjum
4070d13a96 Create Khoj Chat Interface in Emacs
Merge pull request #191 from debanjum/create-chat-interface-on-emacs

- Render conversation history in a read-only org-mode buffer for Khoj Chat
- Add `chat` as a transient action in the Khoj transient menu
- Style chat messages as org-mode entries
  - Put received date in property drawer and keep it hidden/folded by default
  - Add Khoj chat response as child entry of the users associated question org entry
    This allows folding back-n-forth between user and Khoj for easier viewing
  - Render source notes snippets used as references for response as org-mode links
    Hovering mouse on link or opening links shows reference note snippets used
2023-03-22 16:32:40 -06:00
Debanjum Singh Solanky
863933daaa Resolve build issues found by melpazoid 2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky
e9ca04af0d Require dash, org to run ERT tests for khoj.el 2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky
06df394d6c Style chat messages as org-mode entries in Emacs
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
  - Improves color coding for readability
  - Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky
364e6c11af Render chat history from API in chat buffer on first run
- Generalize the render-chat-response method to handle rendering
  history or chat response from chat API reponse

- Trigger rendering of khoj chat history if Khoj chat buffer not
  created for this session yet
2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky
36b52fdd0a Properly escape reference links before rendering
- Use org-insert-link method to improve link rendering robustness
  Previous simple mechanism to crete org-links would result in links
  escaping out of formating. Use a user-facing org-mode method to
  remove/reduce probability of this

- Replace newlines with space to render reference notes as links
2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky
72f63a6ef7 Add basic chat interface for Khoj on Emacs
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
  - [sender-name]: *[message]*
    - /[receive-date]/
- Add references as org links with context visible on hover,
  but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
  Use `-map-indexed' method from dash
2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky
e4d67694e1 Add search to method, variable names meant for khoj search in khoj.el
In preparation to introduce Khoj chat in Emacs
2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky
98e5ea4940 Fix name of default encoder to replace in multi-lingual model setup docs 2023-03-21 20:38:17 -06:00
Debanjum Singh Solanky
2f6284872d Mention Khoj needs Python version 3.10 or lower in docs 2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky
a9b81975f2 Fix encoder model name to configure multilingual search in Readme
See comment in issue #98 for stale model name comment
2023-03-19 17:27:53 -06:00
Debanjum
b351cfb8a0 Add Search Actor to Improve Querying Notes for Khoj Chat
Merge pull request #189 from debanjum/add-search-actor-to-improve-notes-lookup-for-chat

### Introduce Search Actor
Search actor infers Search Queries from user's message
- Capabilities
  - Use previous messages to add context to current search queries[^1]
    This improves quality of responses in multi-turn conversations. 
  - Deconstruct users message into multiple search queries to lookup notes[^2]
  - Use relative date awareness to add date filters to search queries[^3]

- Chat Director now does the following:
  1. [*NEW*] Use Search Actor to generate search queries from user's message
  2. Retrieve relevant notes from Knowledge Base using the Search queries
  3. Pass retrieved relevant notes to Chat Actor to respond to user

### Add Chat Quality Tests 
- Test Search Actor capabilities
- Mark Chat Director Tests for Relative Date, Multiple Search Queries as Expected Pass

### Give More Search Results as Context to Chat Actor
- Loosen search results score threshold to work better for searches with date filters
- Pass more search results (up to 5 from 2) as context to Chat Actor to improve inference

[^1]: Multi-Turn Example
Q: "When did I go to Mars?"
Search: "When did I go to Mars?"
A: "You went to Mars in the future"
Q: "How was that experience?"
Search: "How my Mars experience?"
*This gives better context for the Chat actor to respond* 
[^2]: Deconstruct Example: 
Is Alpha older than Beta? => What is Alpha's age? & When was Beta born?

[^3]: Date Example: 
Convert user messages containing relative dates like last month, yesterday to date filters on specific dates like dt>="2023-03-01"
2023-03-18 18:02:12 -06:00
Debanjum Singh Solanky
601ff2541b Revert to using GPT to extract search queries from users message
- Reasons:
  - GPT can extract date aware search queries with date filters
    better than ChatGPT given the same prompt.
  - Need quality more than cost savings for now.
  - Need to figure ways to improve prompt for ChatGPT before using it
2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky
e28526bbc9 Extract search queries from users message using ChatGPT as Search Actor
- Reasons
  - ChatGPT should be better at following instructions than GPT
  - At 1/10th the cost, it's much cheaper than using older GPT models
2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky
939d7731da Fix-up Search Actor GPT's response for decoding it as valid JSON 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f63fd0995e Pass more search results as context to Chat Actor to improve inference 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
10836dedee Search should return user message if GPT response is not valid JSON
Previously would throw if GPT response is not valid JSON. Better to
return original message to use for search instead
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
08f5fb315f Add answers to context for Search Actor to generate relevant queries
Update Search Actor prompt with answers, more precise primer and
two more examples for context

Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f09bdd515b Expect Chat Director can extract relative dates using new Search Actor 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
36c7389b46 Test Search Actor generating search query from Chat History 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
2600cc9d4d Test Search Actor extracting relative dates & multiple questions 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
45cb510421 Loosen search results score thresold used by chat for more context 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
d871e04a81 Use past user messages, inferred questions as context to extract questions
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
  extract complete questions
- This should improve search results quality

- Example Expected Inferred Questions from User Message using History:
  1. "What is the name of Arun's daughter?"
    => "What is the name of Arun's daughter"
  2. "Where does she study?" =>
    => "Where does Arun's daughter study?" OR
    => "Where does Arun's daughter, Reena study?"
2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky
1a5d1130f4 Generate search queries from message to answer users chat questions
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
   E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
   Answer time range based questions
   Limit search to specified timeframe in question using date filter
   E.g "What national parks did I visit last year?" adds
   dt>="2022-01-01" dt<"2023-01-01" to Khoj search

Note: Temperature set to 0. Message to search queries should be deterministic
2023-03-18 16:28:51 -06:00
Debanjum Singh Solanky
d0f14d3f85 Test usage of = in date filter queries 2023-03-16 14:52:59 -06:00
Debanjum Singh Solanky
dfb277ee37 Set skipif at module level if OpenAI API key not set for chat tests
- Remove stale message_to_prompt test
  It is too broad, reduces maintainability.
  Remove as it doesn't really need its own test right now
- Setting skipif at module level for chat actor, director tests
  reduces code duplication as earlier was using decorator on each chat
  test
2023-03-16 12:23:52 -06:00
Debanjum
e75e13d788 Create Tests to Measure Chat Quality, Capabilities
Create Rubric to Test Chat Quality and Capabilities

### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities

### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
   - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
   - Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
   - Measure quality at 2 separate layers
     - **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
     - **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message.  This is what the `/api/chat` API exposes.
   - Mark desired but not currently available capabilities as expected to fail <br />
     This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky
4e15b4e411 Create test notes dataset for chat testing
Combine hand-written custom notes and PG essays with personal
content to bulk up notes count

Delete old documentation markdown as not a representative dataset for
application (which is more tuned for personal notes)
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
1b4d562700 Test Chat Director Capabilities: Answer from notes, chat history etc
- Chat directors are broad agents.
  - Chat directors orchestrate narrow actor agents to synthesize
    final response for the user
  - Agents are Prompts + ML Model

- Test Chat Director Capabilities
  1. [X] Answer from retrieved notes
  2. [X] Answer from chat history
  3. [X] Answer general questions
  4. [X] Carry out multi-turn conversation
  5. [X] Say don't know when answer not in provided context
  6. [X] Answers that require current date awareness
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  7. [X] Date-aware aggregation across multiple different notes
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  8. [X] Ask clarification questions if no unambiguous answer in provided context
  9. [X] Retrieve answer from chat history beyond lookback window
     This test is expected to fail as the chat director is not capable
     of searching chat history yet. But the test allows assessing chat quality
 10. [X] Retrieve context for answer using multiple independent
         searches on knowledge base
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
b6d63137f1 Setup Pytest fixture for conversation processor to test chat API
- Index markdown test data as knowledge base. As easier to get good
  markdown content (vs org)
- Setup markdown_content_config, processor_config and chat_client to
  test chat API
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
3f719c9e17 Rename Chat Model+Prompt tests to chat actor tests 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
7526a50dd4 Extract conversation processor utility funcs from gpt.py into utils.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
7c4d546039 Configure tests to mark chat quality tests & filter unhelpful warnings
- Mark chat quality tests, register custom mark for chat quality
- Filter unhelpful deprecation warnings from within dateparser library
- Error if tests use unregistered marks
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
c1128a1ad8 Test Chat Actor Capabilities; ability to answer from notes, chat logs etc
- Chat actors are narrow agents (prompt + ML model)
  Chat actors are different from the Chat director. who orchestrates
  the narrow actor agents to synthesize final response to the user

- Test Chat Actor Capabilities
  1. Answer from retrieved notes
  2. Answer from chat history
  3. Answer general questions
  4. Carry out multi-turn conversation
  5. Say don't know when answer not in provided context
  6. Answers that require current date awareness
  7. Date-aware aggregation across multiple different notes
  8. Ask clarification questions if no unambiguous answer in provided context
     This test is expected to fail as the chat is not capable of doing
     this consistently yet. But having the test allows assessing chat quality

- Use Openai API Key from OPENAI_API_KEY environment variable
- Gitignore .env file, python virtualenv directory
  Put OpenAI API Key in .env file to run chatbot tests via vscode
  The .env file is default location for importing env vars
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
9306cd901a Clean up chat tests to work with updated chat methods in gpt.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
24ddebf3ce Make converse prompt more precise. Fix default arg vals in gpt methods
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
  answering
- Make GPT be more reliable in looking at past conversations for
  forming response
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
8609e3129e Fix, improve displaying chat messages, sources by Khoj in web interface
Pretty pretty json in conversation logs
2023-03-14 11:24:47 -06:00
Debanjum
6c0e82b2d6 Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
  - Previously was asking GPT to summarize the notes
  - Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
  - Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
  - Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now

## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
  - Allows Khoj to say don't know if it can't find answer to query from notes
  - Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky
cccd225247 Deduplicate and simplify logic to render chat message with reference 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
b9caad458e Type score_threshold with union, not |, to support python <3.10 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
198d9af8cf Update Readme to reflect Khoj Chat out of Beta 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
a71f168273 Move the chat API out of beta. Save chat sessions at 15min intervals 2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky
bcc0bed9db Upgrade bump_version script to handle release and post-release commit
- Updates version in khoj.el and Obsidian manifest, package, versions
  json files under interface and project root
- Create and tag release commit with updated files
- Creates commit with post-release version upgrade in files
- Use flags to specify whether to create a release or post-release commit
2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
8bb8824d0c Bump khoj versions in obsidian, emacs files 2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
e16d0b6d7e Open references notes used for chat on mobile too (by clicking)
Requires clicking the reference as hover doesn't work on mobile
2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky
c3c7b8a951 Make Khoj chat a separate Progressive Web App (PWA) for easier access 2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky
3838f9d8e3 Remove explicitly asking GPT to say I don't know in prompt for now
GPT still mostly says I don't know when answer not in notes or chats

But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky
f7b8cdd02e Log prompts being passed to GPT for debugging 2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky
2739a492b4 Log message metadata along with Khoj message instead of user message
References should be attached to khoj chat messsage rather than the
users message in the chat interface
2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
87d1e1341d Show reference notes used as response context in chat interface 2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
280061e1fa Do not deduplicate search results used for chat context
- Chat uses compiled form of search results, not the raw entries to
  provide context for chat. The compiled snipped search results
  themselves are unique and using multiple of them for context from
  the same raw note is fine if they cross the score and rank thresholds

  This should improve the context provided for chat

- Also apply score_threshold, no deduplication to the answers API
2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky
672f61529e Make getting deduped search results configurable via Search API 2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky
4fb628975c Fix jumping to note from Khoj Obsidian search modal result on Windows
- Issue
  The file path separator by khoj server and the Obsidian vault were
  different on Windows
- Fix
  Normalize file path to use forward slash(/) to find the matching
  note file in the Obsidian vault for jump to it

Resolves #177
2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky
b6cdc5c7cb Do not expose answer API as a chat type in chat web interface or API
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat

For now it is only exposed via API. Later it will be expose in the
interfaces as well

Remove ability to select different chat types from the chat web
interface as there is only a single chat type

Stop appending answers to the conversation logs
2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky
7f994274bb Support multi-turn conversations in chat mode
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context

- Make GPT provide reason when it can't answer the question. Gives
  user context to tune their questions
2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
d73042426d Support filtering for results above threshold score in search API 2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
45f461d175 Keep search results passed to GPT as context in conversation logs
This will be useful to
1. Show source references used to arrive at answer
2. Carry out multi-turn conversations
2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky
7cad1c9428 Only use past chat message, not session summaries as chat context
Passing only chat messages for current active, and summaries
for past session isn't currently as useful
2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky
ad1f1cf620 Improve and simplify Khoj Chat using ChatGPT
- Set context by either including last 2 chat messages from active
  session or past 2 conversation summaries from conversation logs

- Set personality in system message
- Place personality system message before last completed back & forth
  This may stop ChatGPT forgetting its personality as conversation progresses given:
  - The conditioning based on system role messages is light
  - If system message is too far back in conversation history, the
    model may forget its personality conditioning
  - If system message at end of conversation, the model can think its
    the start of a new conversation
  - Inserting the system message before last completed back & forth should
    prevent ChatGPT from assuming its the start of a new conversation
    while not losing personality conditioning from the system message

- Simplfy the Khoj Chat API to for now just answer from users notes
  instead of trying to infer other potential interaction types.
  - This is the default expected behavior from the feature anyway
  - Use the compiled text of the top 2 search results for context

- Benefits of using ChatGPT
  - Better model
  - 1/10th the price
  - No hand rolled prompt required to make GPT provide more chatty,
    assistant type responses
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
9d42b5d60d Use multiple compiled search results for more relevant context to GPT
Increase temperature to allow GPT to collect answer across multiple
notes
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
c3b624e351 Introduce improved answer API and prompt. Use by default in chat web interface
- Improve GPT prompt
  - Make GPT answer users query based on provided notes instead
    of summarizing the provided notes
  - Make GPT be truthful using prompt and reduced temperature
  - Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
  fit for default consumption yet
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
7184508784 Mention Python and Pip need to be installed in Main and Emacs Readme 2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky
211e460398 Output date filter from cache log at debug level. Remove unused imports
Other logs not directly useful to user have already been converted
to debug log levels in 1ae4016. Just forgot to convert this log line too
2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky
c823f46d89 Test error on missing fields in ContentConfig pulled from Khoj.yml
Resolves #9
2023-03-02 15:35:39 -06:00
Debanjum Singh Solanky
b6dbe4dd1d Do not try retrieve an unconfigured core content type in Config GUI
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config

Fallback to using default config for unconfigured core content type
instead

See #165 for details
2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky
1ae40163a9 Show user friendly information logs by default for context
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
  take time
- Convert all other info logs to be only shown in verbose mode
2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky
fe03ba3dce Index intro text before headings in org files
- Text before headings was not being indexed due to buggy orgnode
  parsing logic
- Resolved indexing intro text from files with and without headings in
  them
- Ensure intro text node has heading set to all title lines collected
  from the file

Resolves #165
2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky
ed177db2be Emojify step names in workflows. Stop publishing to TestPyPi from PR 2023-03-01 10:56:39 -06:00
Debanjum Singh Solanky
7ad251b8ef Log and Continue on OSError while collating dates for date filters
Log to understand if error, date can be handled better
Mitigates #172
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
2bed4c3b50 Fix configuring search types & /config/types API when no plugin configured
- Test /config/types API when no plugin configured, only plugin configured
  and no content configured scenarios
- Do not throw null reference exception while configuring search types
  when no plugin configured
- Do not throw null reference exception on calling /config/types API
  when no plugin configured

Resolves bug introduced by #173
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
8914dbd073 Fix creating GUI panels for unconfigured search, processor types
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`

Bug:
- Unconfigured processor and search_types are instantiated as None in
  self.current_config
- While creating the desktop GUI, these null configs are attempted to
  be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors

Fix:
Use default config to create their GUI elements for unconfigured
search and processor types

Resolves #167
2023-03-01 01:20:58 -06:00
Debanjum
e77a5ffc83 Merge pull request #173 from debanjum/enable-creating-content-plugins
## Enable Creating Content Plugins

### Goal
Index, Search text content not supported by default in Khoj using plugins

### Code Changes
- fcbbe8c Configure content plugins to index using `khoj.yml`
- Index content plugins from standardized JSONL format for ingestion
  - 55a032e Add jsonl processor to index plugin content
  - ab0d3a0 Index configured plugins on app start and via update API endpoint
- Expose plugin content types for usage by interfaces
  - 47b58a2 Dynamically update available types on loading the Khoj server
  - Expose indexed types via API (9d38ead). Simplify getting enabled types in Web (f3f2438), Emacs (1e43f1a) interfaces
- Search plugin content from the Web and Emacs Interfaces
  - d91c7e2 Search plugin content via the search API
  - Render plugin content on Web (88344f9) and Emacs (c2814fc) interfaces
    - The Web, Emacs interfaces are general interfaces, they allow searching across all content types
    - The Obsidian interface is currently tuned for only markdown content
      It will be extended to render more content plugins later

### Testing
- fcbbe8c Add unit tests to test reading plugin config from khoj.yml
- 55a032e Add unit tests for the `JsonlToJsonl` processor
- 88a9ead Add unit tests to validate search, incremental update, force-update API works with plugin content types
- b09350c Add unit test to validate only configure search types returned by the new /api/config/types API endpoint
- Manually test the config read, indexing, search and update with local khoj
2023-02-28 22:23:25 -06:00
Debanjum Singh Solanky
b09350c052 Fix to return only enabled content types via the new config/types API
- Previously was return all core content types even if they had not been
  setup
- Add test to validate only configured content types are returned by
  the api/config/types API endpoint
2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky
b177adf3a7 Return value of search_type in /config/type API endpoint
- Remove need for interfaces to downcase content types returned by API
  before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky
ede6eb6879 Re-enable testing search and update API with image content type
It may have been disabled due to issues with image search earlier
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
88a9eadfba Use client pytest fixture to test API with plugin type configured 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
ab501a56c9 Create pytest fixture to configure app with plugin, search types 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
f944408e69 Update content_config pytest fixture to index plugin content 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
88344f9ed2 Improve rendering search results of plugin content types on web interface
Render only the entry from plugin search response instead of raw json
Use the results-ledger styling for results-plugin styling
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
c2814fce58 Improve rendering search results of plugin content types in khoj.el
Render only the entry from plugin search response instead of raw json
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
f3f24387ec Use new config/types API to set enabled content types on web interface 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
1e43f1a12e Use new config/types API to set enabled content types in khoj.el menu 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
9d38eadd42 Return enabled content types via api/config/types API endpoint
Simplifies dynamically populating enabled content types for interfaces
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
68bd5d9ebc Configure API routes after set up search types while configuring server
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
d91c7e2761 Search for plugin content via the search API 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
47b58a2a4d Configure, use dynamically instantiated SearchType enum on app start
The SearchType is now dynamically populated with core and configured
plugin types

Use the new dynamic SearchType enum from state.py across codebase
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
ab0d3a08e2 Index configured plugins on app start and via update API endpoint 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
55a032e8c4 Add processor to index entries from jsonl files for plugins
- Read, merge entries from input jsonl files and filters
- Mark new, modified entries for update
2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky
fcbbe8c759 Read content plugin configs from Khoj config YAML
Configure external text content plugins via the Khoj YAML
Reuse existing TextContentConfig definition for external text content plugins
2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky
f57d7bf5ad Use pypi khoj to fix docker builds and dockerize github workflow
- Instead of building the package locally like before
  The issue started since moving to dynamic git based versioning with hatch-vcs
  This should reduce image size of docker builds too

- Also move to ubuntu image since pyqt6 builds available on it, so do
  not need to build it locally for image

- This s
2023-02-19 01:57:01 -06:00
Debanjum Singh Solanky
fada617faa Fix TOC links, Add how to auto start Khoj server to Readme
Rename tools directory to more standard scripts directory
2023-02-18 23:51:02 -06:00
Debanjum Singh Solanky
61b6ee2857 Use helper script to bump khoj pre-release versions 2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky
47c2cc63e1 Automate uploading Obsidian artifacts to new releases 2023-02-17 19:57:44 -06:00
Debanjum Singh Solanky
a8940462c4 Automate khoj python package versioning using hatch-vcs and Git tags 2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky
053d6141f3 Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin 2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky
47569da38e Fix usage of "\" in orgnode test string to resolve DeprecationWarning 2023-02-17 17:15:44 -06:00
Debanjum Singh Solanky
36be3c4b8f Fix or ignore MyPy issues in PyQt desktop GUI code
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
  different errors in different environments (venv, system and pre-commit)
2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky
fd0a2f55f8 Run mypy checks in test workflow and on push (via pre-commit)
- Run mypy on git push (not every commit) but for all files
  - Running it on pre-commit, doesn't make sense as mypy wants to look
    at all files, not just diff files
  - But this is too time consuming to run every commit, so run on push

- Update development section documentation on installing, manually
  running pre-commit for validation that includes running mypy checks
2023-02-17 16:08:56 -06:00
Debanjum Singh Solanky
5c0d340970 Update Development section in Readme. Add steps for code validation 2023-02-17 13:31:37 -06:00
Debanjum Singh Solanky
051f0e3fb5 Add, configure and run pre-commit locally and in test workflow 2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky
5e83baab21 Use Black to format Khoj server code and tests 2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky
6130fddf45 Install pytest as optional dev dependency of app in test workflow 2023-02-17 10:11:57 -06:00
Debanjum Singh Solanky
8b293edd7c Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues 2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky
7a9a811874 Fix authors, homepage URL in pyproject.toml and workflow triggers 2023-02-16 03:19:56 -06:00
Debanjum Singh Solanky
dcb86c2d3e Build khoj python package using hatchling, pyproject.toml
- Why
  - pyprojects.toml is the python standards compliant config format
    - allows collating python tooling configs into single standard file
  - hatch(-ling) is a new lightweight build system for python packages

- Detailed Changes
  - Replace setup.py, setuptools with pyproject.toml, hatchling for
    khoj python config and build
  - move pytest into optional development dependencies
  - add more links to khoj in the project urls section
  - add topic classifiers and keywords to find khoj package

  - Delete setup.py, MANIFEST.in as moved to pyproject.toml based setup
  - Update pypi workflow to set python package version in pyproject.toml
2023-02-16 02:37:32 -06:00
Debanjum Singh Solanky
c641eb4ad6 Improve rendering log and error stacktraces using the Rich package
- Use Rich to render uvicorn, fastAPI logs as well
  The previous CustomFormatter only worked on khoj logs
- Improve rendering stacktrace on errors using Rich
2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky
a403def19e Fix workflow to publish Khoj python package to PyPi 2023-02-14 22:19:21 -06:00
Debanjum
eee57599ad Improve Dockerize, Publish to PyPi Workflows
- fb86dea Create tagged Docker image on new tag/release
- 01fd98b Improve workflow to publish khoj to pypi
2023-02-14 21:11:56 -06:00
Debanjum Singh Solanky
af6d65a909 Create tagged Docker image on new tag/release 2023-02-14 20:04:06 -06:00
Debanjum Singh Solanky
25e06f26c0 Improve workflow to publish khoj to pypi
- Use emoji's to improve visual indicator of action step
- Rename to pypi instead of the more ambiguous publish name
  Publish could mean publish docker image, publish to pypi, MELPA or
  Obsidian plugin
- Update workflow badge, link pypi badge to khoj pypi package page
- Use pypa official github action to upload package to (test) pypi
  instead of doing it manually using twine
- Upload python package artifact for easier access for testing.
  As uploading to testpypi doesn't work for PRs by others from forked repos
2023-02-14 20:03:35 -06:00
Debanjum
11873795a6 Use src layout to fix packaging khoj for pypi
### Issue
The khoj python package was using a common top level name[1], `src' instead of `khoj' due to incorrect usage of the src layout[2]

### Fix
Put content meant for python packaging from `src/' to `src/khoj/'
Update code, tests, configs and docs to reference new layout

The `khoj' python package should now get unpacked under `khoj' instead of `src' directory

### Details
- 25a749c Use the src/ layout to fix packaging Khoj for PyPi
- bc7477e Move Emacs, Obsidian plugin code out from under src/khoj directory
- f83cf4e Check wheel contents in workflow before publishing Khoj to PyPI

[1]: https://github.com/jwodder/check-wheel-contents#w005--wheel-contains-common-toplevel-name-in-library
[2]: https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/
2023-02-14 16:26:07 -06:00
Debanjum Singh Solanky
e76c285bdc No need to prune plugins as not included in pypi package.
Mention Obsidian as supported Interfaces in Readme
2023-02-14 16:15:40 -06:00
Debanjum Singh Solanky
bc7477ea3e Move Emacs, Obsidian plugin code out from under src/khoj directory
- What
  - The Emacs and Obsidian interfaces stay in their original
    directories under src/
  - src/khoj now only contains code meant for pypi packaging

- Benefits
  - This avoids having to update khoj MELPA, Obsidian plugin config as
    the Emacs, Obsidian code is under their original directories
  - It separates the code in src/khoj meant for python packaging from
    code for external interfaces like Emacs and Obsidian
2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky
f83cf4ebc6 Check wheel contents in workflow before publishing it to PyPI 2023-02-14 15:20:44 -06:00
Debanjum Singh Solanky
25a749ca1d Use the src/ layout to fix packaging Khoj for PyPi
- Why
  The khoj pypi packages should be installed in `khoj' directory.
  Previously it was being installed into `src' directory, which is a
  generic top level directory name that is discouraged from being used

- Changes
 - move src/* to src/khoj/*
 - update `setup.py' to `find_packages' in `src' instead of project root
 - rename imports to form `from khoj.*' in complete project
 - update `constants.web_directory' path to use `khoj' directory
 - rename root logger to `khoj' in `main.py'
 - fix image_search tests to use the newly rename `khoj' logger
 - update config, docs, workflows to reference new path `src/khoj'
2023-02-14 15:19:06 -06:00
Debanjum Singh Solanky
cc31cd070d Enable the publish workflow for PRs created in the main repo
The publish workflow was previously disabled for PRs in commit
d1945c5ba8
2023-02-14 13:51:31 -06:00
Debanjum
84322b2a45 Demo using Search in Khoj Obsidian Plugin 2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky
a4dcb20622 Add setting to toggle auto configuring of khoj backend from Obsidian
- By default the obsidian plugin automatically configures the khoj
  backend to index the current vault
- For more complex scenarios, users can manage their ~/.khoj/khoj.yml
  manually by toggling the auto-configure setting off in the khoj
  plugin settings

Resolves #156
2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky
24aa696ef5 Indicate indexing active on Update button in Obsidian plugin settings
Use moon rotating through phases to indicate notes indexing in progress

Resolves #129
2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky
11517ba8eb Encode jsonl data as utf8 for gzip write for consistent read/write encoding
Should help with issue #89
2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky
c156b3e087 Remove sub-dependencies from setup.py. Upgrade sentence-transformer
- setup.py best practise recommends only specifying core dependencies,
  not dependencies of core dependencies in it

- Latest sentence-transformer (version 2.2.2) correctly installs its
  huggingface_hub dependency. Else application fails to start
2023-02-12 10:42:05 -06:00
Debanjum Singh Solanky
3ec41c4d64 Wrap lines for org, markdown results in khoj search results buffer 2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky
d1945c5ba8 Do not run publish workflow for PRs as forks do not have auth token 2023-02-12 07:31:24 -06:00
Debanjum Singh Solanky
9a013ec48f Add more details to setup Khoj backend in Obsidian plugin readme 2023-02-12 07:31:13 -06:00
Debanjum
24c553877c Merge pull request #152 from axelson/fix-obsidian-doc-link
Fix link to Obsidian plugins doc in Khoj Obsidian Readme
2023-02-10 22:20:06 -06:00
Jason Axelson
6d5930363a Fix obsidian plugins doc link
Also make it more obvious where the link is going, initially I thought
the link was to another official khoj documentation site.
2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky
215235efd2 Bump khoj pre-release version 2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky
55e4fa9719 Fix indentation in workflow yaml for testing khoj backend 2023-02-07 02:59:46 -03:00
Debanjum Singh Solanky
2445664d40 Deprioritize searching for Music content over other text content 2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky
2e052913b6 Search in first configured content type when no search type set
Instead of searching through all configured content types but only
returning results of the last configured content type
2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky
a26ab31d20 Allow chat with markdown notes if no org-mode content configured 2023-02-07 02:41:31 -03:00
Debanjum
99a03da3f7 Read Markdown file as utf8 instead of the default encoding used by OS
### Background
  1. Obsidian stores markdown notes as `utf8`[1]
  2. By default, the python `open` command uses the OS locale encoding[2]

### Issue
  Based on above background, if the OS locale encoding isn't `utf8` it causes the `UnicodeDecodeError: <locale_encoding> codec can't decode byte` error

### Fix
  - Read markdown files as `utf8`
    The Obsidian plugin is the main use-case for markdown files in khoj currently and that stores md files as `utf8`.
    Do not assume utf8 for other content types like org-mode, beancount for now.
  - Fail if error in reading file as utf8, instead of ignoring errors.
    Would rather have user realize that their files are not going to get indexed correctly.

[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
2023-02-07 01:46:42 -03:00
Debanjum Singh Solanky
d3e82b918f Make Khoj require python version below 3.11 until PyTorch works with it
Closes #128
2023-02-06 23:11:51 -03:00
Debanjum Singh Solanky
c11f7b47e4 Update workflow to run backend tests for all supported python versions 2023-02-06 21:05:34 -03:00
Debanjum Singh Solanky
11a18cc452 Update khoj docker config to index sub directories for text content
- Khoj supports indexing subdirectories but the khoj docker config
  wasn't updated to support the same
- This should also allow khoj docker users to index multiple separate
  directory trees by mounting them into separate sub folders within
  /data/<content-type>/.
  For e.g /data/org/dir1, /data/org/dir2 etc in khoj_docker.yml
2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky
fbb7747dcc Read Markdown file as utf8 instead of the default encoding used by OS
- Background
  1. Obsidian stores markdown notes as utf8[1]
  2. By default, the python `open' command uses the OS locale encoding[2]

  This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error

- Fix
  - Read markdown files as utf8
    The Obsidian plugin is the main use-case for markdown files in
    khoj currently and that stores md files as utf8.
    Do not assume utf8 for other content types like org-mode, beancount for now.
  - Fail if error in reading file as utf8, instead of ignoring errors.
    Would rather have user realize that their files are not going to
    get indexed correctly.

[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky
66dca6cf33 Add Docs to Search across Languages, Uninstall Khoj to Readme
Add details and fixes to Obsidian, Main readme
based on feedback, confusion from the Obsidian plugin announcement
2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky
cba9a6a703 Use List, Tuple, Set from typing to support Python 3.8 for khoj
Before Python 3.9, you can't directly use list, tuple, set etc for
type hinting

Resolves #130
2023-02-06 01:23:52 -03:00
Debanjum
14f28e3a03 Mention Emacs, Obsidian plugins at top of main Readme
Add badges for supported plugins at top of main readme.
Link badges to plugin docs for easy navigation for plugin users from main readme/project root
2023-01-28 18:01:20 -08:00
Debanjum Singh Solanky
f26cee604d Update Khoj Plugin Install Instructions. Rename main Readme to README
Khoj plugin page from within Obsidian isn't recognized. Seems like it
needs an uppercase readme file only. So it doesn't show the Khoj
readme from within Obsidian itself.
2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky
2e13e15625 Ensure markdown entries in khoj.el results separated by empty line
- Update khoj.el test to reflect updated rendering logic
- Move ledger render function before image rendered to group functions
  with similar logic closer
2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky
85ae46f429 Use thread_last to make results rendering funcs more readable in khoj.el 2023-01-26 18:59:44 -03:00
Debanjum
a8ab9448da Resolve Khoj Obsidian Plugin feedback
### Details
- b415f87 Split find and jump to notes code in `onChooseSuggestion' method
- 37063f6 Truncate query to 8k chars for find similar notes from Obsidian plugin
- 4456cf5 No need to use `then' or `finally' in `async' functions after an `await'
- 4070be6 Pass app object from plugin instance to child objects and functions
- c203c6a Use Sentence case for Find similar mote Obsidian command name
2023-01-26 18:54:33 -03:00
Debanjum Singh Solanky
b415f87093 Split code in onChooseSuggestion method to make it more readable
Split find file, jump to file code to make onChooseSuggestion more readable
- Use find, instead of using return in forEach to get first match
- Move the jump to file+heading code out from forEach
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
37063f6a38 Truncate query to 8k chars for find similar notes from obsidian plugin
Truncate current file data passed to khoj backend API via query string
below default query size supported by popular servers
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
4456cf5c8f No need to use then or finally in async functions after an await 2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
4070be637c Pass app object from plugin instance to child objects and functions
Do not reference global app object from child objects and funcs
directly.

It is only available for debugging purposes and access to it maybe
dropped in the future.
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
c203c6a3fd Use Sentence case for Find Similar Note command name in Khoj Obsidian 2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
e18124ef6f Add badge for tests and update project subtitle in khoj.el Readme 2023-01-23 20:52:03 -03:00
Debanjum
477ef28e08 Create and Automate Tests for Khoj.el on Emacs
- Use ERT to test `khoj.el'
- Test extracting and rendering of Org, Markdown and Ledger entries from Khoj API response
- Automate `khoj.el' testing using Github workflow
- Fix, Simplify and Test the get text around point code for the "Find Similar" feature
2023-01-23 20:40:18 -03:00
Debanjum Singh Solanky
f9fb58aec3 Automate khoj.el testing using Github workflow
Install transient.el dependency as it is not available by default
before Emacs 28.1
2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky
86e808abfb Test get-current-text helpers for Find Similar feature in khoj.el 2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky
be6acda212 Create khoj.el tests. Test rendering results of each content types 2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky
0d0bf3b5aa Simplify get-current-text functions for Find Similar in khoj.el
Use existing functions like `string-trim', `thing-at-point' and
remove unneeded code from the two functions
2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky
07e9e4ecc3 Get current paragraph text when point at start of paragraph in khoj.el
Previously if cursor was at start of current paragraph, it would get
text for the current and next paragraph, instead of just the current one
2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky
a0b03c8bb1 Get current entry text when point at heading for Find Similar in khoj.el
Previously if cursor was at heading of current entry, it would find entries
similar to the previous outline heading, instead of the current one
2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky
013c7c10a4 Bump khoj pre-release version 2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky
ad3c9b5f44 Bump khoj version to 0.2.5 in preparation for release 2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky
9ed056c7e7 Use consistent indentation in Khoj Emacs Readme 2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky
0980c6e87f Update Emacs Usage section in Readme. Add find-similar, menu usage 2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky
6908b6eed3 Truncate image queries below max tokens length supported by ML model
This would previously return the infamous tensor size mismatch error
Verify this error is not raised since adding the query truncation logic
2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky
3d9ed91e42 Search by image at path only if query of form "file:/path/to/image"
Previously no query syntax helpers, like the "file:" prefix, were used
before checking if query contains file path.

This made query to image search brittle to misinterpretation and
pointless checking

Add test to verify search by image at file works as expected
2023-01-21 14:06:56 -03:00
Debanjum
655ef11653 Find Similar Notes, Transactions, Images from Khoj in Emacs
### Overview
Find items of specified type similar to current text item at point

### Capabilities
- Support querying with text surrounding point in any text buffer
- Find similar items of specified content type indexed on Khoj

### Details
- Query using text in current section if in a `outline-mode` buffer (i.e markdown heading, org-mode entry text)
- Query using text in current paragraph if in non `outline-mode` buffer
- Search for items of `content-type` set in khoj transient menu
- Update last used khoj content-type and results from the
  *find-similar* and *update* functions for later reuse

### Related
- Recently added [Find Similar Notes in Khoj Obsidian](https://github.com/debanjum/khoj/pull/122) as well
2023-01-20 22:44:28 -03:00
Debanjum Singh Solanky
b7aa22a059 Change order of arg passed to query-api-and-render-results by importance 2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky
936a88fa7e Find items of specified type similar to current text item at point
- Support querying with text surrounding point in any text buffer
  Previously could only find items similar to org entry at point

- Find similar items of specified content type indexed on khoj
  Previously only looked for similar org entries indexed on khoj

  Now uses the content-type configured in khoj transient menu to find
  items of the specified content type

- Details
  - Generalize the get-current-org-entry-text func to get text for any
    outline section
  - Replace leading whitespaces from query text as well
  - Create method to get current paragraph text from non-outline mode
    buffers
  - Update transient, find-similar funcs to pass, use content-type
    configured in khoj transient menu
  - Generalize query title creation logic to remove markdown headings
    prefix (#) apart from org heading prefix (*) as well
  - Update last used khoj content-type and results from the
    find-similar and update funcs for later reuse
  - Jump to top of results buffer after results rendered
2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky
17aaadea1f Find notes similar to current org entry at point 2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky
44bbc0a417 Add section separators to khoj.el for easier code traversal 2023-01-19 23:36:54 -03:00
Debanjum
7516435a0b Automate khoj.el build and quality checks
- 9f0bd0a Build `khoj.el' and Run `package-lint', `checkdoc' and other melpa package quality checks
- 48ad3c5 Use default content types if fail to call backend on `khoj.el` load
2023-01-19 20:21:55 -03:00
Debanjum Singh Solanky
48ad3c535e Use default content types if fail to call backend on khoj.el load
Do not want khoj.el to fail on init/load if khoj backend not running
2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky
9f0bd0a361 Add Github workflow for khoj.el build and quality checks
Add khoj.el build badge to khoj.el Readme
2023-01-19 20:13:19 -03:00
Debanjum
b58dd82141 Use Transient Menu to Improve Khoj.el Interface
- 5f446b1 Convert `khoj' entry point method to transient.el menu for richer configuration
- 9d64a00 Allow updating khoj content index from within `khoj.el'
2023-01-19 03:11:23 -03:00
Debanjum Singh Solanky
0dd1cba272 Rename configuration sections in khoj.el transient menu 2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky
5d0f369186 Add ability to quit khoj transient with standard q keybinding 2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky
87c7cf4272 Use single khoj func as entrypoint. Group khoj.el code into sections
- Give more relevant, specific name to khoj suffix commands
- Remove `khoj-simple'. Have single `khoj' function for entrypoint
2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky
9d64a009fd Allow updating khoj content index from within khoj.el
- Split transient config menu by type
2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky
a8d0c7d905 Rename search type to more apt content type in khoj.el 2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky
00daea16df Allow setting default-search-type to image. Make docstrings compact 2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky
216b17cfd0 Dynamically populate content type choices when khoj transient invoked 2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky
5f446b1440 Convert main khoj.el entrypoint into transient menu for richer configuration 2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky
5c07dcd219 Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section 2023-01-18 00:22:26 -03:00
Debanjum
b7fc344be1 Search for Similar Notes from Obsidian Plugin
Enable searching for notes similar to the current note being viewed

## Main Changes
- 39a18e2 Extend search modal to search for similar notes
  - Hide input field on init, Trigger search on opening modal when in similar notes mode
  - Set input to contents of current markdown file and get notes similar to it
  - Re-rank, by default, when searching for similar notes
  - Filter out current note from similar note search results
- 0bed410 Only show `Find Similar Note' command in Editor
2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky
6119d0a69e Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme 2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky
657e455785 Remove unused `onunload' method in main.ts of khoj obsidian plugin 2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky
0bed410712 Limit Find Similar Note command to be triggered from Editor
Fixup indentation and comments
2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky
39a18e2080 Add ability to search for similar notes in Khoj Obsidian
- Hide input field on init, Trigger search on opening modal in similar notes mode
- Set input to current markdown file and get similar notes to it
- Enable rerank when searching for similar notes
- Filter out current note from similar note search results
2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky
ffaef92476 Encode query string before passing as query param to search API 2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky
d5a7cc5b0f Compact code to map results from search API into SearchResult objects
Make code compact for readability
Remove unneeded temporary variables and return statements
2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky
8ab7a26bde Update Khoj on Obsidian screenshots in Main and Plugin Readme
- Screenshot querying "Setup Editor" on test vault with Khoj Readmes
- New features showcase:
  - information keybindings, rerank keybinding at bottom of modal
  - fixed top level headings in search results
  - search results snipped if greater than N words
2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky
7b4f78776c Fix extracting Markdown Entries with Top Level Headings
- Previously top level headings would have get stripped of the
  space between heading text and the prefix # symbols. That is,
  `# Top Level Heading' would get converted to `#Top Level Heading'
- This would mess up their rendering as a heading in search results

- Add unit tests to text_to_jsonl processors to prevent regression
2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky
1a296518c5 Limit total words for each Search Result rendered in search modal
Provides a more consistent rendering of results in modal.
Makes it easier to see more results in modal.
To see complete entry, user can always just jump to entry from modal
2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky
e7b89f7fd0 Return compiled entry in additional details of /api/search response
This can be used to highlight portion of raw entry to highlight and
for passing to summarizer to stay with max_tokens limit supported by
GPT models
2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky
7071d081e9 Increase max_tokens returned by GPT summarizer. Remove default params 2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky
3d9cdadbbb Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme 2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky
d02ba325aa Handle empty chat history returned by API to chat.html on web interface 2023-01-15 13:51:16 -03:00
Debanjum Singh Solanky
721bbbe15c Update Readme. Add Chat with Notes Section to Advanced Usage
- Add Setup OpenAI API key in Khoj Section to Miscellaneous
  Refer all mentions of setting up your OpenAI API key to that section
- Add Demo Screenshot of Chat with Notes
- Put existing Miscellaneous Section under Beta API sub heading
- Fix to make Access Khoj on Mobile a Subsection of Advanceed Usage

- Trigger refresh of github image cache by adding ? at end of image paths
2023-01-14 00:39:15 -03:00
Debanjum Singh Solanky
42f8230b37 Update Troubleshooting Section in Main Readme
- Convert Troubleshooting Issues into Headings instead of Bullets
  Allows them to be linked to more easily. E.g when pointing folks to
  it in github issues etc

- Add index corruption issue and fix to the Troubleshooting section
2023-01-13 23:03:15 -03:00
Debanjum
3f2ea039a7 Add Chat page to the Khoj Web Interface
### Overview
- Provide a chat interface to engage with and inquire your notes
- Simplify interacting with the beta `chat` and `summarize` APIs

### Use
- Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize
- Type your queries, see summarized response by Khoj from your notes

**Note**:
 - **You will need to add an API key from OpenAI to your khoj.yml**
 - **Your query and top note from search result will be sent to OpenAI for processing**

## Details
- 177756b Show chat history on loading chat page on web interface
- d8ee0f0 Save chat history to disk for persistence, seeing chat logs
- 5294693 Style chat messages as speech bubbles
- d170747 Add khoj web interface and chat styling to new chat page on khoj web
- de6c146 Implement functional, unstyled chat page for khoj web interface
2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky
16d4560ff8 Comment css styling of chat page for later reference 2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky
cfef346d03 Do not update query field to ever chat message
It doesn't work as well with chat, unlike for search page
Use more appropriate thinking face emoji for you instead of surprise face
2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky
177756be7e Fetch chat history from backend and render it on chat page load 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
330febaa1a Update conversation logs from /beta/summary API endpoint too 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
cb6f0b53c9 Make user_message_metadata arg to message_to_log in gpt.py optional
- Use a default user_message_metadata if arg not set
- Update conversation to use `by' as `you' and `khoj'
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
cc2456e411 Update /beta/chat API to return chat history if no query param passed 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
d8ee0f0e9a Use scheduler to save chat history to disk every 5 minutes
- The previous mechanism to trigger saving on shutdown event did not work
- Use scheduler to persist chat sessions to disk at a 5 minute interval
  - This improve time granularity, fixed interval of saving chat logs
  - It may lose ~5 minutes of chat history until mechanism to also
    write on shutdown found/resolved
- Create conversation directory if it doesn't exist before attempting write
- Reset chat_session after writing it to disk
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
5294693e97 Style message as speech bubbles on chat page of web interface
- Wrap messages into speech bubbles
  - Color messages by khoj blue, sender grey
  - Add those standard protrusions to the speech bubbles for fun

- Align bubbles left or right based on sender
  - messages by khoj are left aligned, message by self are right aligned

- Put message metadata like sender and time under speech bubble
  - use data-* attribute and ::after css pseudo-selector for this

- Update renderMessage func to accept time param, remove unused type_ param
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
7723d656dc Do not force GPT to summarize note using past tense
Not all notes are in the past. Notes can be about stuff in the future.
Casting them to past tense gives the impression that they've already
happened / been done.
2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky
2842e3a035 Automatically scroll to bottom of chat body on new messages 2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky
34014635d0 Improve colors, fix contrast for accessability on web interface
- Changes
  - Use blue color for khoj heading font
    - This fixes the title color issue

  - Update background to lighter shade
    - This fixes the body text color issue

  - Update colors for todo, done, miscellaneous todo state, tag color
    - This does not fix the color contrast issue but seems like an acceptable solution
    - Using white text rather than black text on blue background
      better even though the black text on blue background passes the
      WCAG acceptable contrast score
    - For details see blog post:
      https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/

  - Add border to tags to give them tag pills look and differntiate
    from todo states

  - Buttons and inputs
    - Change background color of input fields like type dropdown,
      update button and results count counter, to match background
      color of page
    - Add shadow on hover over button, dropdowns

Resolves #111
2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky
d170747ec2 Add khoj web interface & chat styling to new chat page on khoj web
- Ensure message input box sticks to bottom of screen
- Ensure chat logs div is scrollable when logs become longer than screen
  Do not make the whole page scroll, just the chat logs body div
2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky
de6c146290 Implement functional, unstyled chat page for khoj web interface
Expose it at /chat URL
2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky
f0213d0a82 Fix links to install khoj.el readme from main readme 2023-01-12 02:25:00 -03:00
Debanjum Singh Solanky
e6793816f9 Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections
- Update Query filter details
2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky
2fe21f3a78 Update Advanced Usage section in main Readme
- Update Khoj PWA image to show Khoj open as PWA on Android
- Add section to show configuring Khoj to use OpenAI models for search
2023-01-12 01:49:12 -03:00
Debanjum Singh Solanky
26f791e9ad Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text
- Fold Query Filter, Demo Description
- Add Limitations to Readme
- Add *Update* index bullet to Troubleshooting Options
2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky
3e63af5c94 Constrain grid rows to fix layout of Khoj web interface on Chrome 2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky
a31002bf38 Revert obsidian plugin manifest, versions at project root to 0.2.1 2023-01-11 20:54:12 -03:00
Debanjum Singh Solanky
50c797962c Jump to Search Result from Khoj Modal even on Obsidian Android
Uses longest file path match to find markdown file in vault
corresponding to file of search result returned by Khoj

Allow jumping to search result from khoj plugin modal on Android too
2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky
51ea6d9c9b Do not force index update when configure backend on plugin load
- Backend can handle incremental updates
- Avoid khoj usability delay by avoiding recomputed everytime vault opened
2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky
3fe5ce2721 Merge branch 'master' of github.com:debanjum/khoj 2023-01-11 17:02:30 -03:00
Debanjum
e28af68cbd Fix, Improve Configuring Khoj from Obsidian Plugin
### Details
- 1c813a6 Convert *Results Count* setting to `Slider` from `Text` in plugin settings pane
- 4e1abd1 Disable `Update` button in plugin settings while indexing vault
- 513c86c Set index file paths relative to current or default path on Khoj backend
- 4407e23 Only index current vault on Khoj. Remove `ObsidianVaultPath` setting from plugin
- 86a1e43 Return HTTP Exception on */api/update* API call failure
- 5af2b68 Update plugin notifications for errors. Remove notification for success
2023-01-11 17:01:33 -03:00
Debanjum Singh Solanky
123b077c68 Use apt update before apt install in test workflow on Github 2023-01-11 16:51:16 -03:00
Debanjum Singh Solanky
5996d47d7c Trigger input event to Get, Render Reranked results from Khoj backend
Previous mechanism of manually triggering getSuggestions,
renderSuggestions flow was corrupting traversing and opening
reranked search results in KhojModal

Emulate event that would anyway trigger the get & render of results in
modal. This lets obsidian core handle the flow without digging too
deep into obsidian cores handling of the flow. Lowers the chance of
breakage
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
1c813a6884 Convert results count setting to slider in plugin settings pane 2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
4e1abd1b72 Disable update button while indexing vault in plugin settings 2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
513c86c6a1 Set index file paths relative to current or default path on khoj backend
We need the index file paths to make sense on the khoj backend server

Having path of index on backend relative to current vault directory
on frontend ignores the fact that the frontend maybe on a different
machine than the khoj backend server

Using unique index name per vault allows switching vaults without
overwriting indices of other vaults created on khoj backend when khoj
obsidian plugin is loaded on opening a different vault
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
4407e23c19 Only index current vault on Khoj. Remove plugin setting to configure it
- Overview
  Limits using Khoj with a single vault at a time. This is
  automatically configured to the most recently opened vault.

  Once directory filters are supported on backend, the plugin will be
  updated to index multiple vault but search only current vault from
  current vaults khoj obsidian plugin

- Code Details
 - Remove setting to configure Vault directory from Khoj Obsidian plugin
 - Automatically configure Khoj to index only current Vault.
 - Overwrites any previous vaults that were intended to be indexed by
   Khoj backend
 - Force update of index after configuring vault

- Why
  It's not helpful for now and can lead to more problems, confusion.
  Once directory filters
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
86a1e43605 Return HTTP Exception on /api/update API call failure
- Previously the backend was just throwing backend error.
  The frontend calling the /update API wasn't getting notified
- Now the frontend can react appropriately and make the issue
  visible to the user
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
5af2b68e2b Update plugin notifications for errors and success
- Only show notification on plugin load and failure.
- In settings page, set current backend status at top of pane instead
  of showing notification
  Notices bubbles cluttered the UI while typing updates to settings
- Show notification once index updated via settings pane button click
  There was no notification on index updated, which usually takes time
  on the backend
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
853192932a setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs 2023-01-10 23:36:02 -03:00
Debanjum
531d423715 Enhance Search Modal, Error State Handling in Khoj Obsidian Plugin
### Search Modal Enhancements
  - b52cd85 Allow Reranking results using Keybinding from Khoj Search Modal
  - 580f4ac Add hints to Modal for available Keybindings
  - da49ea2 Add placeholder text to modal in Khoj Obsidian plugin
  
### Handle Failure to Connect to Khoj Backend
Load plugin but warn on failure to connect to Khoj backend

- f046a95 Track connectedToBackend as a setting. Use it across obsidian plugin to:
  - Disable command if not connected to backend
  - Trigger warning notice on clicking Khoj ribbon if not connected to backend
  - Show warning at top of Khoj Obsidian plugin settings pane
- 768e874 Load obsidian plugin even if fail to connect to backend but show warning
  - Allows user to see reason for failure to try resolve it
  - Allows user to update Khoj URL settings to point to URL of Khoj server
  
### Miscellaneous
- 7991ab7 Add button in Obsidian plugin settings to force re-indexing your vault
  - Useful if index gets corrupted
2023-01-10 23:20:32 -03:00
Debanjum Singh Solanky
da49ea272c Add placeholder text to modal in Khoj Obsidian plugin 2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky
580f4aca23 Add hints to Modal for available Keybindings 2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky
b52cd85c76 Allow Reranking Results using Keybinding from Khoj Search Modal 2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky
7991ab7a86 Add button in Obsidian plugin settings to force re-indexing your vault 2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky
f046a95f3d Track connectedToBackend as a setting. Use it across obsidian plugin
- Display warning at top of khoj obsidian plugin settings
- Make search command available only if connected to backend
- Show warning notice on clicking khoj search ribbon button

- Call saveData after configureKhojBackend to ensure
  connnectedToBackend setting saved after being (potentially) updated
  in configureKhojBackend function
2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky
768e874185 Load obsidian plugin even if fail to connect to backend but show warning
- Previously the plugin would not load if cannot connect to Khoj backend
  - Silently failing to load with no reason provided is not helpful
- Load plugin to allow user to fix the Khoj URL in their plugin setting
- Show reason for khoj plugin not working. More helpful than failing silently
2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky
aa22d83172 Create and use a context manager to time code
Use the timer context manager in all places where code was being timed

- Benefits
  - Deduplicate timing code scattered across codebase.
  - Provides single place to manage perf timing code
  - Use consistent timing log patterns
2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky
93f39dbd43 Add typing to text_search. Reformat code to set existing_embedding 2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
db7483329c Only import type hint packages for type checking. Avoids circular imports
Use annotations from the __future__ package to avoid having to quote
type hints. This import will not be required after Python 3.11
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
e5254a8e56 Create BaseEncoder class. Make OpenAI encoder its child. Use for typing
- Set type of all bi_encoders to BaseEncoder

- Make load_model return type Union of CrossEncoder and BaseEncoder
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
cf7400759b Remove unused render_results method from text and image search
It's a relic from when khoj was being used as a python module
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
afcfc3cd62 Split text_search.query logic into separate methods for modularity
The query method had become too big.

Extract out filter, score, sort and deduplicate logic used by
text_search.query into separate methods.

This should improve readabilty of code.
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
8dc6ee8b6c Pass `model' arg to extract_search_type method from beta search API
Issue caught by mypy
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
8498903641 Fix, add typing to Filter and TextSearchModel classes
- Changes
  - Fix method signatures of BaseFilter subclasses.
    Else typing information isn't translating to them
  - Explicitly pass `entries: list[Entry]' as arg to `load' method
  - Fix type of `raw_entries' arg to `apply' method
    to list[Entry] from list[str]
  - Rename `raw_entries' arg to `apply' method to `entries'
  - Fix `raw_query' arg used in `apply' method of subclasses to `query'
  - Set type of entries, corpus_embeddings in TextSearchModel

- Verification
  Ran `mypy --config-file .mypy.ini src' to verify typing
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
d40076fcd6 Deduplicate test code, make teardown more robust using pytest fixtures 2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
eace7c6215 Use torch.tensor as torch.Tensor cannot create tensor on MPS device
- `torch.Tensor' is apparently a legacy tensor constructor
- Using that to create tensor on MPS devices throws error:
  RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed
- `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine
2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky
9def3f8c6f Add exception handling to beta APIs, in case OpenAI API call fails 2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky
7b164de021 Add beta API to summarize top search result using an OpenAI model
This is unlike the more general chat API that combines summarization
of top search result and conversing with the OpenAI model

This should give faster summary results. As no intent categorization
API call required
2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky
d36da46f7b Truncate prompt to not exceed OpenAI prompt limit
Truncate prompt containing the top retrieved entry to 500 words to
avoid triggering the max_token limit error
2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky
237123d18c Fix tests for the conversation processor
- Use latest davinci model for tests
- Wrap prompt in triple quotes to improve legibilty
- `understand' method returns dictionary instead of string. Fix its test
- Fix prompt for new model to pass `chat_with_history' test
2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky
918af5e6f8 Make OpenAI conversation model configurable via khoj.yml
- Default to using `text-davinci-003' if conversation model not
  explicitly configured by user. Stop using the older `davinci' and
  `davinci-instruct' models

- Use `model' instead of `engine' as parameter.
  Usage of `engine' parameter in OpenAI API is deprecated
2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky
7e05389776 Quote all values passed to input-filter fields in sample yaml files 2023-01-08 22:40:18 -03:00
Debanjum Singh Solanky
0440f3fd57 Add encoder-type field to the search-type sections in khoj_sample.yml 2023-01-08 22:07:13 -03:00
Debanjum Singh Solanky
8b8e202ab3 Set input-filter to list in khoj_docker.yml and khoj_sample.yml
`input-filter' was converted to a list a while back but the sample
khoj configs were not updated to reflect this. This change fixes that
2023-01-08 21:08:00 -03:00
Debanjum Singh Solanky
74e779f8d0 Fix /beta/chat API to use Entry class instead of old dictionary pattern
Search returns response of type SearchResponse instead of a dict now
2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky
f2436039a0 Improve readability of GPT prompt strings in conversation processor 2023-01-08 15:27:41 -03:00
Debanjum
1c091e509b Make Encoder Type Configurable. Allow using OpenAI Model for Search
- 2fe37a0 Make type of encoder to use for embeddings configurable via `khoj.yml'
  - Previously `encoder_type' was set in the setup code of search_type
    - All *encoders* were of type `SentenceTransformer'
    - All *cross_encoders* were of type `CrossEncoder'
  - Now the `encoder_type' can be configured via the new `encoder_type' field 
    in `TextSearchConfig' under `search_type` in `khoj.yml'
  - All the specified `encoder-type' class needs is an `encode' method
    that takes entries and returns embedding vectors
  
- 826f9dc Drop long words from compiled entries to be within max token limit of models
  Long words (>500 characters) provide less useful context to models.
   
  Dropping very long words allow models to create better embeddings by
  passing more of the useful context from the entry to the model

- c0ae8ee Allow using OpenAI models for search in Khoj
  To use OpenAI models for search in Khoj, in `~/.khoj/khoj.yml'
  1. Set `encoder' to name of an OpenAI model. E.g *text-embedding-ada-002*
  2. Set `encoder-type' to *src.utils.models.OpenAI*
  3. Set `model-directory` to *null*, as this is an online model and
     cannot be stored on the file system
2023-01-08 11:10:25 -03:00
Debanjum Singh Solanky
6119005838 Improve comments, exceptions, typing and init of OpenAI model code 2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky
c0ae8eee99 Allow using OpenAI models for search in Khoj
- Init processor before search to instantiate `openai_api_key'
  from `khoj.yml'. The key is used to configure search with openai models

- To use OpenAI models for search in Khoj
  - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002
  - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI'
  - Set `model-directory' to `null', as online model cannot be stored on disk
2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky
826f9dc054 Drop long words from compiled entries to be within max token limit of models
Long words (>500 characters) provide less useful context to models.

Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky
6a30a13326 Only create model directory if the optional field is set in SearchConfig 2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky
2fe37a090f Make type of encoder to use for embeddings configurable via khoj.yml
- Previously `model_type' was set in the setup of each `search_type'
  - All encoders were of type `SentenceTransformer'
  - All cross_encoders were of type `CrossEncoder'

- Now `encoder-type' can be configured via the new `encoder_type' field
  in `TextSearchConfig' under `search-type` in `khoj.yml`.

- All the specified `encoder-type' class needs is an `encode' method
  that takes entries and returns embedding vectors
2023-01-07 23:09:12 -03:00
Debanjum Singh Solanky
fa92adcf0d Add Visualization of Codebase to Readme under Development Section
Source from Github vNext Repo Visualizer at
https://githubnext.com/projects/repo-visualization/
2023-01-05 20:11:56 -03:00
Debanjum Singh Solanky
8c7ffd7aee Add Readme doc to fix failure to build tokenizer dependency 2023-01-05 20:11:56 -03:00
Debanjum Singh Solanky
d55d7d53dc Fix GPU usage by Khoj on Macs to speed up search and indexing
- Ensure all tensors are on MPS device before doing operations across them

- Background
  - GPU is used by default for Khoj on MacOS now
    - Needed PyTorch > 1.13.0 on Macs to use GPU, which we do now
  - MPS should speed up search and indexing on MacOS
2023-01-05 15:39:09 -03:00
Debanjum Singh Solanky
7380518f24 Upgrade PyTorch, Pillow version to resolve Dependabot Security Advisories
This also enables GPU usage by Khoj on MacOS as MPS support is now in
PyTorch mainline
2023-01-05 15:39:09 -03:00
Debanjum
abd035e2fa Merge PR #112 to fix quote usage in khoj.el docstring from suliveevil/master
Fix usage warning for unescaped single quote in `khoj.el' docstring. 
Converts usage of '<text>' into `<text>' to use the correct quote forms in generated docs
2023-01-05 13:24:11 -03:00
Debanjum Singh Solanky
1dc1472c55 In publish workflow, make twine upload verbose to troubleshoot 2023-01-05 12:56:46 -03:00
Debanjum Singh Solanky
e792523849 Bump version in metadata packages for khoj, khoj.el and obsidian plugin 2023-01-05 12:50:27 -03:00
suliveevil
b2812b409f fix docstring usage warning
 Warning (comp): khoj.el:119:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
 Warning (comp): khoj.el:120:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
 Warning (comp): khoj.el:121:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
 Warning (comp): khoj.el:168:2: Warning: docstring has wrong usage of unescaped single quotes (use \= or different quoting)
2023-01-05 16:47:38 +08:00
Debanjum Singh Solanky
3d1199540c Update the publish workflow to also run on any tag push 2023-01-04 20:47:23 -03:00
Debanjum Singh Solanky
4842daca5f Run releases workflow on pushing any tag. 'v' prefix not required
Obsidian for some reason cannot pick up plugin assets from releases
made with prefixed tags
2023-01-04 20:27:56 -03:00
Debanjum Singh Solanky
47015ee6cc Fold Demo video descriptions, analysis by default in main Readme 2023-01-04 20:13:43 -03:00
Debanjum Singh Solanky
da17ff6ac8 Add Upgrade instructions for Khoj.el Readme. Fix version of khoj.el 2023-01-04 20:06:39 -03:00
Debanjum
65917eb5c9 Create Obsidian plugin for Khoj
### Plugin Features
  - Search Obsidian notes using Khoj
    *Provide Natural language search on your (markdown) notes in Obsidian Vault*

  - Show search results as rendered Markdown
    *Improve legibility of the results*

  - Jump to selected note from search result in Khoj search modal
    *Simplify seeing result within its original note context*

  - Automatically configure khoj to index markdown files in current vault
    *Reduce khoj setup steps for plugin users by using reasonable defaults*

    - Code updates the markdown config in `khoj.yml` and triggers index update
    - It can be configured by user in khoj plugin settings, if required

  - Add Demo and detailed Readme for the Obsidian plugin
    *Ease setup and usage. Give context about capabilities*

### Miscellaneous
  - (Try) Keep a mono repo until the Khoj project is mature enough
    to reduce maintainance burden

### Commits Details
  - 0e39e0f Add details about the Khoj Obsidian plugin to the main Readme
  - cd8b918 Add `manifest.json`, `versions.json` of Obsidian plugin to project root
  - 66ccd0c Create Obsidian plugin for Khoj
2023-01-04 20:02:42 -03:00
Debanjum Singh Solanky
3dd69f7505 Add Upgrade instructions for Obsidian, Emacs to main Readme 2023-01-04 19:50:26 -03:00
Debanjum Singh Solanky
0e39e0ff71 Add details about the Khoj Obsidian plugin to the main Readme
- Add Khoj in Obsidian Demo

- Update Interfaces Screenshot to include Obsidian Plugin Screenshot

- Update .gitignore to ignore obsidian plugin ignorelist
  Section the .gitignore for better readability

- Update the Setup, Usage instructions to include information about
  the Obsidian plugin
2023-01-04 18:42:53 -03:00
Debanjum Singh Solanky
cd8b918a55 Add manifest.json, versions.json of Obsidian plugin to project root
- Obsidian provides limited support for plugins in larger repositories.
  Currently, it does not have a way to specify the directory of a plugin
  So it expects the plugins `manifest.json' and `versions.json' to be at
  project root

- While this unnecessarily litters the codebase. It is the (current)
  required tradeoff for keeping the core plugins in a mono repo
2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky
66ccd0c970 Create Obsidian plugin for Khoj
- Features
  - Search using Khoj from within the Obsidian app
    Allow Natural language search on your (markdown) notes in Obsidian Vault

  - Show search results as rendered (instead of raw) Markdown
    Improve legibility of the results

  - Jump to selected note from search result in Khoj search modal
    Simplify seeing result within its original note context

  - Automatically configure khoj to index markdown files in current vault
    Reduce khoj setup steps for plugin users by using reasonable defaults

    - Code updates the markdown config in khoj.yml and triggers index update
    - It can be configured by user in khoj plugin settings, if required

  - Add Demo and detailed Readme for the Obsidian plugin
    Ease setup and usage. Give context about capabilities

- Miscellaneous
  - Trying keep a mono repo until the Khoj project is mature enough
    to reduce maintainance burden
2023-01-04 18:28:16 -03:00
Debanjum Singh Solanky
e5ef7789fc Add screenshot of Khoj as PWA on Android Homescreen to Readme 2023-01-04 15:47:08 -03:00
Debanjum Singh Solanky
feddb6ce62 Add start_url to khoj webmanifest to show Khoj as PWA on Chrome 2023-01-04 13:37:56 -03:00
Debanjum Singh Solanky
5ca60a2df7 Add How to Access Khoj on Mobile instructions to Readme 2023-01-04 13:37:40 -03:00
Debanjum Singh Solanky
3dee1aed9e Create /config/data/default API endpoint to serve default khoj config
This can ease configuring khoj from the different interfaces

- Don't need to know all the (default) config used by khoj.
- Just get default config by calling the above API endpoint.
- Then modify desired portions and call POST /api/config/data to
  configure khoj.
2023-01-03 21:52:34 -03:00
Debanjum Singh Solanky
ce945f7a90 Configure processors too on calling /update API
- Previously only search was being reconfigured
- But Processors are configured on app start too
- Match that behavior on calling /update API
2023-01-03 21:51:02 -03:00
Debanjum Singh Solanky
9d31988f42 Allow starting khoj in non-GUI mode without config file instantiated
- Start khoj server (in non-GUI mode) without needing config file
  already instantiated.
  - But throw warning to configure khoj to use it
- This allows plugins to configure the app via the /config/data APIs
- To be used by the Khoj obsidian plugin to configure markdown content
  in khoj
2023-01-03 21:36:59 -03:00
Debanjum Singh Solanky
52664dd96c Allow recursive glob pattern (**) to add files to search index
- Simplify configuring files to index For Obsidian/Org-Roam type
  systems with lots of small files in khoj.yml using `input-filter'
2023-01-03 01:32:58 -03:00
Debanjum Singh Solanky
152e5f1661 Return the file of each search result in response
- Useful for enabling jump to note functionality in interfaces
- It will be used in the Khoj plugin for Obsidian
2023-01-03 01:25:34 -03:00
Debanjum
fe1398401d Automatically update search index hourly
- c535953 Update index automatically in non GUI mode too
- 701d92e Lock the index before updating it via API or Scheduler
- 3b0783a Automate updating embeddings, search index on a hourly schedule

Resolves #106
2023-01-02 00:37:59 +00:00
Debanjum Singh Solanky
c535953915 Update index automatically in non GUI mode too
- Poll scheduler every minute using threading.Timer
  - Use 60 seconds polling interval to avoid fork bombing
- Schedule next via the same poll scheduler
- Allow clean program interrupt by running scheduler in daemon mode
2023-01-01 21:03:19 -03:00
Debanjum Singh Solanky
701d92e17b Lock the index before updating it via API or Scheduler
- There are 3 paths to updating/setting the index (stored in state.model)
  - App start
  - API
  - Scheduler

- Put all updates to the index behind a lock. As multiple updates path
that could (potentially) run at the same time (via API or Scheduler)
2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky
3b0783aab9 Automate updating embeddings, search index on a hourly schedule
- Use the schedule pypi package
- Use QTimer to poll schedule.run_pending() regularly for jobs to run
2023-01-01 17:09:36 -03:00
Debanjum Singh Solanky
a58c243bc0 Document using Word, Date and File Query Filter in Readme 2022-12-26 16:12:49 -03:00
Debanjum
06c25682c9 Split text entries by max tokens supported by ML models
### Background
There is a limit to the maximum input tokens (words) that an ML model can encode into an embedding vector.
For the models used for text search in khoj, a max token size of 256 words is appropriate [1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1#:~:text=model%20was%20just%20trained%20on%20input%20text%20up%20to%20250%20word%20pieces),[2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#:~:text=input%20text%20longer%20than%20256%20word%20pieces%20is%20truncated)

### Issue
Until now entries exceeding max token size would silently get truncated during embedding generation.
So the truncated portion of the entries would be ignored when matching queries with entries
This would degrade the quality of the results

### Fix
- e057c8e Add method to split entries by specified max tokens limit
- Split entries by max tokens while converting [Org](https://github.com/debanjum/khoj/commit/c79919b), [Markdown](https://github.com/debanjum/khoj/commit/f209e30) and [Beancount](https://github.com/debanjum/khoj/commit/17fa123) entries to JSONL
- b283650 Deduplicate results for user query by raw text before returning results

### Results
- The quality of the search results should improve
- Relevant, long entries should show up in results more often
2022-12-26 18:23:43 +00:00
Debanjum Singh Solanky
17fa123b4e Split entries by max tokens while converting Beancount entries To JSONL 2022-12-26 15:14:32 -03:00
Debanjum Singh Solanky
f209e30a3b Split entries by max tokens while converting Markdown entries To JSONL 2022-12-26 13:14:15 -03:00
Debanjum Singh Solanky
24676f95d8 Fix comments, use minimal test case, regenerate test index, merge debug logs
- Remove property drawer from test entry for max_words splitting test
  - Property drawer is not required for the test
  - Keep minimal test case to reduce chance for confusion
2022-12-25 22:33:04 -03:00
Debanjum Singh Solanky
b283650991 Deduplicate results for user query by raw text before returning results
- Required because entries are now split by the max_word count supported
  by the ML models
- This would now result in potentially duplicate hits, entries being
  returned to user
- Do deduplication after ranking to get the top ranked deduplicated
  results
2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky
53cd2e5605 Regenerate initial model in asymmetric reload test to reduce flakyness
- Fix logger message when converting org node to entries
- Remove unused import from conftest
2022-12-25 21:36:15 -03:00
Debanjum Singh Solanky
c79919bd68 Split entries by max tokens while converting Org entries To JSONL
- Test usage the entry splitting by max tokens in text search
2022-12-25 21:36:00 -03:00
Debanjum Singh Solanky
08dc5e3324 Update instructions in khoj.el to install it from MELPA stable
- The instructions suggest installing khoj-assistant via pip install.
  This installs the latest tagged/release version of khoj
- To match that version user should install khoj.el from MELPA stable
  instead of MELPA
2022-12-23 19:08:38 -03:00
Debanjum Singh Solanky
e057c8e208 Add method to split entries by specified max tokens limit
- Issue
   ML Models truncate entries exceeding some max token limit.
   This lowers the quality of search results

- Fix
  Split entries by max tokens before indexing.
  This should improve searching for content in longer entries.

- Miscellaneous
  - Test method to split entries by max tokens
2022-12-23 16:24:04 -03:00
Debanjum Singh Solanky
d3e175370f Update readme to install khoj.el from MELPA stable unless using pre-release khoj
Update readme to ask user to install khoj.el from MELPA when a
pre-release version of the main khoj app is installed. Else install
khoj.el from MELPA Stable
2022-12-20 23:29:22 -03:00
Debanjum Singh Solanky
cd463c5085 Update Khoj.el Install Instructions on Emacs 2022-12-20 11:06:33 -03:00
Debanjum Singh Solanky
23ca5a2d43 Improve (un-)quoting of funcs used in `khoj--get-enabled-content-types'
- Based on melpa package feedback for khoj.el
- Verified these changes don't affect behavior of the function
2022-12-19 18:02:23 -03:00
Debanjum Singh Solanky
5db3a67df5 Fix Khoj Emacs package URL in khoj.el 2022-12-14 22:49:19 -03:00
Debanjum Singh Solanky
abad6d5f44 Declare external khoj.el funcs. Remove undefined func warnings on install 2022-12-14 22:36:04 -03:00
Debanjum Singh Solanky
2e5ac5bf22 Bump Khoj to version 0.2.1. It's the current version in development
Each push to master creates a development release on pypi
2022-12-04 09:09:34 -03:00
Debanjum Singh Solanky
c52383b11c Delete stale, unused installation helper script 2022-12-03 13:36:47 -03:00
Debanjum Singh Solanky
676de2372e Update instructions to install Khoj using Conda
- Requires instally PyQT6 using pip as conda doesn't have a package
  for pyqt6 yet
2022-12-03 13:32:50 -03:00
Debanjum Singh Solanky
1990d09032 Bump khoj version in setup.py, khoj.el to 0.2.0 2022-12-02 14:58:54 -03:00
Debanjum Singh Solanky
a9cfd8b800 Extract hash func for incremental text indexing into separate method 2022-10-26 13:56:58 +05:30
Debanjum Singh Solanky
0de2ff9c97 Add __init__.py to routers directory to register it as a package 2022-10-25 20:40:40 +05:30
Debanjum Singh Solanky
55d2fea9be Move Custom Formatter class for logger to util.helper module from main.py 2022-10-20 00:32:24 +05:30
Debanjum
5ed17ccbd7 Modularize, Improve API. Formalize Intermediate Text Content Format. Add Type Checking
- **Improve API Endpoints**
  - ee65a4f Merge /reload, /regenerate into single /update API endpoint
  - 9975497 Type the /search API response to better document the response schema
  - 0521ea1 Put image score breakdown under `additional` field in search response
- **Formalize Intermediary Format to Index Text Content**
  - 7e9298f Use new Text `Entry` class to track text entries in Intermediate Format
  - 02d9440 Use Base `TextToJsonl` class to standardize `<text>_to_jsonl` processors
- **Modularize API router code**
  - e42a38e Split router code into `web_client`, `api`, `api_beta` routers. Version Khoj API
  - d292bdc Remove API versioning. Premature given current state of the codebase
- **Miscellaneous**
  - c467df8 Setup `mypy` for static type checking
  - 2c54813 Remove unused imports, `embeddings` variable from text search tests
2022-10-19 11:23:04 +00:00
Debanjum Singh Solanky
1c40f97114 Merge branch 'master' of github.com:debanjum/khoj into modularize-api-and-increase-typing
- Conflicts:
  - src/interface/emacs/khoj.el
    Use our update to `config-url', use their `url-request-method'
2022-10-19 16:46:53 +05:30
Debanjum Singh Solanky
e1b5a87920 Rename Frontend Router to Web Client. Fix logger usage in routers
- Use logger in api_beta router instead of print statements
- Remove unused logger in web client router
2022-10-19 16:36:48 +05:30
Debanjum
4abd51cb04 Merge pull request #99 from telotortium/method
Explicitly set `url-request-method' to GET in khoj.el
2022-10-19 10:31:37 +00:00
Debanjum
74f32eedb8 Remove Exiftool Dependency. Ignore legacy model warning on app start
- bf1ae038cb Get XMP metadata from image using `Pillow`. Remove `ExifTool` dependency
  - Pillow library is already used in Khoj and it can extract XMP Metadata from Images
  - Reduce unmaintained dependencies by using Pillow instead of Exiftool
  - Pillow is much better maintained than my fork of the Exiftool python package
- c16ae9e344 Ignore *"Legacy way to download model"* warning for upstream dependency
2022-10-08 14:57:58 +00:00
Debanjum Singh Solanky
c467df8fa3 Setup `mypy' for static type checking 2022-10-08 17:33:13 +03:00
Debanjum Singh Solanky
d292bdcc11 Do not version API. Premature given current state of the codebase
- Reason
  - All clients that currently consume the API are part of Khoj
  - Any breaking API changes will be fixed in clients immediately
  - So decoupling client from API is not required
  - This removes the burden of maintaining muliple versions of the API
2022-10-08 16:32:46 +03:00
Debanjum Singh Solanky
2c548133f3 Remove unused imports, `embeddings' variable from text search tests 2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky
7e9298f315 Use new Text Entry class to track text entries in Intermediate Format
- Context
  - The app maintains all text content in a standard, intermediate format
  - The intermediate format was loaded, passed around as a dictionary
    for easier, faster updates to the intermediate format schema initially
  - The intermediate format is reasonably stable now, given it's usage
    by all 3 text content types currently implemented

- Changes
  - Concretize text entries into `Entries' class instead of using dictionaries
    - Code is updated to load, pass around entries as `Entries' objects
      instead of as dictionaries
    - `text_search' and `text_to_jsonl' methods are annotated with
       type hints for the new `Entries' type
    - Code and Tests referencing entries are updated to use class style
      access patterns instead of the previous dictionary access patterns

  - Move `mark_entries_for_update' method into `TextToJsonl' base class
    - This is a more natural location for the method as it is only
      (to be) used by `text_to_jsonl' classes
    - Avoid circular reference issues on importing `Entries' class
2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky
99754970ab Type the /search API response to better document the response schema
- Both Text, Image Search were already giving list of entry, score
- This change just concretizes this change and exposes this in the API
  documentation (i.e OpenAPI, Swagger, Redocs)
2022-10-08 12:06:05 +03:00
Debanjum Singh Solanky
0521ea10d6 Put image score breakdown under `additional' field in search response
- Update web, emacs interfaces to consume the scores from new schema
2022-10-08 12:06:01 +03:00
Debanjum Singh Solanky
e42a38e825 Version Khoj API, Update frontends, tests and docs to reflect it
- Split router.py into v1.0, beta and frontend (no-prefix) api modules
  under new router package. Version tag in main.py via prefix
- Update frontends to use the versioned api endpoints
- Update tests to work with versioned api endpoints
- Update docs to mentioned, reference only versioned api endpoints
2022-09-28 20:08:38 +03:00
Robert Irelan
d25e1d8e86 fix: explicitly set url-request-method
In my installation, it appears that `url-request-method` is sometimes set
globally to POST.  Need to explicitly set it to ensure that GET is always
used as intended.
2022-09-19 15:46:46 -04:00
Debanjum Singh Solanky
ee65a4f2c7 Merge /reload, /regenerate into single /update API endpoint
- Pass force=true to /update API to force regenerating index from
scratch
- Otherwise calls to the /update API endpoint will result in an
incremental update to index
2022-09-16 00:53:19 +03:00
Debanjum Singh Solanky
02d944030f Use Base TextToJsonl class to standardize <text>_to_jsonl processors
- Start standardizing implementation of the `text_to_jsonl' processors
  - `text_to_jsonl; scripts already had a shared structure
  - This change starts to codify that implicit structure

- Benefits
  - Ease adding more `text_to_jsonl; processors
  - Allow merging shared functionality
  - Help with type hinting

- Drawbacks
  - Lower agility to change. But this was already an implicit issue as
    the text_to_jsonl processors got more deeply wired into the app
2022-09-16 00:53:11 +03:00
Debanjum Singh Solanky
c16ae9e344 Ignore "Legacy way to download model" warning for upstream dependency 2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky
3169e3b78e Use ellipsis instead of pass in base filter abstract methods for aesthetic 2022-09-16 00:48:45 +03:00
Debanjum Singh Solanky
bf1ae038cb Get XMP metadata from image using Pillow. Remove ExifTool dependency
- Pillow already supports reading XMP metadata from Images
- Removes need to maintain my fork of unmaintained PyExiftool
  - This also removes dependency on system Exiftool package for
    XMP metadata extraction
- Add test to verify XMP metadata extracted from test images
- Remove references to Exiftool from Documentation
2022-09-16 00:48:45 +03:00
Saba
a53094ec92 Add workflow dispatch support in build.yml
- To support dispatch, set the image label based on the branch name
- Master build should still be tagged with latest to get benefit of the standard production Docker label
2022-09-15 20:28:41 +03:00
Debanjum Singh Solanky
8f57a62675 Remove unused imports. Fix typing and indentation
- Typing issues discovered using `mypy'. Fixed manually
- Unused imports discovered and fixed using `autoflake'
- Fix indentation in `org_to_jsonl' manually
2022-09-14 04:56:52 +03:00
Debanjum Singh Solanky
be57c711fd Revert OrgNode.hasTag func to method instead of property as accepts argument 2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky
0109c7bd91 Disable ability to call <text>_to_jsonl, <type>_search packages directly
- This code is de-synced with expected args by above scripts
- Better to remove unused capabilitity that needlessly increases
  maintainance burden
2022-09-14 04:56:48 +03:00
Debanjum Singh Solanky
1680a617da Reflect updates to query and results count in URL
- Simplify tracking khoj query history, saving/sharing links
- Do not execute search, when query only contains whitespaces
  - Prevents error when try process results of empty query
2022-09-13 23:39:24 +03:00
Debanjum Singh Solanky
34314e859a Call /reload instead of /regenerate API to update index from web interface
- As `/reload` updates index incrementally, it's relatively quick
- This makes exposing `/reload` endpoint a better default to expose
  via the web interface than `the /regenerate' endpoint
2022-09-12 23:39:10 +03:00
Debanjum Singh Solanky
13b5d5082f Create input field to set results count on the web interface
Resolves #96
2022-09-12 23:24:46 +03:00
Debanjum Singh Solanky
0ce0c00090 Bump khoj version to 0.1.10 2022-09-12 23:03:22 +03:00
Debanjum Singh Solanky
1bfe9c4ef2 Handle filter only queries. Short-circuit and return filtered results
- For queries with only filters in them short-circuit and return
  filtered results. No need to run semantic search, re-ranking.
- Add client test for filter only query and quote query in client tests
2022-09-12 17:13:05 +03:00
Debanjum Singh Solanky
afc84de234 Make word filter regex explicit. Allow hyphen in word filters
Helps with #88
2022-09-12 17:05:29 +03:00
Debanjum
3d86d763c5 Support Multiple Input Filters to Configure Content to Index
- 536f03a Process text content files in sorted order for stable indexing
- a701ad0 Support multiple input-filters to configure content to index via `khoj.yml`

Resolves #84
2022-09-12 08:19:52 +00:00
Debanjum Singh Solanky
536f03af8f Process text content files in sorted order for stable indexing
- Image search already uses a sorted list of images to process
- Prevents index of entries to desync when entries, embeddings
  generated by a separate server/app instance
2022-09-12 11:09:40 +03:00
Debanjum Singh Solanky
a701ad08b9 Support multiple input-filters to configure content to index via khoj.yml
- Update existings code, tests to process input-filters as list
  instead of str
- Test `text_to_jsonl' get files methods to work with combination of
  `input-files' and `input-filters'

Resolves #84
2022-09-12 11:08:59 +03:00
Debanjum Singh Solanky
940c8fac8c Use app LRU, not functools LRU decorator, to cache search results in router
- Provides more control to invalidate cache on update to entries, embeddings
- Allows logging when results are being returned from cache etc
- FastAPI, Swagger API docs look better as the `search' controller not
  wrapped in generically named function when using functools LRU decorator
2022-09-12 09:38:48 +03:00
Debanjum Singh Solanky
c6fa09d8fc Fix querying with include word filter from web interface
- Not encoding the `query' string before querying the backend API with
  it was causing the "+" prefix for include word filter to be lost
2022-09-12 09:27:02 +03:00
Debanjum Singh Solanky
1502fbc9e9 Add index_heading_entries flag to default and sample khoj configs 2022-09-11 17:33:37 +03:00
Debanjum Singh Solanky
7216cdff58 Add Date, Word filter for Org-Music content 2022-09-11 17:29:34 +03:00
Debanjum
182fbbd8df Allow Indexing Heading Entries. Improve Org, TextToJsonl Parser
### Summary
- Set `index_heading_entries` field in `~/.khoj/khoj.yml` to `true` to index entries with empty body

### Main Changes
#### Make Indexing Org-Mode Entries with Empty Body Configurable
- 253c9ea Set `index_heading_entries` field in `khoj.yml` to index entries with no body

### Fix, Improve OrgNode, TextToJsonl Parser
- 9d369ae Fix `OrgNode` render of entries with property drawers and empty body
- 1d3b3d5 Convert field get/set methods in `OrgNode` class to `@property`
- db37e38 Create `OrgNode` `hasBody` method. Use it in `org_to_jsonl` checks
- b4878d7 Extract entries from scratch when regenerate requested
- 52e3dd9 Pass the whole `TextContentConfig` as argument to `text_to_jsonl` methods
- e951ba3 Raise exception when org file not found

Resolves #87
2022-09-11 13:46:11 +00:00
Debanjum Singh Solanky
9d369ae4df Fix OrgNode render of entries with property drawers and empty body
- Issue
  - Indent regex was previously catching escape sequences like newlines
  - This was resulting in entries with only escape sequences in body to
    be prepended to property drawers etc during rendering
- Fix
  - Update indent regex to only look for spaces in each line
  - Only render body when body contains non-escape characters
  - Create test to prevent this regression from silently resurfacing
2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky
253c9eae9a Set index_heading_entries field in config to index entries with no body
- Previously heading entries were not indexed to maintain search quality
- But given that there are use-cases for indexing entries with no body
- Add a configurable `index_heading_entries' field to index heading entries
- This `TextContentConfig' field is currently only used for OrgMode content
2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky
1d3b3d5f39 Convert field get/set methods in OrgNode class to @property
- Use more descriptive variable names in OrgNode parser and class
- Convert OrgNode fields to private/protected, use property methods to
  get/set them
2022-09-11 14:59:28 +03:00
Debanjum Singh Solanky
db37e38df7 Create OrgNode hasBody method. Use it in org_to_jsonl checks 2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky
b4878d76ea Extract entries from scratch when regenerate requested
- Do not rely on previously extracted entries to find new entries in
regenerate scenario
2022-09-11 12:50:03 +03:00
Debanjum Singh Solanky
52e3dd9835 Pass the whole TextContentConfig as argument to text_to_jsonl methods
- Let the specific text_to_jsonl method decide which of the
  TextContentConfig fields it needs to convert <text> type to jsonl
- This simplifies extending TextContentConfig for a specific type without
  modifying all text_to_jsonl methods
- It keeps the number of args being passed to the `text_to_jsonl'
  methods in check
2022-09-11 12:49:56 +03:00
Debanjum Singh Solanky
e951ba37ad Raise exception when org file not found
- No need to catch the IOError in OrgNode
2022-09-11 01:09:24 +03:00
Debanjum
c415af32d5 Support Incremental Update of Entries, Embeddings for OrgMode, Markdown, Beancount Content
### Major Changes
  - 030fab9 Support incremental update of **Markdown** entries, embeddings
  - 91aac83 Support incremental update of **Beancount** transactions, embeddings
  - 2f7a6af Support incremental update of **Org-Mode** entries, embeddings
    - Encode embeddings for updated or new entries
    - Reuse embeddings encoded for existing entries earlier
    - Merge the existing and new entries and embeddings to get the updated entries, embeddings
  - 91d11cc Only hash compiled entry to identify new/updated entries to update
  - b9a6e80 Make OrgNode tags stable sorted to find new entries for incremental updates

### Minor Changes
  - c17a0fd Do not store word filters index to file. Not necessary for now
  - 4eb84c7 Log performance metrics for jsonl conversion
  - 2e1bbe0 Fix striping empty escape sequences from strings

### Why
  - Encoding embeddings is the slowest step to index content
  - Previously we regenerated embeddings for all entries, even if they existed in previous runs
  - Reusing previously generated embeddings should significantly speed up index updates,
    given most user generated content can be expected to be unchanged across time

Resolves #36
2022-09-10 21:38:05 +00:00
Debanjum Singh Solanky
9b2845de06 Add basic tests for beancount to jsonl conversion 2022-09-11 00:16:02 +03:00
Debanjum Singh Solanky
d3267554ae Add basic tests for markdown to jsonl conversion 2022-09-11 00:15:27 +03:00
Debanjum Singh Solanky
2e1bbe0cac Fix striping empty escape sequences from strings
- Fix log message on jsonl write
2022-09-10 23:57:05 +03:00
Debanjum Singh Solanky
a7cf6c8458 Use dictionary instead of list to track entry to file maps 2022-09-10 23:08:30 +03:00
Debanjum Singh Solanky
3e1323971b Stack function calls in jsonl converters to avoid unneeded variables 2022-09-10 22:56:06 +03:00
Debanjum Singh Solanky
4eb84c7f51 Log performance metrics for beancount, markdown to jsonl conversion 2022-09-10 22:47:54 +03:00
Debanjum Singh Solanky
ebd5039bd1 Merge branch 'master' into support-incremental-updates-of-embeddings 2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky
ed8d432fdd Clean-up generated file after image search test run
- Clean-up unused imports in test files
2022-09-10 21:43:31 +03:00
Debanjum Singh Solanky
030fab9bb2 Support incremental update of Markdown entries, embeddings 2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky
91aac83c6a Support incremental update of Beancount transactions, embeddings 2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky
cfaf7aa6f4 Update Indexing Performance Section in Readme 2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky
b01b4d7daa Extract logic to mark entries for embeddings update into helper function
- This could be re-used by other text_to_jsonl converters like
  markdown, beancount
2022-09-10 21:43:08 +03:00
Debanjum Singh Solanky
f97308bef2 Fix log message on writing JSONL data to file 2022-09-10 21:40:08 +03:00
Debanjum Singh Solanky
899bfc5c3e Test incremental update triggered on calling text_search.setup
- Previously updates to index required explicitly setting `regenerate=True`
- Now incremental update check made everytime on `text_search.setup` now
- Test if index automatically updates when call `text_search.setup`
  with new content even with `regenerate=False`
2022-09-10 21:02:27 +03:00
Debanjum Singh Solanky
c17a0fd05b Do not store word filters index to file. Not necessary for now
- It's more of a hassle to not let word filter go stale on entry
  updates
- Generating index on 120K lines of notes takes 1s. Loading from file
  takes 0.2s. For less content load time difference will be even smaller
- Let go of startup time improvement for simplicity for now
2022-09-10 21:01:54 +03:00
Debanjum Singh Solanky
91d11ccb49 Only hash compiled entry to identify new/updated entries to update
- Comparing compiled entries is the appropriately narrow target to
  identify entries that need to encode their embedding vectors. Given we
  pass the compiled form of the entry to the model for encoding

- Hashing the whole entry along with it's raw form was resulting in a
  bunch of entries being marked for updated as LINE: <entry_line_no>
  is a string added to each entries raw format.

- This results in an update to a single entry resulting in all entries
  below it in the file being marked for update (as all their line
  numbers have changed)

- Log performance metrics for steps to convert org entries to jsonl
2022-09-10 21:01:44 +03:00
Debanjum Singh Solanky
b9a6e80629 Make OrgNode tags stable sorted to find new entries for incremental updates
- Having Tags as sets was returning them in a different order
  everytime
- This resulted in spuriously identifying existing entries as new
  because their tags ordering changed
- Converting tags to list fixes the issue and identifies updated new
  entries for incremental update correctly
2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky
2f7a6af56a Support incremental update of org-mode entries and embeddings
- What
  - Hash the entries and compare to find new/updated entries
  - Reuse embeddings encoded for existing entries
  - Only encode embeddings for updated or new entries
  - Merge the existing and new entries and embeddings to get the updated
    entries, embeddings

- Why
  - Given most note text entries are expected to be unchanged
    across time. Reusing their earlier encoded embeddings should
    significantly speed up embeddings updates
  - Previously we were regenerating embeddings for all entries,
    even if they had existed in previous runs
2022-09-10 20:58:33 +03:00
Debanjum Singh Solanky
ec675d27d3 Suppress non-actionable HuggingFace FutureWarning shown on app start 2022-09-10 16:43:14 +03:00
Debanjum Singh Solanky
1ac6a71ff0 Add --version flag to show installed version of khoj 2022-09-10 16:40:19 +03:00
Debanjum
372dcd2dbc Handle Empty Org Files or Org Files with No Headings
### Main Changes
- bf01a4f Use filename or "#+TITLE" as heading for 0th level content in org files
- d6bd7bf Fix initializing `OrgNode` `level` to string to parse org files with no headings
- d835467 Throw exception if no valid entries found in specified content files

### Miscellaneous Improvements
- 7df39e5 Reuse search models across `pytest` sessions. Merge unused pytest fixtures
- 2dc0588 Do not normalize absolute filenames for entry links in `OrgNode`
- e00bb53 Init word filter dictionary with default value as set to simplify code

Resolves #83
2022-09-10 12:42:07 +00:00
Debanjum Singh Solanky
976397bd82 Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings 2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky
2b58218b56 Reuse search models across sessions. Merge unused pytest fixtures
- Remove unused model_dir pytest fixture. It was only being used by
  the content_config fixture, not by any tests
- Reuse existing search models downloaded to khoj directory.
  Downloading search models for each pytest sessions seems excessive and
  slows down tests quite a bit
2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
11917c6ddd Do not normalize absolute filenames for creating links in OrgNode 2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
07b98d35f1 Use filename or #+TITLE as heading for 0th level content in org files
- Set LINE, SOURCE link properties in property drawer correctly for
  content which falls under no heading
- See Issue #83 for more details
2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
d6bd7bf3e1 Fix initializing OrgNode level to string to parse org files
- Parsed `level` argument passed to OrgNode during init is expected to
  be a string, not an integer
- This was resulting in app failure only when parsing org files with
  no headings, like in issue #83, as level is set to string of `*`s
  the moment a heading is found in the current file
2022-09-10 14:21:08 +03:00
Debanjum Singh Solanky
d835467f2c Throw exception if no valid entries found in specified content files
- Previously we were failing if no valid entries while computing
  embeddings. This was obscuring the actual issue of no valid entries
  found in the specified content files
- Throwing an exception early with clear message when no entries found
  should make clarify the issue to be fixed
- See issue #83 for details
2022-09-10 14:20:10 +03:00
Debanjum Singh Solanky
e00bb53336 Init word filter dictionary with default value as set to simplify code 2022-09-10 12:19:09 +03:00
Debanjum Singh Solanky
4d776d9c7a Bump khoj version to 0.1.9 2022-09-09 07:50:15 +03:00
Debanjum
b58b7d7483 Create App Directory, Fix Initialization GUI on First Run
- 588f598 Pass empty list of `input_files` to `FileBrowser` on first run
- 3ddffdf Create config directory before setting up logging to file under it

Resolves #78
Resolves #79
Resolves #80
2022-09-09 04:40:22 +00:00
Debanjum Singh Solanky
588f598949 Pass empty list of `input_files' to FileBrowser on first run
- Default config has `input_files' set to None
- This was being passed to `FileBrowser' on Initialization
- But `FileBrowser' expects `content_files' of list type, not None
- This resulted in an unexpected NoneType failure
2022-09-09 07:26:40 +03:00
Debanjum Singh Solanky
3ddffdfba4 Create config directory before setting up logging to file under it
- The logging to file code expects the config directory to already be setup
- But parent directory of config file was being set up later in code
- This resulted in app start failing with ~/.khoj dir does not exist error
2022-09-09 07:21:42 +03:00
Debanjum
79894efc7a Resolve GUI Issues in Docker Build
- 17354aa Install `pyqt` system package in Docker image to get qt dependencies
- 5d3aeba Do not start GUI when Khoj started from Docker
- 26ff66f (Re-)Enable image search via Docker image as image search issues fixed

Resolves #76
2022-09-08 07:55:06 +00:00
Debanjum Singh Solanky
26ff66f38b (Re-)Enable image search via Docker image as image search issues fixed 2022-09-08 10:42:34 +03:00
Debanjum Singh Solanky
17354aaffd Install pyqt system package in Docker image to get qt dependencies
Otherwise app start fails with pyqt package import related errors.
See #76 for bug
2022-09-08 10:39:11 +03:00
Debanjum Singh Solanky
5d3aeba22f Use --no-gui flag on starting Khoj from docker-compose
As the GUI wouldn't work when run from a docker container
2022-09-08 10:37:39 +03:00
Debanjum Singh Solanky
e4d40e4d4d Update setup.py version, Readme. Remove faulty release badge for now 2022-09-07 14:51:03 +03:00
182 changed files with 12062 additions and 4405 deletions

View File

@@ -6,4 +6,5 @@ docs/
tests/
build/
dist/
scripts/
*.egg-info/

39
.github/workflows/build_khoj_el.yml vendored Normal file
View File

@@ -0,0 +1,39 @@
# melpa quality checks like checkdoc, byte-compile, package-lint for khoj.el
# using melpazoid: https://github.com/riscy/melpazoid
name: build khoj.el
on:
push:
branches:
- 'master'
paths:
- src/interface/emacs/*.el
- .github/workflows/build_khoj_el.yml
pull_request:
branches:
- 'master'
paths:
- src/interface/emacs/*.el
- .github/workflows/build_khoj_el.yml
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v1
with: { python-version: 3.9 }
- name: ⏬️ Install Dependencies
run: |
python -m pip install --upgrade pip
sudo apt-get install emacs && emacs --version
git clone https://github.com/riscy/melpazoid.git ~/melpazoid
pip install ~/melpazoid
- name: 🌡️ Validate Khoj.el
env:
# Khoj recipe from https://github.com/melpa/melpa/pull/8321/files
RECIPE: (khoj :fetcher github :repo "debanjum/khoj" :files ("src/interface/emacs/*.el"))
EXIST_OK: true
LOCAL_REPO: ${{ github.workspace }}
run: echo $GITHUB_REF && make -C ~/melpazoid

View File

@@ -1,16 +1,22 @@
name: build
name: dockerize
on:
push:
tags:
- "*"
branches:
- master
paths:
- src/**
- src/khoj/**
- config/**
- setup.py
- pyproject.toml
- Dockerfile
- docker-compose.yml
- .github/workflows/build.yml
- .github/workflows/dockerize.yml
workflow_dispatch:
env:
DOCKER_IMAGE_TAG: ${{ github.ref == 'refs/heads/master' && 'latest' || github.ref_name }}
jobs:
build:
@@ -18,24 +24,24 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
uses: docker/setup-buildx-action@v2
- name: Login to GitHub Container Registry
uses: docker/login-action@v1
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.PAT }}
- name: Build and Push Docker Image
- name: 📦 Build and Push Docker Image
uses: docker/build-push-action@v2
with:
context: .
file: Dockerfile
push: true
tags: ghcr.io/${{ github.repository }}:latest
tags: ghcr.io/${{ github.repository }}:${{ env.DOCKER_IMAGE_TAG }}
build-args: |
PORT=8000
PORT=8000

View File

@@ -0,0 +1,45 @@
name: dockerize telemetry server
on:
push:
branches:
- master
paths:
- src/telemetry/**
- .github/workflows/dockerize_telemetry_server.yml
pull_request:
branches:
- master
paths:
- src/telemetry/**
- .github/workflows/dockerize_telemetry_server.yml
workflow_dispatch:
env:
DOCKER_IMAGE_TAG: ${{ github.ref == 'refs/heads/master' && 'latest' || github.event.pull_request.number }}
jobs:
build:
name: Build Docker Image, Push to Container Registry
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.PAT }}
- name: 📦 Build and Push Docker Image
uses: docker/build-push-action@v2
with:
context: src/telemetry
file: src/telemetry/Dockerfile
push: true
tags: ghcr.io/${{ github.repository }}-telemetry:${{ env.DOCKER_IMAGE_TAG }}

View File

@@ -1,95 +0,0 @@
name: publish
on:
push:
tags:
- v*
branches:
- 'master'
paths:
- src/**
- setup.py
- .github/workflows/publish.yml
pull_request:
branches:
- 'master'
paths:
- src/**
- setup.py
- .github/workflows/publish.yml
jobs:
publish:
name: Publish App to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install build twine
- name: Install Application
run: |
pip install --upgrade .
- name: Publish Release to PyPI
if: startsWith(github.ref, 'refs/tags')
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_KEY }}
run: |
# Setup Environment for Reproducible Builds
export PYTHONHASHSEED=42
export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
# Build and Upload PyPi Package
rm -rf dist
python -m build
twine check dist/*
twine upload dist/*
- name: Publish Master to PyPI
if: github.ref == 'refs/heads/master'
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_KEY }}
run: |
# Set Pre-Release Version
sed -E -i "s/version=(.*)',/version=\1a$(date +%s)',/g" setup.py
# Setup Environment for Reproducible Builds
export PYTHONHASHSEED=42
export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
# Build and Upload PyPi Package
rm -rf dist
python -m build
twine check dist/*
twine upload dist/*
- name: Publish PR to Test PyPI
if: github.event_name == 'pull_request'
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_API_KEY }}
PULL_REQUEST_NUMBER: ${{ github.event.number }}
run: |
# Set Development Release Version
sed -E -i "s/version=(.*)',/version=\1.dev$PULL_REQUEST_NUMBER$(date +%s)',/g" setup.py
# Setup Environment for Reproducible Builds
export PYTHONHASHSEED=42
export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
# Build and Upload PyPi Package
rm -rf dist
python -m build
twine check dist/*
twine upload -r testpypi dist/*

64
.github/workflows/pypi.yml vendored Normal file
View File

@@ -0,0 +1,64 @@
name: pypi
on:
push:
tags:
- "*"
branches:
- 'master'
paths:
- src/khoj/**
- pyproject.toml
- .github/workflows/pypi.yml
pull_request:
branches:
- 'master'
paths:
- src/khoj/**
- pyproject.toml
- .github/workflows/pypi.yml
jobs:
publish:
name: Publish Python Package to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: ⬇️ Install Application
run: python -m pip install --upgrade pip && pip install --upgrade .
- name: ⚙️ Build Python Package
run: |
# Setup Environment for Reproducible Builds
export PYTHONHASHSEED=42
export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
rm -rf dist
# Build PyPi Package
pipx run build
- name: 🌡️ Validate Python Package
run: |
# Validate PyPi Package
pipx run check-wheel-contents dist/*.whl
pipx run twine check dist/*
- name: ⏫ Upload Python Package Artifacts
uses: actions/upload-artifact@v3
with:
name: khoj-assistant
path: dist/*.whl
- name: 📦 Publish Python Package to PyPI
if: startsWith(github.ref, 'refs/tags') || github.ref == 'refs/heads/master'
uses: pypa/gh-action-pypi-publish@v1.6.4
with:
password: ${{ secrets.PYPI_API_KEY }}

View File

@@ -1,18 +1,70 @@
name: release
on:
push:
tags:
- "*"
workflow_dispatch:
inputs:
version:
description: 'Version Number'
required: true
type: string
push:
tags:
- v*
jobs:
publish:
publish_obsidian_plugin:
name: 💎 Publish Obsidian Plugin
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: src/interface/obsidian
steps:
- uses: actions/checkout@v3
- name: Install Node
uses: actions/setup-node@v3
with:
node-version: "lts/*"
- name: ⚙️ Build Obsidian Plugin
run: |
yarn
yarn run build --if-present
- name: ⏫ Upload Obsidian Plugin main.js
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: main.js
path: src/interface/obsidian/main.js
- name: ⏫ Upload Obsidian Plugin manifest.json
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: manifest.json
path: src/interface/obsidian/manifest.json
- name: ⏫ Upload Obsidian Plugin styles.css
uses: actions/upload-artifact@v3
with:
if-no-files-found: error
name: styles.css
path: src/interface/obsidian/styles.css
- name: 🌈 Create Release
uses: softprops/action-gh-release@v1
if: startsWith(github.ref, 'refs/tags/')
with:
generate_release_notes: true
files: |
src/interface/obsidian/main.js
src/interface/obsidian/manifest.json
src/interface/obsidian/styles.css
publish_desktop_apps:
name: 🖥️ Publish Desktop Apps
strategy:
matrix:
include:
@@ -31,7 +83,7 @@ jobs:
with:
python-version: '3.9'
- name: Install Dependencies
- name: ⏬️ Install Dependencies
shell: bash
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
@@ -40,11 +92,11 @@ jobs:
python -m pip install --upgrade pip
pip install pyinstaller
- name: Install Khoj App
- name: ⬇️ Install Khoj App
run: |
pip install --upgrade .
- name: Package Khoj App
- name: 📦 Package Khoj App
shell: bash
run: |
# Setup Environment for Reproducible Builds
@@ -56,7 +108,7 @@ jobs:
mv dist/Khoj.exe dist/khoj_"$GITHUB_REF_NAME"_amd64.exe
fi
- name: Create Mac App DMG
- name: 💻 Create Mac App DMG
if: matrix.os == 'macos-latest'
run: |
# Install Mac DMG Creator
@@ -66,7 +118,7 @@ jobs:
# Create disk image with the app
create-dmg \
--volname "Khoj" \
--volicon "src/interface/web/assets/icons/favicon.icns" \
--volicon "src/khoj/interface/web/assets/icons/favicon.icns" \
--window-pos 200 120 \
--window-size 600 300 \
--icon-size 100 \
@@ -80,7 +132,7 @@ jobs:
if: matrix.os == 'ubuntu-latest'
with:
ruby-version: '3.0'
- name: Create Debian Package
- name: 🐧 Create Debian Package
if: matrix.os == 'ubuntu-latest'
shell: bash
env:
@@ -92,7 +144,7 @@ jobs:
# Copy app files into expected output directory structure
mkdir -p package/opt package/usr/share/applications package/usr/share/icons/hicolor/128x128/apps
cp -r dist/Khoj package/opt/Khoj
cp src/interface/web/assets/icons/favicon-128x128.png package/usr/share/icons/hicolor/128x128/apps/Khoj.png
cp src/khoj/interface/web/assets/icons/favicon-128x128.png package/usr/share/icons/hicolor/128x128/apps/Khoj.png
cp Khoj.desktop package/usr/share/applications
# Fix permissions to be usable by non-root users
@@ -110,8 +162,9 @@ jobs:
name: khoj_${{github.ref_name}}_amd64.${{matrix.extension}}
path: dist/khoj_${{github.ref_name}}_amd64.${{matrix.extension}}
- name: Release
- name: 🌈 Release
uses: softprops/action-gh-release@v1
if: startsWith(github.ref, 'refs/tags/')
with:
generate_release_notes: true
files: dist/khoj_${{github.ref_name}}_amd64.${{matrix.extension}}

View File

@@ -5,43 +5,52 @@ on:
branches:
- 'master'
paths:
- src/**
- src/khoj/**
- tests/**
- config/**
- setup.py
- pyproject.toml
- .pre-commit-config.yml
- .github/workflows/test.yml
push:
branches:
- 'master'
paths:
- src/**
- src/khoj/**
- tests/**
- config/**
- setup.py
- pyproject.toml
- .pre-commit-config.yml
- .github/workflows/test.yml
jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python_version:
- '3.8'
- '3.9'
- '3.10'
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
python-version: ${{ matrix.python_version }}
- name: Install Dependencies
- name: ⏬️ Install Dependencies
run: |
sudo apt install libegl1 -y
sudo apt update && sudo apt install -y libegl1
python -m pip install --upgrade pip
pip install pytest
- name: Install Application
run: |
pip install --upgrade .
- name: ⬇️ Install Application
run: pip install --upgrade .[dev]
- name: Test Application
run: |
pytest
- name: 🌡️ Validate Application
run: pre-commit run --hook-stage manual --all
- name: 🧪 Test Application
run: pytest

52
.github/workflows/test_khoj_el.yml vendored Normal file
View File

@@ -0,0 +1,52 @@
name: test khoj.el
on:
push:
branches:
- 'master'
paths:
- src/interface/emacs/*.el
- src/interface/emacs/tests/*.el
- .github/workflows/test_khoj_el.yml
pull_request:
branches:
- 'master'
paths:
- src/interface/emacs/*.el
- src/interface/emacs/tests/*.el
- .github/workflows/test_khoj_el.yml
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
emacs_version:
- 27.1
- 27.2
- 28.1
- 28.2
- snapshot
steps:
- uses: purcell/setup-emacs@master
with:
version: ${{ matrix.emacs_version }}
- uses: actions/checkout@v3
- name: 🧪 Test Khoj.el
run: |
# Run ERT tests on khoj.el
emacs -batch \
--eval "(progn \
(require 'package) \
(push '(\"melpa\" . \"https://melpa.org/packages/\") package-archives) \
(package-initialize) \
(unless package-archive-contents (package-refresh-contents)) \
(unless (package-installed-p 'transient) (package-install 'transient)) \
(unless (package-installed-p 'dash) (package-install 'dash)) \
(unless (package-installed-p 'org) (package-install 'org)) \
)" \
-l ert \
-l ./src/interface/emacs/khoj.el \
-l ./src/interface/emacs/tests/khoj-tests.el \
-f ert-run-tests-batch-and-exit

36
.gitignore vendored
View File

@@ -1,16 +1,38 @@
# Khoj artifacts
*.gz
*.pt
tests/data/models
tests/data/embeddings
# External app artifacts
__pycache__
.DS_Store
.emacs.desktop*
*.py[cod]
tests/data/models
tests/data/embeddings
src/.data
/src/interface/web/images
.vscode
*.gz
*.pt
.env
.venv/*
# Build artifacts
/src/khoj/interface/web/images
/build/
/dist/
/khoj_assistant.egg-info/
khoj_assistant.egg-info
/config/khoj*.yml
.pytest_cache
khoj.log
# Obsidian plugin artifacts
# ---
# npm
node_modules
# Don't include the compiled obsidian main.js file in the repo.
# They should be uploaded to GitHub releases instead.
main.js
# Exclude sourcemaps
*.map
# obsidian
data.json

25
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,25 @@
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
# Exclude elisp files to not clear page breaks
exclude: \.el$
- id: check-json
- id: check-toml
- id: check-yaml
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.0.0
hooks:
- id: mypy
stages: [push, manual]
pass_filenames: false
args:
- --config-file=pyproject.toml

View File

@@ -1,18 +1,14 @@
# syntax=docker/dockerfile:1
FROM python:3.10-slim-bullseye
FROM ubuntu:kinetic
LABEL org.opencontainers.image.source https://github.com/debanjum/khoj
# Install System Dependencies
RUN apt-get update -y && \
apt-get -y install libimage-exiftool-perl
# Copy Application to Container
COPY . /app
WORKDIR /app
RUN apt update -y && \
apt -y install python3-pip python3-pyqt6
# Install Python Dependencies
RUN pip install --upgrade pip && \
pip install --upgrade .
pip install --upgrade --pre khoj-assistant
# Run the Application
# There are more arguments required for the application to run,

View File

@@ -4,4 +4,4 @@ Name=Khoj
Comment=A natural language search engine for your personal notes, transactions and images.
Path=/opt
Exec=/opt/Khoj
Icon=Khoj
Icon=Khoj

View File

@@ -5,7 +5,7 @@ from PyInstaller.utils.hooks import copy_metadata
import sysconfig
datas = [
('src/interface/web', 'src/interface/web'),
('src/khoj/interface/web', 'src/khoj/interface/web'),
(f'{sysconfig.get_paths()["purelib"]}/transformers', 'transformers')
]
datas += copy_metadata('tqdm')
@@ -19,7 +19,7 @@ datas += copy_metadata('tokenizers')
block_cipher = None
a = Analysis(
['src/main.py'],
['src/khoj/main.py'],
pathex=[],
binaries=[],
datas=datas,
@@ -50,7 +50,7 @@ pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
if system() != 'Darwin':
# Add Splash screen to show on app launch
splash = Splash(
'src/interface/web/assets/icons/favicon-144x144.png',
'src/khoj/interface/web/assets/icons/favicon-144x144.png',
binaries=a.binaries,
datas=a.datas,
text_pos=(10, 160),
@@ -82,7 +82,7 @@ if system() != 'Darwin':
target_arch='x86_64',
codesign_identity=None,
entitlements_file=None,
icon='src/interface/web/assets/icons/favicon-144x144.ico',
icon='src/khoj/interface/web/assets/icons/favicon-144x144.ico',
)
else:
exe = EXE(
@@ -105,11 +105,11 @@ else:
target_arch='x86_64',
codesign_identity=None,
entitlements_file=None,
icon='src/interface/web/assets/icons/favicon.icns',
icon='src/khoj/interface/web/assets/icons/favicon.icns',
)
app = BUNDLE(
exe,
name='Khoj.app',
icon='src/interface/web/assets/icons/favicon.icns',
icon='src/khoj/interface/web/assets/icons/favicon.icns',
bundle_identifier=None,
)

View File

@@ -619,4 +619,3 @@ Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS

View File

@@ -1,5 +0,0 @@
include Readme.md
graft src/interface/*
prune src/interface/web/images*
prune docs*
global-exclude .DS_Store *.py[cod]

493
README.md Normal file
View File

@@ -0,0 +1,493 @@
# Khoj 🦅
[![test](https://github.com/debanjum/khoj/actions/workflows/test.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/test.yml)
[![dockerize](https://github.com/debanjum/khoj/actions/workflows/dockerize.yml/badge.svg)](https://github.com/debanjum/khoj/pkgs/container/khoj)
[![pypi](https://github.com/debanjum/khoj/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/khoj-assistant/)
*A search assistant for your second brain*
**Supported Plugins**
[![Khoj on Obsidian](https://img.shields.io/badge/Obsidian-%23483699.svg?style=for-the-badge&logo=obsidian&logoColor=white)](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#readme)
[![Khoj on Emacs](https://img.shields.io/badge/Emacs-%237F5AB6.svg?&style=for-the-badge&logo=gnu-emacs&logoColor=white)](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#readme)
## Table of Contents
- [Features](#Features)
- [Demos](#Demos)
- [Khoj in Obsidian](#khoj-in-obsidian)
- [Khoj in Emacs, Browser](#khoj-in-emacs-browser)
- [Interfaces](#Interfaces)
- [Architecture](#Architecture)
- [Setup](#Setup)
- [Install](#1-Install)
- [Run](#2-Run)
- [Configure](#3-Configure)
- [Install Plugins](#4-install-interface-plugins)
- [Use](#Use)
- [Khoj Search](#Khoj-search)
- [Khoj Chat](#Khoj-chat)
- [Upgrade](#Upgrade)
- [Khoj Server](#upgrade-khoj-server)
- [Khoj.el](#upgrade-khoj-on-emacs)
- [Khoj Obsidian](#upgrade-khoj-on-obsidian)
- [Uninstall](#uninstall)
- [Troubleshoot](#Troubleshoot)
- [Advanced Usage](#advanced-usage)
- [Access Khoj on Mobile](#access-khoj-on-mobile)
- [Use OpenAI Models for Search](#use-openai-models-for-search)
- [Search across Different Languages](#search-across-different-languages)
- [Miscellaneous](#Miscellaneous)
- [Setup OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
- [GPT API](#gpt-api)
- [Performance](#Performance)
- [Query Performance](#Query-performance)
- [Indexing Performance](#Indexing-performance)
- [Miscellaneous](#Miscellaneous-1)
- [Development](#Development)
- [Visualize Codebase](#visualize-codebase)
- [Setup](#Setup)
- [Using Pip](#Using-Pip)
- [Using Docker](#Using-Docker)
- [Using Conda](#Using-Conda)
- [Validate](#Validate)
- [Credits](#Credits)
## Features
- **Search**
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
- **General**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Index your Org-mode and Markdown notes, Beancount transactions and Photos
- **Multiple Interfaces**: Interact from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
## Demos
### Khoj in Obsidian
https://user-images.githubusercontent.com/6413477/210486007-36ee3407-e6aa-4185-8a26-b0bfc0a4344f.mp4
<details><summary>Description</summary>
- Install Khoj via `pip` and start Khoj backend in non-gui mode
- Install Khoj plugin via Community Plugins settings pane on Obsidian app
- Check the new Khoj plugin settings
- Let Khoj backend index the markdown files in the current Vault
- Open Khoj plugin on Obsidian via Search button on Left Pane
- Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs](https://marcus.se.net/obsidian-plugin-docs/)
- Jump to the [search result](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin)
</details>
### Khoj in Emacs, Browser
https://user-images.githubusercontent.com/6413477/184735169-92c78bf1-d827-4663-9087-a1ea194b8f4b.mp4
<details><summary>Description</summary>
- Install Khoj via pip
- Start Khoj app
- Add this readme and [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs) as org-mode for Khoj to index
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#2-Install-Khojel)
</details>
<details><summary>Analysis</summary>
- The results do not have any words used in the query
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter
</details>
### Interfaces
![](https://github.com/debanjum/khoj/blob/master/docs/interfaces.png?)
## Architecture
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_architecture.png?)
## Setup
These are the general setup instructions for Khoj.
- Make sure [python](https://realpython.com/installing-python/) (version 3.10 or lower) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine
- Check the [Khoj.el Readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Setup) to setup Khoj with Emacs<br />
Its simpler as it can skip the server *install*, *run* and *configure* step below.
- Check the [Khoj Obsidian Readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#Setup) to setup Khoj with Obsidian<br />
Its simpler as it can skip the *configure* step below.
### 1. Install
- On Linux/MacOS
```shell
python -m pip install khoj-assistant
```
- On Windows
```shell
py -m pip install khoj-assistant
```
### 2. Run
```shell
khoj
```
Note: To start Khoj automatically in the background use [Task scheduler](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10) on Windows or [Cron](https://en.wikipedia.org/wiki/Cron) on Mac, Linux (e.g with `@reboot khoj`)
### 3. Configure
1. Enable content types and point to files to search in the First Run Screen that pops up on app start
2. Click `Configure` and wait. The app will download ML models and index the content for search
### 4. Install Interface Plugins
Khoj exposes a web interface by default.<br />
The optional steps below allow using Khoj from within an existing application like Obsidian or Emacs.
- **Khoj Obsidian**:<br />
[Install](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) the Khoj Obsidian plugin
- **Khoj Emacs**:<br />
[Install](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#2-Install-Khojel) khoj.el
## Use
### Khoj Search
- **Khoj via Obsidian**
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **Khoj via Emacs**
- Run `M-x khoj <user-query>`
- **Khoj via Web**
- Open <http://localhost:8000/> via desktop interface or directly
- **Khoj via API**
- See the Khoj FastAPI [Swagger Docs](http://localhost:8000/docs), [ReDocs](http://localhost:8000/redocs)
<details><summary>Query Filters</summary>
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
</details>
### Khoj Chat
#### Overview
- Creates a personal assistant for you to inquire and engage with your notes
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](#khoj-search)
- Supports multi-turn conversations with the relevant notes for context
- Shows reference notes used to generate a response
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
#### Setup
- [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
#### Use
1. Open [/chat](http://localhost:8000/chat)[^2]
2. Type your queries and see response by Khoj from your notes
#### Demo
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_chat_web_interface.png?)
### Details
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
## Upgrade
### Upgrade Khoj Server
```shell
pip install --upgrade khoj-assistant
```
*Note: To upgrade to the latest pre-release version of the khoj server run below command*
```shell
# Maps to the latest commit on the master branch
pip install --upgrade --pre khoj-assistant
```
### Upgrade Khoj on Emacs
- Use your Emacs Package Manager to Upgrade
- See [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade) for details
### Upgrade Khoj on Obsidian
- Upgrade via the Community plugins tab on the settings pane in the Obsidian app
- See the [khoj plugin readme](https://github.com/debanjum/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin) for details
## Uninstall
1. (Optional) Hit `Ctrl-C` in the terminal running the khoj server to stop it
2. Delete the khoj directory in your home folder (i.e `~/.khoj` on Linux, Mac or `C:\Users\<your-username>\.khoj` on Windows)
3. Uninstall the khoj server with `pip uninstall khoj-assistant`
4. (Optional) Uninstall khoj.el or the khoj obsidian plugin in the standard way on Emacs, Obsidian
## Troubleshoot
#### Install fails while building Tokenizer dependency
- **Details**: `pip install khoj-assistant` fails while building the `tokenizers` dependency. Complains about Rust.
- **Fix**: Install Rust to build the tokenizers package. For example on Mac run:
```shell
brew install rustup
rustup-init
source ~/.cargo/env
```
- **Refer**: [Issue with Fix](https://github.com/debanjum/khoj/issues/82#issuecomment-1241890946) for more details
#### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:8000/api/update?force=true)[^2] in browser to regenerate index from scratch
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
#### Khoj in Docker errors out with \"Killed\" in error message
- **Fix**: Increase RAM available to Docker Containers in Docker Settings
- **Refer**: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)
#### Khoj errors out complaining about Tensors mismatch or null
- **Mitigation**: Disable `image` search using the desktop GUI
## Advanced Usage
### Access Khoj on Mobile
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:8000` or `http://name-of-server:8000` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
5. Enjoy exploring your notes, transactions and images from your phone!
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_pwa_android.png?)
### Use OpenAI Models for Search
#### Setup
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: text-embedding-ada-002
+ encoder-type: src.khoj.utils.models.OpenAI
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index
### Search across Different Languages
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "~/.khoj/search/asymmetric/"
```
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:8000/api/update?t=force)
## Miscellaneous
### Set your OpenAI API key in Khoj
If you want, Khoj can be configured to use OpenAI for search and chat.<br />
Add your OpenAI API to Khoj by using either of the two options below:
- Open the Khoj desktop GUI, add your [OpenAI API key](https://beta.openai.com/account/api-keys) and click *Configure*
Ensure khoj is started **without** the `--no-gui` flag. Check your system tray to see if Khoj 🦅 is minimized there.
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
```diff
processor:
conversation:
- openai-api-key: # "YOUR_OPENAI_API_KEY"
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
```
**Warning**: *This will enable Khoj to send your query and note(s) to OpenAI for processing*
### GPT API
- The [chat](http://localhost:8000/api/chat), [answer](http://localhost:8000/api/beta/answer) and [search](http://localhost:8000/api/beta/search) API endpoints use [OpenAI API](https://openai.com/api/)
- They are disabled by default
- To use them:
1. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
2. Interact with them from the [Khoj Swagger docs](http://locahost:8000/docs)[^2]
## Performance
### Query performance
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
## Development
### Visualize Codebase
*[Interactive Visualization](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=debanjum%2Fkhoj)*
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_codebase_visualization_0.2.1.png?)
### Setup
#### Using Pip
##### 1. Install
```shell
# Get Khoj Code
git clone https://github.com/debanjum/khoj && cd khoj
# Create, Activate Virtual Environment
python3 -m venv .venv && source .venv/bin/activate
# Install Khoj for Development
pip install -e .[dev]
```
##### 2. Run
1. Start Khoj
```shell
khoj -vv
```
2. Configure Khoj
- **Via GUI**: Add files, directories to index in the GUI window that pops up on starting Khoj, then Click Configure
- **Manually**:
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type` and `processor` sub-section(s) irrelevant for your use-case
- Restart khoj
Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
#### Using Docker
##### 1. Clone
```shell
git clone https://github.com/debanjum/khoj && cd khoj
```
##### 2. Configure
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes and beancount directories
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
##### 3. Run
```shell
docker-compose up -d
```
*Note: The first run will take time. Let it run, it\'s mostly not hung, just generating embeddings*
##### 4. Upgrade
```shell
docker-compose build --pull
```
#### Using Conda
##### 1. Install Dependencies
- [Install Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
##### 2. Install Khoj
```shell
git clone https://github.com/debanjum/khoj && cd khoj
conda env create -f config/environment.yml
conda activate khoj
python3 -m pip install pyqt6 # As conda does not support pyqt6 yet
```
##### 3. Configure
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type`, `processor` sub-sections irrelevant for your use-case
##### 4. Run
```shell
python3 -m src.khoj.main -vv
```
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
##### 5. Upgrade
```shell
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
```
### Validate
#### Before Make Changes
1. Install Git Hooks for Validation
```shell
pre-commit install -t pre-push -t pre-commit
```
- This ensures standard code formatting fixes and other checks run automatically on every commit and push
- Note 1: If [pre-commit](https://pre-commit.com/#intro) didn't already get installed, [install it](https://pre-commit.com/#install) via `pip install pre-commit`
- Note 2: To run the pre-commit changes manually, use `pre-commit run --hook-stage manual --all` before creating PR
#### Before Creating PR
1. Run Tests
```shell
pytest
```
2. Run MyPy to check types
```shell
mypy --config-file pyproject.toml
```
#### After Creating PR
- Automated [validation workflows](.github/workflows) run for every PR.
Ensure any issues seen by them our fixed
- Test the python packge created for a PR
1. Download and extract the zipped `.whl` artifact generated from the pypi workflow run for the PR.
2. Install (in your virtualenv) with `pip install /path/to/download*.whl>`
3. Start and use the application to see if it works fine
## Credits
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
[^2]: Default Khoj url @ http://localhost:8000

257
Readme.md
View File

@@ -1,257 +0,0 @@
# Khoj 🦅
[![build](https://github.com/debanjum/khoj/actions/workflows/build.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/build.yml)
[![test](https://github.com/debanjum/khoj/actions/workflows/test.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/test.yml)
[![publish](https://github.com/debanjum/khoj/actions/workflows/publish.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/publish.yml)
[![release](https://github.com/debanjum/khoj/actions/workflows/release.yml/badge.svg)](https://github.com/debanjum/khoj/actions/workflows/release.yml)
*A natural language search engine for your personal notes, transactions and images*
## Table of Contents
- [Features](#Features)
- [Demo](#Demo)
- [Description](#Description)
- [Analysis](#Analysis)
- [Interfaces](#Interfaces)
- [Architecture](#Architecture)
- [Setup](#Setup)
- [Install](#1-Install)
- [Configure](#2-Configure)
- [Run](#3-Run)
- [Use](#Use)
- [Upgrade](#Upgrade)
- [Troubleshoot](#Troubleshoot)
- [Miscellaneous](#Miscellaneous)
- [Performance](#Performance)
- [Query Performance](#Query-performance)
- [Indexing Performance](#Indexing-performance)
- [Miscellaneous](#Miscellaneous-1)
- [Development](#Development)
- [Setup](#Setup)
- [Using Pip](#Using-Pip)
- [Using Docker](#Using-Docker)
- [Using Conda](#Test)
- [Test](#Test)
- [Credits](#Credits)
## Features
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search, indexing is done on your machine[\*](https://github.com/debanjum/khoj#miscellaneous)
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Search your Org-mode and Markdown notes, Beancount transactions and Photos
- **Multiple Interfaces**: Search using a [Web Browser](./src/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or the [API](http://localhost:8000/docs)
## Demo
https://user-images.githubusercontent.com/6413477/184735169-92c78bf1-d827-4663-9087-a1ea194b8f4b.mp4
### Description
- Install Khoj via pip
- Start Khoj app
- Add this readme and [khoj.el readme](https://github.com/debanjum/khoj/tree/master/src/interface/emacs) as org-mode for Khoj to index
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation)
### Analysis
- The results do not have any words used in the query
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter
### Interfaces
![](https://github.com/debanjum/khoj/blob/master/docs/interfaces.png)
## Architecture
![](https://github.com/debanjum/khoj/blob/master/docs/khoj_architecture.png)
## Setup
### 1. Install
```shell
pip install khoj-assistant
```
### 2. Start App
```shell
khoj
```
### 3. Configure
1. Enable content types and point to files to search in the First Run Screen that pops up on app start
2. Click configure and wait. The app will load ML model, generates embeddings and expose the search API
## Use
- **Khoj via Web**
- Open <http://localhost:8000/> via desktop interface or directly
- **Khoj via Emacs**
- [Install](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation) [khoj.el](./src/interface/emacs/khoj.el)
- Run `M-x khoj <user-query>`
- **Khoj via API**
- See the Khoj FastAPI [Swagger Docs](http://localhost:8000/docs), [ReDocs](http://localhost:8000/redocs)
## Upgrade
```shell
pip install --upgrade khoj-assistant
```
## Troubleshoot
- Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Disable `image` search on the desktop GUI
- Symptom: Errors out with \"Killed\" in error message in Docker
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)
## Miscellaneous
- The beta [chat](http://localhost:8000/beta/chat) and [search](http://localhost:8000/beta/search) API endpoints use [OpenAI API](https://openai.com/api/)
- It is disabled by default
- To use it add your `openai-api-key` via the app configure screen
- Warning: *If you use the above beta APIs, your query and top result(s) will be sent to OpenAI for processing*
## Performance
### Query performance
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes 6 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Once <https://github.com/debanjum/khoj/issues/36> is implemented, it should only take this long on first run
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
## Development
### Setup
#### Using Pip
##### 1. Install
```shell
git clone https://github.com/debanjum/khoj && cd khoj
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
```
##### 2. Configure
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type` and `processor` sub-section(s) irrelevant for your use-case
##### 3. Run
```shell
khoj -vv
```
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
##### 4. Upgrade
```shell
# To Upgrade To Latest Stable Release
# Maps to the latest tagged version of khoj on master branch
pip install --upgrade khoj-assistant
# To Upgrade To Latest Pre-Release
# Maps to the latest commit on the master branch
pip install --upgrade --pre khoj-assistant
# To Upgrade To Specific Development Release.
# Useful to test, review a PR.
# Note: khoj-assistant is published to test PyPi on creating a PR
pip install -i https://test.pypi.org/simple/ khoj-assistant==0.1.5.dev57166025766
```
#### Using Docker
##### 1. Clone
```shell
git clone https://github.com/debanjum/khoj && cd khoj
```
##### 2. Configure
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes and beancount directories
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
##### 3. Run
```shell
docker-compose up -d
```
*Note: The first run will take time. Let it run, it\'s mostly not hung, just generating embeddings*
##### 4. Upgrade
```shell
docker-compose build --pull
```
#### Using Conda
##### 1. Install Dependencies
- [Install Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) \[Required\]
- Install Exiftool \[Optional\]
``` shell
sudo apt -y install libimage-exiftool-perl
```
##### 2. Install Khoj
```shell
git clone https://github.com/debanjum/khoj && cd khoj
conda env create -f config/environment.yml
conda activate khoj
```
##### 3. Configure
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type`, `processor` sub-sections irrelevant for your use-case
##### 4. Run
```shell
python3 -m src.main -vv
```
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
##### 5. Upgrade
```shell
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
```
### Test
```shell
pytest
```
## Credits
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
- Sven Marnach for [PyExifTool](https://github.com/smarnach/pyexiftool/blob/master/exiftool.py)

View File

@@ -4,18 +4,19 @@ channels:
dependencies:
- python=3.8.*
- numpy=1.22.4
- pytorch=1.11.0
- transformers=4.19.4
- pytorch=1.13.1
- torchvision=0.14.1
- transformers=4.21.0
- sentence-transformers=2.1.0
- fastapi=0.77.1
- uvicorn=0.17.6
- pyyaml=6.0
- pytest=7.1.2
- pillow=8.4.0
- torchvision=0.12.0
- pillow=9.3.0
- openai=0.20.0
- pydantic=1.9.1
- jinja2=3.1.2
- aiofiles=0.8.0
- huggingface_hub=0.8.1
- dateparser=1.1.1
- dateparser=1.1.1
- schedule=1.1.0

View File

@@ -4,27 +4,28 @@ content-type:
# If changing, the docker-compose volumes should also be changed to match.
org:
input-files: null
input-filter: "/data/org/*.org"
input-filter: ["/data/org/**/*.org"]
compressed-jsonl: "/data/embeddings/notes.jsonl.gz"
embeddings-file: "/data/embeddings/note_embeddings.pt"
index_heading_entries: false
markdown:
input-files: null
input-filter: "/data/markdown/*.md"
input-filter: ["/data/markdown/**/*.md"]
compressed-jsonl: "/data/embeddings/markdown.jsonl.gz"
embeddings-file: "/data/embeddings/markdown_embeddings.pt"
ledger:
input-files: null
input-filter: /data/ledger/*.beancount
input-filter: ["/data/ledger/**/*.beancount"]
compressed-jsonl: /data/embeddings/transactions.jsonl.gz
embeddings-file: /data/embeddings/transaction_embeddings.pt
# image:
# input-directories: ["/data/images/"]
# embeddings-file: "/data/embeddings/image_embeddings.pt"
# batch-size: 50
# use-xmp-metadata: true
image:
input-directories: ["/data/images/"]
embeddings-file: "/data/embeddings/image_embeddings.pt"
batch-size: 50
use-xmp-metadata: false
music:
input-files: ["/data/music/music.org"]
@@ -50,4 +51,5 @@ search-type:
processor:
#conversation:
# openai-api-key: null
# conversation-logfile: "/data/embeddings/conversation_logs.json"
# model: "text-davinci-003"
# conversation-logfile: "/data/embeddings/conversation_logs.json"

View File

@@ -1,32 +1,33 @@
content-type:
org:
input-files: # ["/path/to/org-file.org"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # /path/to/org/*.org REQUIRED IF input-files IS NOT SET
input-filter: # ["/path/to/org/*.org"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/org/org.jsonl.gz"
embeddings-file: "~/.khoj/content/org/org_embeddings.pt"
index_heading_entries: false # Set to true to index entries with empty body
markdown:
input-files: # ["/path/to/markdown-file.md"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # "/path/to/markdown/*.md" REQUIRED IF input-files IS NOT SET
input-filter: # ["/path/to/markdown/*.md"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/markdown/markdown.jsonl.gz"
embeddings-file: "~/.khoj/content/markdown/markdown_embeddings.pt"
ledger:
input-files: # ["/path/to/ledger-file.beancount"] REQUIRED IF input-filter is not set OR
input-filter: # /path/to/ledger/*.beancount REQUIRED IF input-files is not set
input-filter: # ["/path/to/ledger/*.beancount"] REQUIRED IF input-files is not set
compressed-jsonl: "~/.khoj/content/ledger/ledger.jsonl.gz"
embeddings-file: "~/.khoj/content/ledger/ledger_embeddings.pt"
image:
input-directories: # ["/path/to/images/"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # /path/to/images/*.jpg REQUIRED IF input-directories IS NOT SET
input-directories: # ["/path/to/images/"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/images/*.jpg"] REQUIRED IF input-directories IS NOT SET
embeddings-file: "~/.khoj/content/image/image_embeddings.pt"
batch-size: 50
use-xmp-metadata: false
music:
input-files: # ["/path/to/music-file.org"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # /path/to/music/*.org REQUIRED IF input-files IS NOT SET
input-files: # ["/path/to/music-file.org"] REQUIRED IF input-filter IS NOT SET OR
input-filter: # ["/path/to/music/*.org"] REQUIRED IF input-files IS NOT SET
compressed-jsonl: "~/.khoj/content/music/music.jsonl.gz"
embeddings-file: "~/.khoj/content/music/music_embeddings.pt"
@@ -34,18 +35,22 @@ search-type:
symmetric:
encoder: "sentence-transformers/all-MiniLM-L6-v2"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/symmetric/"
asymmetric:
encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/asymmetric/"
image:
encoder: "sentence-transformers/clip-ViT-B-32"
encoder-type: sentence_transformers.SentenceTransformer
model_directory: "~/.khoj/search/image/"
processor:
conversation:
openai-api-key: # "YOUR_OPENAI_API_KEY"
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"

View File

@@ -4,14 +4,14 @@ services:
image: ghcr.io/debanjum/khoj:latest
ports:
# If changing the local port (left hand side), no other changes required.
# If changing the remote port (right hand side),
# change the port in the args in the build section,
# If changing the remote port (right hand side),
# change the port in the args in the build section,
# as well as the port in the command section to match
- "8000:8000"
working_dir: /app
volumes:
- .:/app
# These mounted volumes hold the raw data that should be indexed for search.
# These mounted volumes hold the raw data that should be indexed for search.
# The path in your local directory (left hand side)
# points to the files you want to index.
# The path of the mounted directory (right hand side),
@@ -26,4 +26,4 @@ services:
- ./tests/data/embeddings/:/data/embeddings/
- ./tests/data/models/:/data/models/
# Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/
command: --host="0.0.0.0" --port=8000 -c=config/khoj_docker.yml -vv
command: --no-gui --host="0.0.0.0" --port=8000 -c=config/khoj_docker.yml -vv

Binary file not shown.

Before

Width:  |  Height:  |  Size: 606 KiB

After

Width:  |  Height:  |  Size: 979 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 302 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 544 KiB

BIN
docs/khoj_emacs_menu.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
docs/khoj_on_emacs.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

BIN
docs/khoj_pwa_android.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

10
manifest.json Normal file
View File

@@ -0,0 +1,10 @@
{
"id": "khoj",
"name": "Khoj",
"version": "0.6.2",
"minAppVersion": "0.15.0",
"description": "A Search Assistant for your Second Brain 🦅",
"author": "Debanjum Singh Solanky",
"authorUrl": "https://github.com/debanjum",
"isDesktopOnly": false
}

107
pyproject.toml Normal file
View File

@@ -0,0 +1,107 @@
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
[project]
name = "khoj-assistant"
description = "A natural language search engine for your personal notes, transactions and images"
readme = "README.md"
license = "GPL-3.0-or-later"
requires-python = ">=3.8, <3.11"
authors = [
{ name = "Debanjum Singh Solanky, Saba Imran" },
]
keywords = [
"search",
"semantic-search",
"productivity",
"NLP",
"AI",
"org-mode",
"markdown",
"beancount",
"images",
]
classifiers = [
"Development Status :: 4 - Beta",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Internet :: WWW/HTTP :: Indexing/Search",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Scientific/Engineering :: Human Machine Interfaces",
"Topic :: Text Processing :: Linguistic",
]
dependencies = [
"dateparser == 1.1.1",
"defusedxml == 0.7.1",
"fastapi == 0.77.1",
"jinja2 == 3.1.2",
"openai >= 0.27.0",
"tiktoken >= 0.3.0",
"tenacity >= 8.2.2",
"pillow == 9.3.0",
"pydantic == 1.9.1",
"pyqt6 == 6.3.1",
"pyyaml == 6.0",
"rich >= 13.3.1",
"schedule == 1.1.0",
"sentence-transformers == 2.2.2",
"torch == 1.13.1",
"uvicorn == 0.17.6",
"aiohttp == 3.8.4",
]
dynamic = ["version"]
[project.urls]
Homepage = "https://github.com/debanjum/khoj#readme"
Issues = "https://github.com/debanjum/khoj/issues"
Discussions = "https://github.com/debanjum/khoj/discussions"
Releases = "https://github.com/debanjum/khoj/releases"
[project.scripts]
khoj = "khoj.main:run"
[project.optional-dependencies]
test = [
"pytest >= 7.1.2",
]
dev = [
"khoj-assistant[test]",
"mypy >= 1.0.1",
"black >= 23.1.0",
"pre-commit >= 3.0.4",
"freezegun >= 1.2.0",
]
[tool.hatch.version]
source = "vcs"
raw-options.local_scheme = "no-local-version" # PEP440 compliant version for PyPi
[tool.hatch.build.targets.sdist]
include = ["src/khoj"]
[tool.hatch.build.targets.wheel]
packages = ["src/khoj"]
[tool.mypy]
files = "src/khoj"
pretty = true
strict_optional = false
install_types = true
ignore_missing_imports = true
non_interactive = true
show_error_codes = true
warn_unused_ignores = false
[tool.black]
line-length = 120
[tool.pytest.ini_options]
addopts = "--strict-markers"
markers = [
"chatquality: Evaluate chatbot capabilities and quality",
]

82
scripts/bump_version.sh Executable file
View File

@@ -0,0 +1,82 @@
#!/bin/zsh
project_root=$PWD
while getopts 'nc:' opt;
do
case "${opt}" in
c)
# Get current project version
current_version=$OPTARG
# Bump Obsidian plugin to current version
cd $project_root/src/interface/obsidian
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$current_version\",/" package.json
sed -E -i.bak "s/version\": \"(.*)\"/version\": \"$current_version\"/" manifest.json
cp $project_root/versions.json .
npm run version # append current version
rm *.bak
# Bump Emacs package to current version
cd ../emacs
sed -E -i.bak "s/^;; Version: (.*)/;; Version: $current_version/" khoj.el
git add khoj.el
rm *.bak
# Copy current obsidian versioned files to project root
cd $project_root
cp src/interface/obsidian/versions.json .
cp src/interface/obsidian/manifest.json .
# Run pre-commit validation to fix jsons
pre-commit run --hook-stage manual --all
# Commit changes and tag commit for release
git add \
$project_root/src/interface/obsidian/package.json \
$project_root/src/interface/obsidian/manifest.json \
$project_root/src/interface/obsidian/versions.json \
$project_root/src/interface/emacs/khoj.el \
$project_root/manifest.json \
$project_root/versions.json
git commit -m "Release Khoj version $current_version"
git tag $current_version master
;;
n)
# Induce hatch to compute next version number
# remove .dev[commits-since-tag] version suffix from hatch computed version number
next_version=$(touch bump.txt && git add bump.txt && hatch version | sed 's/\.dev.*//g')
git rm --cached -- bump.txt && rm bump.txt
# Bump Obsidian plugins to next version
cd $project_root/src/interface/obsidian
sed -E -i.bak "s/version\": \"(.*)\",/version\": \"$next_version\",/" package.json
sed -E -i.bak "s/version\": \"(.*)\"/version\": \"$next_version\"/" manifest.json
npm run version # updates versions.json
rm *.bak
# Bump Emacs package to next version
cd $project_root/src/interface/emacs
sed -E -i.bak "s/^;; Version: (.*)/;; Version: $next_version/" khoj.el
rm *.bak
# Run pre-commit validations to fix jsons
pre-commit run --hook-stage manual --all
# Commit changes
git add \
$project_root/src/interface/obsidian/package.json \
$project_root/src/interface/obsidian/manifest.json \
$project_root/src/interface/obsidian/versions.json \
$project_root/src/interface/emacs/khoj.el
git commit -m "Bump Khoj to pre-release version $next_version"
;;
?)
echo -e "Invalid command option.\nUsage: $(basename $0) [-c] [-n]"
exit 1
;;
esac
done
# Restore State
cd $project_root

View File

@@ -1,55 +0,0 @@
#!/usr/bin/env python
from setuptools import find_packages, setup
from pathlib import Path
this_directory = Path(__file__).parent
setup(
name='khoj-assistant',
version='0.1.7',
description="A natural language search engine for your personal notes, transactions and images",
long_description=(this_directory / "Readme.md").read_text(encoding="utf-8"),
long_description_content_type="text/markdown",
author='Debanjum Singh Solanky, Saba Imran',
author_email='debanjum+pypi@gmail.com, narmiabas@gmail.com',
url='https://github.com/debanjum/khoj',
license="GPLv3",
keywords="search semantic-search productivity NLP org-mode markdown beancount images",
python_requires=">=3.8, <4",
packages=find_packages(
where=".",
exclude=["tests*"],
include=["src*"]
),
install_requires=[
"numpy == 1.22.4",
"torch == 1.12.1",
"torchvision == 0.13.1",
"transformers == 4.21.0",
"sentence-transformers == 2.1.0",
"openai == 0.20.0",
"huggingface_hub == 0.8.1",
"pydantic == 1.9.1",
"fastapi == 0.77.1",
"uvicorn == 0.17.6",
"jinja2 == 3.1.2",
"pyyaml == 6.0",
"pytest == 7.1.2",
"pillow == 9.2.0",
"aiofiles == 0.8.0",
"dateparser == 1.1.1",
"pyqt6 == 6.3.1",
],
include_package_data=True,
entry_points={"console_scripts": ["khoj = src.main:run"]},
classifiers=[
"Development Status :: 4 - Beta",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
]
)

View File

@@ -1,132 +0,0 @@
# System Packages
import sys
import logging
# External Packages
import json
# Internal Packages
from src.processor.ledger.beancount_to_jsonl import beancount_to_jsonl
from src.processor.markdown.markdown_to_jsonl import markdown_to_jsonl
from src.processor.org_mode.org_to_jsonl import org_to_jsonl
from src.search_type import image_search, text_search
from src.utils.config import SearchType, SearchModels, ProcessorConfigModel, ConversationProcessorConfigModel
from src.utils import state
from src.utils.helpers import resolve_absolute_path
from src.utils.rawconfig import FullConfig, ProcessorConfig
from src.search_filter.date_filter import DateFilter
from src.search_filter.word_filter import WordFilter
from src.search_filter.file_filter import FileFilter
logger = logging.getLogger(__name__)
def configure_server(args, required=False):
if args.config is None:
if required:
print('Exiting as Khoj is not configured. Configure the application to use it.')
sys.exit(1)
else:
return
else:
state.config = args.config
# Initialize the search model from Config
state.model = configure_search(state.model, state.config, args.regenerate)
# Initialize Processor from Config
state.processor_config = configure_processor(args.config.processor)
def configure_search(model: SearchModels, config: FullConfig, regenerate: bool, t: SearchType = None):
# Initialize Org Notes Search
if (t == SearchType.Org or t == None) and config.content_type.org:
# Extract Entries, Generate Notes Embeddings
model.orgmode_search = text_search.setup(
org_to_jsonl,
config.content_type.org,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[
DateFilter(),
WordFilter(config.content_type.org.compressed_jsonl.parent, SearchType.Org),
FileFilter(),
])
# Initialize Org Music Search
if (t == SearchType.Music or t == None) and config.content_type.music:
# Extract Entries, Generate Music Embeddings
model.music_search = text_search.setup(
org_to_jsonl,
config.content_type.music,
search_config=config.search_type.asymmetric,
regenerate=regenerate)
# Initialize Markdown Search
if (t == SearchType.Markdown or t == None) and config.content_type.markdown:
# Extract Entries, Generate Markdown Embeddings
model.markdown_search = text_search.setup(
markdown_to_jsonl,
config.content_type.markdown,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[
DateFilter(),
WordFilter(config.content_type.markdown.compressed_jsonl.parent, SearchType.Markdown),
FileFilter(),
])
# Initialize Ledger Search
if (t == SearchType.Ledger or t == None) and config.content_type.ledger:
# Extract Entries, Generate Ledger Embeddings
model.ledger_search = text_search.setup(
beancount_to_jsonl,
config.content_type.ledger,
search_config=config.search_type.symmetric,
regenerate=regenerate,
filters=[
DateFilter(),
WordFilter(config.content_type.ledger.compressed_jsonl.parent, SearchType.Ledger),
FileFilter(),
])
# Initialize Image Search
if (t == SearchType.Image or t == None) and config.content_type.image:
# Extract Entries, Generate Image Embeddings
model.image_search = image_search.setup(
config.content_type.image,
search_config=config.search_type.image,
regenerate=regenerate)
return model
def configure_processor(processor_config: ProcessorConfig):
if not processor_config:
return
processor = ProcessorConfigModel()
# Initialize Conversation Processor
if processor_config.conversation:
processor.conversation = configure_conversation_processor(processor_config.conversation)
return processor
def configure_conversation_processor(conversation_processor_config):
conversation_processor = ConversationProcessorConfigModel(conversation_processor_config)
conversation_logfile = resolve_absolute_path(conversation_processor.conversation_logfile)
if conversation_logfile.is_file():
# Load Metadata Logs from Conversation Logfile
with conversation_logfile.open('r') as f:
conversation_processor.meta_log = json.load(f)
logger.info('Conversation logs loaded from disk.')
else:
# Initialize Conversation Logs
conversation_processor.meta_log = {}
conversation_processor.chat_session = ""
return conversation_processor

View File

@@ -1,54 +1,165 @@
* Emacs Khoj
/An Emacs interface for [[https://github.com/debanjum/khoj][Khoj]]/
* Khoj Emacs 🦅
[[https://stable.melpa.org/#/khoj][file:https://stable.melpa.org/packages/khoj-badge.svg]] [[https://melpa.org/#/khoj][file:https://melpa.org/packages/khoj-badge.svg]] [[https://github.com/debanjum/khoj/actions/workflows/build_khoj_el.yml][https://github.com/debanjum/khoj/actions/workflows/build_khoj_el.yml/badge.svg?]] [[https://github.com/debanjum/khoj/actions/workflows/test_khoj_el.yml][https://github.com/debanjum/khoj/actions/workflows/test_khoj_el.yml/badge.svg?]]
** Requirements
- Install and Run [[https://github.com/debanjum/khoj][Khoj]]
/A search assistant for your second brain/
** Installation
- Direct Install
- Put ~khoj.el~ in your Emacs load path. For e.g ~/.emacs.d/lisp
** Table of Contents
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#features][Features]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Interface][Interface]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Setup][Setup]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Direct-Install][Direct Install]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Minimal-Install][Minimal Install]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Standard-Install][Standard Install]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#With-Straight.el][With Straight.el]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Use][Use]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Search][Search]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Chat][Chat]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Find-similar-entries][Find Similar Entries]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Advanced-usage][Advanced Usage]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Khoj-menu][Khoj Menu]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade][Upgrade]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade-Khoj-Backend][Upgrade Backend]]
- [[https://github.com/debanjum/khoj/tree/master/src/interface/emacs#Upgrade-Khojel][Upgrade Khoj.el]]
- Load via ~use-package~ in your ~/.emacs.d/init.el or .emacs file by adding below snippet
#+begin_src elisp
;; Khoj Package
(use-package khoj
:load-path "~/.emacs.d/lisp/khoj.el"
:bind ("C-c s" . 'khoj))
#+end_src
** Features
- *Search*
- *Natural*: Advanced natural language understanding using Transformer based ML Models
- *Local*: Your personal data stays local. All search, indexing is done on your machine*
- *Incremental*: Incremental search for a fast, search-as-you-type experience
- *Chat*
- *Faster answers*: Find answers faster than search
- *Iterative discovery*: Iteratively explore and (re-)discover your notes
- *Assisted creativity*: Smoothly weave across answer retrieval and content generation
- With [[https://github.com/raxod502/straight.el][straight.el]]
- Add below snippet to your ~/.emacs.d/init.el or .emacs config file and execute it.
#+begin_src elisp
;; Khoj Package for Semantic Search
(use-package khoj
:after org
:straight (khoj :type git :host github :repo "debanjum/khoj" :files (:defaults "src/interface/emacs/khoj.el"))
:bind ("C-c s" . 'khoj))
#+end_src
** Interface
*** Search UI
[[/docs/khoj_on_emacs.png]]
- With [[https://github.com/quelpa/quelpa#installation][Quelpa]]
- Ensure [[https://github.com/quelpa/quelpa#installation][Quelpa]], [[https://github.com/quelpa/quelpa-use-package#installation][quelpa-use-package]] are installed
- Add below snippet to your ~/.emacs.d/init.el or .emacs config file and execute it.
#+begin_src elisp
;; Khoj Package
(use-package khoj
:after org
:quelpa (khoj :fetcher url :url "https://raw.githubusercontent.com/debanjum/khoj/master/src/interface/emacs/khoj.el")
:bind ("C-c s" . 'khoj))
#+end_src
*** Chat UI
[[/docs/khoj_chat_on_emacs_0.5.0.png]]
** Usage
1. Open Query Interface on Client
** Setup
- /Make sure [[https://realpython.com/installing-python/][python]] (version 3.10 or lower) and [[https://pip.pypa.io/en/stable/installation/][pip]] are installed on your machine/
- In Emacs: Call ~khoj~ using keybinding ~C-c s~ or ~M-x khoj~
- On Web: Open http://localhost:8000/
- /khoj.el attempts to automatically install, start and configure the khoj server./
If this fails, follow [[https://github.com/debanjum/khoj/tree/master/#Setup][these instructions]] to manually setup the khoj server.
2. Query Incrementally in Natural Language
*** Direct Install
#+begin_src elisp
M-x package-install khoj
#+end_src
e.g "What is the meaning of life?" "What are my life goals?"
*** Minimal Install
Add below snippet to your Emacs config file.
Indexes your org-agenda files, by default.
3. Apply filters to narrow down results further
#+begin_src elisp
;; Install Khoj Package from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj)
#+end_src
Include/Exclude specific words or date range from results by updating query with below query format
- Note: Install ~khoj.el~ from MELPA (instead of MELPA Stable) if you installed the pre-release version of khoj
- That is, use ~:pin melpa~ to install khoj.el in above snippet if khoj server was installed with ~--pre~ flag, i.e ~pip install --pre khoj-assistant~
- Else use ~:pin melpa-stable~ to install khoj.el in above snippet if khoj was installed with ~pip install khoj-assistant~
- This ensures both khoj.el and khoj app are from the same version (git tagged or latest)
e.g `What is the meaning of life? -god +none dt:"last week"`
*** Standard Install
Add below snippet to your Emacs config file.
Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Chat
#+begin_src elisp
;; Install Khoj Package from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj)
:config (setq khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")
khoj-openai-api-key "YOUR_OPENAI_API_KEY")) ; required to enable chat
#+end_src
*** With [[https://github.com/raxod502/straight.el][Straight.el]]
Add below snippet to your Emacs config file.
Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Chat
#+begin_src elisp
;; Install Khoj Package using Straight.el
(use-package khoj
:after org
:straight (khoj :type git :host github :repo "debanjum/khoj" :files (:defaults "src/interface/emacs/khoj.el"))
:bind ("C-c s" . 'khoj)
:config (setq khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")
khoj-openai-api-key "YOUR_OPENAI_API_KEY" ; required to enable chat)
#+end_src
** Use
*** Search
1. Hit ~C-c s s~ (or ~M-x khoj RET s~) to open khoj search
2. Enter your query in natural language
e.g "What is the meaning of life?", "My life goals for 2023"
*** Chat
1. Hit ~C-c s c~ (or ~M-x khoj RET c~) to open khoj chat
2. Ask questions in a natural, conversational style
E.g "When did I file my taxes last year?"
See [[https://github.com/debanjum/khoj/tree/master/#Khoj-Chat][Khoj Chat]] for more details
*** Find Similar Entries
This feature finds entries similar to the one you are currently on.
1. Move cursor to the org-mode entry, markdown section or text paragraph you want to find similar entries for
2. Hit ~C-c s f~ (or ~M-x khoj RET f~) to find similar entries
*** Advanced Usage
- Add [[https://github.com/debanjum/khoj/#query-filters][query filters]] during search to narrow down results further
e.g `What is the meaning of life? -"god" +"none" dt>"last week"`
- Use ~C-c C-o 2~ to open the current result at cursor in its source org file
- This calls ~M-x org-open-at-point~ on the current entry and opens the second link in the entry.
- The second link is the entries [[https://orgmode.org/manual/Handling-Links.html#FOOT28][org-id]], if set, or the heading text.
The first link is the line number of the entry in the source file. This link is less robust to file changes.
- Note: If you have [[https://orgmode.org/manual/Speed-Keys.html][speed keys]] enabled, ~o 2~ will also work
*** Khoj Menu
[[/docs/khoj_emacs_menu.png]]
Hit ~C-c s~ (or ~M-x khoj~) to open the khoj menu above. Then:
- Hit ~t~ until you preferred content type is selected in the khoj menu
~Content Type~ specifies the content to perform ~Search~, ~Update~ or ~Find Similar~ actions on
- Hit ~n~ twice and then enter number of results you want to see
~Results Count~ is used by the ~Search~ and ~Find Similar~ actions
- Hit ~-f u~ to ~force~ update the khoj content index
The ~Force Update~ switch is only used by the ~Update~ action
** Upgrade
*** Upgrade Khoj Backend
#+begin_src shell
pip install --upgrade khoj-assistant
#+end_src
*** Upgrade Khoj.el
Use your Emacs package manager to upgrade ~khoj.el~
- For ~khoj.el~ from MELPA
- Method 1
- Run ~M-x package-list-packages~ to list all packages
- Press ~U~ on ~khoj~ to mark it for upgrade
- Press ~x~ to execute the marked actions
- Method 2
- Run ~M-x package-refresh-content~
- Run ~M-x package-reinstall khoj~
- For ~khoj.el~ from Straight
- Run ~M-x straight-pull-package khoj~

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,252 @@
;;; khoj-tests.el --- Test suite for khoj.el -*- lexical-binding: t -*-
;; Copyright (C) 2023 Debanjum Singh Solanky
;; Author: Debanjum Singh Solanky <debanjum@gmail.com>
;; Version: 0.0.0
;; Package-Requires: ((emacs "27.1") (transient "0.3.0") (dash "2.19.1") (org "9.0.0"))
;; URL: https://github.com/debanjum/khoj/tree/master/src/interface/emacs
;;; License:
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License
;; as published by the Free Software Foundation; either version 3
;; of the License, or (at your option) any later version.
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with this program. If not, see <http://www.gnu.org/licenses/>.
;;; Commentary:
;; This file contains the test suite for khoj.el.
;;; Code:
(require 'dash)
(require 'ert)
(require 'khoj)
(require 'org)
;; ----------------------------------------------------
;; Test Extract and Render Entries of each Content Type
;; ----------------------------------------------------
(ert-deftest khoj-tests--extract-entries-as-markdown ()
"Test `json-response', `query' from API formatted as markdown."
(let ((user-query "Become God")
(json-response-from-khoj-backend
(json-read-from-string
"[\
{\
\"entry\": \"## Upgrade\\n\\n Penance to Immortality\",\
\"score\": \"0.376\",\
\"additional\": {\
\"file\": \"/home/ravan/upgrade.md\",\
\"compiled\": \"## Upgrade Penance to Immortality\"\
}\
},\
{\
\"entry\": \"## Act\\n\\n Rule everything\",\
\"score\": \"0.153\",\
\"additional\": {\
\"file\": \"/home/ravan/act.md\",\
\"compiled\": \"## Act Rule everything\"\
}\
}]\
")))
(should
(equal
(khoj--extract-entries-as-markdown json-response-from-khoj-backend user-query)
"\
# Become God\n\
## Upgrade\n\
\n\
Penance to Immortality\n\n\
## Act\n\
\n\
Rule everything\n\n"))))
(ert-deftest khoj-tests--extract-entries-as-org ()
"Test `json-response', `query' from API formatted as org."
(let ((user-query "Become God")
(json-response-from-khoj-backend
(json-read-from-string
"[\
{\
\"entry\": \"** Upgrade\\n\\n Penance to Immortality\\n\",\
\"score\": \"0.42\",\
\"additional\": {\
\"file\": \"/home/ravan/upgrade.md\",\
\"compiled\": \"** Upgrade Penance to Immortality\"\
}\
},\
{\
\"entry\": \"** Act\\n\\n Rule everything\\n\",\
\"score\": \"0.42\",\
\"additional\": {\
\"file\": \"/home/ravan/act.md\",\
\"compiled\": \"** Act Rule everything\"\
}\
}]\
")))
(should
(equal
(khoj--extract-entries-as-org json-response-from-khoj-backend user-query)
"\
* Become God\n\
** Upgrade\n\
\n\
Penance to Immortality\n\
** Act\n\
\n\
Rule everything\n\
\n"))))
(ert-deftest khoj-tests--extract-entries-as-ledger ()
"Test `json-response', `query' from API formatted as beancount ledger."
(let ((user-query "Become God")
(json-response-from-khoj-backend
(json-read-from-string
"[\
{\
\"entry\": \"4242-04-01 * \\\"Penance Center\\\" \\\"Book Stay for 10,000 Years\\\"\\n Expenses:Health:Mental 15 GOLD\\n Assets:Commodities:Gold\",\
\"score\": \"0.42\",\
\"additional\": {\
\"file\": \"/home/ravan/ledger.beancount\",\
\"compiled\": \"4242-04-01 * \\\"Penance Center\\\" \\\"Book Stay for 10,000 Years\\\" Expenses:Health:Mental 15 GOLD Assets:Commodities:Gold\"\
}\
},\
{\
\"entry\": \"14242-04-01 * \\\"Brahma\\\" \\\"Boon for Invincibility from Higher Beings\\\"\\n Income:Health -1,00,00,000 LIFE\\n Assets:Commodities:Life\",\
\"score\": \"0.42\",\
\"additional\": {\
\"file\": \"/home/ravan/ledger.beancount\",\
\"compiled\": \"4242-04-01 * \\\"Brahma\\\" \\\"Boon for Invincibility from Higher Beings\\\" Income:Health -1,00,00,000 LIFE Assets:Commodities:Life\"\
}\
}]\
")))
(should
(equal
(khoj--extract-entries-as-ledger json-response-from-khoj-backend user-query)
";; Become God\n\
\n\
4242-04-01 * \"Penance Center\" \"Book Stay for 10,000 Years\"\n\
Expenses:Health:Mental 15 GOLD\n\
Assets:Commodities:Gold\n\
\n\
14242-04-01 * \"Brahma\" \"Boon for Invincibility from Higher Beings\"\n\
Income:Health -1,00,00,000 LIFE\n\
Assets:Commodities:Life\n\
\n\
\n\
"))))
;; -------------------------------------
;; Test Helpers for Find Similar Feature
;; -------------------------------------
(ert-deftest khoj-tests--get-current-outline-entry-text ()
"Test get current outline-mode entry text'."
(with-temp-buffer
(insert "\
* Become God\n\
** Upgrade\n\
\n\
Penance to Immortality\n\
** Act\n\
\n\
Rule everything\\n")
(goto-char (point-min))
;; Test getting current entry text from cursor at start of outline heading
(outline-next-visible-heading 1)
(should
(equal
(khoj--get-current-outline-entry-text)
"\
** Upgrade\n\
\n\
Penance to Immortality"))
;; Test getting current entry text from cursor within outline entry
(forward-line)
(should
(equal
(khoj--get-current-outline-entry-text)
"\
** Upgrade\n\
\n\
Penance to Immortality"))
))
(ert-deftest khoj-tests--get-current-paragraph-text ()
"Test get current paragraph text'."
(with-temp-buffer
(insert "\
* Become God\n\
** Upgrade\n\
\n\
Penance to Immortality\n\
** Act\n\
\n\
Rule everything\n")
;; Test getting current paragraph text from cursor at start of buffer
(goto-char (point-min))
(should
(equal
(khoj--get-current-paragraph-text)
"* Become God\n\
** Upgrade"))
;; Test getting current paragraph text from cursor within paragraph
(goto-char (point-min))
(forward-line 1)
(should
(equal
(khoj--get-current-paragraph-text)
"* Become God\n\
** Upgrade"))
;; Test getting current paragraph text from cursor at paragraph end
(goto-char (point-min))
(forward-line 2)
(should
(equal
(khoj--get-current-paragraph-text)
"* Become God\n\
** Upgrade"))
;; Test getting current paragraph text from cursor at start of middle paragraph
(goto-char (point-min))
(forward-line 3)
(should
(equal
(khoj--get-current-paragraph-text)
"Penance to Immortality\n\
** Act"))
;; Test getting current paragraph text from cursor at end of buffer
(goto-char (point-max))
(should
(equal
(khoj--get-current-paragraph-text)
"Rule everything"))
))
(provide 'khoj-tests)
;;; khoj-tests.el ends here

View File

@@ -0,0 +1,10 @@
# top-most EditorConfig file
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = tab
indent_size = 4
tab_width = 4

View File

@@ -0,0 +1,2 @@
npm node_modules
build

View File

@@ -0,0 +1,23 @@
{
"root": true,
"parser": "@typescript-eslint/parser",
"env": { "node": true },
"plugins": [
"@typescript-eslint"
],
"extends": [
"eslint:recommended",
"plugin:@typescript-eslint/eslint-recommended",
"plugin:@typescript-eslint/recommended"
],
"parserOptions": {
"sourceType": "module"
},
"rules": {
"no-unused-vars": "off",
"@typescript-eslint/no-unused-vars": ["error", { "args": "none" }],
"@typescript-eslint/ban-ts-comment": "off",
"no-prototype-builtins": "off",
"@typescript-eslint/no-empty-function": "off"
}
}

26
src/interface/obsidian/.gitignore vendored Normal file
View File

@@ -0,0 +1,26 @@
# vscode
.vscode
# Intellij
*.iml
.idea
# npm
node_modules
# Don't include the compiled main.js file in the repo.
# They should be uploaded to GitHub releases instead.
main.js
# Exclude sourcemaps
*.map
# obsidian
data.json
# Exclude macOS Finder (System Explorer) View States
.DS_Store
# Exclude system files
.emacs.desktop
.emacs.desktop*

View File

@@ -0,0 +1 @@
tag-version-prefix=""

View File

@@ -0,0 +1,621 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS

View File

@@ -0,0 +1,158 @@
# Khoj Obsidian 🦅
> Natural language search for your Obsidian notes using [Khoj](https://github.com/debanjum/khoj)
## Table of Contents
- [Features](#Features)
- [Demo](#Demo)
- [Search Demo](#Search-Demo)
- [Interfaces](#Interfaces)
- [Search Modal](#Search-Modal)
- [Chat Modal](#Chat-Modal)
- [Setup](#Setup)
- [Setup Backend](#1-Setup-Backend)
- [Setup Plugin](#2-Setup-Plugin)
- [Use](#Use)
- [Search](#search)
- [Chat](#chat)
- [Find Similar Notes](#find-similar-notes)
- [Upgrade](#Upgrade)
- [Upgrade Backend](#1-Upgrade-Backend)
- [Upgrade Plugin](#2-Upgrade-Plugin)
- [Troubleshoot](#Troubleshoot)
- [Visualize Codebase](#Visualize-Codebase)
- [Implementation](#Implementation)
## Features
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster and with less effort than search
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
## Demo
### Search Demo
https://user-images.githubusercontent.com/6413477/210486007-36ee3407-e6aa-4185-8a26-b0bfc0a4344f.mp4
<details><summary>Description</summary>
1. Install Khoj via `pip` and start Khoj backend in non-gui mode
2. Install Khoj plugin via Community Plugins settings pane on Obsidian app
3. Check the new Khoj plugin settings
4. Wait for Khoj backend to index markdown files in the current Vault
5. Open Khoj plugin on Obsidian via Search button on Left Pane
6. Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs](https://marcus.se.net/obsidian-plugin-docs/)
7. Jump to the [search result](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin)
</details>
## Interfaces
### Search Modal
![](https://github.com/debanjum/khoj/blob/master/src/interface/obsidian/docs/khoj_on_obsidian_0.2.5.png?)
### Chat Modal
![](https://github.com/debanjum/khoj/blob/master/src/interface/obsidian/docs/khoj_chat_on_obsidian_0.6.0.png?)
## Setup
- *Make sure [python](https://realpython.com/installing-python/) (version 3.10 or lower) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
- *Ensure you follow the ordering of the setup steps. Install the plugin after starting the khoj backend. This allows the plugin to configure the khoj backend*
### 1. Setup Backend
Open terminal/cmd and run below command to install and start the khoj backend
- On Linux/MacOS
```shell
python -m pip install khoj-assistant && khoj --no-gui
```
- On Windows
```shell
py -m pip install khoj-assistant && khoj --no-gui
```
### 2. Setup Plugin
1. Open [Khoj](https://obsidian.md/plugins?id=khoj) from the *Community plugins* tab in Obsidian settings panel
2. Click *Install*, then *Enable* on the Khoj plugin page in Obsidian
3. [Optional] To enable Khoj Chat, set your [OpenAI API key](https://platform.openai.com/account/api-keys) in the Khoj plugin settings
See [official Obsidian plugin docs](https://help.obsidian.md/Extending+Obsidian/Community+plugins) for details
## Use
### Chat
Run *Khoj: Chat* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) and ask questions in a natural, conversational style.<br />
E.g "When did I file my taxes last year?"
Notes:
- *Using Khoj Chat will result in query relevant notes being shared with OpenAI for ChatGPT to respond.*
- *To use Khoj Chat, ensure you've set your [OpenAI API key](https://platform.openai.com/account/api-keys) in the Khoj plugin settings.*
See [Khoj Chat](https://github.com/debanjum/khoj/tree/master/#Khoj-Chat) for more details
![](https://github.com/debanjum/khoj/blob/master/src/interface/obsidian/docs/khoj_chat_on_obsidian_0.6.0.png?)
### Search
Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or run *Khoj: Search* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
*Note: Ensure the khoj server is running in the background before searching. Execute `khoj --no-gui` in your terminal if it is not already running*
https://user-images.githubusercontent.com/6413477/218801155-cd67e8b4-a770-404a-8179-d6b61caa0f93.mp4
<details><summary>Query Filters</summary>
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
</details>
### Find Similar Notes
To see other notes similar to the current one, run *Khoj: Find Similar Notes* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
## Upgrade
### 1. Upgrade Backend
```shell
pip install --upgrade khoj-assistant
```
### 2. Upgrade Plugin
1. Open *Community plugins* tab in Obsidian settings
2. Click the *Check for updates* button
3. Click the *Update* button next to Khoj, if available
## Troubleshooting
- Open the Khoj plugin settings pane, to configure Khoj
- Toggle Enable/Disable Khoj, if setting changes have not applied
- Click *Update* button to force index to refresh, if results are failing or stale
## Current Limitations
- The plugin loads the index of only one vault at a time.<br/>
So notes across multiple vaults **cannot** be searched at the same time
## Visualize Codebase
<img src="https://github.com/debanjum/khoj/blob/master/src/interface/obsidian/docs/khoj_obsidian_codebase_visualization_0.2.1.png" width="700" />
## Implementation
The plugin implements the following functionality to search your notes with Khoj:
- [X] Open the Khoj search modal via left ribbon icon or the *Khoj: Search* command
- [X] Render results as Markdown preview to improve readability
- [X] Configure Khoj via the plugin setting tab on the settings page
- Set Obsidian Vault to Index with Khoj. Defaults to all markdown files in current Vault
- Set URL of Khoj backend
- Set Number of Search Results to show in Search Modal
- [X] Allow reranking of result to improve search quality
- [X] Allow Finding notes similar to current note being viewed

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 333 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 378 KiB

View File

@@ -0,0 +1,42 @@
import esbuild from "esbuild";
import process from "process";
import builtins from 'builtin-modules'
const banner =
`/*
THIS IS A GENERATED/BUNDLED FILE BY ESBUILD
if you want to view the source, please visit the github repository of this plugin
*/
`;
const prod = (process.argv[2] === 'production');
esbuild.build({
banner: {
js: banner,
},
entryPoints: ['src/main.ts'],
bundle: true,
external: [
'obsidian',
'electron',
'@codemirror/autocomplete',
'@codemirror/collab',
'@codemirror/commands',
'@codemirror/language',
'@codemirror/lint',
'@codemirror/search',
'@codemirror/state',
'@codemirror/view',
'@lezer/common',
'@lezer/highlight',
'@lezer/lr',
...builtins],
format: 'cjs',
watch: !prod,
target: 'es2018',
logLevel: "info",
sourcemap: prod ? false : 'inline',
treeShaking: true,
outfile: 'main.js',
}).catch(() => process.exit(1));

View File

@@ -0,0 +1,10 @@
{
"id": "khoj",
"name": "Khoj",
"version": "0.6.2",
"minAppVersion": "0.15.0",
"description": "A Search Assistant for your Second Brain 🦅",
"author": "Debanjum Singh Solanky",
"authorUrl": "https://github.com/debanjum",
"isDesktopOnly": false
}

View File

@@ -0,0 +1,24 @@
{
"name": "Khoj",
"version": "0.6.2",
"description": "Natural, Incremental Search for your Second Brain 🦅",
"main": "src/main.js",
"scripts": {
"dev": "node esbuild.config.mjs",
"build": "tsc -noEmit -skipLibCheck && node esbuild.config.mjs production",
"version": "node version-bump.mjs && git add manifest.json versions.json"
},
"keywords": ["search"],
"author": "Debanjum Singh Solanky",
"license": "GPL-3.0-or-later",
"devDependencies": {
"@types/node": "^16.11.6",
"@typescript-eslint/eslint-plugin": "5.29.0",
"@typescript-eslint/parser": "5.29.0",
"builtin-modules": "3.3.0",
"esbuild": "0.14.47",
"obsidian": "latest",
"tslib": "2.4.0",
"typescript": "4.7.4"
}
}

View File

@@ -0,0 +1,130 @@
import { App, Modal, request, Setting } from 'obsidian';
import { KhojSetting } from 'src/settings';
export class KhojChatModal extends Modal {
result: string;
setting: KhojSetting;
constructor(app: App, setting: KhojSetting) {
super(app);
this.setting = setting;
// Register Modal Keybindings to send user message
this.scope.register([], 'Enter', async () => {
// Get text in chat input elmenet
let input_el = <HTMLInputElement>this.contentEl.getElementsByClassName("khoj-chat-input")[0];
// Clear text after extracting message to send
let user_message = input_el.value;
input_el.value = "";
// Get and render chat response to user message
await this.getChatResponse(user_message);
});
}
async onOpen() {
let { contentEl } = this;
contentEl.addClass("khoj-chat");
// Add title to the Khoj Chat modal
contentEl.createEl("h1", ({ attr: { id: "khoj-chat-title" }, text: "Khoj Chat" }));
// Create area for chat logs
contentEl.createDiv({ attr: { id: "khoj-chat-body", class: "khoj-chat-body" } });
// Get conversation history from Khoj backend
let chatUrl = `${this.setting.khojUrl}/api/chat?`;
let response = await request(chatUrl);
let chatLogs = JSON.parse(response).response;
chatLogs.forEach((chatLog: any) => {
this.renderMessageWithReferences(chatLog.message, chatLog.by, chatLog.context, new Date(chatLog.created));
});
// Add chat input field
contentEl.createEl("input",
{
attr: {
type: "text",
id: "khoj-chat-input",
autofocus: "autofocus",
placeholder: "Chat with Khoj 🦅 [Hit Enter to send message]",
class: "khoj-chat-input option"
}
})
.addEventListener('change', (event) => { this.result = (<HTMLInputElement>event.target).value });
// Scroll to bottom of modal, till the send message input box
this.modalEl.scrollTop = this.modalEl.scrollHeight;
}
generateReference(messageEl: any, reference: string, index: number) {
// Generate HTML for Chat Reference
// `<sup><abbr title="${escaped_ref}" tabindex="0">${index}</abbr></sup>`;
let escaped_ref = reference.replace(/"/g, "&quot;")
return messageEl.createEl("sup").createEl("abbr", {
attr: {
title: escaped_ref,
tabindex: "0",
},
text: `[${index}] `,
});
}
renderMessageWithReferences(message: string, sender: string, context?: [string], dt?: Date) {
let messageEl = this.renderMessage(message, sender, dt);
if (context && !!messageEl) {
context.map((reference, index) => this.generateReference(messageEl, reference, index + 1));
}
}
renderMessage(message: string, sender: string, dt?: Date): Element | null {
let message_time = this.formatDate(dt ?? new Date());
let emojified_sender = sender == "khoj" ? "🦅 Khoj" : "🤔 You";
// Append message to conversation history HTML element.
// The chat logs should display above the message input box to follow standard UI semantics
let chat_body_el = this.contentEl.getElementsByClassName("khoj-chat-body")[0];
let chat_message_el = chat_body_el.createDiv({
attr: {
"data-meta": `${emojified_sender} at ${message_time}`,
class: `khoj-chat-message ${sender}`
},
}).createDiv({
attr: {
class: `khoj-chat-message-text ${sender}`
},
text: `${message}`
})
// Scroll to bottom after inserting chat messages
this.modalEl.scrollTop = this.modalEl.scrollHeight;
return chat_message_el
}
formatDate(date: Date): string {
// Format date in HH:MM, DD MMM YYYY format
let time_string = date.toLocaleTimeString('en-IN', { hour: '2-digit', minute: '2-digit', hour12: false });
let date_string = date.toLocaleString('en-IN', { year: 'numeric', month: 'short', day: '2-digit' }).replace(/-/g, ' ');
return `${time_string}, ${date_string}`;
}
async getChatResponse(query: string | undefined | null): Promise<void> {
// Exit if query is empty
if (!query || query === "") return;
// Render user query as chat message
this.renderMessage(query, "you");
// Get chat response from Khoj backend
let encodedQuery = encodeURIComponent(query);
let chatUrl = `${this.setting.khojUrl}/api/chat?q=${encodedQuery}`;
let response = await request(chatUrl);
let data = JSON.parse(response);
// Render Khoj response as chat message
this.renderMessage(data.response, "khoj");
}
}

View File

@@ -0,0 +1,75 @@
import { Notice, Plugin } from 'obsidian';
import { KhojSetting, KhojSettingTab, DEFAULT_SETTINGS } from 'src/settings'
import { KhojSearchModal } from 'src/search_modal'
import { KhojChatModal } from 'src/chat_modal'
import { configureKhojBackend } from './utils';
export default class Khoj extends Plugin {
settings: KhojSetting;
async onload() {
await this.loadSettings();
// Add search command. It can be triggered from anywhere
this.addCommand({
id: 'search',
name: 'Search',
checkCallback: (checking) => {
if (!checking && this.settings.connectedToBackend)
new KhojSearchModal(this.app, this.settings).open();
return this.settings.connectedToBackend;
}
});
// Add similar notes command. It can only be triggered from the editor
this.addCommand({
id: 'similar',
name: 'Find similar notes',
editorCheckCallback: (checking) => {
if (!checking && this.settings.connectedToBackend)
new KhojSearchModal(this.app, this.settings, true).open();
return this.settings.connectedToBackend;
}
});
// Add chat command. It can be triggered from anywhere
this.addCommand({
id: 'chat',
name: 'Chat',
checkCallback: (checking) => {
if (!checking && this.settings.connectedToBackend && !!this.settings.openaiApiKey)
new KhojChatModal(this.app, this.settings).open();
return !!this.settings.openaiApiKey;
}
});
// Create an icon in the left ribbon.
this.addRibbonIcon('search', 'Khoj', (_: MouseEvent) => {
// Called when the user clicks the icon.
this.settings.connectedToBackend
? new KhojSearchModal(this.app, this.settings).open()
: new Notice(`Ensure Khoj backend is running and Khoj URL is pointing to it in the plugin settings`);
});
// Add a settings tab so the user can configure khoj
this.addSettingTab(new KhojSettingTab(this.app, this));
}
async loadSettings() {
// Load khoj obsidian plugin settings
this.settings = Object.assign({}, DEFAULT_SETTINGS, await this.loadData());
if (this.settings.autoConfigure) {
// Load, configure khoj server settings
await configureKhojBackend(this.app.vault, this.settings);
}
}
async saveSettings() {
if (this.settings.autoConfigure) {
await configureKhojBackend(this.app.vault, this.settings, false);
}
this.saveData(this.settings);
}
}

View File

@@ -0,0 +1,147 @@
import { App, SuggestModal, request, MarkdownRenderer, Instruction, Platform } from 'obsidian';
import { KhojSetting } from 'src/settings';
import { createNoteAndCloseModal } from 'src/utils';
export interface SearchResult {
entry: string;
file: string;
}
export class KhojSearchModal extends SuggestModal<SearchResult> {
setting: KhojSetting;
rerank: boolean = false;
find_similar_notes: boolean;
query: string = "";
app: App;
constructor(app: App, setting: KhojSetting, find_similar_notes: boolean = false) {
super(app);
this.app = app;
this.setting = setting;
this.find_similar_notes = find_similar_notes;
// Hide input element in Similar Notes mode
this.inputEl.hidden = this.find_similar_notes;
// Register Modal Keybindings to Rerank Results
this.scope.register(['Mod'], 'Enter', async () => {
// Re-rank when explicitly triggered by user
this.rerank = true
// Trigger input event to get and render (reranked) results from khoj backend
this.inputEl.dispatchEvent(new Event('input'));
// Rerank disabled by default to satisfy latency requirements for incremental search
this.rerank = false
});
// Register Modal Keybindings to Create New Note with Query as Title
this.scope.register(['Shift'], 'Enter', async () => {
if (this.query != "") createNoteAndCloseModal(this.query, this);
});
this.scope.register(['Ctrl', 'Shift'], 'Enter', async () => {
if (this.query != "") createNoteAndCloseModal(this.query, this, { newLeaf: true });
});
// Add Hints to Modal for available Keybindings
const modalInstructions: Instruction[] = [
{
command: '↑↓',
purpose: 'to navigate',
},
{
command: '↵',
purpose: 'to open',
},
{
command: Platform.isMacOS ? 'cmd ↵' : 'ctrl ↵',
purpose: 'to rerank',
},
{
command: 'esc',
purpose: 'to dismiss',
},
]
this.setInstructions(modalInstructions);
// Set Placeholder Text for Modal
this.setPlaceholder('Search with Khoj 🦅...');
}
async onOpen() {
if (this.find_similar_notes) {
// If markdown file is currently active
let file = this.app.workspace.getActiveFile();
if (file && file.extension === 'md') {
// Enable rerank of search results
this.rerank = true
// Set input element to contents of active markdown file
// truncate to first 8,000 characters to avoid hitting query size limits
this.inputEl.value = await this.app.vault.read(file).then(file_str => file_str.slice(0, 8000));
// Trigger search to get and render similar notes from khoj backend
this.inputEl.dispatchEvent(new Event('input'));
this.rerank = false
}
else {
this.resultContainerEl.setText('Cannot find similar notes for non-markdown files');
}
}
}
async getSuggestions(query: string): Promise<SearchResult[]> {
// Query Khoj backend for search results
let encodedQuery = encodeURIComponent(query);
let searchUrl = `${this.setting.khojUrl}/api/search?q=${encodedQuery}&n=${this.setting.resultsCount}&r=${this.rerank}&t=markdown`;
let response = await request(searchUrl);
let data = JSON.parse(response);
let results = data
.filter((result: any) => !this.find_similar_notes || !result.additional.file.endsWith(this.app.workspace.getActiveFile()?.path))
.map((result: any) => { return { entry: result.entry, file: result.additional.file } as SearchResult; });
this.query = query;
return results;
}
async renderSuggestion(result: SearchResult, el: HTMLElement) {
// Max number of lines to render
let lines_to_render = 8;
// Extract filename of result
let os_path_separator = result.file.includes('\\') ? '\\' : '/';
let filename = result.file.split(os_path_separator).pop();
// Remove YAML frontmatter when rendering string
result.entry = result.entry.replace(/---[\n\r][\s\S]*---[\n\r]/, '');
// Truncate search results to lines_to_render
let entry_snipped_indicator = result.entry.split('\n').length > lines_to_render ? ' **...**' : '';
let snipped_entry = result.entry.split('\n').slice(0, lines_to_render).join('\n');
// Show filename of each search result for context
el.createEl("div",{ cls: 'khoj-result-file' }).setText(filename ?? "");
let result_el = el.createEl("div", { cls: 'khoj-result-entry' })
// @ts-ignore
MarkdownRenderer.renderMarkdown(snipped_entry + entry_snipped_indicator, result_el, null, null);
}
async onChooseSuggestion(result: SearchResult, _: MouseEvent | KeyboardEvent) {
// Get all markdown files in vault
const mdFiles = this.app.vault.getMarkdownFiles();
// Find the vault file matching file of chosen search result
let file_match = mdFiles
// Sort by descending length of path
// This finds longest path match when multiple files have same name
.sort((a, b) => b.path.length - a.path.length)
// The first match is the best file match across OS
// e.g Khoj server on Linux, Obsidian vault on Android
.find(file => result.file.replace(/\\/g, "/").endsWith(file.path))
// Open vault file at heading of chosen search result
if (file_match) {
let resultHeading = result.entry.split('\n', 1)[0];
let linkToEntry = `${file_match.path}${resultHeading}`
this.app.workspace.openLinkText(linkToEntry, '');
console.log(`Link: ${linkToEntry}, File: ${file_match.path}, Heading: ${resultHeading}`);
}
}
}

View File

@@ -0,0 +1,127 @@
import { App, Notice, PluginSettingTab, request, Setting } from 'obsidian';
import Khoj from 'src/main';
export interface KhojSetting {
openaiApiKey: string;
resultsCount: number;
khojUrl: string;
connectedToBackend: boolean;
autoConfigure: boolean;
}
export const DEFAULT_SETTINGS: KhojSetting = {
resultsCount: 6,
khojUrl: 'http://localhost:8000',
connectedToBackend: false,
autoConfigure: true,
openaiApiKey: '',
}
export class KhojSettingTab extends PluginSettingTab {
plugin: Khoj;
constructor(app: App, plugin: Khoj) {
super(app, plugin);
this.plugin = plugin;
}
display(): void {
const { containerEl } = this;
containerEl.empty();
// Add notice whether able to connect to khoj backend or not
containerEl.createEl('small', { text: this.getBackendStatusMessage() });
// Add khoj settings configurable from the plugin settings tab
new Setting(containerEl)
.setName('Khoj URL')
.setDesc('The URL of the Khoj backend')
.addText(text => text
.setValue(`${this.plugin.settings.khojUrl}`)
.onChange(async (value) => {
this.plugin.settings.khojUrl = value.trim();
await this.plugin.saveSettings();
containerEl.firstElementChild?.setText(this.getBackendStatusMessage());
}));
new Setting(containerEl)
.setName('OpenAI API Key')
.setDesc('Your OpenAI API Key for Khoj Chat')
.addText(text => text
.setValue(`${this.plugin.settings.openaiApiKey}`)
.onChange(async (value) => {
this.plugin.settings.openaiApiKey = value.trim();
await this.plugin.saveSettings();
}));
new Setting(containerEl)
.setName('Results Count')
.setDesc('The number of search results to show')
.addSlider(slider => slider
.setLimits(1, 10, 1)
.setValue(this.plugin.settings.resultsCount)
.setDynamicTooltip()
.onChange(async (value) => {
this.plugin.settings.resultsCount = value;
await this.plugin.saveSettings();
}));
new Setting(containerEl)
.setName('Auto Configure')
.setDesc('Automatically configure the Khoj backend')
.addToggle(toggle => toggle
.setValue(this.plugin.settings.autoConfigure)
.onChange(async (value) => {
this.plugin.settings.autoConfigure = value;
await this.plugin.saveSettings();
}));
let indexVaultSetting = new Setting(containerEl);
indexVaultSetting
.setName('Index Vault')
.setDesc('Manually force Khoj to re-index your Obsidian Vault')
.addButton(button => button
.setButtonText('Update')
.setCta()
.onClick(async () => {
// Disable button while updating index
button.setButtonText('Updating 🌑');
button.removeCta();
indexVaultSetting = indexVaultSetting.setDisabled(true);
// Show indicator for indexing in progress
const progress_indicator = window.setInterval(() => {
if (button.buttonEl.innerText === 'Updating 🌑') {
button.setButtonText('Updating 🌘');
} else if (button.buttonEl.innerText === 'Updating 🌘') {
button.setButtonText('Updating 🌗');
} else if (button.buttonEl.innerText === 'Updating 🌗') {
button.setButtonText('Updating 🌖');
} else if (button.buttonEl.innerText === 'Updating 🌖') {
button.setButtonText('Updating 🌕');
} else if (button.buttonEl.innerText === 'Updating 🌕') {
button.setButtonText('Updating 🌔');
} else if (button.buttonEl.innerText === 'Updating 🌔') {
button.setButtonText('Updating 🌓');
} else if (button.buttonEl.innerText === 'Updating 🌓') {
button.setButtonText('Updating 🌒');
} else if (button.buttonEl.innerText === 'Updating 🌒') {
button.setButtonText('Updating 🌑');
}
}, 300);
this.plugin.registerInterval(progress_indicator);
await request(`${this.plugin.settings.khojUrl}/api/update?t=markdown&force=true`);
new Notice('✅ Updated Khoj index.');
// Reset button once index is updated
window.clearInterval(progress_indicator);
button.setButtonText('Update');
button.setCta();
indexVaultSetting = indexVaultSetting.setDisabled(false);
})
);
}
getBackendStatusMessage() {
return !this.plugin.settings.connectedToBackend
? '❗Disconnected from Khoj backend. Ensure Khoj backend is running and Khoj URL is correctly set below.'
: '✅ Connected to Khoj backend.';
}
}

View File

@@ -0,0 +1,175 @@
import { FileSystemAdapter, Notice, RequestUrlParam, request, Vault, Modal } from 'obsidian';
import { KhojSetting } from 'src/settings'
export function getVaultAbsolutePath(vault: Vault): string {
let adaptor = vault.adapter;
if (adaptor instanceof FileSystemAdapter) {
return adaptor.getBasePath();
}
return '';
}
export async function configureKhojBackend(vault: Vault, setting: KhojSetting, notify: boolean = true) {
let vaultPath = getVaultAbsolutePath(vault);
let mdInVault = `${vaultPath}/**/*.md`;
let khojConfigUrl = `${setting.khojUrl}/api/config/data`;
// Check if khoj backend is configured, note if cannot connect to backend
let khoj_already_configured = await request(khojConfigUrl)
.then(response => {
setting.connectedToBackend = true;
return response !== "null"
})
.catch(error => {
setting.connectedToBackend = false;
if (notify)
new Notice(`Ensure Khoj backend is running and Khoj URL is pointing to it in the plugin settings.\n\n${error}`);
})
// Short-circuit configuring khoj if unable to connect to khoj backend
if (!setting.connectedToBackend) return;
// Set index name from the path of the current vault
let indexName = vaultPath.replace(/\//g, '_').replace(/\\/g, '_').replace(/ /g, '_').replace(/:/g, '_');
// Get default config fields from khoj backend
let defaultConfig = await request(`${khojConfigUrl}/default`).then(response => JSON.parse(response));
let khojDefaultIndexDirectory = getIndexDirectoryFromBackendConfig(defaultConfig["content-type"]["markdown"]["embeddings-file"]);
let khojDefaultChatDirectory = getIndexDirectoryFromBackendConfig(defaultConfig["processor"]["conversation"]["conversation-logfile"]);
let khojDefaultChatModelName = defaultConfig["processor"]["conversation"]["model"];
// Get current config if khoj backend configured, else get default config from khoj backend
await request(khoj_already_configured ? khojConfigUrl : `${khojConfigUrl}/default`)
.then(response => JSON.parse(response))
.then(data => {
// If khoj backend not configured yet
if (!khoj_already_configured) {
// Create khoj content-type config with only markdown configured
data["content-type"] = {
"markdown": {
"input-filter": [mdInVault],
"input-files": null,
"embeddings-file": `${khojDefaultIndexDirectory}/${indexName}.pt`,
"compressed-jsonl": `${khojDefaultIndexDirectory}/${indexName}.jsonl.gz`,
}
}
}
// Else if khoj config has no markdown content config
else if (!data["content-type"]["markdown"]) {
// Add markdown config to khoj content-type config
// Set markdown config to index markdown files in configured obsidian vault
data["content-type"]["markdown"] = {
"input-filter": [mdInVault],
"input-files": null,
"embeddings-file": `${khojDefaultIndexDirectory}/${indexName}.pt`,
"compressed-jsonl": `${khojDefaultIndexDirectory}/${indexName}.jsonl.gz`,
}
}
// Else if khoj is not configured to index markdown files in configured obsidian vault
else if (data["content-type"]["markdown"]["input-filter"].length != 1 ||
data["content-type"]["markdown"]["input-filter"][0] !== mdInVault) {
// Update markdown config in khoj content-type config
// Set markdown config to only index markdown files in configured obsidian vault
let khojIndexDirectory = getIndexDirectoryFromBackendConfig(data["content-type"]["markdown"]["embeddings-file"]);
data["content-type"]["markdown"] = {
"input-filter": [mdInVault],
"input-files": null,
"embeddings-file": `${khojIndexDirectory}/${indexName}.pt`,
"compressed-jsonl": `${khojIndexDirectory}/${indexName}.jsonl.gz`,
}
}
// If OpenAI API key not set in Khoj plugin settings
if (!setting.openaiApiKey) {
// Disable khoj processors, as not required
delete data["processor"];
}
// Else if khoj backend not configured yet
else if (!khoj_already_configured || !data["processor"]) {
data["processor"] = {
"conversation": {
"conversation-logfile": `${khojDefaultChatDirectory}/conversation.json`,
"model": khojDefaultChatModelName,
"openai-api-key": setting.openaiApiKey,
}
}
}
// Else if khoj config has no conversation processor config
else if (!data["processor"]["conversation"]) {
data["processor"]["conversation"] = {
"conversation-logfile": `${khojDefaultChatDirectory}/conversation.json`,
"model": khojDefaultChatModelName,
"openai-api-key": setting.openaiApiKey,
}
}
// Else if khoj is not configured with OpenAI API key from khoj plugin settings
else if (data["processor"]["conversation"]["openai-api-key"] !== setting.openaiApiKey) {
data["processor"]["conversation"] = {
"conversation-logfile": data["processor"]["conversation"]["conversation-logfile"],
"model": data["processor"]["conversation"]["model"],
"openai-api-key": setting.openaiApiKey,
}
}
// Save updated config and refresh index on khoj backend
updateKhojBackend(setting.khojUrl, data);
if (!khoj_already_configured)
console.log(`Khoj: Created khoj backend config:\n${JSON.stringify(data)}`)
else
console.log(`Khoj: Updated khoj backend config:\n${JSON.stringify(data)}`)
})
.catch(error => {
if (notify)
new Notice(`Failed to configure Khoj backend. Contact developer on Github.\n\nError: ${error}`);
})
}
export async function updateKhojBackend(khojUrl: string, khojConfig: Object) {
// POST khojConfig to khojConfigUrl
let requestContent: RequestUrlParam = {
url: `${khojUrl}/api/config/data`,
body: JSON.stringify(khojConfig),
method: 'POST',
contentType: 'application/json',
};
// Save khojConfig on khoj backend at khojConfigUrl
await request(requestContent)
// Refresh khoj search index after updating config
.then(_ => request(`${khojUrl}/api/update?t=markdown`));
}
function getIndexDirectoryFromBackendConfig(filepath: string) {
return filepath.split("/").slice(0, -1).join("/");
}
export async function createNote(name: string, newLeaf = false): Promise<void> {
try {
let pathPrefix: string
// @ts-ignore
switch (app.vault.getConfig('newFileLocation')) {
case 'current':
pathPrefix = (app.workspace.getActiveFile()?.parent.path ?? '') + '/'
break
case 'folder':
pathPrefix = this.app.vault.getConfig('newFileFolderPath') + '/'
break
default: // 'root'
pathPrefix = ''
break
}
await app.workspace.openLinkText(`${pathPrefix}${name}.md`, '', newLeaf)
} catch (e) {
console.error('Khoj: Could not create note.\n' + (e as any).message);
throw e
}
}
export async function createNoteAndCloseModal(query: string, modal: Modal, opt?: { newLeaf: boolean }): Promise<void> {
try {
await createNote(query, opt?.newLeaf);
}
catch (e) {
new Notice((e as Error).message)
return
}
modal.close();
}

View File

@@ -0,0 +1,176 @@
/*
This CSS file will be included with your plugin, and
available in the app when your plugin is enabled.
If your plugin does not need CSS, delete this file.
*/
:root {
--khoj-chat-blue: #017eff;
--khoj-chat-dark-grey: #475569;
}
.khoj-chat {
display: grid;
background: var(--background-primary);
color: var(--text-normal);
text-align: center;
font-family: roboto, karma, segoe ui, sans-serif;
font-size: var(--font-ui-large);
font-weight: 300;
line-height: 1.5em;
}
.khoj-chat > * {
padding: 10px;
margin: 10px;
}
#khoj-chat-title {
font-weight: 200;
color: var(--khoj-chat-blue);
}
#khoj-chat-body {
font-size: var(--font-ui-medium);
margin: 0px;
line-height: 20px;
overflow-y: scroll; /* Make chat body scroll to see history */
}
/* add chat metatdata to bottom of bubble */
.khoj-chat-message::after {
content: attr(data-meta);
display: block;
font-size: var(--font-ui-smaller);
color: var(--text-muted);
margin: -12px 7px 0 -5px;
}
/* move message by khoj to left */
.khoj-chat-message.khoj {
margin-left: auto;
text-align: left;
}
/* move message by you to right */
.khoj-chat-message.you {
margin-right: auto;
text-align: right;
}
/* basic style chat message text */
.khoj-chat-message-text {
margin: 10px;
border-radius: 10px;
padding: 10px;
position: relative;
display: inline-block;
max-width: 80%;
text-align: left;
}
/* color chat bubble by khoj blue */
.khoj-chat-message-text.khoj {
color: var(--text-on-accent);
background: var(--khoj-chat-blue);
margin-left: auto;
white-space: pre-line;
}
/* add left protrusion to khoj chat bubble */
.khoj-chat-message-text.khoj:after {
content: '';
position: absolute;
bottom: -2px;
left: -7px;
border: 10px solid transparent;
border-top-color: var(--khoj-chat-blue);
border-bottom: 0;
transform: rotate(-60deg);
}
/* color chat bubble by you dark grey */
.khoj-chat-message-text.you {
color: var(--text-on-accent);
background: var(--khoj-chat-dark-grey);
margin-right: auto;
}
/* add right protrusion to you chat bubble */
.khoj-chat-message-text.you:after {
content: '';
position: absolute;
top: 91%;
right: -2px;
border: 10px solid transparent;
border-left-color: var(--khoj-chat-dark-grey);
border-right: 0;
margin-top: -10px;
transform: rotate(-60deg)
}
#khoj-chat-footer {
padding: 0;
display: grid;
grid-template-columns: minmax(70px, 100%);
grid-column-gap: 10px;
grid-row-gap: 10px;
}
#khoj-chat-footer > * {
padding: 15px;
background: #f9fafc
}
#khoj-chat-input.option:hover {
box-shadow: 0 0 11px var(--background-modifier-box-shadow);
}
#khoj-chat-input {
font-size: var(--font-ui-medium);
padding: 25px 20px;
}
@media (pointer: coarse), (hover: none) {
#khoj-chat-body.abbr[title] {
position: relative;
padding-left: 4px; /* space references out to ease tapping */
}
#khoj-chat-body.abbr[title]:focus:after {
content: attr(title);
/* position tooltip */
position: absolute;
left: 16px; /* open tooltip to right of ref link, instead of on top of it */
width: auto;
z-index: 1; /* show tooltip above chat messages */
/* style tooltip */
background-color: var(--background-secondary);
color: var(--text-muted);
border-radius: 2px;
box-shadow: 1px 1px 4px 0 var(--background-modifier-box-shadow);
font-size: var(--font-ui-small);
padding: 2px 4px;
}
}
.khoj-result-file {
font-weight: 600;
}
.khoj-result-entry {
color: var(--text-muted);
margin-left: 2em;
padding-left: 0.5em;
line-height: normal;
margin-top: 0.2em;
margin-bottom: 0.2em;
border-left-style: solid;
border-left-color: var(--color-accent-2);
white-space: normal;
}
.khoj-result-entry > * {
font-size: var(--font-ui-medium);
}
.khoj-result-entry > p {
margin-top: 0.2em;
margin-bottom: 0.2em;
}
.khoj-result-entry p br {
display: none;
}

View File

@@ -0,0 +1,24 @@
{
"compilerOptions": {
"baseUrl": ".",
"inlineSourceMap": true,
"inlineSources": true,
"module": "ESNext",
"target": "ES6",
"allowJs": true,
"noImplicitAny": true,
"moduleResolution": "node",
"importHelpers": true,
"isolatedModules": true,
"strictNullChecks": true,
"lib": [
"DOM",
"ES5",
"ES6",
"ES7"
]
},
"include": [
"**/*.ts"
]
}

View File

@@ -0,0 +1,14 @@
import { readFileSync, writeFileSync } from "fs";
const targetVersion = process.env.npm_package_version;
// read minAppVersion from manifest.json and bump version to target version
let manifest = JSON.parse(readFileSync("manifest.json", "utf8"));
const { minAppVersion } = manifest;
manifest.version = targetVersion;
writeFileSync("manifest.json", JSON.stringify(manifest, null, "\t"));
// update versions.json with target version and minAppVersion from manifest.json
let versions = JSON.parse(readFileSync("versions.json", "utf8"));
versions[targetVersion] = minAppVersion;
writeFileSync("versions.json", JSON.stringify(versions, null, "\t"));

View File

@@ -0,0 +1,11 @@
{
"0.2.1": "0.15.0",
"0.2.5": "0.15.0",
"0.2.6": "0.15.0",
"0.3.0": "0.15.0",
"0.4.0": "0.15.0",
"0.5.0": "0.15.0",
"0.6.0": "0.15.0",
"0.6.1": "0.15.0",
"0.6.2": "0.15.0"
}

View File

@@ -0,0 +1,519 @@
# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
# yarn lockfile v1
"@nodelib/fs.scandir@2.1.5":
version "2.1.5"
resolved "https://registry.npmjs.org/@nodelib/fs.scandir/-/fs.scandir-2.1.5.tgz"
integrity sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==
dependencies:
"@nodelib/fs.stat" "2.0.5"
run-parallel "^1.1.9"
"@nodelib/fs.stat@2.0.5", "@nodelib/fs.stat@^2.0.2":
version "2.0.5"
resolved "https://registry.npmjs.org/@nodelib/fs.stat/-/fs.stat-2.0.5.tgz"
integrity sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A==
"@nodelib/fs.walk@^1.2.3":
version "1.2.8"
resolved "https://registry.npmjs.org/@nodelib/fs.walk/-/fs.walk-1.2.8.tgz"
integrity sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==
dependencies:
"@nodelib/fs.scandir" "2.1.5"
fastq "^1.6.0"
"@types/codemirror@0.0.108":
version "0.0.108"
resolved "https://registry.npmjs.org/@types/codemirror/-/codemirror-0.0.108.tgz"
integrity sha512-3FGFcus0P7C2UOGCNUVENqObEb4SFk+S8Dnxq7K6aIsLVs/vDtlangl3PEO0ykaKXyK56swVF6Nho7VsA44uhw==
dependencies:
"@types/tern" "*"
"@types/estree@*":
version "1.0.0"
resolved "https://registry.npmjs.org/@types/estree/-/estree-1.0.0.tgz"
integrity sha512-WulqXMDUTYAXCjZnk6JtIHPigp55cVtDgDrO2gHRwhyJto21+1zbVCtOYB2L1F9w4qCQ0rOGWBnBe0FNTiEJIQ==
"@types/json-schema@^7.0.9":
version "7.0.11"
resolved "https://registry.npmjs.org/@types/json-schema/-/json-schema-7.0.11.tgz"
integrity sha512-wOuvG1SN4Us4rez+tylwwwCV1psiNVOkJeM3AUWUNWg/jDQY2+HE/444y5gc+jBmRqASOm2Oeh5c1axHobwRKQ==
"@types/node@^16.11.6":
version "16.18.12"
resolved "https://registry.npmjs.org/@types/node/-/node-16.18.12.tgz"
integrity sha512-vzLe5NaNMjIE3mcddFVGlAXN1LEWueUsMsOJWaT6wWMJGyljHAWHznqfnKUQWGzu7TLPrGvWdNAsvQYW+C0xtw==
"@types/tern@*":
version "0.23.4"
resolved "https://registry.npmjs.org/@types/tern/-/tern-0.23.4.tgz"
integrity sha512-JAUw1iXGO1qaWwEOzxTKJZ/5JxVeON9kvGZ/osgZaJImBnyjyn0cjovPsf6FNLmyGY8Vw9DoXZCMlfMkMwHRWg==
dependencies:
"@types/estree" "*"
"@typescript-eslint/eslint-plugin@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/eslint-plugin/-/eslint-plugin-5.29.0.tgz"
integrity sha512-kgTsISt9pM53yRFQmLZ4npj99yGl3x3Pl7z4eA66OuTzAGC4bQB5H5fuLwPnqTKU3yyrrg4MIhjF17UYnL4c0w==
dependencies:
"@typescript-eslint/scope-manager" "5.29.0"
"@typescript-eslint/type-utils" "5.29.0"
"@typescript-eslint/utils" "5.29.0"
debug "^4.3.4"
functional-red-black-tree "^1.0.1"
ignore "^5.2.0"
regexpp "^3.2.0"
semver "^7.3.7"
tsutils "^3.21.0"
"@typescript-eslint/parser@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/parser/-/parser-5.29.0.tgz"
integrity sha512-ruKWTv+x0OOxbzIw9nW5oWlUopvP/IQDjB5ZqmTglLIoDTctLlAJpAQFpNPJP/ZI7hTT9sARBosEfaKbcFuECw==
dependencies:
"@typescript-eslint/scope-manager" "5.29.0"
"@typescript-eslint/types" "5.29.0"
"@typescript-eslint/typescript-estree" "5.29.0"
debug "^4.3.4"
"@typescript-eslint/scope-manager@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/scope-manager/-/scope-manager-5.29.0.tgz"
integrity sha512-etbXUT0FygFi2ihcxDZjz21LtC+Eps9V2xVx09zFoN44RRHPrkMflidGMI+2dUs821zR1tDS6Oc9IXxIjOUZwA==
dependencies:
"@typescript-eslint/types" "5.29.0"
"@typescript-eslint/visitor-keys" "5.29.0"
"@typescript-eslint/type-utils@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/type-utils/-/type-utils-5.29.0.tgz"
integrity sha512-JK6bAaaiJozbox3K220VRfCzLa9n0ib/J+FHIwnaV3Enw/TO267qe0pM1b1QrrEuy6xun374XEAsRlA86JJnyg==
dependencies:
"@typescript-eslint/utils" "5.29.0"
debug "^4.3.4"
tsutils "^3.21.0"
"@typescript-eslint/types@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/types/-/types-5.29.0.tgz"
integrity sha512-X99VbqvAXOMdVyfFmksMy3u8p8yoRGITgU1joBJPzeYa0rhdf5ok9S56/itRoUSh99fiDoMtarSIJXo7H/SnOg==
"@typescript-eslint/typescript-estree@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/typescript-estree/-/typescript-estree-5.29.0.tgz"
integrity sha512-mQvSUJ/JjGBdvo+1LwC+GY2XmSYjK1nAaVw2emp/E61wEVYEyibRHCqm1I1vEKbXCpUKuW4G7u9ZCaZhJbLoNQ==
dependencies:
"@typescript-eslint/types" "5.29.0"
"@typescript-eslint/visitor-keys" "5.29.0"
debug "^4.3.4"
globby "^11.1.0"
is-glob "^4.0.3"
semver "^7.3.7"
tsutils "^3.21.0"
"@typescript-eslint/utils@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/utils/-/utils-5.29.0.tgz"
integrity sha512-3Eos6uP1nyLOBayc/VUdKZikV90HahXE5Dx9L5YlSd/7ylQPXhLk1BYb29SDgnBnTp+jmSZUU0QxUiyHgW4p7A==
dependencies:
"@types/json-schema" "^7.0.9"
"@typescript-eslint/scope-manager" "5.29.0"
"@typescript-eslint/types" "5.29.0"
"@typescript-eslint/typescript-estree" "5.29.0"
eslint-scope "^5.1.1"
eslint-utils "^3.0.0"
"@typescript-eslint/visitor-keys@5.29.0":
version "5.29.0"
resolved "https://registry.npmjs.org/@typescript-eslint/visitor-keys/-/visitor-keys-5.29.0.tgz"
integrity sha512-Hpb/mCWsjILvikMQoZIE3voc9wtQcS0A9FUw3h8bhr9UxBdtI/tw1ZDZUOXHXLOVMedKCH5NxyzATwnU78bWCQ==
dependencies:
"@typescript-eslint/types" "5.29.0"
eslint-visitor-keys "^3.3.0"
array-union@^2.1.0:
version "2.1.0"
resolved "https://registry.npmjs.org/array-union/-/array-union-2.1.0.tgz"
integrity sha512-HGyxoOTYUyCM6stUe6EJgnd4EoewAI7zMdfqO+kGjnlZmBDz/cR5pf8r/cR4Wq60sL/p0IkcjUEEPwS3GFrIyw==
braces@^3.0.2:
version "3.0.2"
resolved "https://registry.npmjs.org/braces/-/braces-3.0.2.tgz"
integrity sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==
dependencies:
fill-range "^7.0.1"
builtin-modules@3.3.0:
version "3.3.0"
resolved "https://registry.npmjs.org/builtin-modules/-/builtin-modules-3.3.0.tgz"
integrity sha512-zhaCDicdLuWN5UbN5IMnFqNMhNfo919sH85y2/ea+5Yg9TsTkeZxpL+JLbp6cgYFS4sRLp3YV4S6yDuqVWHYOw==
debug@^4.3.4:
version "4.3.4"
resolved "https://registry.npmjs.org/debug/-/debug-4.3.4.tgz"
integrity sha512-PRWFHuSU3eDtQJPvnNY7Jcket1j0t5OuOsFzPPzsekD52Zl8qUfFIPEiswXqIvHWGVHOgX+7G/vCNNhehwxfkQ==
dependencies:
ms "2.1.2"
dir-glob@^3.0.1:
version "3.0.1"
resolved "https://registry.npmjs.org/dir-glob/-/dir-glob-3.0.1.tgz"
integrity sha512-WkrWp9GR4KXfKGYzOLmTuGVi1UWFfws377n9cc55/tb6DuqyF6pcQ5AbiHEshaDpY9v6oaSr2XCDidGmMwdzIA==
dependencies:
path-type "^4.0.0"
esbuild-android-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-android-64/-/esbuild-android-64-0.14.47.tgz#ef95b42c67bcf4268c869153fa3ad1466c4cea6b"
integrity sha512-R13Bd9+tqLVFndncMHssZrPWe6/0Kpv2/dt4aA69soX4PRxlzsVpCvoJeFE8sOEoeVEiBkI0myjlkDodXlHa0g==
esbuild-android-arm64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-android-arm64/-/esbuild-android-arm64-0.14.47.tgz#4ebd7ce9fb250b4695faa3ee46fd3b0754ecd9e6"
integrity sha512-OkwOjj7ts4lBp/TL6hdd8HftIzOy/pdtbrNA4+0oVWgGG64HrdVzAF5gxtJufAPOsEjkyh1oIYvKAUinKKQRSQ==
esbuild-darwin-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-darwin-64/-/esbuild-darwin-64-0.14.47.tgz#e0da6c244f497192f951807f003f6a423ed23188"
integrity sha512-R6oaW0y5/u6Eccti/TS6c/2c1xYTb1izwK3gajJwi4vIfNs1s8B1dQzI1UiC9T61YovOQVuePDcfqHLT3mUZJA==
esbuild-darwin-arm64@0.14.47:
version "0.14.47"
resolved "https://registry.npmjs.org/esbuild-darwin-arm64/-/esbuild-darwin-arm64-0.14.47.tgz"
integrity sha512-seCmearlQyvdvM/noz1L9+qblC5vcBrhUaOoLEDDoLInF/VQ9IkobGiLlyTPYP5dW1YD4LXhtBgOyevoIHGGnw==
esbuild-freebsd-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-freebsd-64/-/esbuild-freebsd-64-0.14.47.tgz#8da6a14c095b29c01fc8087a16cb7906debc2d67"
integrity sha512-ZH8K2Q8/Ux5kXXvQMDsJcxvkIwut69KVrYQhza/ptkW50DC089bCVrJZZ3sKzIoOx+YPTrmsZvqeZERjyYrlvQ==
esbuild-freebsd-arm64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-freebsd-arm64/-/esbuild-freebsd-arm64-0.14.47.tgz#ad31f9c92817ff8f33fd253af7ab5122dc1b83f6"
integrity sha512-ZJMQAJQsIOhn3XTm7MPQfCzEu5b9STNC+s90zMWe2afy9EwnHV7Ov7ohEMv2lyWlc2pjqLW8QJnz2r0KZmeAEQ==
esbuild-linux-32@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-32/-/esbuild-linux-32-0.14.47.tgz#de085e4db2e692ea30c71208ccc23fdcf5196c58"
integrity sha512-FxZOCKoEDPRYvq300lsWCTv1kcHgiiZfNrPtEhFAiqD7QZaXrad8LxyJ8fXGcWzIFzRiYZVtB3ttvITBvAFhKw==
esbuild-linux-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-64/-/esbuild-linux-64-0.14.47.tgz#2a9321bbccb01f01b04cebfcfccbabeba3658ba1"
integrity sha512-nFNOk9vWVfvWYF9YNYksZptgQAdstnDCMtR6m42l5Wfugbzu11VpMCY9XrD4yFxvPo9zmzcoUL/88y0lfJZJJw==
esbuild-linux-arm64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-arm64/-/esbuild-linux-arm64-0.14.47.tgz#b9da7b6fc4b0ca7a13363a0c5b7bb927e4bc535a"
integrity sha512-ywfme6HVrhWcevzmsufjd4iT3PxTfCX9HOdxA7Hd+/ZM23Y9nXeb+vG6AyA6jgq/JovkcqRHcL9XwRNpWG6XRw==
esbuild-linux-arm@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-arm/-/esbuild-linux-arm-0.14.47.tgz#56fec2a09b9561c337059d4af53625142aded853"
integrity sha512-ZGE1Bqg/gPRXrBpgpvH81tQHpiaGxa8c9Rx/XOylkIl2ypLuOcawXEAo8ls+5DFCcRGt/o3sV+PzpAFZobOsmA==
esbuild-linux-mips64le@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-mips64le/-/esbuild-linux-mips64le-0.14.47.tgz#9db21561f8f22ed79ef2aedb7bbef082b46cf823"
integrity sha512-mg3D8YndZ1LvUiEdDYR3OsmeyAew4MA/dvaEJxvyygahWmpv1SlEEnhEZlhPokjsUMfRagzsEF/d/2XF+kTQGg==
esbuild-linux-ppc64le@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-ppc64le/-/esbuild-linux-ppc64le-0.14.47.tgz#dc3a3da321222b11e96e50efafec9d2de408198b"
integrity sha512-WER+f3+szmnZiWoK6AsrTKGoJoErG2LlauSmk73LEZFQ/iWC+KhhDsOkn1xBUpzXWsxN9THmQFltLoaFEH8F8w==
esbuild-linux-riscv64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-riscv64/-/esbuild-linux-riscv64-0.14.47.tgz#9bd6dcd3dca6c0357084ecd06e1d2d4bf105335f"
integrity sha512-1fI6bP3A3rvI9BsaaXbMoaOjLE3lVkJtLxsgLHqlBhLlBVY7UqffWBvkrX/9zfPhhVMd9ZRFiaqXnB1T7BsL2g==
esbuild-linux-s390x@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-linux-s390x/-/esbuild-linux-s390x-0.14.47.tgz#a458af939b52f2cd32fc561410d441a51f69d41f"
integrity sha512-eZrWzy0xFAhki1CWRGnhsHVz7IlSKX6yT2tj2Eg8lhAwlRE5E96Hsb0M1mPSE1dHGpt1QVwwVivXIAacF/G6mw==
esbuild-netbsd-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-netbsd-64/-/esbuild-netbsd-64-0.14.47.tgz#6388e785d7e7e4420cb01348d7483ab511b16aa8"
integrity sha512-Qjdjr+KQQVH5Q2Q1r6HBYswFTToPpss3gqCiSw2Fpq/ua8+eXSQyAMG+UvULPqXceOwpnPo4smyZyHdlkcPppQ==
esbuild-openbsd-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-openbsd-64/-/esbuild-openbsd-64-0.14.47.tgz#309af806db561aa886c445344d1aacab850dbdc5"
integrity sha512-QpgN8ofL7B9z8g5zZqJE+eFvD1LehRlxr25PBkjyyasakm4599iroUpaj96rdqRlO2ShuyqwJdr+oNqWwTUmQw==
esbuild-sunos-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-sunos-64/-/esbuild-sunos-64-0.14.47.tgz#3f19612dcdb89ba6c65283a7ff6e16f8afbf8aaa"
integrity sha512-uOeSgLUwukLioAJOiGYm3kNl+1wJjgJA8R671GYgcPgCx7QR73zfvYqXFFcIO93/nBdIbt5hd8RItqbbf3HtAQ==
esbuild-windows-32@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-windows-32/-/esbuild-windows-32-0.14.47.tgz#a92d279c8458d5dc319abcfeb30aa49e8f2e6f7f"
integrity sha512-H0fWsLTp2WBfKLBgwYT4OTfFly4Im/8B5f3ojDv1Kx//kiubVY0IQunP2Koc/fr/0wI7hj3IiBDbSrmKlrNgLQ==
esbuild-windows-64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-windows-64/-/esbuild-windows-64-0.14.47.tgz#2564c3fcf0c23d701edb71af8c52d3be4cec5f8a"
integrity sha512-/Pk5jIEH34T68r8PweKRi77W49KwanZ8X6lr3vDAtOlH5EumPE4pBHqkCUdELanvsT14yMXLQ/C/8XPi1pAtkQ==
esbuild-windows-arm64@0.14.47:
version "0.14.47"
resolved "https://registry.yarnpkg.com/esbuild-windows-arm64/-/esbuild-windows-arm64-0.14.47.tgz#86d9db1a22d83360f726ac5fba41c2f625db6878"
integrity sha512-HFSW2lnp62fl86/qPQlqw6asIwCnEsEoNIL1h2uVMgakddf+vUuMcCbtUY1i8sst7KkgHrVKCJQB33YhhOweCQ==
esbuild@0.14.47:
version "0.14.47"
resolved "https://registry.npmjs.org/esbuild/-/esbuild-0.14.47.tgz"
integrity sha512-wI4ZiIfFxpkuxB8ju4MHrGwGLyp1+awEHAHVpx6w7a+1pmYIq8T9FGEVVwFo0iFierDoMj++Xq69GXWYn2EiwA==
optionalDependencies:
esbuild-android-64 "0.14.47"
esbuild-android-arm64 "0.14.47"
esbuild-darwin-64 "0.14.47"
esbuild-darwin-arm64 "0.14.47"
esbuild-freebsd-64 "0.14.47"
esbuild-freebsd-arm64 "0.14.47"
esbuild-linux-32 "0.14.47"
esbuild-linux-64 "0.14.47"
esbuild-linux-arm "0.14.47"
esbuild-linux-arm64 "0.14.47"
esbuild-linux-mips64le "0.14.47"
esbuild-linux-ppc64le "0.14.47"
esbuild-linux-riscv64 "0.14.47"
esbuild-linux-s390x "0.14.47"
esbuild-netbsd-64 "0.14.47"
esbuild-openbsd-64 "0.14.47"
esbuild-sunos-64 "0.14.47"
esbuild-windows-32 "0.14.47"
esbuild-windows-64 "0.14.47"
esbuild-windows-arm64 "0.14.47"
eslint-scope@^5.1.1:
version "5.1.1"
resolved "https://registry.npmjs.org/eslint-scope/-/eslint-scope-5.1.1.tgz"
integrity sha512-2NxwbF/hZ0KpepYN0cNbo+FN6XoK7GaHlQhgx/hIZl6Va0bF45RQOOwhLIy8lQDbuCiadSLCBnH2CFYquit5bw==
dependencies:
esrecurse "^4.3.0"
estraverse "^4.1.1"
eslint-utils@^3.0.0:
version "3.0.0"
resolved "https://registry.npmjs.org/eslint-utils/-/eslint-utils-3.0.0.tgz"
integrity sha512-uuQC43IGctw68pJA1RgbQS8/NP7rch6Cwd4j3ZBtgo4/8Flj4eGE7ZYSZRN3iq5pVUv6GPdW5Z1RFleo84uLDA==
dependencies:
eslint-visitor-keys "^2.0.0"
eslint-visitor-keys@^2.0.0:
version "2.1.0"
resolved "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-2.1.0.tgz"
integrity sha512-0rSmRBzXgDzIsD6mGdJgevzgezI534Cer5L/vyMX0kHzT/jiB43jRhd9YUlMGYLQy2zprNmoT8qasCGtY+QaKw==
eslint-visitor-keys@^3.3.0:
version "3.3.0"
resolved "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-3.3.0.tgz"
integrity sha512-mQ+suqKJVyeuwGYHAdjMFqjCyfl8+Ldnxuyp3ldiMBFKkvytrXUZWaiPCEav8qDHKty44bD+qV1IP4T+w+xXRA==
esrecurse@^4.3.0:
version "4.3.0"
resolved "https://registry.npmjs.org/esrecurse/-/esrecurse-4.3.0.tgz"
integrity sha512-KmfKL3b6G+RXvP8N1vr3Tq1kL/oCFgn2NYXEtqP8/L3pKapUA4G8cFVaoF3SU323CD4XypR/ffioHmkti6/Tag==
dependencies:
estraverse "^5.2.0"
estraverse@^4.1.1:
version "4.3.0"
resolved "https://registry.npmjs.org/estraverse/-/estraverse-4.3.0.tgz"
integrity sha512-39nnKffWz8xN1BU/2c79n9nB9HDzo0niYUqx6xyqUnyoAnQyyWpOTdZEeiCch8BBu515t4wp9ZmgVfVhn9EBpw==
estraverse@^5.2.0:
version "5.3.0"
resolved "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz"
integrity sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==
fast-glob@^3.2.9:
version "3.2.12"
resolved "https://registry.npmjs.org/fast-glob/-/fast-glob-3.2.12.tgz"
integrity sha512-DVj4CQIYYow0BlaelwK1pHl5n5cRSJfM60UA0zK891sVInoPri2Ekj7+e1CT3/3qxXenpI+nBBmQAcJPJgaj4w==
dependencies:
"@nodelib/fs.stat" "^2.0.2"
"@nodelib/fs.walk" "^1.2.3"
glob-parent "^5.1.2"
merge2 "^1.3.0"
micromatch "^4.0.4"
fastq@^1.6.0:
version "1.15.0"
resolved "https://registry.npmjs.org/fastq/-/fastq-1.15.0.tgz"
integrity sha512-wBrocU2LCXXa+lWBt8RoIRD89Fi8OdABODa/kEnyeyjS5aZO5/GNvI5sEINADqP/h8M29UHTHUb53sUu5Ihqdw==
dependencies:
reusify "^1.0.4"
fill-range@^7.0.1:
version "7.0.1"
resolved "https://registry.npmjs.org/fill-range/-/fill-range-7.0.1.tgz"
integrity sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==
dependencies:
to-regex-range "^5.0.1"
functional-red-black-tree@^1.0.1:
version "1.0.1"
resolved "https://registry.npmjs.org/functional-red-black-tree/-/functional-red-black-tree-1.0.1.tgz"
integrity sha512-dsKNQNdj6xA3T+QlADDA7mOSlX0qiMINjn0cgr+eGHGsbSHzTabcIogz2+p/iqP1Xs6EP/sS2SbqH+brGTbq0g==
glob-parent@^5.1.2:
version "5.1.2"
resolved "https://registry.npmjs.org/glob-parent/-/glob-parent-5.1.2.tgz"
integrity sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==
dependencies:
is-glob "^4.0.1"
globby@^11.1.0:
version "11.1.0"
resolved "https://registry.npmjs.org/globby/-/globby-11.1.0.tgz"
integrity sha512-jhIXaOzy1sb8IyocaruWSn1TjmnBVs8Ayhcy83rmxNJ8q2uWKCAj3CnJY+KpGSXCueAPc0i05kVvVKtP1t9S3g==
dependencies:
array-union "^2.1.0"
dir-glob "^3.0.1"
fast-glob "^3.2.9"
ignore "^5.2.0"
merge2 "^1.4.1"
slash "^3.0.0"
ignore@^5.2.0:
version "5.2.4"
resolved "https://registry.npmjs.org/ignore/-/ignore-5.2.4.tgz"
integrity sha512-MAb38BcSbH0eHNBxn7ql2NH/kX33OkB3lZ1BNdh7ENeRChHTYsTvWrMubiIAMNS2llXEEgZ1MUOBtXChP3kaFQ==
is-extglob@^2.1.1:
version "2.1.1"
resolved "https://registry.npmjs.org/is-extglob/-/is-extglob-2.1.1.tgz"
integrity sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==
is-glob@^4.0.1, is-glob@^4.0.3:
version "4.0.3"
resolved "https://registry.npmjs.org/is-glob/-/is-glob-4.0.3.tgz"
integrity sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==
dependencies:
is-extglob "^2.1.1"
is-number@^7.0.0:
version "7.0.0"
resolved "https://registry.npmjs.org/is-number/-/is-number-7.0.0.tgz"
integrity sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==
lru-cache@^6.0.0:
version "6.0.0"
resolved "https://registry.npmjs.org/lru-cache/-/lru-cache-6.0.0.tgz"
integrity sha512-Jo6dJ04CmSjuznwJSS3pUeWmd/H0ffTlkXXgwZi+eq1UCmqQwCh+eLsYOYCwY991i2Fah4h1BEMCx4qThGbsiA==
dependencies:
yallist "^4.0.0"
merge2@^1.3.0, merge2@^1.4.1:
version "1.4.1"
resolved "https://registry.npmjs.org/merge2/-/merge2-1.4.1.tgz"
integrity sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==
micromatch@^4.0.4:
version "4.0.5"
resolved "https://registry.npmjs.org/micromatch/-/micromatch-4.0.5.tgz"
integrity sha512-DMy+ERcEW2q8Z2Po+WNXuw3c5YaUSFjAO5GsJqfEl7UjvtIuFKO6ZrKvcItdy98dwFI2N1tg3zNIdKaQT+aNdA==
dependencies:
braces "^3.0.2"
picomatch "^2.3.1"
moment@2.29.4:
version "2.29.4"
resolved "https://registry.npmjs.org/moment/-/moment-2.29.4.tgz"
integrity sha512-5LC9SOxjSc2HF6vO2CyuTDNivEdoz2IvyJJGj6X8DJ0eFyfszE0QiEd+iXmBvUP3WHxSjFH/vIsA0EN00cgr8w==
ms@2.1.2:
version "2.1.2"
resolved "https://registry.npmjs.org/ms/-/ms-2.1.2.tgz"
integrity sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w==
obsidian@latest:
version "1.1.1"
resolved "https://registry.npmjs.org/obsidian/-/obsidian-1.1.1.tgz"
integrity sha512-GcxhsHNkPEkwHEjeyitfYNBcQuYGeAHFs1pEpZIv0CnzSfui8p8bPLm2YKLgcg20B764770B1sYGtxCvk9ptxg==
dependencies:
"@types/codemirror" "0.0.108"
moment "2.29.4"
path-type@^4.0.0:
version "4.0.0"
resolved "https://registry.npmjs.org/path-type/-/path-type-4.0.0.tgz"
integrity sha512-gDKb8aZMDeD/tZWs9P6+q0J9Mwkdl6xMV8TjnGP3qJVJ06bdMgkbBlLU8IdfOsIsFz2BW1rNVT3XuNEl8zPAvw==
picomatch@^2.3.1:
version "2.3.1"
resolved "https://registry.npmjs.org/picomatch/-/picomatch-2.3.1.tgz"
integrity sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==
queue-microtask@^1.2.2:
version "1.2.3"
resolved "https://registry.npmjs.org/queue-microtask/-/queue-microtask-1.2.3.tgz"
integrity sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==
regexpp@^3.2.0:
version "3.2.0"
resolved "https://registry.npmjs.org/regexpp/-/regexpp-3.2.0.tgz"
integrity sha512-pq2bWo9mVD43nbts2wGv17XLiNLya+GklZ8kaDLV2Z08gDCsGpnKn9BFMepvWuHCbyVvY7J5o5+BVvoQbmlJLg==
reusify@^1.0.4:
version "1.0.4"
resolved "https://registry.npmjs.org/reusify/-/reusify-1.0.4.tgz"
integrity sha512-U9nH88a3fc/ekCF1l0/UP1IosiuIjyTh7hBvXVMHYgVcfGvt897Xguj2UOLDeI5BG2m7/uwyaLVT6fbtCwTyzw==
run-parallel@^1.1.9:
version "1.2.0"
resolved "https://registry.npmjs.org/run-parallel/-/run-parallel-1.2.0.tgz"
integrity sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA==
dependencies:
queue-microtask "^1.2.2"
semver@^7.3.7:
version "7.3.8"
resolved "https://registry.npmjs.org/semver/-/semver-7.3.8.tgz"
integrity sha512-NB1ctGL5rlHrPJtFDVIVzTyQylMLu9N9VICA6HSFJo8MCGVTMW6gfpicwKmmK/dAjTOrqu5l63JJOpDSrAis3A==
dependencies:
lru-cache "^6.0.0"
slash@^3.0.0:
version "3.0.0"
resolved "https://registry.npmjs.org/slash/-/slash-3.0.0.tgz"
integrity sha512-g9Q1haeby36OSStwb4ntCGGGaKsaVSjQ68fBxoQcutl5fS1vuY18H3wSt3jFyFtrkx+Kz0V1G85A4MyAdDMi2Q==
to-regex-range@^5.0.1:
version "5.0.1"
resolved "https://registry.npmjs.org/to-regex-range/-/to-regex-range-5.0.1.tgz"
integrity sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==
dependencies:
is-number "^7.0.0"
tslib@2.4.0:
version "2.4.0"
resolved "https://registry.npmjs.org/tslib/-/tslib-2.4.0.tgz"
integrity sha512-d6xOpEDfsi2CZVlPQzGeux8XMwLT9hssAsaPYExaQMuYskwb+x1x7J371tWlbBdWHroy99KnVB6qIkUbs5X3UQ==
tslib@^1.8.1:
version "1.14.1"
resolved "https://registry.npmjs.org/tslib/-/tslib-1.14.1.tgz"
integrity sha512-Xni35NKzjgMrwevysHTCArtLDpPvye8zV/0E4EyYn43P7/7qvQwPh9BGkHewbMulVntbigmcT7rdX3BNo9wRJg==
tsutils@^3.21.0:
version "3.21.0"
resolved "https://registry.npmjs.org/tsutils/-/tsutils-3.21.0.tgz"
integrity sha512-mHKK3iUXL+3UF6xL5k0PEhKRUBKPBCv/+RkEOpjRWxxx27KKRBmmA60A9pgOUvMi8GKhRMPEmjBRPzs2W7O1OA==
dependencies:
tslib "^1.8.1"
typescript@4.7.4:
version "4.7.4"
resolved "https://registry.yarnpkg.com/typescript/-/typescript-4.7.4.tgz#1a88596d1cf47d59507a1bcdfb5b9dfe4d488235"
integrity sha512-C0WQT0gezHuw6AdY1M2jxUO83Rjf0HP7Sk1DtXj6j1EwkQNZrHAg2XPWlq62oqEhYvONq5pkC2Y9oPljWToLmQ==
yallist@^4.0.0:
version "4.0.0"
resolved "https://registry.npmjs.org/yallist/-/yallist-4.0.0.tgz"
integrity sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A==

242
src/khoj/configure.py Normal file
View File

@@ -0,0 +1,242 @@
# Standard Packages
import sys
import logging
import json
from enum import Enum
import requests
# External Packages
import schedule
from fastapi.staticfiles import StaticFiles
# Internal Packages
from khoj.processor.conversation.gpt import summarize
from khoj.processor.ledger.beancount_to_jsonl import BeancountToJsonl
from khoj.processor.jsonl.jsonl_to_jsonl import JsonlToJsonl
from khoj.processor.markdown.markdown_to_jsonl import MarkdownToJsonl
from khoj.processor.org_mode.org_to_jsonl import OrgToJsonl
from khoj.search_type import image_search, text_search
from khoj.utils import constants, state
from khoj.utils.config import SearchType, SearchModels, ProcessorConfigModel, ConversationProcessorConfigModel
from khoj.utils.helpers import LRU, resolve_absolute_path, merge_dicts
from khoj.utils.rawconfig import FullConfig, ProcessorConfig
from khoj.search_filter.date_filter import DateFilter
from khoj.search_filter.word_filter import WordFilter
from khoj.search_filter.file_filter import FileFilter
logger = logging.getLogger(__name__)
def configure_server(args, required=False):
if args.config is None:
if required:
logger.error(f"Exiting as Khoj is not configured.\nConfigure it via GUI or by editing {state.config_file}.")
sys.exit(1)
else:
logger.warn(
f"Khoj is not configured.\nConfigure it via khoj GUI, plugins or by editing {state.config_file}."
)
return
else:
state.config = args.config
# Initialize Processor from Config
state.processor_config = configure_processor(args.config.processor)
# Initialize the search type and model from Config
state.search_index_lock.acquire()
state.SearchType = configure_search_types(state.config)
state.model = configure_search(state.model, state.config, args.regenerate)
state.search_index_lock.release()
def configure_routes(app):
# Import APIs here to setup search types before while configuring server
from khoj.routers.api import api
from khoj.routers.api_beta import api_beta
from khoj.routers.web_client import web_client
app.mount("/static", StaticFiles(directory=constants.web_directory), name="static")
app.include_router(api, prefix="/api")
app.include_router(api_beta, prefix="/api/beta")
app.include_router(web_client)
@schedule.repeat(schedule.every(61).minutes)
def update_search_index():
state.search_index_lock.acquire()
state.model = configure_search(state.model, state.config, regenerate=False)
state.search_index_lock.release()
logger.info("📬 Search index updated via Scheduler")
def configure_search_types(config: FullConfig):
# Extract core search types
core_search_types = {e.name: e.value for e in SearchType}
# Extract configured plugin search types
plugin_search_types = {}
if config.content_type.plugins:
plugin_search_types = {plugin_type: plugin_type for plugin_type in config.content_type.plugins.keys()}
# Dynamically generate search type enum by merging core search types with configured plugin search types
return Enum("SearchType", merge_dicts(core_search_types, plugin_search_types))
def configure_search(model: SearchModels, config: FullConfig, regenerate: bool, t: state.SearchType = None):
# Initialize Org Notes Search
if (t == state.SearchType.Org or t == None) and config.content_type.org:
logger.info("🦄 Setting up search for orgmode notes")
# Extract Entries, Generate Notes Embeddings
model.orgmode_search = text_search.setup(
OrgToJsonl,
config.content_type.org,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[DateFilter(), WordFilter(), FileFilter()],
)
# Initialize Org Music Search
if (t == state.SearchType.Music or t == None) and config.content_type.music:
logger.info("🎺 Setting up search for org-music")
# Extract Entries, Generate Music Embeddings
model.music_search = text_search.setup(
OrgToJsonl,
config.content_type.music,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[DateFilter(), WordFilter()],
)
# Initialize Markdown Search
if (t == state.SearchType.Markdown or t == None) and config.content_type.markdown:
logger.info("💎 Setting up search for markdown notes")
# Extract Entries, Generate Markdown Embeddings
model.markdown_search = text_search.setup(
MarkdownToJsonl,
config.content_type.markdown,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[DateFilter(), WordFilter(), FileFilter()],
)
# Initialize Ledger Search
if (t == state.SearchType.Ledger or t == None) and config.content_type.ledger:
logger.info("💸 Setting up search for ledger")
# Extract Entries, Generate Ledger Embeddings
model.ledger_search = text_search.setup(
BeancountToJsonl,
config.content_type.ledger,
search_config=config.search_type.symmetric,
regenerate=regenerate,
filters=[DateFilter(), WordFilter(), FileFilter()],
)
# Initialize Image Search
if (t == state.SearchType.Image or t == None) and config.content_type.image:
logger.info("🌄 Setting up search for images")
# Extract Entries, Generate Image Embeddings
model.image_search = image_search.setup(
config.content_type.image, search_config=config.search_type.image, regenerate=regenerate
)
# Initialize External Plugin Search
if (t == None or t in state.SearchType) and config.content_type.plugins:
logger.info("🔌 Setting up search for plugins")
model.plugin_search = {}
for plugin_type, plugin_config in config.content_type.plugins.items():
model.plugin_search[plugin_type] = text_search.setup(
JsonlToJsonl,
plugin_config,
search_config=config.search_type.asymmetric,
regenerate=regenerate,
filters=[DateFilter(), WordFilter(), FileFilter()],
)
# Invalidate Query Cache
state.query_cache = LRU()
return model
def configure_processor(processor_config: ProcessorConfig):
if not processor_config:
return
processor = ProcessorConfigModel()
# Initialize Conversation Processor
if processor_config.conversation:
logger.info("💬 Setting up conversation processor")
processor.conversation = configure_conversation_processor(processor_config.conversation)
return processor
def configure_conversation_processor(conversation_processor_config):
conversation_processor = ConversationProcessorConfigModel(conversation_processor_config)
conversation_logfile = resolve_absolute_path(conversation_processor.conversation_logfile)
if conversation_logfile.is_file():
# Load Metadata Logs from Conversation Logfile
with conversation_logfile.open("r") as f:
conversation_processor.meta_log = json.load(f)
logger.debug(f"Loaded conversation logs from {conversation_logfile}")
else:
# Initialize Conversation Logs
conversation_processor.meta_log = {}
conversation_processor.chat_session = ""
return conversation_processor
@schedule.repeat(schedule.every(17).minutes)
def save_chat_session():
# No need to create empty log file
if not (
state.processor_config
and state.processor_config.conversation
and state.processor_config.conversation.meta_log
and state.processor_config.conversation.chat_session
):
return
# Summarize Conversation Logs for this Session
chat_session = state.processor_config.conversation.chat_session
openai_api_key = state.processor_config.conversation.openai_api_key
conversation_log = state.processor_config.conversation.meta_log
model = state.processor_config.conversation.model
session = {
"summary": summarize(chat_session, summary_type="chat", model=model, api_key=openai_api_key),
"session-start": conversation_log.get("session", [{"session-end": 0}])[-1]["session-end"],
"session-end": len(conversation_log["chat"]),
}
if "session" in conversation_log:
conversation_log["session"].append(session)
else:
conversation_log["session"] = [session]
# Save Conversation Metadata Logs to Disk
conversation_logfile = resolve_absolute_path(state.processor_config.conversation.conversation_logfile)
conversation_logfile.parent.mkdir(parents=True, exist_ok=True) # create conversation directory if doesn't exist
with open(conversation_logfile, "w+", encoding="utf-8") as logfile:
json.dump(conversation_log, logfile, indent=2)
state.processor_config.conversation.chat_session = None
logger.info("📩 Saved current chat session to conversation logs")
@schedule.repeat(schedule.every(59).minutes)
def upload_telemetry():
if not state.config.app.should_log_telemetry or not state.telemetry:
message = "📡 No telemetry to upload" if not state.telemetry else "📡 Telemetry logging disabled"
logger.debug(message)
return
try:
logger.debug(f"📡 Upload usage telemetry to {constants.telemetry_server}:\n{state.telemetry}")
requests.post(constants.telemetry_server, json=state.telemetry)
except Exception as e:
logger.error(f"📡 Error uploading telemetry: {e}")
else:
state.telemetry = []

View File

@@ -3,12 +3,12 @@ from PyQt6 import QtWidgets
from PyQt6.QtCore import QDir
# Internal Packages
from src.utils.config import SearchType
from src.utils.helpers import is_none_or_empty
from khoj.utils.config import SearchType
from khoj.utils.helpers import is_none_or_empty
class FileBrowser(QtWidgets.QWidget):
def __init__(self, title, search_type: SearchType=None, default_files:list=[]):
def __init__(self, title, search_type: SearchType = None, default_files: list = []):
QtWidgets.QWidget.__init__(self)
layout = QtWidgets.QHBoxLayout()
self.setLayout(layout)
@@ -22,51 +22,54 @@ class FileBrowser(QtWidgets.QWidget):
self.label.setFixedWidth(95)
self.label.setWordWrap(True)
layout.addWidget(self.label)
self.lineEdit = QtWidgets.QPlainTextEdit(self)
self.lineEdit.setFixedWidth(330)
self.setFiles(default_files)
self.lineEdit.setFixedHeight(min(7+20*len(self.lineEdit.toPlainText().split('\n')),90))
self.lineEdit.textChanged.connect(self.updateFieldHeight)
self.lineEdit.setFixedHeight(min(7 + 20 * len(self.lineEdit.toPlainText().split("\n")), 90))
self.lineEdit.textChanged.connect(self.updateFieldHeight) # type: ignore[attr-defined]
layout.addWidget(self.lineEdit)
self.button = QtWidgets.QPushButton('Add')
self.button.clicked.connect(self.storeFilesSelectedInFileDialog)
self.button = QtWidgets.QPushButton("Add")
self.button.clicked.connect(self.storeFilesSelectedInFileDialog) # type: ignore[attr-defined]
layout.addWidget(self.button)
layout.addStretch()
def getFileFilter(self, search_type):
if search_type == SearchType.Org:
return 'Org-Mode Files (*.org)'
return "Org-Mode Files (*.org)"
elif search_type == SearchType.Ledger:
return 'Beancount Files (*.bean *.beancount)'
return "Beancount Files (*.bean *.beancount)"
elif search_type == SearchType.Markdown:
return 'Markdown Files (*.md *.markdown)'
return "Markdown Files (*.md *.markdown)"
elif search_type == SearchType.Music:
return 'Org-Music Files (*.org)'
return "Org-Music Files (*.org)"
elif search_type == SearchType.Image:
return 'Images (*.jp[e]g)'
return "Images (*.jp[e]g)"
def storeFilesSelectedInFileDialog(self):
filepaths = self.getPaths()
if self.search_type == SearchType.Image:
filepaths.append(QtWidgets.QFileDialog.getExistingDirectory(self, caption='Choose Folder',
directory=self.dirpath))
filepaths.append(
QtWidgets.QFileDialog.getExistingDirectory(self, caption="Choose Folder", directory=self.dirpath)
)
else:
filepaths.extend(QtWidgets.QFileDialog.getOpenFileNames(self, caption='Choose Files',
directory=self.dirpath,
filter=self.filter_name)[0])
filepaths.extend(
QtWidgets.QFileDialog.getOpenFileNames(
self, caption="Choose Files", directory=self.dirpath, filter=self.filter_name
)[0]
)
self.setFiles(filepaths)
def setFiles(self, paths:list):
def setFiles(self, paths: list):
self.filepaths = [path for path in paths if not is_none_or_empty(path)]
self.lineEdit.setPlainText("\n".join(self.filepaths))
def getPaths(self) -> list:
if self.lineEdit.toPlainText() == '':
if self.lineEdit.toPlainText() == "":
return []
else:
return self.lineEdit.toPlainText().split('\n')
return self.lineEdit.toPlainText().split("\n")
def updateFieldHeight(self):
self.lineEdit.setFixedHeight(min(7+20*len(self.lineEdit.toPlainText().split('\n')),90))
self.lineEdit.setFixedHeight(min(7 + 20 * len(self.lineEdit.toPlainText().split("\n")), 90))

View File

@@ -2,11 +2,11 @@
from PyQt6 import QtWidgets
# Internal Packages
from src.utils.config import ProcessorType
from khoj.utils.config import ProcessorType
class LabelledTextField(QtWidgets.QWidget):
def __init__(self, title, processor_type: ProcessorType=None, default_value: str=None):
def __init__(self, title, processor_type: ProcessorType = None, default_value: str = None):
QtWidgets.QWidget.__init__(self)
layout = QtWidgets.QHBoxLayout()
self.setLayout(layout)

View File

@@ -9,13 +9,13 @@ from PyQt6 import QtGui, QtWidgets
from PyQt6.QtCore import Qt, QThread, QObject, pyqtSignal
# Internal Packages
from src.configure import configure_server
from src.interface.desktop.file_browser import FileBrowser
from src.interface.desktop.labelled_text_field import LabelledTextField
from src.utils import constants, state, yaml as yaml_utils
from src.utils.cli import cli
from src.utils.config import SearchType, ProcessorType
from src.utils.helpers import merge_dicts, resolve_absolute_path
from khoj.configure import configure_server
from khoj.interface.desktop.file_browser import FileBrowser
from khoj.interface.desktop.labelled_text_field import LabelledTextField
from khoj.utils import constants, state, yaml as yaml_utils
from khoj.utils.cli import cli
from khoj.utils.config import SearchType, ProcessorType
from khoj.utils.helpers import merge_dicts, resolve_absolute_path
class MainWindow(QtWidgets.QMainWindow):
@@ -31,9 +31,9 @@ class MainWindow(QtWidgets.QMainWindow):
self.config_file = config_file
# Set regenerate flag to regenerate embeddings everytime user clicks configure
if state.cli_args:
state.cli_args += ['--regenerate']
state.cli_args += ["--regenerate"]
else:
state.cli_args = ['--regenerate']
state.cli_args = ["--regenerate"]
# Load config from existing config, if exists, else load from default config
if resolve_absolute_path(self.config_file).exists():
@@ -49,22 +49,27 @@ class MainWindow(QtWidgets.QMainWindow):
self.setFixedWidth(600)
# Set Window Icon
icon_path = constants.web_directory / 'assets/icons/favicon-144x144.png'
self.setWindowIcon(QtGui.QIcon(f'{icon_path.absolute()}'))
icon_path = constants.web_directory / "assets/icons/favicon-144x144.png"
self.setWindowIcon(QtGui.QIcon(f"{icon_path.absolute()}"))
# Initialize Configure Window Layout
self.layout = QtWidgets.QVBoxLayout()
self.wlayout = QtWidgets.QVBoxLayout()
# Add Settings Panels for each Search Type to Configure Window Layout
self.search_settings_panels = []
for search_type in SearchType:
current_content_config = self.current_config['content-type'].get(search_type, {})
current_content_config = self.current_config["content-type"].get(
search_type, None
) or self.get_default_config(search_type=search_type)
self.search_settings_panels += [self.add_settings_panel(current_content_config, search_type)]
# Add Conversation Processor Panel to Configure Screen
self.processor_settings_panels = []
conversation_type = ProcessorType.Conversation
current_conversation_config = self.current_config['processor'].get(conversation_type, {})
if self.current_config["processor"] and conversation_type in self.current_config["processor"]:
current_conversation_config = self.current_config["processor"][conversation_type]
else:
current_conversation_config = self.get_default_config(processor_type=conversation_type)
self.processor_settings_panels += [self.add_processor_panel(current_conversation_config, conversation_type)]
# Add Action Buttons Panel
@@ -73,7 +78,7 @@ class MainWindow(QtWidgets.QMainWindow):
# Set the central widget of the Window. Widget will expand
# to take up all the space in the window by default.
self.config_window = QtWidgets.QWidget()
self.config_window.setLayout(self.layout)
self.config_window.setLayout(self.wlayout)
self.setCentralWidget(self.config_window)
self.position_window()
@@ -81,35 +86,35 @@ class MainWindow(QtWidgets.QMainWindow):
"Add Settings Panel for specified Search Type. Toggle Editable Search Types"
# Get current files from config for given search type
if search_type == SearchType.Image:
current_content_files = current_content_config.get('input-directories', [])
file_input_text = f'{search_type.name} Folders'
current_content_files = current_content_config.get("input-directories", [])
file_input_text = f"{search_type.name} Folders"
else:
current_content_files = current_content_config.get('input-files', [])
file_input_text = f'{search_type.name} Files'
current_content_files = current_content_config.get("input-files", [])
file_input_text = f"{search_type.name} Files"
# Create widgets to display settings for given search type
search_type_settings = QtWidgets.QWidget()
search_type_layout = QtWidgets.QVBoxLayout(search_type_settings)
enable_search_type = SearchCheckBox(f"Search {search_type.name}", search_type)
# Add file browser to set input files for given search type
input_files = FileBrowser(file_input_text, search_type, current_content_files)
input_files = FileBrowser(file_input_text, search_type, current_content_files or [])
# Set enabled/disabled based on checkbox state
enable_search_type.setChecked(current_content_files is not None and len(current_content_files) > 0)
input_files.setEnabled(enable_search_type.isChecked())
enable_search_type.stateChanged.connect(lambda _: input_files.setEnabled(enable_search_type.isChecked()))
enable_search_type.stateChanged.connect(lambda _: input_files.setEnabled(enable_search_type.isChecked())) # type: ignore[attr-defined]
# Add setting widgets for given search type to panel
search_type_layout.addWidget(enable_search_type)
search_type_layout.addWidget(input_files)
self.layout.addWidget(search_type_settings)
self.wlayout.addWidget(search_type_settings)
return search_type_settings
def add_processor_panel(self, current_conversation_config: dict, processor_type: ProcessorType):
"Add Conversation Processor Panel"
# Get current settings from config for given processor type
current_openai_api_key = current_conversation_config.get('openai-api-key', None)
current_openai_api_key = current_conversation_config.get("openai-api-key", None)
# Create widgets to display settings for given processor type
processor_type_settings = QtWidgets.QWidget()
@@ -121,12 +126,12 @@ class MainWindow(QtWidgets.QMainWindow):
# Set enabled/disabled based on checkbox state
enable_conversation.setChecked(current_openai_api_key is not None)
input_field.setEnabled(enable_conversation.isChecked())
enable_conversation.stateChanged.connect(lambda _: input_field.setEnabled(enable_conversation.isChecked()))
enable_conversation.stateChanged.connect(lambda _: input_field.setEnabled(enable_conversation.isChecked())) # type: ignore[attr-defined]
# Add setting widgets for given processor type to panel
processor_type_layout.addWidget(enable_conversation)
processor_type_layout.addWidget(input_field)
self.layout.addWidget(processor_type_settings)
self.wlayout.addWidget(processor_type_settings)
return processor_type_settings
@@ -137,20 +142,22 @@ class MainWindow(QtWidgets.QMainWindow):
action_bar_layout = QtWidgets.QHBoxLayout(action_bar)
self.configure_button = QtWidgets.QPushButton("Configure", clicked=self.configure_app)
self.search_button = QtWidgets.QPushButton("Search", clicked=lambda: webbrowser.open(f'http://{state.host}:{state.port}/'))
self.search_button = QtWidgets.QPushButton(
"Search", clicked=lambda: webbrowser.open(f"http://{state.host}:{state.port}/")
)
self.search_button.setEnabled(not self.first_run)
action_bar_layout.addWidget(self.configure_button)
action_bar_layout.addWidget(self.search_button)
self.layout.addWidget(action_bar)
self.wlayout.addWidget(action_bar)
def get_default_config(self, search_type:SearchType=None, processor_type:ProcessorType=None):
def get_default_config(self, search_type: SearchType = None, processor_type: ProcessorType = None):
"Get default config"
config = constants.default_config
if search_type:
return config['content-type'][search_type]
return config["content-type"][search_type] # type: ignore
elif processor_type:
return config['processor'][processor_type]
return config["processor"][processor_type] # type: ignore
else:
return config
@@ -158,10 +165,12 @@ class MainWindow(QtWidgets.QMainWindow):
"Add Error Message to Configure Screen"
# Remove any existing error messages
for message_prefix in ErrorType:
for i in reversed(range(self.layout.count())):
current_widget = self.layout.itemAt(i).widget()
if isinstance(current_widget, QtWidgets.QLabel) and current_widget.text().startswith(message_prefix.value):
self.layout.removeWidget(current_widget)
for i in reversed(range(self.wlayout.count())):
current_widget = self.wlayout.itemAt(i).widget()
if isinstance(current_widget, QtWidgets.QLabel) and current_widget.text().startswith(
message_prefix.value
):
self.wlayout.removeWidget(current_widget)
current_widget.deleteLater()
# Add new error message
@@ -170,7 +179,7 @@ class MainWindow(QtWidgets.QMainWindow):
error_message.setWordWrap(True)
error_message.setText(message)
error_message.setStyleSheet("color: red")
self.layout.addWidget(error_message)
self.wlayout.addWidget(error_message)
def update_search_settings(self):
"Update config with search settings from UI"
@@ -180,18 +189,24 @@ class MainWindow(QtWidgets.QMainWindow):
continue
if isinstance(child, SearchCheckBox):
# Search Type Disabled
if not child.isChecked() and child.search_type in self.new_config['content-type']:
del self.new_config['content-type'][child.search_type]
if not child.isChecked() and child.search_type in self.new_config["content-type"]:
del self.new_config["content-type"][child.search_type]
# Search Type (re)-Enabled
if child.isChecked():
current_search_config = self.current_config['content-type'].get(child.search_type, {})
default_search_config = self.get_default_config(search_type = child.search_type)
self.new_config['content-type'][child.search_type.value] = merge_dicts(current_search_config, default_search_config)
elif isinstance(child, FileBrowser) and child.search_type in self.new_config['content-type']:
current_search_config = self.current_config["content-type"].get(child.search_type, {})
default_search_config = self.get_default_config(search_type=child.search_type)
self.new_config["content-type"][child.search_type.value] = merge_dicts(
current_search_config, default_search_config
)
elif isinstance(child, FileBrowser) and child.search_type in self.new_config["content-type"]:
if child.search_type.value == SearchType.Image:
self.new_config['content-type'][child.search_type.value]['input-directories'] = child.getPaths() if child.getPaths() != [] else None
self.new_config["content-type"][child.search_type.value]["input-directories"] = (
child.getPaths() if child.getPaths() != [] else None
)
else:
self.new_config['content-type'][child.search_type.value]['input-files'] = child.getPaths() if child.getPaths() != [] else None
self.new_config["content-type"][child.search_type.value]["input-files"] = (
child.getPaths() if child.getPaths() != [] else None
)
def update_processor_settings(self):
"Update config with conversation settings from UI"
@@ -201,16 +216,20 @@ class MainWindow(QtWidgets.QMainWindow):
continue
if isinstance(child, ProcessorCheckBox):
# Processor Type Disabled
if not child.isChecked() and child.processor_type in self.new_config['processor']:
del self.new_config['processor'][child.processor_type]
if not child.isChecked() and child.processor_type in self.new_config["processor"]:
del self.new_config["processor"][child.processor_type]
# Processor Type (re)-Enabled
if child.isChecked():
current_processor_config = self.current_config['processor'].get(child.processor_type, {})
default_processor_config = self.get_default_config(processor_type = child.processor_type)
self.new_config['processor'][child.processor_type.value] = merge_dicts(current_processor_config, default_processor_config)
elif isinstance(child, LabelledTextField) and child.processor_type in self.new_config['processor']:
current_processor_config = self.current_config["processor"].get(child.processor_type, {})
default_processor_config = self.get_default_config(processor_type=child.processor_type)
self.new_config["processor"][child.processor_type.value] = merge_dicts(
current_processor_config, default_processor_config
)
elif isinstance(child, LabelledTextField) and child.processor_type in self.new_config["processor"]:
if child.processor_type == ProcessorType.Conversation:
self.new_config['processor'][child.processor_type.value]['openai-api-key'] = child.input_field.toPlainText() if child.input_field.toPlainText() != '' else None
self.new_config["processor"][child.processor_type.value]["openai-api-key"] = (
child.input_field.toPlainText() if child.input_field.toPlainText() != "" else None
)
def save_settings_to_file(self) -> bool:
"Save validated settings to file"
@@ -278,7 +297,7 @@ class MainWindow(QtWidgets.QMainWindow):
self.show()
self.setWindowState(Qt.WindowState.WindowActive)
self.activateWindow() # For Bringing to Top on Windows
self.raise_() # For Bringing to Top from Minimized State on OSX
self.raise_() # For Bringing to Top from Minimized State on OSX
class SettingsLoader(QObject):
@@ -312,6 +331,7 @@ class ProcessorCheckBox(QtWidgets.QCheckBox):
self.processor_type = processor_type
super(ProcessorCheckBox, self).__init__(text, parent=parent)
class ErrorType(Enum):
"Error Types"
ConfigLoadingError = "Config Loading Error"

View File

@@ -5,10 +5,11 @@ import webbrowser
from PyQt6 import QtGui, QtWidgets
# Internal Packages
from src.utils import constants, state
from khoj.utils import constants, state
from khoj.interface.desktop.main_window import MainWindow
def create_system_tray(gui: QtWidgets.QApplication, main_window: QtWidgets.QMainWindow):
def create_system_tray(gui: QtWidgets.QApplication, main_window: MainWindow):
"""Create System Tray with Menu. Menu contain options to
1. Open Search Page on the Web Interface
2. Open App Configuration Screen
@@ -16,23 +17,23 @@ def create_system_tray(gui: QtWidgets.QApplication, main_window: QtWidgets.QMain
"""
# Create the system tray with icon
icon_path = constants.web_directory / 'assets/icons/favicon-144x144.png'
icon = QtGui.QIcon(f'{icon_path.absolute()}')
icon_path = constants.web_directory / "assets/icons/favicon-144x144.png"
icon = QtGui.QIcon(f"{icon_path.absolute()}")
tray = QtWidgets.QSystemTrayIcon(icon)
tray.setVisible(True)
# Create the menu and menu actions
menu = QtWidgets.QMenu()
menu_actions = [
('Search', lambda: webbrowser.open(f'http://{state.host}:{state.port}/')),
('Configure', main_window.show_on_top),
('Quit', gui.quit),
("Search", lambda: webbrowser.open(f"http://{state.host}:{state.port}/")),
("Configure", main_window.show_on_top),
("Quit", gui.quit),
]
# Add the menu actions to the menu
for action_text, action_function in menu_actions:
menu_action = QtGui.QAction(action_text, menu)
menu_action.triggered.connect(action_function)
menu_action.triggered.connect(action_function) # type: ignore[attr-defined]
menu.addAction(menu_action)
# Add the menu to the system tray

View File

@@ -26,4 +26,4 @@ span.config-element-value {
button {
cursor: pointer;
}
}

View File

@@ -10,7 +10,7 @@ var emptyValueDefault = "🖊️";
/**
* Fetch the existing config file.
*/
fetch("/config/data")
fetch("/api/config/data")
.then(response => response.json())
.then(data => {
rawConfig = data;
@@ -26,7 +26,7 @@ fetch("/config/data")
configForm.addEventListener("submit", (event) => {
event.preventDefault();
console.log(rawConfig);
fetch("/config/data", {
fetch("/api/config/data", {
method: "POST",
credentials: "same-origin",
headers: {
@@ -46,7 +46,7 @@ regenerateButton.addEventListener("click", (event) => {
event.preventDefault();
regenerateButton.style.cursor = "progress";
regenerateButton.disabled = true;
fetch("/regenerate")
fetch("/api/update?force=true")
.then(response => response.json())
.then(data => {
regenerateButton.style.cursor = "pointer";
@@ -56,10 +56,10 @@ regenerateButton.addEventListener("click", (event) => {
})
/**
* Adds config elements to the DOM representing the sub-components
* Adds config elements to the DOM representing the sub-components
* of one of the fields in the raw config file.
* @param {the parent element} element
* @param {the data to be rendered for this element and its children} data
* @param {the parent element} element
* @param {the data to be rendered for this element and its children} data
*/
function processChildren(element, data) {
for (let key in data) {
@@ -78,11 +78,11 @@ function processChildren(element, data) {
}
/**
* Takes an element, and replaces it with an editable
* Takes an element, and replaces it with an editable
* element with the same data in place.
* @param {the original element to be replaced} original
* @param {the source data to be rendered for the new element} data
* @param {the key for this input in the source data} key
* @param {the original element to be replaced} original
* @param {the source data to be rendered for the new element} data
* @param {the key for this input in the source data} key
*/
function makeElementEditable(original, data, key) {
original.addEventListener("click", () => {
@@ -98,8 +98,8 @@ function makeElementEditable(original, data, key) {
/**
* Creates a node corresponding to the value of a config element.
* @param {the source data} data
* @param {the key corresponding to this node's data} key
* @param {the source data} data
* @param {the key corresponding to this node's data} key
* @returns A new element which corresponds to the value in some field.
*/
function createValueNode(data, key) {
@@ -111,11 +111,11 @@ function createValueNode(data, key) {
}
/**
* Replaces an existing input element with an element with the same data, which is not an input.
* Replaces an existing input element with an element with the same data, which is not an input.
* If the input data for this element was changed, update the corresponding data in the raw config.
* @param {the original element to be replaced} original
* @param {the source data} data
* @param {the key corresponding to this node's data} key
* @param {the original element to be replaced} original
* @param {the source data} data
* @param {the key corresponding to this node's data} key
*/
function fixInputOnFocusOut(original, data, key) {
original.addEventListener("blur", () => {

View File

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View File

Before

Width:  |  Height:  |  Size: 159 KiB

After

Width:  |  Height:  |  Size: 159 KiB

View File

Before

Width:  |  Height:  |  Size: 29 KiB

After

Width:  |  Height:  |  Size: 29 KiB

View File

@@ -1,6 +1,6 @@
/*! markdown-it 13.0.1 https://github.com/markdown-it/markdown-it @license MIT */
(function(global, factory) {
typeof exports === "object" && typeof module !== "undefined" ? module.exports = factory() : typeof define === "function" && define.amd ? define(factory) : (global = typeof globalThis !== "undefined" ? globalThis : global || self,
typeof exports === "object" && typeof module !== "undefined" ? module.exports = factory() : typeof define === "function" && define.amd ? define(factory) : (global = typeof globalThis !== "undefined" ? globalThis : global || self,
global.markdownit = factory());
})(this, (function() {
"use strict";
@@ -2164,7 +2164,7 @@
var encodeCache = {};
// Create a lookup array where anything but characters in `chars` string
// and alphanumeric chars is percent-encoded.
function getEncodeCache(exclude) {
var i, ch, cache = encodeCache[exclude];
if (cache) {
@@ -2187,11 +2187,11 @@
}
// Encode unsafe characters with percent-encoding, skipping already
// encoded sequences.
// - string - string to encode
// - exclude - list of characters to ignore (in addition to a-zA-Z0-9)
// - keepEscaped - don't encode '%' in a correct escape sequence (default: true)
function encode$2(string, exclude, keepEscaped) {
var i, l, code, nextCode, cache, result = "";
if (typeof exclude !== "string") {
@@ -2253,7 +2253,7 @@
return cache;
}
// Decode percent-encoded string.
function decode$2(string, exclude) {
var cache;
if (typeof exclude !== "string") {
@@ -2340,26 +2340,26 @@
return result;
};
// Copyright Joyent, Inc. and other Node contributors.
// Changes from joyent/node:
// 1. No leading slash in paths,
// e.g. in `url.parse('http://foo?bar')` pathname is ``, not `/`
// 2. Backslashes are not replaced with slashes,
// so `http:\\example.org\` is treated like a relative path
// 3. Trailing colon is treated like a part of the path,
// i.e. in `http://example.org:foo` pathname is `:foo`
// 4. Nothing is URL-encoded in the resulting object,
// (in joyent/node some chars in auth and paths are encoded)
// 5. `url.parse()` does not have `parseQueryString` argument
// 6. Removed extraneous result properties: `host`, `path`, `query`, etc.,
// which can be constructed using other parts of the url.
function Url() {
this.protocol = null;
this.slashes = null;
@@ -2373,28 +2373,28 @@
// Reference: RFC 3986, RFC 1808, RFC 2396
// define these here so at least they only have to be
// compiled once on the first module load.
var protocolPattern = /^([a-z0-9.+-]+:)/i, portPattern = /:[0-9]*$/,
var protocolPattern = /^([a-z0-9.+-]+:)/i, portPattern = /:[0-9]*$/,
// Special case for a simple path URL
simplePathPattern = /^(\/\/?(?!\/)[^\?\s]*)(\?[^\s]*)?$/,
simplePathPattern = /^(\/\/?(?!\/)[^\?\s]*)(\?[^\s]*)?$/,
// RFC 2396: characters reserved for delimiting URLs.
// We actually just auto-escape these.
delims = [ "<", ">", '"', "`", " ", "\r", "\n", "\t" ],
delims = [ "<", ">", '"', "`", " ", "\r", "\n", "\t" ],
// RFC 2396: characters not allowed for various reasons.
unwise = [ "{", "}", "|", "\\", "^", "`" ].concat(delims),
unwise = [ "{", "}", "|", "\\", "^", "`" ].concat(delims),
// Allowed by RFCs, but cause of XSS attacks. Always escape these.
autoEscape = [ "'" ].concat(unwise),
autoEscape = [ "'" ].concat(unwise),
// Characters that are never ever allowed in a hostname.
// Note that any invalid chars are also handled, but these
// are the ones that are *expected* to be seen, so we fast-path
// them.
nonHostChars = [ "%", "/", "?", ";", "#" ].concat(autoEscape), hostEndingChars = [ "/", "?", "#" ], hostnameMaxLen = 255, hostnamePartPattern = /^[+a-z0-9A-Z_-]{0,63}$/, hostnamePartStart = /^([+a-z0-9A-Z_-]{0,63})(.*)$/,
nonHostChars = [ "%", "/", "?", ";", "#" ].concat(autoEscape), hostEndingChars = [ "/", "?", "#" ], hostnameMaxLen = 255, hostnamePartPattern = /^[+a-z0-9A-Z_-]{0,63}$/, hostnamePartStart = /^([+a-z0-9A-Z_-]{0,63})(.*)$/,
// protocols that can allow "unsafe" and "unwise" chars.
/* eslint-disable no-script-url */
// protocols that never have a hostname.
hostlessProtocol = {
javascript: true,
"javascript:": true
},
},
// protocols that always contain a // bit.
slashedProtocol = {
http: true,
@@ -2632,7 +2632,7 @@
return _hasOwnProperty.call(object, key);
}
// Merge objects
function assign(obj /*from1, from2, from3, ...*/) {
var sources = Array.prototype.slice.call(arguments, 1);
sources.forEach((function(source) {
@@ -2798,12 +2798,12 @@
return regex$4.test(ch);
}
// Markdown ASCII punctuation characters.
// !, ", #, $, %, &, ', (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, @, [, \, ], ^, _, `, {, |, }, or ~
// http://spec.commonmark.org/0.15/#ascii-punctuation-character
// Don't confuse with unicode punctuation !!! It lacks some chars in ascii range.
function isMdAsciiPunct(ch) {
switch (ch) {
case 33 /* ! */ :
@@ -2845,58 +2845,58 @@
}
}
// Hepler to unify [reference labels].
function normalizeReference(str) {
// Trim and collapse whitespace
str = str.trim().replace(/\s+/g, " ");
// In node v10 'ẞ'.toLowerCase() === 'Ṿ', which is presumed to be a bug
// fixed in v12 (couldn't find any details).
// So treat this one as a special case
// (remove this when node v10 is no longer supported).
if ("\u1e9e".toLowerCase() === "\u1e7e") {
str = str.replace(/\u1e9e/g, "\xdf");
}
// .toLowerCase().toUpperCase() should get rid of all differences
// between letter variants.
// Simple .toLowerCase() doesn't normalize 125 code points correctly,
// and .toUpperCase doesn't normalize 6 of them (list of exceptions:
// İ, ϴ, ẞ, Ω, , Å - those are already uppercased, but have differently
// uppercased versions).
// Here's an example showing how it happens. Lets take greek letter omega:
// uppercase U+0398 (Θ), U+03f4 (ϴ) and lowercase U+03b8 (θ), U+03d1 (ϑ)
// Unicode entries:
// 0398;GREEK CAPITAL LETTER THETA;Lu;0;L;;;;;N;;;;03B8;
// 03B8;GREEK SMALL LETTER THETA;Ll;0;L;;;;;N;;;0398;;0398
// 03D1;GREEK THETA SYMBOL;Ll;0;L;<compat> 03B8;;;;N;GREEK SMALL LETTER SCRIPT THETA;;0398;;0398
// 03F4;GREEK CAPITAL THETA SYMBOL;Lu;0;L;<compat> 0398;;;;N;;;;03B8;
// Case-insensitive comparison should treat all of them as equivalent.
// But .toLowerCase() doesn't change ϑ (it's already lowercase),
// and .toUpperCase() doesn't change ϴ (already uppercase).
// Applying first lower then upper case normalizes any character:
// '\u0398\u03f4\u03b8\u03d1'.toLowerCase().toUpperCase() === '\u0398\u0398\u0398\u0398'
// Note: this is equivalent to unicode case folding; unicode normalization
// is a different step that is not required here.
// Final result should be uppercased, because it's later stored in an object
// (this avoid a conflict with Object.prototype members,
// most notably, `__proto__`)
return str.toLowerCase().toUpperCase();
}
////////////////////////////////////////////////////////////////////////////////
// Re-export libraries commonly used in both markdown-it and its plugins,
// so plugins won't have to depend on them explicitly, which reduces their
// bundled size (e.g. a browser build).
exports.lib = {};
exports.lib.mdurl = mdurl;
exports.lib.ucmicro = uc_micro;
@@ -3129,7 +3129,7 @@
var token = tokens[idx];
// "alt" attr MUST be set, even if empty. Because it's mandatory and
// should be placed on proper position for tests.
// Replace content with actual value
token.attrs[token.attrIndex("alt")][1] = slf.renderInlineAsText(token.children, options, env);
return slf.renderToken(tokens, idx, options);
@@ -3215,11 +3215,11 @@
}
// Insert a newline between hidden paragraph and subsequent opening
// block-level tag.
// For example, here we should insert a newline before blockquote:
// - a
// >
if (token.block && token.nesting !== -1 && idx && tokens[idx - 1].hidden) {
result += "\n";
}
@@ -3343,16 +3343,16 @@
// }
this.__rules__ = [];
// Cached rule chains.
// First level - chain name, '' for default.
// Second level - diginal anchor for fast filtering by charcodes.
this.__cache__ = null;
}
////////////////////////////////////////////////////////////////////////////////
// Helper methods, should not be used directly
// Find rule index by name
Ruler.prototype.__find__ = function(name) {
for (var i = 0; i < this.__rules__.length; i++) {
if (this.__rules__[i].name === name) {
@@ -3362,7 +3362,7 @@
return -1;
};
// Build rules lookup cache
Ruler.prototype.__compile__ = function() {
var self = this;
var chains = [ "" ];
@@ -3726,7 +3726,7 @@
// Linkifier might send raw hostnames like "example.com", where url
// starts with domain name. So we prepend http:// in those cases,
// and remove it afterwards.
if (!links[ln].schema) {
urlText = state.md.normalizeLinkText("http://" + urlText).replace(/^http:\/\//, "");
} else if (links[ln].schema === "mailto:" && !/^mailto:/i.test(urlText)) {
@@ -3874,7 +3874,7 @@
isSingle = t[0] === "'";
// Find previous character,
// default to space if it's the beginning of the line
lastChar = 32;
if (t.index - 1 >= 0) {
lastChar = text.charCodeAt(t.index - 1);
@@ -3890,7 +3890,7 @@
}
// Find next character,
// default to space if it's the end of the line
nextChar = 32;
if (pos < max) {
nextChar = text.charCodeAt(pos);
@@ -4193,7 +4193,7 @@
// re-export Token class to use in core rules
StateCore.prototype.Token = token;
var state_core = StateCore;
var _rules$2 = [ [ "normalize", normalize ], [ "block", block ], [ "inline", inline ], [ "linkify", linkify$1 ], [ "replacements", replacements ], [ "smartquotes", smartquotes ],
var _rules$2 = [ [ "normalize", normalize ], [ "block", block ], [ "inline", inline ], [ "linkify", linkify$1 ], [ "replacements", replacements ], [ "smartquotes", smartquotes ],
// `text_join` finds `text_special` tokens (for escape sequences)
// and joins them with the rest of the text
[ "text_join", text_join ] ];
@@ -4590,12 +4590,12 @@
oldParentType = state.parentType;
state.parentType = "blockquote";
// Search the end of the block
// Block ends with either:
// 1. an empty line outside:
// ```
// > test
// ```
// 2. an empty line inside:
// ```
@@ -4712,7 +4712,7 @@
oldTShift.push(state.tShift[nextLine]);
oldSCount.push(state.sCount[nextLine]);
// A negative indentation means that this is a paragraph continuation
state.sCount[nextLine] = -1;
}
oldIndent = state.blkIndent;
@@ -4905,9 +4905,9 @@
}
token.map = listLines = [ startLine, 0 ];
token.markup = String.fromCharCode(markerCharCode);
// Iterate list items
nextLine = startLine;
prevEmptyEnd = false;
terminatorRules = state.md.block.ruler.getRules("list");
@@ -4957,7 +4957,7 @@
// - example list
// ^ listIndent position will be here
// ^ blkIndent position will be here
oldListIndent = state.listIndent;
state.listIndent = state.blkIndent;
state.blkIndent = indent;
@@ -4995,9 +4995,9 @@
if (nextLine >= endLine) {
break;
}
// Try to check if list is terminated or continued.
if (state.sCount[nextLine] < state.blkIndent) {
break;
}
@@ -5245,7 +5245,7 @@
var HTML_OPEN_CLOSE_TAG_RE = html_re.HTML_OPEN_CLOSE_TAG_RE;
// An array of opening and corresponding closing sequences for html tags,
// last argument defines whether it can terminate a paragraph or not
var HTML_SEQUENCES = [ [ /^<(script|pre|style|textarea)(?=(\s|>|$))/i, /<\/(script|pre|style|textarea)>/i, true ], [ /^<!--/, /-->/, true ], [ /^<\?/, /\?>/, true ], [ /^<![A-Z]/, />/, true ], [ /^<!\[CDATA\[/, /\]\]>/, true ], [ new RegExp("^</?(" + html_blocks.join("|") + ")(?=(\\s|/?>|$))", "i"), /^$/, true ], [ new RegExp(HTML_OPEN_CLOSE_TAG_RE.source + "\\s*$"), /^$/, false ] ];
var html_block = function html_block(state, startLine, endLine, silent) {
var i, nextLine, token, lineText, pos = state.bMarks[startLine] + state.tShift[startLine], max = state.eMarks[startLine];
@@ -5357,9 +5357,9 @@
if (state.sCount[nextLine] - state.blkIndent > 3) {
continue;
}
// Check for underline in setext header
if (state.sCount[nextLine] >= state.blkIndent) {
pos = state.bMarks[nextLine] + state.tShift[nextLine];
max = state.eMarks[nextLine];
@@ -5456,9 +5456,9 @@
// link to parser instance
this.md = md;
this.env = env;
// Internal state vartiables
this.tokens = tokens;
this.bMarks = [];
// line begin offsets for fast jumps
@@ -5470,14 +5470,14 @@
// indents for each line (tabs expanded)
// An amount of virtual spaces (tabs expanded) between beginning
// of each line (bMarks) and real beginning of that line.
// It exists only as a hack because blockquotes override bMarks
// losing information in the process.
// It's used only when expanding tabs, you can think about it as
// an initial tab length, e.g. bsCount=21 applied to string `\t123`
// means first tab should be expanded to 4-21%4 === 3 spaces.
this.bsCount = [];
// block parser variables
this.blkIndent = 0;
@@ -5543,7 +5543,7 @@
// don't count last fake line
}
// Push new token to "stream".
StateBlock.prototype.push = function(type, tag, nesting) {
var token$1 = new token(type, tag, nesting);
token$1.block = true;
@@ -5655,7 +5655,7 @@
// re-export Token class to use in block rules
StateBlock.prototype.Token = token;
var state_block = StateBlock;
var _rules$1 = [
var _rules$1 = [
// First 2 params - rule name & source. Secondary array - list of rules,
// which can be terminated by this one.
[ "table", table, [ "paragraph", "reference" ] ], [ "code", code ], [ "fence", fence, [ "paragraph", "reference", "blockquote", "list" ] ], [ "blockquote", blockquote, [ "paragraph", "reference", "blockquote", "list" ] ], [ "hr", hr, [ "paragraph", "reference", "blockquote", "list" ] ], [ "list", list, [ "paragraph", "reference", "blockquote" ] ], [ "reference", reference ], [ "html_block", html_block, [ "paragraph", "reference", "blockquote" ] ], [ "heading", heading, [ "paragraph", "reference", "blockquote" ] ], [ "lheading", lheading ], [ "paragraph", paragraph ] ];
@@ -5675,7 +5675,7 @@
}
}
// Generate tokens for input range
ParserBlock.prototype.tokenize = function(state, startLine, endLine) {
var ok, i, rules = this.ruler.getRules(""), len = rules.length, line = startLine, hasEmptyLines = false, maxNesting = state.md.options.maxNesting;
while (line < endLine) {
@@ -5696,7 +5696,7 @@
}
// Try all possible rules.
// On success, rule should:
// - update `state.line`
// - update `state.tokens`
// - return true
@@ -5961,7 +5961,7 @@
};
// ~~strike through~~
// Insert each marker as a separate text token, and add it to delimiter list
var tokenize$1 = function strikethrough(state, silent) {
var i, scanned, token, len, ch, start = state.pos, marker = state.src.charCodeAt(start);
if (silent) {
@@ -6027,9 +6027,9 @@
// If a marker sequence has an odd number of characters, it's splitted
// like this: `~~~~~` -> `~` + `~~` + `~~`, leaving one marker at the
// start of the sequence.
// So, we have to move all those markers after subsequent s_close tags.
while (loneMarkers.length) {
i = loneMarkers.pop();
j = i + 1;
@@ -6045,7 +6045,7 @@
}
}
// Walk through delimiter list and replace text tokens with tags
var postProcess_1$1 = function strikethrough(state) {
var curr, tokens_meta = state.tokens_meta, max = state.tokens_meta.length;
postProcess$1(state, state.delimiters);
@@ -6061,7 +6061,7 @@
};
// Process *this* and _that_
// Insert each marker as a separate text token, and add it to delimiter list
var tokenize = function emphasis(state, silent) {
var i, scanned, token, start = state.pos, marker = state.src.charCodeAt(start);
if (silent) {
@@ -6107,12 +6107,12 @@
endDelim = delimiters[startDelim.end];
// If the previous delimiter has the same marker and is adjacent to this one,
// merge those into one strong delimiter.
// `<em><em>whatever</em></em>` -> `<strong>whatever</strong>`
isStrong = i > 0 && delimiters[i - 1].end === startDelim.end + 1 &&
isStrong = i > 0 && delimiters[i - 1].end === startDelim.end + 1 &&
// check that first two markers match and adjacent
delimiters[i - 1].marker === startDelim.marker && delimiters[i - 1].token === startDelim.token - 1 &&
delimiters[i - 1].marker === startDelim.marker && delimiters[i - 1].token === startDelim.token - 1 &&
// check that last two markers are adjacent (we can safely assume they match)
delimiters[startDelim.end + 1].token === endDelim.token + 1;
ch = String.fromCharCode(startDelim.marker);
@@ -6136,7 +6136,7 @@
}
}
// Walk through delimiter list and replace text tokens with tags
var postProcess_1 = function emphasis(state) {
var curr, tokens_meta = state.tokens_meta, max = state.tokens_meta.length;
postProcess(state, state.delimiters);
@@ -6251,10 +6251,10 @@
href = ref.href;
title = ref.title;
}
// We found the end of the link, and know for a fact it's a valid link;
// so all that's left to do is to call tokenizer.
if (!silent) {
state.pos = labelStart;
state.posMax = labelEnd;
@@ -6375,10 +6375,10 @@
href = ref.href;
title = ref.title;
}
// We found the end of the link, and know for a fact it's a valid link;
// so all that's left to do is to call tokenizer.
if (!silent) {
content = state.src.slice(labelStart, labelEnd);
state.md.inline.parse(content, state.md, state.env, tokens = []);
@@ -6547,7 +6547,7 @@
// markers belong to same delimiter run if:
// - they have adjacent tokens
// - AND markers are the same
if (delimiters[headerIdx].marker !== closer.marker || lastTokenIdx !== closer.token - 1) {
headerIdx = closerIdx;
}
@@ -6555,7 +6555,7 @@
// Length is only used for emphasis-specific "rule of 3",
// if it's not defined (in strikethrough or 3rd party plugins),
// we can default it to 0 to disable those checks.
closer.length = closer.length || 0;
if (!closer.close) continue;
// Previously calculated lower bounds (previous fails)
@@ -6574,12 +6574,12 @@
if (opener.open && opener.end < 0) {
isOddMatch = false;
// from spec:
// If one of the delimiters can both open and close emphasis, then the
// sum of the lengths of the delimiter runs containing the opening and
// closing delimiters must not be a multiple of 3 unless both lengths
// are multiples of 3.
if (opener.close || closer.open) {
if ((opener.length + closer.length) % 3 === 0) {
if (opener.length % 3 !== 0 || closer.length % 3 !== 0) {
@@ -6678,7 +6678,7 @@
this.linkLevel = 0;
}
// Flush pending text
StateInline.prototype.pushPending = function() {
var token$1 = new token("text", "", 0);
token$1.content = this.pending;
@@ -6689,7 +6689,7 @@
};
// Push new token to "stream".
// If pending text exists - flush it as text token
StateInline.prototype.push = function(type, tag, nesting) {
if (this.pending) {
this.pushPending();
@@ -6718,10 +6718,10 @@
};
// Scan a sequence of emphasis-like markers, and determine whether
// it can start an emphasis sequence or end an emphasis sequence.
// - start - position to scan from (it should point at a valid marker);
// - canSplitWord - determine if these markers can be found inside a word
StateInline.prototype.scanDelims = function(start, canSplitWord) {
var pos = start, lastChar, nextChar, count, can_open, can_close, isLastWhiteSpace, isLastPunctChar, isNextWhiteSpace, isNextPunctChar, left_flanking = true, right_flanking = true, max = this.posMax, marker = this.src.charCodeAt(start);
// treat beginning of the line as a whitespace
@@ -6771,10 +6771,10 @@
var _rules = [ [ "text", text ], [ "linkify", linkify ], [ "newline", newline ], [ "escape", _escape ], [ "backticks", backticks ], [ "strikethrough", strikethrough.tokenize ], [ "emphasis", emphasis.tokenize ], [ "link", link ], [ "image", image ], [ "autolink", autolink ], [ "html_inline", html_inline ], [ "entity", entity ] ];
// `rule2` ruleset was created specifically for emphasis/strikethrough
// post-processing and may be changed in the future.
// Don't use this for anything except pairs (plugins working with `balance_pairs`).
var _rules2 = [ [ "balance_pairs", balance_pairs ], [ "strikethrough", strikethrough.postProcess ], [ "emphasis", emphasis.postProcess ],
var _rules2 = [ [ "balance_pairs", balance_pairs ], [ "strikethrough", strikethrough.postProcess ], [ "emphasis", emphasis.postProcess ],
// rules for pairs separate '**' into its own text tokens, which may be left unused,
// rule below merges unused segments back with the rest of the text
[ "fragments_join", fragments_join ] ];
@@ -6802,7 +6802,7 @@
}
// Skip single token by running all rules in validation mode;
// returns `true` if any rule reported success
ParserInline.prototype.skipToken = function(state) {
var ok, i, pos = state.pos, rules = this.ruler.getRules(""), len = rules.length, maxNesting = state.md.options.maxNesting, cache = state.cache;
if (typeof cache[pos] !== "undefined") {
@@ -6837,7 +6837,7 @@
cache[pos] = state.pos;
};
// Generate tokens for input range
ParserInline.prototype.tokenize = function(state) {
var ok, i, rules = this.ruler.getRules(""), len = rules.length, end = state.posMax, maxNesting = state.md.options.maxNesting;
while (state.pos < end) {
@@ -6928,11 +6928,11 @@
re.src_xn = "xn--[a-z0-9\\-]{1,59}";
// More to read about domain names
// http://serverfault.com/questions/638260/
re.src_domain_root =
re.src_domain_root =
// Allow letters & digits (http://test1)
"(?:" + re.src_xn + "|" + re.src_pseudo_letter + "{1,63}" + ")";
re.src_domain = "(?:" + re.src_xn + "|" + "(?:" + re.src_pseudo_letter + ")" + "|" + "(?:" + re.src_pseudo_letter + "(?:-|" + re.src_pseudo_letter + "){0,61}" + re.src_pseudo_letter + ")" + ")";
re.src_host = "(?:" +
re.src_host = "(?:" +
// Don't need IP check, because digits are already allowed in normal domain names
// src_ip4 +
// '|' +
@@ -6949,11 +6949,11 @@
// Rude test fuzzy links by host, for quick deny
re.tpl_host_fuzzy_test = "localhost|www\\.|\\.\\d{1,3}\\.|(?:\\.(?:%TLDS%)(?:" + re.src_ZPCc + "|>|$))";
re.tpl_email_fuzzy = "(^|" + text_separators + '|"|\\(|' + re.src_ZCc + ")" + "(" + re.src_email_name + "@" + re.tpl_host_fuzzy_strict + ")";
re.tpl_link_fuzzy =
re.tpl_link_fuzzy =
// Fuzzy link can't be prepended with .:/\- and non punctuation.
// but can start with > (markdown blockquote)
"(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|" + re.src_ZPCc + "))" + "((?![$+<=>^`|\uff5c])" + re.tpl_host_port_fuzzy_strict + re.src_path + ")";
re.tpl_link_no_ip_fuzzy =
re.tpl_link_no_ip_fuzzy =
// Fuzzy link can't be prepended with .:/\- and non punctuation.
// but can start with > (markdown blockquote)
"(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|" + re.src_ZPCc + "))" + "((?![$+<=>^`|\uff5c])" + re.tpl_host_port_no_ip_fuzzy_strict + re.src_path + ")";
@@ -6962,7 +6962,7 @@
////////////////////////////////////////////////////////////////////////////////
// Helpers
// Merge objects
function assign(obj /*from1, from2, from3, ...*/) {
var sources = Array.prototype.slice.call(arguments, 1);
sources.forEach((function(source) {
@@ -7025,7 +7025,7 @@
var tail = text.slice(pos);
if (!self.re.no_http) {
// compile lazily, because "host"-containing variables can change on tlds update.
self.re.no_http = new RegExp("^" + self.re.src_auth +
self.re.no_http = new RegExp("^" + self.re.src_auth +
// Don't allow single-level domains, because of false positives like '//test'
// with code comments
"(?:localhost|(?:(?:" + self.re.src_domain + ")\\.)+" + self.re.src_domain_root + ")" + self.re.src_port + self.re.src_host_terminator + self.re.src_path, "i");
@@ -7082,7 +7082,7 @@
};
}
// Schemas compiler. Build regexps.
function compile(self) {
// Load & clone RE patterns.
var re$1 = self.re = re(self.__opts__);
@@ -7101,9 +7101,9 @@
re$1.link_fuzzy = RegExp(untpl(re$1.tpl_link_fuzzy), "i");
re$1.link_no_ip_fuzzy = RegExp(untpl(re$1.tpl_link_no_ip_fuzzy), "i");
re$1.host_fuzzy_test = RegExp(untpl(re$1.tpl_host_fuzzy_test), "i");
// Compile each schema
var aliases = [];
self.__compiled__ = {};
// Reset compiled data
@@ -7144,9 +7144,9 @@
}
schemaError(name, val);
}));
// Compile postponed aliases
aliases.forEach((function(alias) {
if (!self.__compiled__[self.__schemas__[alias]]) {
// Silently fail on missed schemas to avoid errons on disable.
@@ -7156,16 +7156,16 @@
self.__compiled__[alias].validate = self.__compiled__[self.__schemas__[alias]].validate;
self.__compiled__[alias].normalize = self.__compiled__[self.__schemas__[alias]].normalize;
}));
// Fake record for guessed links
self.__compiled__[""] = {
validate: null,
normalize: createNormalizer()
};
// Build schema condition
var slist = Object.keys(self.__compiled__).filter((function(name) {
// Filter disabled & fake schemas
return name.length > 0 && self.__compiled__[name];
@@ -7175,9 +7175,9 @@
self.re.schema_search = RegExp("(^|(?!_)(?:[><\uff5c]|" + re$1.src_ZPCc + "))(" + slist + ")", "ig");
self.re.schema_at_start = RegExp("^" + self.re.schema_search.source, "i");
self.re.pretest = RegExp("(" + self.re.schema_test.source + ")|(" + self.re.host_fuzzy_test.source + ")|@", "i");
// Cleanup
resetScanCache(self);
}
/**
@@ -7673,7 +7673,7 @@
* @returns {String} The resulting string of Unicode symbols.
*/ function decode(input) {
// Don't use UCS-2
var output = [], inputLength = input.length, out, i = 0, n = initialN, bias = initialBias, basic, j, index, oldi, w, k, digit, t,
var output = [], inputLength = input.length, out, i = 0, n = initialN, bias = initialBias, basic, j, index, oldi, w, k, digit, t,
/** Cached calculation results */
baseMinusT;
// Handle the basic code points: let `basic` be the number of input code
@@ -7738,9 +7738,9 @@
* @param {String} input The string of Unicode symbols.
* @returns {String} The resulting Punycode string of ASCII-only symbols.
*/ function encode(input) {
var n, delta, handledCPCount, basicLength, bias, j, m, q, k, t, currentValue, output = [],
var n, delta, handledCPCount, basicLength, bias, j, m, q, k, t, currentValue, output = [],
/** `inputLength` will hold the number of code points in `input`. */
inputLength,
inputLength,
/** Cached calculation results */
handledCPCountPlusOne, baseMinusT, qMinusT;
// Convert the input in UCS-2 to Unicode
@@ -7993,13 +7993,13 @@
commonmark: commonmark
};
////////////////////////////////////////////////////////////////////////////////
// This validator can prohibit more than really needed to prevent XSS. It's a
// tradeoff to keep code simple and to be secure by default.
// If you need different setup - override validator method as you wish. Or
// replace it with dummy function and use external sanitizer.
var BAD_PROTO_RE = /^(vbscript|javascript|file|data):/;
var GOOD_DATA_RE = /^data:image\/(gif|png|jpeg|webp);/;
function validateLink(url) {

View File

@@ -0,0 +1,283 @@
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0 maximum-scale=1.0">
<title>Khoj</title>
<link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 144 144%22><text y=%22.86em%22 font-size=%22144%22>🦅</text></svg>">
<link rel="icon" type="image/png" sizes="144x144" href="/static/assets/icons/favicon-144x144.png">
<link rel="manifest" href="/static/khoj_chat.webmanifest">
</head>
<script>
function formatDate(date) {
// Format date in HH:MM, DD MMM YYYY format
let time_string = date.toLocaleTimeString('en-IN', { hour: '2-digit', minute: '2-digit', hour12: false });
let date_string = date.toLocaleString('en-IN', { year: 'numeric', month: 'short', day: '2-digit'}).replaceAll('-', ' ');
return `${time_string}, ${date_string}`;
}
function generateReference(reference, index) {
// Escape reference for HTML rendering
let escaped_ref = reference.replaceAll('"', '&quot;');
// Generate HTML for Chat Reference
return `<sup><abbr title="${escaped_ref}" tabindex="0">${index}</abbr></sup>`;
}
function renderMessage(message, by, dt=null) {
let message_time = formatDate(dt ?? new Date());
let by_name = by == "khoj" ? "🦅 Khoj" : "🤔 You";
// Generate HTML for Chat Message and Append to Chat Body
document.getElementById("chat-body").innerHTML += `
<div data-meta="${by_name} at ${message_time}" class="chat-message ${by}">
<div class="chat-message-text ${by}">${message}</div>
</div>
`;
// Scroll to bottom of chat-body element
document.getElementById("chat-body").scrollTop = document.getElementById("chat-body").scrollHeight;
}
function renderMessageWithReference(message, by, context=null, dt=null) {
let references = '';
if (context) {
references = context
.map((reference, index) => generateReference(reference, index))
.join("<sup>,</sup>");
}
renderMessage(message+references, by, dt);
}
function chat() {
// Extract required fields for search from form
let query = document.getElementById("chat-input").value.trim();
console.log(`Query: ${query}`);
// Short circuit on empty query
if (query.length === 0)
return;
// Add message by user to chat body
renderMessage(query, "you");
document.getElementById("chat-input").value = "";
// Generate backend API URL to execute query
let url = `/api/chat?q=${encodeURIComponent(query)}`;
// Call specified Khoj API
fetch(url)
.then(response => response.json())
.then(data => {
// Render message by Khoj to chat body
console.log(data.response);
renderMessageWithReference(data.response, "khoj", data.context);
});
}
function incrementalChat(event) {
// Send chat message on 'Enter'
if (event.key === 'Enter') {
chat();
}
}
window.onload = function () {
fetch('/api/chat')
.then(response => response.json())
.then(data => data.response)
.then(chat_logs => {
// Render conversation history, if any
chat_logs.forEach(chat_log => {
renderMessageWithReference(chat_log.message, chat_log.by, chat_log.context, new Date(chat_log.created));
});
});
// Set welcome message on load
renderMessage("Hey, what's up?", "khoj");
// Fill query field with value passed in URL query parameters, if any.
var query_via_url = new URLSearchParams(window.location.search).get("q");
if (query_via_url) {
document.getElementById("chat-input").value = query_via_url;
chat();
}
}
</script>
<body>
<!-- Chat Header -->
<h1>Khoj</h1>
<!-- Chat Body -->
<div id="chat-body"></div>
<!-- Chat Footer -->
<div id="chat-footer">
<input type="text" id="chat-input" class="option" onkeyup=incrementalChat(event) autofocus="autofocus" placeholder="What is the meaning of life?">
</div>
</body>
<style>
html, body {
height: 100%;
width: 100%;
padding: 0px;
margin: 0px;
}
body {
display: grid;
background: #f8fafc;
color: #475569;
text-align: center;
font-family: roboto, karma, segoe ui, sans-serif;
font-size: 20px;
font-weight: 300;
line-height: 1.5em;
}
body > * {
padding: 10px;
margin: 10px;
}
h1 {
font-weight: 200;
color: #017eff;
}
#chat-body {
font-size: medium;
margin: 0px;
line-height: 20px;
overflow-y: scroll; /* Make chat body scroll to see history */
}
/* add chat metatdata to bottom of bubble */
.chat-message::after {
content: attr(data-meta);
display: block;
font-size: x-small;
color: #475569;
margin: -12px 7px 0 -5px;
}
/* move message by khoj to left */
.chat-message.khoj {
margin-left: auto;
text-align: left;
}
/* move message by you to right */
.chat-message.you {
margin-right: auto;
text-align: right;
}
/* basic style chat message text */
.chat-message-text {
margin: 10px;
border-radius: 10px;
padding: 10px;
position: relative;
display: inline-block;
max-width: 80%;
text-align: left;
}
/* color chat bubble by khoj blue */
.chat-message-text.khoj {
color: #f8fafc;
background: #017eff;
margin-left: auto;
white-space: pre-line;
}
/* add left protrusion to khoj chat bubble */
.chat-message-text.khoj:after {
content: '';
position: absolute;
bottom: -2px;
left: -7px;
border: 10px solid transparent;
border-top-color: #017eff;
border-bottom: 0;
transform: rotate(-60deg);
}
/* color chat bubble by you dark grey */
.chat-message-text.you {
color: #f8fafc;
background: #475569;
margin-right: auto;
}
/* add right protrusion to you chat bubble */
.chat-message-text.you:after {
content: '';
position: absolute;
top: 91%;
right: -2px;
border: 10px solid transparent;
border-left-color: #475569;
border-right: 0;
margin-top: -10px;
transform: rotate(-60deg)
}
#chat-footer {
padding: 0;
display: grid;
grid-template-columns: minmax(70px, 100%);
grid-column-gap: 10px;
grid-row-gap: 10px;
}
#chat-footer > * {
padding: 15px;
border-radius: 5px;
border: 1px solid #475569;
background: #f9fafc
}
.option:hover {
box-shadow: 0 0 11px #aaa;
}
#chat-input {
font-size: medium;
}
@media (pointer: coarse), (hover: none) {
abbr[title] {
position: relative;
padding-left: 4px; /* space references out to ease tapping */
}
abbr[title]:focus:after {
content: attr(title);
/* position tooltip */
position: absolute;
left: 16px; /* open tooltip to right of ref link, instead of on top of it */
width: auto;
z-index: 1; /* show tooltip above chat messages */
/* style tooltip */
background-color: #aaa;
color: #f8fafc;
border-radius: 2px;
box-shadow: 1px 1px 4px 0 rgba(0, 0, 0, 0.4);
font-size: 14px;
padding: 2px 4px;
}
}
@media only screen and (max-width: 600px) {
body {
grid-template-columns: 1fr;
grid-template-rows: auto minmax(80px, 100%) auto;
}
body > * {
grid-column: 1;
}
#chat-footer {
padding: 0;
margin: 4px;
grid-template-columns: auto;
}
}
@media only screen and (min-width: 600px) {
body {
grid-template-columns: auto min(70vw, 100%) auto;
grid-template-rows: auto minmax(80px, 100%) auto;
}
body > * {
grid-column: 2;
}
}
</style>
</html>

View File

@@ -16,7 +16,7 @@
return `
<a href="${item.entry}" class="image-link">
<img id=${item.score} src="${item.entry}?${Math.random()}"
title="Effective Score: ${item.score}, Meta: ${item.metadata_score}, Image: ${item.image_score}"
title="Effective Score: ${item.score}, Meta: ${item.additional.metadata_score}, Image: ${item.additional.image_score}"
class="image">
</a>`
}
@@ -56,17 +56,33 @@
} else if (type === "ledger") {
return render_ledger(query, data);
} else {
return `<pre id="json">${JSON.stringify(data, null, 2)}</pre>`;
return `<div id="results-plugin">`
+ data.map((item) => `<p>${item.entry}</p>`).join("\n")
+ `</div>`;
}
}
function search(rerank=false) {
query = document.getElementById("query").value;
// Extract required fields for search from form
query = document.getElementById("query").value.trim();
type = document.getElementById("type").value;
console.log(query, type);
results_count = document.getElementById("results-count").value || 6;
console.log(`Query: ${query}, Type: ${type}`);
// Short circuit on empty query
if (query.length === 0)
return;
// If set query field in url query param on rerank
if (rerank)
setQueryFieldInUrl(query);
// Generate Backend API URL to execute Search
url = type === "image"
? `/search?q=${query}&t=${type}&n=6`
: `/search?q=${query}&t=${type}&n=6&r=${rerank}`;
? `/api/search?q=${encodeURIComponent(query)}&t=${type}&n=${results_count}`
: `/api/search?q=${encodeURIComponent(query)}&t=${type}&n=${results_count}&r=${rerank}`;
// Execute Search and Render Results
fetch(url)
.then(response => response.json())
.then(data => {
@@ -78,9 +94,9 @@
});
}
function regenerate() {
function updateIndex() {
type = document.getElementById("type").value;
fetch(`/regenerate?t=${type}`)
fetch(`/api/update?t=${type}`)
.then(response => response.json())
.then(data => {
console.log(data);
@@ -89,7 +105,7 @@
});
}
function incremental_search(event) {
function incrementalSearch(event) {
type = document.getElementById("type").value;
// Search with reranking on 'Enter'
if (event.key === 'Enter') {
@@ -102,35 +118,52 @@
}
function populate_type_dropdown() {
// Populate type dropdown field with enabled search types only
var possible_search_types = ["org", "markdown", "ledger", "music", "image"];
fetch("/config/data")
// Populate type dropdown field with enabled content types only
fetch("/api/config/types")
.then(response => response.json())
.then(data => {
.then(enabled_types => {
document.getElementById("type").innerHTML =
possible_search_types
.filter(type => data["content-type"].hasOwnProperty(type) && data["content-type"][type])
enabled_types
.map(type => `<option value="${type}">${type.slice(0,1).toUpperCase() + type.slice(1)}</option>`)
.join('');
return enabled_types;
})
.then(() => {
// Set type field to search type passed in URL query parameter, if valid
.then(enabled_types => {
// Set type field to content type passed in URL query parameter, if valid
var type_via_url = new URLSearchParams(window.location.search).get("t");
if (type_via_url && possible_search_types.includes(type_via_url))
if (type_via_url && enabled_types.includes(type_via_url))
document.getElementById("type").value = type_via_url;
});
}
function setTypeInQueryParam(type) {
function setTypeFieldInUrl(type) {
var url = new URL(window.location.href);
url.searchParams.set("t", type.value);
window.history.pushState({}, "", url.href);
}
function setCountFieldInUrl(results_count) {
var url = new URL(window.location.href);
url.searchParams.set("n", results_count.value);
window.history.pushState({}, "", url.href);
}
function setQueryFieldInUrl(query) {
var url = new URL(window.location.href);
url.searchParams.set("q", query);
window.history.pushState({}, "", url.href);
}
window.onload = function () {
// Dynamically populate type dropdown based on enabled search types and type passed as URL query parameter
// Dynamically populate type dropdown based on enabled content types and type passed as URL query parameter
populate_type_dropdown();
// Set results count field with value passed in URL query parameters, if any.
var results_count = new URLSearchParams(window.location.search).get("n");
if (results_count)
document.getElementById("results-count").value = results_count;
// Fill query field with value passed in URL query parameters, if any.
var query_via_url = new URLSearchParams(window.location.search).get("q");
if (query_via_url)
@@ -142,15 +175,18 @@
<h1>Khoj</h1>
<!--Add Text Box To Enter Query, Trigger Incremental Search OnChange -->
<input type="text" id="query" onkeyup=incremental_search(event) autofocus="autofocus" placeholder="What is the meaning of life?">
<input type="text" id="query" class="option" onkeyup=incrementalSearch(event) autofocus="autofocus" placeholder="What is the meaning of life?">
<div id="options">
<!--Add Dropdown to Select Query Type -->
<select id="type" onchange="setTypeInQueryParam(this)"></select>
<select id="type" class="option" onchange="setTypeFieldInUrl(this)"></select>
<!--Add Button To Regenerate -->
<button id="regenerate" onclick="regenerate()">Regenerate</button>
</div>
<button id="update" class="option" onclick="updateIndex()">Update</button>
<!--Add Results Count Input To Set Results Count -->
<input type="number" id="results-count" min="1" max="100" value="6" placeholder="results count" onchange="setCountFieldInUrl(this)">
</div>
<!-- Section to Render Results -->
<div id="results"></div>
@@ -161,6 +197,7 @@
body {
display: grid;
grid-template-columns: 1fr;
grid-template-rows: 1fr 1fr 1fr minmax(80px, 100%);
}
body > * {
grid-column: 1;
@@ -170,6 +207,7 @@
body {
display: grid;
grid-template-columns: 1fr min(70vw, 100%) 1fr;
grid-template-rows: 1fr 1fr 1fr minmax(80px, 100%);
padding-top: 60vw;
}
body > * {
@@ -180,8 +218,8 @@
body {
padding: 0px;
margin: 0px;
background: #eee;
color: #888;
background: #f8fafc;
color: #475569;
text-align: center;
font-family: roboto, karma, segoe ui, sans-serif;
font-size: 20px;
@@ -194,24 +232,28 @@
}
h1 {
font-weight: 200;
color: #888;
color: #017eff;
}
#options {
padding: 0;
display: grid;
grid-template-columns: 1fr 1fr;
grid-template-columns: 1fr 1fr minmax(70px, 0.5fr);
}
#options > * {
padding: 15px;
border-radius: 5px;
border: 1px solid #ccc;
border: 1px solid #475569;
background: #f9fafc
}
.option:hover {
box-shadow: 0 0 11px #aaa;
}
#options > select {
margin-right: 5px;
margin-right: 10px;
}
#options > button {
margin-left: 5px;
margin-right: 10px;
}
#query {
@@ -232,14 +274,15 @@
.image {
width: 20vw;
border-radius: 10px;
border: 1px solid #ccc;
border: 1px solid #475569;
}
#json {
white-space: pre-wrap;
}
#results-plugin,
#results-ledger {
white-space: pre-line;
text-align: left;
white-space: pre-line;
}
#results-markdown {
text-align: left;
@@ -260,16 +303,16 @@
padding: 3.5px 3.5px 0;
margin-right: 5px;
border-radius: 5px;
background-color: #ed6f00;
background-color: #eab308;
font-size: medium;
}
span.music-task-status.todo,
span.task-status.todo {
background-color: #048ba8
background-color: #3b82f6
}
span.music-task-status.done,
span.task-status.done {
background-color: #06a77d;
background-color: #22c55e;
}
span.music-task-tag,
span.task-tag {
@@ -277,7 +320,8 @@
padding: 3.5px 3.5px 0;
margin-right: 5px;
border-radius: 5px;
background-color: #bbb;
border: 1px solid #475569;
background-color: #ef4444;
font-size: small;
}
</style>

View File

@@ -11,5 +11,6 @@
],
"theme_color": "#ffffff",
"background_color": "#ffffff",
"display": "standalone"
"display": "standalone",
"start_url": "/"
}

View File

@@ -0,0 +1,16 @@
{
"name": "Khoj Chat",
"short_name": "Khoj Chat",
"description": "A personal assistant for your notes",
"icons": [
{
"src": "/static/assets/icons/favicon-144x144.png",
"sizes": "144x144",
"type": "image/png"
}
],
"theme_color": "#ffffff",
"background_color": "#ffffff",
"display": "standalone",
"start_url": "/chat"
}

View File

@@ -3,89 +3,72 @@ import os
import signal
import sys
import logging
import threading
import warnings
from platform import system
# Ignore non-actionable warnings
warnings.filterwarnings("ignore", message=r"snapshot_download.py has been made private", category=FutureWarning)
warnings.filterwarnings("ignore", message=r"legacy way to download files from the HF hub,", category=FutureWarning)
# External Packages
import uvicorn
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from PyQt6 import QtWidgets
from PyQt6.QtCore import QThread, QTimer
from rich.logging import RichHandler
import schedule
# Internal Packages
from src.configure import configure_server
from src.router import router
from src.utils import constants, state
from src.utils.cli import cli
from src.interface.desktop.main_window import MainWindow
from src.interface.desktop.system_tray import create_system_tray
from khoj.configure import configure_routes, configure_server
from khoj.utils import state
from khoj.utils.cli import cli
from khoj.interface.desktop.main_window import MainWindow
from khoj.interface.desktop.system_tray import create_system_tray
# Initialize the Application Server
app = FastAPI()
app.mount("/static", StaticFiles(directory=constants.web_directory), name="static")
app.include_router(router)
logger = logging.getLogger('src')
# Setup Logger
rich_handler = RichHandler(rich_tracebacks=True)
rich_handler.setFormatter(fmt=logging.Formatter(fmt="%(message)s", datefmt="[%X]"))
logging.basicConfig(handlers=[rich_handler])
class CustomFormatter(logging.Formatter):
blue = "\x1b[1;34m"
green = "\x1b[1;32m"
grey = "\x1b[38;20m"
yellow = "\x1b[33;20m"
red = "\x1b[31;20m"
bold_red = "\x1b[31;1m"
reset = "\x1b[0m"
format = "%(levelname)s: %(asctime)s: %(name)s | %(message)s"
FORMATS = {
logging.DEBUG: blue + format + reset,
logging.INFO: green + format + reset,
logging.WARNING: yellow + format + reset,
logging.ERROR: red + format + reset,
logging.CRITICAL: bold_red + format + reset
}
def format(self, record):
log_fmt = self.FORMATS.get(record.levelno)
formatter = logging.Formatter(log_fmt)
return formatter.format(record)
logger = logging.getLogger("khoj")
def run():
# Turn Tokenizers Parallelism Off. App does not support it.
os.environ["TOKENIZERS_PARALLELISM"] = 'false'
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Load config from CLI
state.cli_args = sys.argv[1:]
args = cli(state.cli_args)
set_state(args)
# Setup Logger
# Create app directory, if it doesn't exist
state.config_file.parent.mkdir(parents=True, exist_ok=True)
# Set Logging Level
if args.verbose == 0:
logger.setLevel(logging.WARN)
elif args.verbose == 1:
logger.setLevel(logging.INFO)
elif args.verbose >= 2:
elif args.verbose >= 1:
logger.setLevel(logging.DEBUG)
# Set Log Format
ch = logging.StreamHandler()
ch.setFormatter(CustomFormatter())
logger.addHandler(ch)
# Set Log File
fh = logging.FileHandler(state.config_file.parent / 'khoj.log')
fh = logging.FileHandler(state.config_file.parent / "khoj.log", encoding="utf-8")
fh.setLevel(logging.DEBUG)
logger.addHandler(fh)
logger.info("Starting Khoj...")
logger.info("🌘 Starting Khoj")
if args.no_gui:
# Setup task scheduler
poll_task_scheduler()
# Start Server
configure_server(args, required=True)
configure_server(args, required=False)
configure_routes(app)
start_server(app, host=args.host, port=args.port, socket=args.socket)
else:
# Setup GUI
@@ -96,34 +79,36 @@ def run():
# On Linux (Gnome) the System tray is not supported.
# Since only the Main Window is available
# Quitting it should quit the application
if system() in ['Windows', 'Darwin']:
if system() in ["Windows", "Darwin"]:
gui.setQuitOnLastWindowClosed(False)
tray = create_system_tray(gui, main_window)
tray.show()
# Setup Server
configure_server(args, required=False)
configure_routes(app)
server = ServerThread(app, args.host, args.port, args.socket)
# Show Main Window on First Run Experience or if on Linux
if args.config is None or system() not in ['Windows', 'Darwin']:
if args.config is None or system() not in ["Windows", "Darwin"]:
main_window.show()
# Setup Signal Handlers
signal.signal(signal.SIGINT, sigint_handler)
# Invoke python Interpreter every 500ms to handle signals
# Invoke Python interpreter every 500ms to handle signals, run scheduled tasks
timer = QTimer()
timer.start(500)
timer.timeout.connect(lambda: None)
timer.timeout.connect(schedule.run_pending)
# Start Application
server.start()
gui.aboutToQuit.connect(server.terminate)
# Close Splash Screen if still open
if system() != 'Darwin':
if system() != "Darwin":
try:
import pyi_splash
# Update the text on the splash screen
pyi_splash.update_text("Khoj setup complete")
# Close Splash Screen
@@ -135,7 +120,6 @@ def run():
def sigint_handler(*args):
print("\nShutting down Khoj...")
QtWidgets.QApplication.quit()
@@ -148,10 +132,19 @@ def set_state(args):
def start_server(app, host=None, port=None, socket=None):
logger.info("🌖 Khoj is ready to use")
if socket:
uvicorn.run(app, proxy_headers=True, uds=socket)
uvicorn.run(app, proxy_headers=True, uds=socket, log_level="debug", use_colors=True, log_config=None)
else:
uvicorn.run(app, host=host, port=port)
uvicorn.run(app, host=host, port=port, log_level="debug", use_colors=True, log_config=None)
logger.info("🌒 Stopping Khoj")
def poll_task_scheduler():
timer_thread = threading.Timer(60.0, poll_task_scheduler)
timer_thread.daemon = True
timer_thread.start()
schedule.run_pending()
class ServerThread(QThread):
@@ -169,5 +162,5 @@ class ServerThread(QThread):
start_server(self.app, self.host, self.port, self.socket)
if __name__ == '__main__':
if __name__ == "__main__":
run()

View File

@@ -0,0 +1,289 @@
# Standard Packages
import json
import logging
from datetime import datetime
# Internal Packages
from khoj.utils.constants import empty_escape_sequences
from khoj.processor.conversation.utils import (
chat_completion_with_backoff,
completion_with_backoff,
message_to_prompt,
generate_chatml_messages_with_context,
)
logger = logging.getLogger(__name__)
def answer(text, user_query, model, api_key=None, temperature=0.5, max_tokens=500):
"""
Answer user query using provided text as reference with OpenAI's GPT
"""
# Setup Prompt based on Summary Type
prompt = f"""
You are a friendly, helpful personal assistant.
Using the users notes below, answer their following question. If the answer is not contained within the notes, say "I don't know."
Notes:
{text}
Question: {user_query}
Answer (in second person):"""
# Get Response from GPT
logger.debug(f"Prompt for GPT: {prompt}")
response = completion_with_backoff(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
stop='"""',
api_key=api_key,
)
# Extract, Clean Message from GPT's Response
story = response["choices"][0]["text"]
return str(story).replace("\n\n", "")
def summarize(text, summary_type, model, user_query=None, api_key=None, temperature=0.5, max_tokens=200):
"""
Summarize user input using OpenAI's GPT
"""
# Setup Prompt based on Summary Type
if summary_type == "chat":
prompt = f"""
You are an AI. Summarize the conversation below from your perspective:
{text}
Summarize the conversation from the AI's first-person perspective:"""
elif summary_type == "notes":
prompt = f"""
Summarize the below notes about {user_query}:
{text}
Summarize the notes in second person perspective:"""
# Get Response from GPT
logger.debug(f"Prompt for GPT: {prompt}")
response = completion_with_backoff(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
frequency_penalty=0.2,
stop='"""',
api_key=api_key,
)
# Extract, Clean Message from GPT's Response
story = response["choices"][0]["text"]
return str(story).replace("\n\n", "")
def extract_questions(text, model="text-davinci-003", conversation_log={}, api_key=None, temperature=0, max_tokens=100):
"""
Infer search queries to retrieve relevant notes to answer user query
"""
# Extract Past User Message and Inferred Questions from Conversation Log
chat_history = "".join(
[
f'Q: {chat["intent"]["query"]}\n\n{chat["intent"].get("inferred-queries") or list([chat["intent"]["query"]])}\n\n{chat["message"]}\n\n'
for chat in conversation_log.get("chat", [])[-4:]
if chat["by"] == "khoj"
]
)
# Get dates relative to today for prompt creation
today = datetime.today()
current_new_year = today.replace(month=1, day=1)
last_new_year = current_new_year.replace(year=today.year - 1)
prompt = f"""
You are Khoj, an extremely smart and helpful search assistant with the ability to retrieve information from the users notes.
- The user will provide their questions and answers to you for context.
- Add as much context from the previous questions and answers as required into your search queries.
- Break messages into multiple search queries when required to retrieve the relevant information.
- Add date filters to your search queries from questions and answers when required to retrieve the relevant information.
What searches, if any, will you need to perform to answer the users question?
Provide search queries as a JSON list of strings
Current Date: {today.strftime("%A, %Y-%m-%d")}
Q: How was my trip to Cambodia?
["How was my trip to Cambodia?"]
A: The trip was amazing. I went to the Angkor Wat temple and it was beautiful.
Q: Who did i visit that temple with?
["Who did I visit the Angkor Wat Temple in Cambodia with?"]
A: You visited the Angkor Wat Temple in Cambodia with Pablo, Namita and Xi.
Q: What national parks did I go to last year?
["National park I visited in {last_new_year.strftime("%Y")} dt>=\\"{last_new_year.strftime("%Y-%m-%d")}\\" dt<\\"{current_new_year.strftime("%Y-%m-%d")}\\""]
A: You visited the Grand Canyon and Yellowstone National Park in {last_new_year.strftime("%Y")}.
Q: How are you feeling today?
[]
A: I'm feeling a little bored. Helping you will hopefully make me feel better!
Q: How many tennis balls fit in the back of a 2002 Honda Civic?
["What is the size of a tennis ball?", "What is the trunk size of a 2002 Honda Civic?"]
A: 1085 tennis balls will fit in the trunk of a Honda Civic
Q: Is Bob older than Tom?
["When was Bob born?", "What is Tom's age?"]
A: Yes, Bob is older than Tom. As Bob was born on 1984-01-01 and Tom is 30 years old.
Q: What is their age difference?
["What is Bob's age?", "What is Tom's age?"]
A: Bob is {current_new_year.year - 1984 - 30} years older than Tom. As Bob is {current_new_year.year - 1984} years old and Tom is 30 years old.
{chat_history}
Q: {text}
"""
# Get Response from GPT
response = completion_with_backoff(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
stop=["A: ", "\n"],
api_key=api_key,
)
# Extract, Clean Message from GPT's Response
response_text = response["choices"][0]["text"]
try:
questions = json.loads(
# Clean response to increase likelihood of valid JSON. E.g replace ' with " to enclose strings
response_text.strip(empty_escape_sequences)
.replace("['", '["')
.replace("']", '"]')
.replace("', '", '", "')
)
except json.decoder.JSONDecodeError:
logger.warn(f"GPT returned invalid JSON. Falling back to using user message as search query.\n{response_text}")
questions = [text]
logger.debug(f"Extracted Questions by GPT: {questions}")
return questions
def extract_search_type(text, model, api_key=None, temperature=0.5, max_tokens=100, verbose=0):
"""
Extract search type from user query using OpenAI's GPT
"""
# Initialize Variables
understand_primer = """
Objective: Extract search type from user query and return information as JSON
Allowed search types are listed below:
- search-type=["notes","ledger","image","music"]
Some examples are given below for reference:
Q:What fiction book was I reading last week about AI starship?
A:{ "search-type": "notes" }
Q:Play some calm classical music?
A:{ "search-type": "music" }
Q:How much did I spend at Subway for dinner last time?
A:{ "search-type": "ledger" }
Q:What was that popular Sri lankan song that Alex had mentioned?
A:{ "search-type": "music" }
Q:Can you recommend a movie to watch from my notes?
A:{ "search-type": "notes" }
Q: When did I buy Groceries last?
A:{ "search-type": "ledger" }
Q:When did I go surfing last?
A:{ "search-type": "notes" }"""
# Setup Prompt with Understand Primer
prompt = message_to_prompt(text, understand_primer, start_sequence="\nA:", restart_sequence="\nQ:")
if verbose > 1:
print(f"Message -> Prompt: {text} -> {prompt}")
# Get Response from GPT
logger.debug(f"Prompt for GPT: {prompt}")
response = completion_with_backoff(
prompt=prompt,
model=model,
temperature=temperature,
max_tokens=max_tokens,
frequency_penalty=0.2,
stop=["\n"],
api_key=api_key,
)
# Extract, Clean Message from GPT's Response
story = str(response["choices"][0]["text"])
return json.loads(story.strip(empty_escape_sequences))
def converse(references, user_query, conversation_log={}, model="gpt-3.5-turbo", api_key=None, temperature=0.2):
"""
Converse with user using OpenAI's ChatGPT
"""
# Initialize Variables
compiled_references = "\n\n".join({f"# {item}" for item in references})
personality_primer = "You are Khoj, a friendly, smart and helpful personal assistant."
conversation_primers = {
"general": f"""
Using your general knowledge and our past conversations as context, answer the following question.
Current Date: {datetime.now().strftime("%Y-%m-%d")}
Question: {user_query}
""".strip(),
"notes": f"""
Using the notes and our past conversations as context, answer the following question.
Current Date: {datetime.now().strftime("%Y-%m-%d")}
Notes:
{compiled_references}
Question: {user_query}
""".strip(),
}
# Get Conversation Primer appropriate to Conversation Type
conversation_type = "general" if user_query.startswith("@general") or compiled_references.strip() == "" else "notes"
logger.debug(f"Conversation Type: {conversation_type}")
conversation_primer = conversation_primers.get(conversation_type)
# Setup Prompt with Primer or Conversation History
messages = generate_chatml_messages_with_context(
conversation_primer,
personality_primer,
conversation_log,
model,
)
# Get Response from GPT
logger.debug(f"Conversation Context for GPT: {messages}")
response = chat_completion_with_backoff(
messages=messages,
model=model,
temperature=temperature,
api_key=api_key,
)
# Extract, Clean Message from GPT's Response
story = str(response["choices"][0]["message"]["content"])
return story.strip(empty_escape_sequences)

View File

@@ -0,0 +1,132 @@
# Standard Packages
import os
import logging
from datetime import datetime
# External Packages
import openai
import tiktoken
from tenacity import (
before_sleep_log,
retry,
retry_if_exception_type,
stop_after_attempt,
wait_exponential,
wait_random_exponential,
)
# Internal Packages
from khoj.utils.helpers import merge_dicts
logger = logging.getLogger(__name__)
max_prompt_size = {"gpt-3.5-turbo": 4096, "gpt-4": 8192}
@retry(
retry=(
retry_if_exception_type(openai.error.Timeout)
| retry_if_exception_type(openai.error.APIError)
| retry_if_exception_type(openai.error.APIConnectionError)
| retry_if_exception_type(openai.error.RateLimitError)
| retry_if_exception_type(openai.error.ServiceUnavailableError)
),
wait=wait_random_exponential(min=1, max=30),
stop=stop_after_attempt(6),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
def completion_with_backoff(**kwargs):
openai.api_key = kwargs["api_key"] if kwargs.get("api_key") else os.getenv("OPENAI_API_KEY")
return openai.Completion.create(**kwargs, request_timeout=60)
@retry(
retry=(
retry_if_exception_type(openai.error.Timeout)
| retry_if_exception_type(openai.error.APIError)
| retry_if_exception_type(openai.error.APIConnectionError)
| retry_if_exception_type(openai.error.RateLimitError)
| retry_if_exception_type(openai.error.ServiceUnavailableError)
),
wait=wait_exponential(multiplier=1, min=4, max=10),
stop=stop_after_attempt(6),
before_sleep=before_sleep_log(logger, logging.DEBUG),
reraise=True,
)
def chat_completion_with_backoff(**kwargs):
openai.api_key = kwargs["api_key"] if kwargs.get("api_key") else os.getenv("OPENAI_API_KEY")
return openai.ChatCompletion.create(**kwargs, request_timeout=60)
def generate_chatml_messages_with_context(
user_message, system_message, conversation_log={}, model_name="gpt-3.5-turbo", lookback_turns=2
):
"""Generate messages for ChatGPT with context from previous conversation"""
# Extract Chat History for Context
chat_logs = [f'{chat["message"]}\n\nNotes:\n{chat.get("context","")}' for chat in conversation_log.get("chat", [])]
rest_backnforths = []
# Extract in reverse chronological order
for user_msg, assistant_msg in zip(chat_logs[-2::-2], chat_logs[::-2]):
if len(rest_backnforths) >= 2 * lookback_turns:
break
rest_backnforths += reciprocal_conversation_to_chatml([user_msg, assistant_msg])[::-1]
# Format user and system messages to chatml format
system_chatml_message = [message_to_chatml(system_message, "system")]
user_chatml_message = [message_to_chatml(user_message, "user")]
messages = user_chatml_message + rest_backnforths[:2] + system_chatml_message + rest_backnforths[2:]
# Truncate oldest messages from conversation history until under max supported prompt size by model
encoder = tiktoken.encoding_for_model(model_name)
tokens = sum([len(encoder.encode(value)) for message in messages for value in message.values()])
while tokens > max_prompt_size[model_name]:
messages.pop()
tokens = sum([len(encoder.encode(value)) for message in messages for value in message.values()])
# Return message in chronological order
return messages[::-1]
def reciprocal_conversation_to_chatml(message_pair):
"""Convert a single back and forth between user and assistant to chatml format"""
return [message_to_chatml(message, role) for message, role in zip(message_pair, ["user", "assistant"])]
def message_to_chatml(message, role="assistant"):
"""Create chatml message from message and role"""
return {"role": role, "content": message}
def message_to_prompt(
user_message, conversation_history="", gpt_message=None, start_sequence="\nAI:", restart_sequence="\nHuman:"
):
"""Create prompt for GPT from messages and conversation history"""
gpt_message = f" {gpt_message}" if gpt_message else ""
return f"{conversation_history}{restart_sequence} {user_message}{start_sequence}{gpt_message}"
def message_to_log(user_message, gpt_message, user_message_metadata={}, khoj_message_metadata={}, conversation_log=[]):
"""Create json logs from messages, metadata for conversation log"""
default_khoj_message_metadata = {
"intent": {"type": "remember", "memory-type": "notes", "query": user_message},
"trigger-emotion": "calm",
}
khoj_response_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Create json log from Human's message
human_log = merge_dicts({"message": user_message, "by": "you"}, user_message_metadata)
# Create json log from GPT's response
khoj_log = merge_dicts(khoj_message_metadata, default_khoj_message_metadata)
khoj_log = merge_dicts({"message": gpt_message, "by": "khoj", "created": khoj_response_time}, khoj_log)
conversation_log.extend([human_log, khoj_log])
return conversation_log
def extract_summaries(metadata):
"""Extract summaries from metadata"""
return "".join([f'\n{session["summary"]}' for session in metadata])

View File

@@ -0,0 +1,100 @@
# Standard Packages
import glob
import logging
from pathlib import Path
from typing import List
# Internal Packages
from khoj.processor.text_to_jsonl import TextToJsonl
from khoj.utils.helpers import get_absolute_path, timer
from khoj.utils.jsonl import load_jsonl, dump_jsonl, compress_jsonl_data
from khoj.utils.rawconfig import Entry
logger = logging.getLogger(__name__)
class JsonlToJsonl(TextToJsonl):
# Define Functions
def process(self, previous_entries=None):
# Extract required fields from config
input_jsonl_files, input_jsonl_filter, output_file = (
self.config.input_files,
self.config.input_filter,
self.config.compressed_jsonl,
)
# Get Jsonl Input Files to Process
all_input_jsonl_files = JsonlToJsonl.get_jsonl_files(input_jsonl_files, input_jsonl_filter)
# Extract Entries from specified jsonl files
with timer("Parse entries from jsonl files", logger):
input_jsons = JsonlToJsonl.extract_jsonl_entries(all_input_jsonl_files)
current_entries = list(map(Entry.from_dict, input_jsons))
# Split entries by max tokens supported by model
with timer("Split entries by max token size supported by model", logger):
current_entries = self.split_entries_by_max_tokens(current_entries, max_tokens=256)
# Identify, mark and merge any new entries with previous entries
with timer("Identify new or updated entries", logger):
if not previous_entries:
entries_with_ids = list(enumerate(current_entries))
else:
entries_with_ids = self.mark_entries_for_update(
current_entries,
previous_entries,
key="compiled",
logger=logger,
)
with timer("Write entries to JSONL file", logger):
# Process Each Entry from All Notes Files
entries = list(map(lambda entry: entry[1], entries_with_ids))
jsonl_data = JsonlToJsonl.convert_entries_to_jsonl(entries)
# Compress JSONL formatted Data
if output_file.suffix == ".gz":
compress_jsonl_data(jsonl_data, output_file)
elif output_file.suffix == ".jsonl":
dump_jsonl(jsonl_data, output_file)
return entries_with_ids
@staticmethod
def get_jsonl_files(jsonl_files=None, jsonl_file_filters=None):
"Get all jsonl files to process"
absolute_jsonl_files, filtered_jsonl_files = set(), set()
if jsonl_files:
absolute_jsonl_files = {get_absolute_path(jsonl_file) for jsonl_file in jsonl_files}
if jsonl_file_filters:
filtered_jsonl_files = {
filtered_file
for jsonl_file_filter in jsonl_file_filters
for filtered_file in glob.glob(get_absolute_path(jsonl_file_filter), recursive=True)
}
all_jsonl_files = sorted(absolute_jsonl_files | filtered_jsonl_files)
files_with_non_jsonl_extensions = {
jsonl_file for jsonl_file in all_jsonl_files if not jsonl_file.endswith(".jsonl")
}
if any(files_with_non_jsonl_extensions):
print(f"[Warning] There maybe non jsonl files in the input set: {files_with_non_jsonl_extensions}")
logger.debug(f"Processing files: {all_jsonl_files}")
return all_jsonl_files
@staticmethod
def extract_jsonl_entries(jsonl_files):
"Extract entries from specified jsonl files"
entries = []
for jsonl_file in jsonl_files:
entries.extend(load_jsonl(Path(jsonl_file)))
return entries
@staticmethod
def convert_entries_to_jsonl(entries: List[Entry]):
"Convert each entry to JSON and collate as JSONL"
return "".join([f"{entry.to_json()}\n" for entry in entries])

View File

@@ -0,0 +1,133 @@
# Standard Packages
import glob
import re
import logging
from typing import List
# Internal Packages
from khoj.processor.text_to_jsonl import TextToJsonl
from khoj.utils.helpers import get_absolute_path, is_none_or_empty, timer
from khoj.utils.constants import empty_escape_sequences
from khoj.utils.jsonl import dump_jsonl, compress_jsonl_data
from khoj.utils.rawconfig import Entry
logger = logging.getLogger(__name__)
class BeancountToJsonl(TextToJsonl):
# Define Functions
def process(self, previous_entries=None):
# Extract required fields from config
beancount_files, beancount_file_filter, output_file = (
self.config.input_files,
self.config.input_filter,
self.config.compressed_jsonl,
)
# Input Validation
if is_none_or_empty(beancount_files) and is_none_or_empty(beancount_file_filter):
print("At least one of beancount-files or beancount-file-filter is required to be specified")
exit(1)
# Get Beancount Files to Process
beancount_files = BeancountToJsonl.get_beancount_files(beancount_files, beancount_file_filter)
# Extract Entries from specified Beancount files
with timer("Parse transactions from Beancount files into dictionaries", logger):
current_entries = BeancountToJsonl.convert_transactions_to_maps(
*BeancountToJsonl.extract_beancount_transactions(beancount_files)
)
# Split entries by max tokens supported by model
with timer("Split entries by max token size supported by model", logger):
current_entries = self.split_entries_by_max_tokens(current_entries, max_tokens=256)
# Identify, mark and merge any new entries with previous entries
with timer("Identify new or updated transaction", logger):
if not previous_entries:
entries_with_ids = list(enumerate(current_entries))
else:
entries_with_ids = self.mark_entries_for_update(
current_entries, previous_entries, key="compiled", logger=logger
)
with timer("Write transactions to JSONL file", logger):
# Process Each Entry from All Notes Files
entries = list(map(lambda entry: entry[1], entries_with_ids))
jsonl_data = BeancountToJsonl.convert_transaction_maps_to_jsonl(entries)
# Compress JSONL formatted Data
if output_file.suffix == ".gz":
compress_jsonl_data(jsonl_data, output_file)
elif output_file.suffix == ".jsonl":
dump_jsonl(jsonl_data, output_file)
return entries_with_ids
@staticmethod
def get_beancount_files(beancount_files=None, beancount_file_filters=None):
"Get Beancount files to process"
absolute_beancount_files, filtered_beancount_files = set(), set()
if beancount_files:
absolute_beancount_files = {get_absolute_path(beancount_file) for beancount_file in beancount_files}
if beancount_file_filters:
filtered_beancount_files = {
filtered_file
for beancount_file_filter in beancount_file_filters
for filtered_file in glob.glob(get_absolute_path(beancount_file_filter), recursive=True)
}
all_beancount_files = sorted(absolute_beancount_files | filtered_beancount_files)
files_with_non_beancount_extensions = {
beancount_file
for beancount_file in all_beancount_files
if not beancount_file.endswith(".bean") and not beancount_file.endswith(".beancount")
}
if any(files_with_non_beancount_extensions):
print(f"[Warning] There maybe non beancount files in the input set: {files_with_non_beancount_extensions}")
logger.debug(f"Processing files: {all_beancount_files}")
return all_beancount_files
@staticmethod
def extract_beancount_transactions(beancount_files):
"Extract entries from specified Beancount files"
# Initialize Regex for extracting Beancount Entries
transaction_regex = r"^\n?\d{4}-\d{2}-\d{2} [\*|\!] "
empty_newline = f"^[\n\r\t\ ]*$"
entries = []
transaction_to_file_map = []
for beancount_file in beancount_files:
with open(beancount_file) as f:
ledger_content = f.read()
transactions_per_file = [
entry.strip(empty_escape_sequences)
for entry in re.split(empty_newline, ledger_content, flags=re.MULTILINE)
if re.match(transaction_regex, entry)
]
transaction_to_file_map += zip(transactions_per_file, [beancount_file] * len(transactions_per_file))
entries.extend(transactions_per_file)
return entries, dict(transaction_to_file_map)
@staticmethod
def convert_transactions_to_maps(parsed_entries: List[str], transaction_to_file_map) -> List[Entry]:
"Convert each parsed Beancount transaction into a Entry"
entries = []
for parsed_entry in parsed_entries:
entries.append(
Entry(compiled=parsed_entry, raw=parsed_entry, file=f"{transaction_to_file_map[parsed_entry]}")
)
logger.debug(f"Converted {len(parsed_entries)} transactions to dictionaries")
return entries
@staticmethod
def convert_transaction_maps_to_jsonl(entries: List[Entry]) -> str:
"Convert each Beancount transaction entry to JSON and collate as JSONL"
return "".join([f"{entry.to_json()}\n" for entry in entries])

View File

@@ -0,0 +1,152 @@
# Standard Packages
import glob
import logging
import re
from pathlib import Path
from typing import List
# Internal Packages
from khoj.processor.text_to_jsonl import TextToJsonl
from khoj.utils.helpers import get_absolute_path, is_none_or_empty, timer
from khoj.utils.constants import empty_escape_sequences
from khoj.utils.jsonl import dump_jsonl, compress_jsonl_data
from khoj.utils.rawconfig import Entry
logger = logging.getLogger(__name__)
class MarkdownToJsonl(TextToJsonl):
# Define Functions
def process(self, previous_entries=None):
# Extract required fields from config
markdown_files, markdown_file_filter, output_file = (
self.config.input_files,
self.config.input_filter,
self.config.compressed_jsonl,
)
# Input Validation
if is_none_or_empty(markdown_files) and is_none_or_empty(markdown_file_filter):
print("At least one of markdown-files or markdown-file-filter is required to be specified")
exit(1)
# Get Markdown Files to Process
markdown_files = MarkdownToJsonl.get_markdown_files(markdown_files, markdown_file_filter)
# Extract Entries from specified Markdown files
with timer("Parse entries from Markdown files into dictionaries", logger):
current_entries = MarkdownToJsonl.convert_markdown_entries_to_maps(
*MarkdownToJsonl.extract_markdown_entries(markdown_files)
)
# Split entries by max tokens supported by model
with timer("Split entries by max token size supported by model", logger):
current_entries = self.split_entries_by_max_tokens(current_entries, max_tokens=256)
# Identify, mark and merge any new entries with previous entries
with timer("Identify new or updated entries", logger):
if not previous_entries:
entries_with_ids = list(enumerate(current_entries))
else:
entries_with_ids = self.mark_entries_for_update(
current_entries, previous_entries, key="compiled", logger=logger
)
with timer("Write markdown entries to JSONL file", logger):
# Process Each Entry from All Notes Files
entries = list(map(lambda entry: entry[1], entries_with_ids))
jsonl_data = MarkdownToJsonl.convert_markdown_maps_to_jsonl(entries)
# Compress JSONL formatted Data
if output_file.suffix == ".gz":
compress_jsonl_data(jsonl_data, output_file)
elif output_file.suffix == ".jsonl":
dump_jsonl(jsonl_data, output_file)
return entries_with_ids
@staticmethod
def get_markdown_files(markdown_files=None, markdown_file_filters=None):
"Get Markdown files to process"
absolute_markdown_files, filtered_markdown_files = set(), set()
if markdown_files:
absolute_markdown_files = {get_absolute_path(markdown_file) for markdown_file in markdown_files}
if markdown_file_filters:
filtered_markdown_files = {
filtered_file
for markdown_file_filter in markdown_file_filters
for filtered_file in glob.glob(get_absolute_path(markdown_file_filter), recursive=True)
}
all_markdown_files = sorted(absolute_markdown_files | filtered_markdown_files)
files_with_non_markdown_extensions = {
md_file
for md_file in all_markdown_files
if not md_file.endswith(".md") and not md_file.endswith(".markdown")
}
if any(files_with_non_markdown_extensions):
logger.warn(
f"[Warning] There maybe non markdown-mode files in the input set: {files_with_non_markdown_extensions}"
)
logger.debug(f"Processing files: {all_markdown_files}")
return all_markdown_files
@staticmethod
def extract_markdown_entries(markdown_files):
"Extract entries by heading from specified Markdown files"
# Regex to extract Markdown Entries by Heading
markdown_heading_regex = r"^#"
entries = []
entry_to_file_map = []
for markdown_file in markdown_files:
with open(markdown_file, "r", encoding="utf8") as f:
markdown_content = f.read()
markdown_entries_per_file = []
any_headings = re.search(markdown_heading_regex, markdown_content, flags=re.MULTILINE)
for entry in re.split(markdown_heading_regex, markdown_content, flags=re.MULTILINE):
# Add heading level as the regex split removed it from entries with headings
prefix = "#" if entry.startswith("#") else "# " if any_headings else ""
stripped_entry = entry.strip(empty_escape_sequences)
if stripped_entry != "":
markdown_entries_per_file.append(f"{prefix}{stripped_entry}")
entry_to_file_map += zip(markdown_entries_per_file, [markdown_file] * len(markdown_entries_per_file))
entries.extend(markdown_entries_per_file)
return entries, dict(entry_to_file_map)
@staticmethod
def convert_markdown_entries_to_maps(parsed_entries: List[str], entry_to_file_map) -> List[Entry]:
"Convert each Markdown entries into a dictionary"
entries = []
for parsed_entry in parsed_entries:
entry_filename = Path(entry_to_file_map[parsed_entry])
heading = parsed_entry.splitlines()[0] if re.search("^#+\s", parsed_entry) else ""
# Append base filename to compiled entry for context to model
# Increment heading level for heading entries and make filename as its top level heading
prefix = f"# {entry_filename.stem}\n#" if heading else f"# {entry_filename.stem}\n"
compiled_entry = f"{prefix}{parsed_entry}"
entries.append(
Entry(
compiled=compiled_entry,
raw=parsed_entry,
heading=f"{prefix}{heading}",
file=f"{entry_filename}",
)
)
logger.debug(f"Converted {len(parsed_entries)} markdown entries to dictionaries")
return entries
@staticmethod
def convert_markdown_maps_to_jsonl(entries: List[Entry]):
"Convert each Markdown entry to JSON and collate as JSONL"
return "".join([f"{entry.to_json()}\n" for entry in entries])

View File

@@ -0,0 +1,160 @@
# Standard Packages
import glob
import logging
from pathlib import Path
from typing import Iterable, List
# Internal Packages
from khoj.processor.org_mode import orgnode
from khoj.processor.text_to_jsonl import TextToJsonl
from khoj.utils.helpers import get_absolute_path, is_none_or_empty, timer
from khoj.utils.jsonl import dump_jsonl, compress_jsonl_data
from khoj.utils.rawconfig import Entry
from khoj.utils import state
logger = logging.getLogger(__name__)
class OrgToJsonl(TextToJsonl):
# Define Functions
def process(self, previous_entries: List[Entry] = None):
# Extract required fields from config
org_files, org_file_filter, output_file = (
self.config.input_files,
self.config.input_filter,
self.config.compressed_jsonl,
)
index_heading_entries = self.config.index_heading_entries
# Input Validation
if is_none_or_empty(org_files) and is_none_or_empty(org_file_filter):
print("At least one of org-files or org-file-filter is required to be specified")
exit(1)
# Get Org Files to Process
with timer("Get org files to process", logger):
org_files = OrgToJsonl.get_org_files(org_files, org_file_filter)
# Extract Entries from specified Org files
with timer("Parse entries from org files into OrgNode objects", logger):
entry_nodes, file_to_entries = self.extract_org_entries(org_files)
with timer("Convert OrgNodes into list of entries", logger):
current_entries = self.convert_org_nodes_to_entries(entry_nodes, file_to_entries, index_heading_entries)
with timer("Split entries by max token size supported by model", logger):
current_entries = self.split_entries_by_max_tokens(current_entries, max_tokens=256)
# Identify, mark and merge any new entries with previous entries
if not previous_entries:
entries_with_ids = list(enumerate(current_entries))
else:
entries_with_ids = self.mark_entries_for_update(
current_entries, previous_entries, key="compiled", logger=logger
)
# Process Each Entry from All Notes Files
with timer("Write org entries to JSONL file", logger):
entries = map(lambda entry: entry[1], entries_with_ids)
jsonl_data = self.convert_org_entries_to_jsonl(entries)
# Compress JSONL formatted Data
if output_file.suffix == ".gz":
compress_jsonl_data(jsonl_data, output_file)
elif output_file.suffix == ".jsonl":
dump_jsonl(jsonl_data, output_file)
return entries_with_ids
@staticmethod
def get_org_files(org_files=None, org_file_filters=None):
"Get Org files to process"
absolute_org_files, filtered_org_files = set(), set()
if org_files:
absolute_org_files = {get_absolute_path(org_file) for org_file in org_files}
if org_file_filters:
filtered_org_files = {
filtered_file
for org_file_filter in org_file_filters
for filtered_file in glob.glob(get_absolute_path(org_file_filter), recursive=True)
}
all_org_files = sorted(absolute_org_files | filtered_org_files)
files_with_non_org_extensions = {org_file for org_file in all_org_files if not org_file.endswith(".org")}
if any(files_with_non_org_extensions):
logger.warn(f"There maybe non org-mode files in the input set: {files_with_non_org_extensions}")
logger.debug(f"Processing files: {all_org_files}")
return all_org_files
@staticmethod
def extract_org_entries(org_files):
"Extract entries from specified Org files"
entries = []
entry_to_file_map = []
for org_file in org_files:
org_file_entries = orgnode.makelist(str(org_file))
entry_to_file_map += zip(org_file_entries, [org_file] * len(org_file_entries))
entries.extend(org_file_entries)
return entries, dict(entry_to_file_map)
@staticmethod
def convert_org_nodes_to_entries(
parsed_entries: List[orgnode.Orgnode], entry_to_file_map, index_heading_entries=False
) -> List[Entry]:
"Convert Org-Mode nodes into list of Entry objects"
entries: List[Entry] = []
for parsed_entry in parsed_entries:
if not parsed_entry.hasBody and not index_heading_entries:
# Ignore title notes i.e notes with just headings and empty body
continue
# Prepend filename as top heading to entry
filename = Path(entry_to_file_map[parsed_entry]).stem
heading = f"* {filename}\n** {parsed_entry.heading}." if parsed_entry.heading else f"* {filename}."
compiled = heading
if state.verbose > 2:
logger.debug(f"Title: {parsed_entry.heading}")
if parsed_entry.tags:
tags_str = " ".join(parsed_entry.tags)
compiled += f"\t {tags_str}."
if state.verbose > 2:
logger.debug(f"Tags: {tags_str}")
if parsed_entry.closed:
compiled += f'\n Closed on {parsed_entry.closed.strftime("%Y-%m-%d")}.'
if state.verbose > 2:
logger.debug(f'Closed: {parsed_entry.closed.strftime("%Y-%m-%d")}')
if parsed_entry.scheduled:
compiled += f'\n Scheduled for {parsed_entry.scheduled.strftime("%Y-%m-%d")}.'
if state.verbose > 2:
logger.debug(f'Scheduled: {parsed_entry.scheduled.strftime("%Y-%m-%d")}')
if parsed_entry.hasBody:
compiled += f"\n {parsed_entry.body}"
if state.verbose > 2:
logger.debug(f"Body: {parsed_entry.body}")
if compiled:
entries.append(
Entry(
compiled=compiled,
raw=f"{parsed_entry}",
heading=f"{heading}",
file=f"{entry_to_file_map[parsed_entry]}",
)
)
return entries
@staticmethod
def convert_org_entries_to_jsonl(entries: Iterable[Entry]) -> str:
"Convert each Org-Mode entry to JSON and collate as JSONL"
return "".join([f"{entry_dict.to_json()}\n" for entry_dict in entries])

View File

@@ -0,0 +1,492 @@
# Copyright (c) 2010 Charles Cave
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use, copy,
# modify, merge, publish, distribute, sublicense, and/or sell copies
# of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# Program written by Charles Cave (charlesweb@optusnet.com.au)
# February - March 2009
# Version 2 - June 2009
# Added support for all tags, TODO priority and checking existence of a tag
# More information at
# http://members.optusnet.com.au/~charles57/GTD
"""
The Orgnode module consists of the Orgnode class for representing a
headline and associated text from an org-mode file, and routines for
constructing data structures of these classes.
"""
import re
import datetime
from pathlib import Path
from os.path import relpath
from typing import List
indent_regex = re.compile(r"^ *")
def normalize_filename(filename):
"Normalize and escape filename for rendering"
if not Path(filename).is_absolute():
# Normalize relative filename to be relative to current directory
normalized_filename = f"~/{relpath(filename, start=Path.home())}"
else:
normalized_filename = filename
escaped_filename = f"{normalized_filename}".replace("[", "\[").replace("]", "\]")
return escaped_filename
def makelist(filename):
"""
Read an org-mode file and return a list of Orgnode objects
created from this file.
"""
ctr = 0
f = open(filename, "r")
todos = {
"TODO": "",
"WAITING": "",
"ACTIVE": "",
"DONE": "",
"CANCELLED": "",
"FAILED": "",
} # populated from #+SEQ_TODO line
level = ""
heading = ""
bodytext = ""
introtext = ""
tags = list() # set of all tags in headline
closed_date = ""
sched_date = ""
deadline_date = ""
logbook = list()
nodelist: List[Orgnode] = list()
property_map = dict()
in_properties_drawer = False
in_logbook_drawer = False
file_title = f"{filename}"
for line in f:
ctr += 1
heading_search = re.search(r"^(\*+)\s(.*?)\s*$", line)
if heading_search: # we are processing a heading line
if heading: # if we have are on second heading, append first heading to headings list
thisNode = Orgnode(level, heading, bodytext, tags)
if closed_date:
thisNode.closed = closed_date
closed_date = ""
if sched_date:
thisNode.scheduled = sched_date
sched_date = ""
if deadline_date:
thisNode.deadline = deadline_date
deadline_date = ""
if logbook:
thisNode.logbook = logbook
logbook = list()
thisNode.properties = property_map
nodelist.append(thisNode)
property_map = {"LINE": f"file:{normalize_filename(filename)}::{ctr}"}
level = heading_search.group(1)
heading = heading_search.group(2)
bodytext = ""
tags = list() # set of all tags in headline
tag_search = re.search(r"(.*?)\s*:([a-zA-Z0-9].*?):$", heading)
if tag_search:
heading = tag_search.group(1)
parsedtags = tag_search.group(2)
if parsedtags:
for parsedtag in parsedtags.split(":"):
if parsedtag != "":
tags.append(parsedtag)
else: # we are processing a non-heading line
if line[:10] == "#+SEQ_TODO":
kwlist = re.findall(r"([A-Z]+)\(", line)
for kw in kwlist:
todos[kw] = ""
# Set file title to TITLE property, if it exists
title_search = re.search(r"^#\+TITLE:\s*(.*)$", line)
if title_search and title_search.group(1).strip() != "":
title_text = title_search.group(1).strip()
if file_title == f"{filename}":
file_title = title_text
else:
file_title += f" {title_text}"
continue
# Ignore Properties Drawer Start, End Lines
if re.search(":PROPERTIES:", line):
in_properties_drawer = True
continue
if in_properties_drawer and re.search(":END:", line):
in_properties_drawer = False
continue
# Ignore Logbook Drawer Start, End Lines
if re.search(":LOGBOOK:", line):
in_logbook_drawer = True
continue
if in_logbook_drawer and re.search(":END:", line):
in_logbook_drawer = False
continue
# Extract Clocking Lines
clocked_re = re.search(
r"CLOCK:\s*\[([0-9]{4}-[0-9]{2}-[0-9]{2} [a-zA-Z]{3} [0-9]{2}:[0-9]{2})\]--\[([0-9]{4}-[0-9]{2}-[0-9]{2} [a-zA-Z]{3} [0-9]{2}:[0-9]{2})\]",
line,
)
if clocked_re:
# convert clock in, clock out strings to datetime objects
clocked_in = datetime.datetime.strptime(clocked_re.group(1), "%Y-%m-%d %a %H:%M")
clocked_out = datetime.datetime.strptime(clocked_re.group(2), "%Y-%m-%d %a %H:%M")
# add clocked time to the entries logbook list
logbook += [(clocked_in, clocked_out)]
line = ""
property_search = re.search(r"^\s*:([a-zA-Z0-9]+):\s*(.*?)\s*$", line)
if property_search:
# Set ID property to an id based org-mode link to the entry
if property_search.group(1) == "ID":
property_map["ID"] = f"id:{property_search.group(2)}"
else:
property_map[property_search.group(1)] = property_search.group(2)
continue
cd_re = re.search(r"CLOSED:\s*\[([0-9]{4})-([0-9]{2})-([0-9]{2})", line)
if cd_re:
closed_date = datetime.date(int(cd_re.group(1)), int(cd_re.group(2)), int(cd_re.group(3)))
sd_re = re.search(r"SCHEDULED:\s*<([0-9]+)\-([0-9]+)\-([0-9]+)", line)
if sd_re:
sched_date = datetime.date(int(sd_re.group(1)), int(sd_re.group(2)), int(sd_re.group(3)))
dd_re = re.search(r"DEADLINE:\s*<(\d+)\-(\d+)\-(\d+)", line)
if dd_re:
deadline_date = datetime.date(int(dd_re.group(1)), int(dd_re.group(2)), int(dd_re.group(3)))
# Ignore property drawer, scheduled, closed, deadline, logbook entries and # lines from body
if (
not in_properties_drawer
and not cd_re
and not sd_re
and not dd_re
and not clocked_re
and line[:1] != "#"
):
# if we are in a heading
if heading:
# add the line to the bodytext
bodytext += line
# else we are in the pre heading portion of the file
elif line.strip():
# so add the line to the introtext
introtext += line
# write out intro node before headings
# this is done at the end to allow collating all title lines
if introtext:
thisNode = Orgnode(level, file_title, introtext, tags)
nodelist = [thisNode] + nodelist
# write out last heading node
if heading:
thisNode = Orgnode(level, heading, bodytext, tags)
thisNode.properties = property_map
if sched_date:
thisNode.scheduled = sched_date
if deadline_date:
thisNode.deadline = deadline_date
if closed_date:
thisNode.closed = closed_date
if logbook:
thisNode.logbook = logbook
nodelist.append(thisNode)
# using the list of TODO keywords found in the file
# process the headings searching for TODO keywords
for n in nodelist:
todo_search = re.search(r"([A-Z]+)\s(.*?)$", n.heading)
if todo_search:
if todo_search.group(1) in todos:
n.heading = todo_search.group(2)
n.todo = todo_search.group(1)
# extract, set priority from heading, update heading if necessary
priority_search = re.search(r"^\[\#(A|B|C)\] (.*?)$", n.heading)
if priority_search:
n.priority = priority_search.group(1)
n.heading = priority_search.group(2)
# Set SOURCE property to a file+heading based org-mode link to the entry
if n.level == 0:
n.properties["LINE"] = f"file:{normalize_filename(filename)}::0"
n.properties["SOURCE"] = f"[[file:{normalize_filename(filename)}]]"
else:
escaped_heading = n.heading.replace("[", "\\[").replace("]", "\\]")
n.properties["SOURCE"] = f"[[file:{normalize_filename(filename)}::*{escaped_heading}]]"
return nodelist
######################
class Orgnode(object):
"""
Orgnode class represents a headline, tags and text associated
with the headline.
"""
def __init__(self, level, headline, body, tags):
"""
Create an Orgnode object given the parameters of level (as the
raw asterisks), headline text (including the TODO tag), and
first tag. The makelist routine postprocesses the list to
identify TODO tags and updates headline and todo fields.
"""
self._level = len(level)
self._heading = headline
self._body = body
self._tags = tags # All tags in the headline
self._todo = ""
self._priority = "" # empty of A, B or C
self._scheduled = "" # Scheduled date
self._deadline = "" # Deadline date
self._closed = "" # Closed date
self._properties = dict()
self._logbook = list() # List of clock-in, clock-out tuples representing logbook entries
# Look for priority in headline and transfer to prty field
@property
def heading(self):
"""
Return the Heading text of the node without the TODO tag
"""
return self._heading
@heading.setter
def heading(self, newhdng):
"""
Change the heading to the supplied string
"""
self._heading = newhdng
@property
def body(self):
"""
Returns all lines of text of the body of this node except the
Property Drawer
"""
return self._body
@property
def hasBody(self):
"""
Returns True if node has non empty body, else False
"""
return self._body and re.sub(r"\n|\t|\r| ", "", self._body) != ""
@property
def level(self):
"""
Returns an integer corresponding to the level of the node.
Top level (one asterisk) has a level of 1.
"""
return self._level
@property
def priority(self):
"""
Returns the priority of this headline: 'A', 'B', 'C' or empty
string if priority has not been set.
"""
return self._priority
@priority.setter
def priority(self, new_priority):
"""
Change the value of the priority of this headline.
Values values are '', 'A', 'B', 'C'
"""
self._priority = new_priority
@property
def tags(self):
"""
Returns the list of all tags
For example, :HOME:COMPUTER: would return ['HOME', 'COMPUTER']
"""
return self._tags
@tags.setter
def tags(self, newtags):
"""
Store all the tags found in the headline.
"""
self._tags = newtags
def hasTag(self, tag):
"""
Returns True if the supplied tag is present in this headline
For example, hasTag('COMPUTER') on headling containing
:HOME:COMPUTER: would return True.
"""
return tag in self._tags
@property
def todo(self):
"""
Return the value of the TODO tag
"""
return self._todo
@todo.setter
def todo(self, new_todo):
"""
Set the value of the TODO tag to the supplied string
"""
self._todo = new_todo
@property
def properties(self):
"""
Return the dictionary of properties
"""
return self._properties
@properties.setter
def properties(self, new_properties):
"""
Sets all properties using the supplied dictionary of
name/value pairs
"""
self._properties = new_properties
def Property(self, property_key):
"""
Returns the value of the requested property or null if the
property does not exist.
"""
return self._properties.get(property_key, "")
@property
def scheduled(self):
"""
Return the scheduled date
"""
return self._scheduled
@scheduled.setter
def scheduled(self, new_scheduled):
"""
Set the scheduled date to the scheduled date
"""
self._scheduled = new_scheduled
@property
def deadline(self):
"""
Return the deadline date
"""
return self._deadline
@deadline.setter
def deadline(self, new_deadline):
"""
Set the deadline (due) date to the new deadline date
"""
self._deadline = new_deadline
@property
def closed(self):
"""
Return the closed date
"""
return self._closed
@closed.setter
def closed(self, new_closed):
"""
Set the closed date to the new closed date
"""
self._closed = new_closed
@property
def logbook(self):
"""
Return the logbook with all clocked-in, clocked-out date object pairs or empty list if nonexistent
"""
return self._logbook
@logbook.setter
def logbook(self, new_logbook):
"""
Set the logbook with list of clocked-in, clocked-out tuples for the entry
"""
self._logbook = new_logbook
def __repr__(self):
"""
Print the level, heading text and tag of a node and the body
text as used to construct the node.
"""
# Output heading line
n = ""
for _ in range(0, self._level):
n = n + "*"
n = n + " "
if self._todo:
n = n + self._todo + " "
if self._priority:
n = n + "[#" + self._priority + "] "
n = n + self._heading
n = "%-60s " % n # hack - tags will start in column 62
closecolon = ""
for t in self._tags:
n = n + ":" + t
closecolon = ":"
n = n + closecolon
n = n + "\n"
# Get body indentation from first line of body
indent = indent_regex.match(self._body).group()
# Output Closed Date, Scheduled Date, Deadline Date
if self._closed or self._scheduled or self._deadline:
n = n + indent
if self._closed:
n = n + f'CLOSED: [{self._closed.strftime("%Y-%m-%d %a")}] '
if self._scheduled:
n = n + f'SCHEDULED: <{self._scheduled.strftime("%Y-%m-%d %a")}> '
if self._deadline:
n = n + f'DEADLINE: <{self._deadline.strftime("%Y-%m-%d %a")}> '
if self._closed or self._scheduled or self._deadline:
n = n + "\n"
# Ouput Property Drawer
n = n + indent + ":PROPERTIES:\n"
for key, value in self._properties.items():
n = n + indent + f":{key}: {value}\n"
n = n + indent + ":END:\n"
# Output Body
if self.hasBody:
n = n + self._body
return n

View File

@@ -0,0 +1,91 @@
# Standard Packages
from abc import ABC, abstractmethod
import hashlib
import logging
from typing import Callable, List, Tuple
from khoj.utils.helpers import timer
# Internal Packages
from khoj.utils.rawconfig import Entry, TextContentConfig
logger = logging.getLogger(__name__)
class TextToJsonl(ABC):
def __init__(self, config: TextContentConfig):
self.config = config
@abstractmethod
def process(self, previous_entries: List[Entry] = None) -> List[Tuple[int, Entry]]:
...
@staticmethod
def hash_func(key: str) -> Callable:
return lambda entry: hashlib.md5(bytes(getattr(entry, key), encoding="utf-8")).hexdigest()
@staticmethod
def split_entries_by_max_tokens(
entries: List[Entry], max_tokens: int = 256, max_word_length: int = 500
) -> List[Entry]:
"Split entries if compiled entry length exceeds the max tokens supported by the ML model."
chunked_entries: List[Entry] = []
for entry in entries:
# Split entry into words
compiled_entry_words = [word for word in entry.compiled.split(" ") if word != ""]
# Drop long words instead of having entry truncated to maintain quality of entry processed by models
compiled_entry_words = [word for word in compiled_entry_words if len(word) <= max_word_length]
# Split entry into chunks of max tokens
for chunk_index in range(0, len(compiled_entry_words), max_tokens):
compiled_entry_words_chunk = compiled_entry_words[chunk_index : chunk_index + max_tokens]
compiled_entry_chunk = " ".join(compiled_entry_words_chunk)
# Prepend heading to all other chunks, the first chunk already has heading from original entry
if chunk_index > 0:
# Snip heading to avoid crossing max_tokens limit
# Keep last 100 characters of heading as entry heading more important than filename
snipped_heading = entry.heading[-100:]
compiled_entry_chunk = f"{snipped_heading}.\n{compiled_entry_chunk}"
chunked_entries.append(
Entry(
compiled=compiled_entry_chunk,
raw=entry.raw,
heading=entry.heading,
file=entry.file,
)
)
return chunked_entries
def mark_entries_for_update(
self, current_entries: List[Entry], previous_entries: List[Entry], key="compiled", logger=None
) -> List[Tuple[int, Entry]]:
# Hash all current and previous entries to identify new entries
with timer("Hash previous, current entries", logger):
current_entry_hashes = list(map(TextToJsonl.hash_func(key), current_entries))
previous_entry_hashes = list(map(TextToJsonl.hash_func(key), previous_entries))
with timer("Identify, Mark, Combine new, existing entries", logger):
hash_to_current_entries = dict(zip(current_entry_hashes, current_entries))
hash_to_previous_entries = dict(zip(previous_entry_hashes, previous_entries))
# All entries that did not exist in the previous set are to be added
new_entry_hashes = set(current_entry_hashes) - set(previous_entry_hashes)
# All entries that exist in both current and previous sets are kept
existing_entry_hashes = set(current_entry_hashes) & set(previous_entry_hashes)
# Mark new entries with -1 id to flag for later embeddings generation
new_entries = [(-1, hash_to_current_entries[entry_hash]) for entry_hash in new_entry_hashes]
# Set id of existing entries to their previous ids to reuse their existing encoded embeddings
existing_entries = [
(previous_entry_hashes.index(entry_hash), hash_to_previous_entries[entry_hash])
for entry_hash in existing_entry_hashes
]
existing_entries_sorted = sorted(existing_entries, key=lambda e: e[0])
entries_with_ids = existing_entries_sorted + new_entries
return entries_with_ids

263
src/khoj/routers/api.py Normal file
View File

@@ -0,0 +1,263 @@
# Standard Packages
import math
import yaml
import logging
from datetime import datetime
from typing import List, Optional, Union
# External Packages
from fastapi import APIRouter
from fastapi import HTTPException
# Internal Packages
from khoj.configure import configure_processor, configure_search
from khoj.processor.conversation.gpt import converse, extract_questions
from khoj.processor.conversation.utils import message_to_log, message_to_prompt
from khoj.search_type import image_search, text_search
from khoj.utils.helpers import log_telemetry, timer
from khoj.utils.rawconfig import FullConfig, SearchResponse
from khoj.utils.state import SearchType
from khoj.utils import state, constants
# Initialize Router
api = APIRouter()
logger = logging.getLogger(__name__)
# Create Routes
@api.get("/config/data/default")
def get_default_config_data():
return constants.default_config
@api.get("/config/types", response_model=List[str])
def get_config_types():
"""Get configured content types"""
if state.config is None or state.config.content_type is None:
raise HTTPException(
status_code=500,
detail="Content types not configured. Configure at least one content type on server and restart it.",
)
configured_content_types = state.config.content_type.dict(exclude_none=True)
return [
search_type.value
for search_type in SearchType
if search_type.value in configured_content_types
or ("plugins" in configured_content_types and search_type.name in configured_content_types["plugins"])
]
@api.get("/config/data", response_model=FullConfig)
def get_config_data():
return state.config
@api.post("/config/data")
async def set_config_data(updated_config: FullConfig):
state.config = updated_config
with open(state.config_file, "w") as outfile:
yaml.dump(yaml.safe_load(state.config.json(by_alias=True)), outfile)
outfile.close()
return state.config
@api.get("/search", response_model=List[SearchResponse])
def search(
q: str,
n: Optional[int] = 5,
t: Optional[SearchType] = None,
r: Optional[bool] = False,
score_threshold: Optional[Union[float, None]] = None,
dedupe: Optional[bool] = True,
):
results: List[SearchResponse] = []
if q is None or q == "":
logger.warn(f"No query param (q) passed in API call to initiate search")
return results
# initialize variables
user_query = q.strip()
results_count = n
score_threshold = score_threshold if score_threshold is not None else -math.inf
# return cached results, if available
query_cache_key = f"{user_query}-{n}-{t}-{r}-{score_threshold}-{dedupe}"
if query_cache_key in state.query_cache:
logger.debug(f"Return response from query cache")
return state.query_cache[query_cache_key]
if (t == SearchType.Org or t == None) and state.model.orgmode_search:
# query org-mode notes
with timer("Query took", logger):
hits, entries = text_search.query(
user_query, state.model.orgmode_search, rank_results=r, score_threshold=score_threshold, dedupe=dedupe
)
# collate and return results
with timer("Collating results took", logger):
results = text_search.collate_results(hits, entries, results_count)
elif (t == SearchType.Markdown or t == None) and state.model.markdown_search:
# query markdown files
with timer("Query took", logger):
hits, entries = text_search.query(
user_query, state.model.markdown_search, rank_results=r, score_threshold=score_threshold, dedupe=dedupe
)
# collate and return results
with timer("Collating results took", logger):
results = text_search.collate_results(hits, entries, results_count)
elif (t == SearchType.Ledger or t == None) and state.model.ledger_search:
# query transactions
with timer("Query took", logger):
hits, entries = text_search.query(
user_query, state.model.ledger_search, rank_results=r, score_threshold=score_threshold, dedupe=dedupe
)
# collate and return results
with timer("Collating results took", logger):
results = text_search.collate_results(hits, entries, results_count)
elif (t == SearchType.Music or t == None) and state.model.music_search:
# query music library
with timer("Query took", logger):
hits, entries = text_search.query(
user_query, state.model.music_search, rank_results=r, score_threshold=score_threshold, dedupe=dedupe
)
# collate and return results
with timer("Collating results took", logger):
results = text_search.collate_results(hits, entries, results_count)
elif (t == SearchType.Image or t == None) and state.model.image_search:
# query images
with timer("Query took", logger):
hits = image_search.query(
user_query, results_count, state.model.image_search, score_threshold=score_threshold
)
output_directory = constants.web_directory / "images"
# collate and return results
with timer("Collating results took", logger):
results = image_search.collate_results(
hits,
image_names=state.model.image_search.image_names,
output_directory=output_directory,
image_files_url="/static/images",
count=results_count,
)
elif (t in SearchType or t == None) and state.model.plugin_search:
# query specified plugin type
with timer("Query took", logger):
hits, entries = text_search.query(
user_query,
# Get plugin search model for specified search type, or the first one if none specified
state.model.plugin_search.get(t.value) or next(iter(state.model.plugin_search.values())),
rank_results=r,
score_threshold=score_threshold,
dedupe=dedupe,
)
# collate and return results
with timer("Collating results took", logger):
results = text_search.collate_results(hits, entries, results_count)
# Cache results
state.query_cache[query_cache_key] = results
# Only log telemetry if query is new and not a continuation of previous query
if state.previous_query is None or state.previous_query not in user_query:
state.telemetry += [log_telemetry(telemetry_type="api", api="search", app_config=state.config.app)]
state.previous_query = user_query
return results
@api.get("/update")
def update(t: Optional[SearchType] = None, force: Optional[bool] = False):
try:
state.search_index_lock.acquire()
state.model = configure_search(state.model, state.config, regenerate=force, t=t)
state.search_index_lock.release()
except ValueError as e:
logger.error(e)
raise HTTPException(status_code=500, detail=str(e))
else:
logger.info("📬 Search index updated via API")
try:
state.processor_config = configure_processor(state.config.processor)
except ValueError as e:
logger.error(e)
raise HTTPException(status_code=500, detail=str(e))
else:
logger.info("📬 Processor reconfigured via API")
state.telemetry += [log_telemetry(telemetry_type="api", api="update", app_config=state.config.app)]
return {"status": "ok", "message": "khoj reloaded"}
@api.get("/chat")
def chat(q: Optional[str] = None):
if (
state.processor_config is None
or state.processor_config.conversation is None
or state.processor_config.conversation.openai_api_key is None
):
raise HTTPException(
status_code=500, detail="Chat processor not configured. Configure OpenAI API key on server and restart it."
)
# Initialize Variables
api_key = state.processor_config.conversation.openai_api_key
model = state.processor_config.conversation.model
chat_model = state.processor_config.conversation.chat_model
user_message_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Load Conversation History
chat_session = state.processor_config.conversation.chat_session
meta_log = state.processor_config.conversation.meta_log
# If user query is empty, return chat history
if not q:
if meta_log.get("chat"):
return {"status": "ok", "response": meta_log["chat"]}
else:
return {"status": "ok", "response": []}
# Infer search queries from user message
with timer("Extracting search queries took", logger):
inferred_queries = extract_questions(q, model=model, api_key=api_key, conversation_log=meta_log)
# Collate search results as context for GPT
with timer("Searching knowledge base took", logger):
result_list = []
for query in inferred_queries:
result_list.extend(search(query, n=5, r=True, score_threshold=-5.0, dedupe=False))
compiled_references = [item.additional["compiled"] for item in result_list]
try:
with timer("Generating chat response took", logger):
gpt_response = converse(compiled_references, q, meta_log, model=chat_model, api_key=api_key)
status = "ok"
except Exception as e:
gpt_response = str(e)
status = "error"
# Update Conversation History
state.processor_config.conversation.chat_session = message_to_prompt(q, chat_session, gpt_message=gpt_response)
state.processor_config.conversation.meta_log["chat"] = message_to_log(
q,
gpt_response,
user_message_metadata={"created": user_message_time},
khoj_message_metadata={"context": compiled_references, "intent": {"inferred-queries": inferred_queries}},
conversation_log=meta_log.get("chat", []),
)
state.telemetry += [log_telemetry(telemetry_type="api", api="chat", app_config=state.config.app)]
return {"status": status, "response": gpt_response, "context": compiled_references}

View File

@@ -0,0 +1,64 @@
# Standard Packages
import logging
from typing import Optional
# External Packages
from fastapi import APIRouter
# Internal Packages
from khoj.routers.api import search
from khoj.processor.conversation.gpt import (
answer,
extract_search_type,
)
from khoj.utils.state import SearchType
from khoj.utils.helpers import get_from_dict
from khoj.utils import state
# Initialize Router
api_beta = APIRouter()
logger = logging.getLogger(__name__)
# Create Routes
@api_beta.get("/search")
def search_beta(q: str, n: Optional[int] = 1):
# Initialize Variables
model = state.processor_config.conversation.model
api_key = state.processor_config.conversation.openai_api_key
# Extract Search Type using GPT
try:
metadata = extract_search_type(q, model=model, api_key=api_key, verbose=state.verbose)
search_type = get_from_dict(metadata, "search-type")
except Exception as e:
return {"status": "error", "result": [str(e)], "type": None}
# Search
search_results = search(q, n=n, t=SearchType(search_type))
# Return response
return {"status": "ok", "result": search_results, "type": search_type}
@api_beta.get("/answer")
def answer_beta(q: str):
# Initialize Variables
model = state.processor_config.conversation.model
api_key = state.processor_config.conversation.openai_api_key
# Collate context for GPT
result_list = search(q, n=2, r=True, score_threshold=0, dedupe=False)
collated_result = "\n\n".join([f"# {item.additional['compiled']}" for item in result_list])
logger.debug(f"Reference Context:\n{collated_result}")
# Make GPT respond to user query using provided context
try:
gpt_response = answer(collated_result, user_query=q, model=model, api_key=api_key)
status = "ok"
except Exception as e:
gpt_response = str(e)
status = "error"
return {"status": status, "response": gpt_response}

View File

@@ -0,0 +1,29 @@
# External Packages
from fastapi import APIRouter
from fastapi import Request
from fastapi.responses import HTMLResponse, FileResponse
from fastapi.templating import Jinja2Templates
# Internal Packages
from khoj.utils import constants
# Initialize Router
web_client = APIRouter()
templates = Jinja2Templates(directory=constants.web_directory)
# Create Routes
@web_client.get("/", response_class=FileResponse)
def index():
return FileResponse(constants.web_directory / "index.html")
@web_client.get("/config", response_class=HTMLResponse)
def config_page(request: Request):
return templates.TemplateResponse("config.html", context={"request": request})
@web_client.get("/chat", response_class=FileResponse)
def chat_page():
return FileResponse(constants.web_directory / "chat.html")

Some files were not shown because too many files have changed in this diff Show More