Files
BrowserOS/docs/features/local-models.mdx
2026-03-10 08:42:54 -07:00

121 lines
3.9 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Bring Your Local Model"
description: "Run AI models locally with Ollama or LM Studio for free, private, offline use"
---
BrowserOS works great with local models for Chat Mode. Run models completely offline — your data never leaves your machine.
## Context Length
<Warning>
**Ollama defaults to 4,096 tokens of context — this is too low for BrowserOS.** Below 15K tokens, the context overflows and the agent gets stuck in a loop constantly trying to recover. Only Chat Mode will work at low context lengths. Set at least **15,00020,000 tokens** for local models to function properly.
</Warning>
Set context length when starting Ollama:
```bash
OLLAMA_CONTEXT_LENGTH=20000 ollama serve
```
<Info>
Increasing context length uses more VRAM. Run `ollama ps` to check your current allocation. See the [Ollama context length docs](https://docs.ollama.com/context-length) for more details.
</Info>
---
## Setup
<Tabs>
<Tab title="Ollama" icon="terminal">
The easiest way to run models locally.
<Steps>
<Step title="Install Ollama">
Download from [ollama.com](https://ollama.com) and install it.
</Step>
<Step title="Pull a model">
```bash
ollama pull qwen/qwen3-4b
```
</Step>
<Step title="Start Ollama with higher context">
```bash
OLLAMA_CONTEXT_LENGTH=20000 ollama serve
```
</Step>
<Step title="Configure in BrowserOS">
1. Go to `chrome://browseros/settings`
2. Click **USE** on the Ollama card
3. Set **Model ID** to `qwen/qwen3-4b`
4. Set **Context Window** to `20000`
5. Click **Save**
![Ollama in BrowserOS](/images/byollm--ollama-config.png)
</Step>
</Steps>
</Tab>
<Tab title="LM Studio" icon="desktop">
Nice GUI if you don't want to use the terminal.
<Steps>
<Step title="Install LM Studio">
Download from [lmstudio.ai](https://lmstudio.ai) and install it.
</Step>
<Step title="Load a model">
Open LM Studio → **Developer** tab → load a model. It runs a server at `http://localhost:1234/v1/`.
![LM Studio](/images/setting-up-lm-studio/lmstudio-step1.png)
</Step>
<Step title="Configure in BrowserOS">
1. Go to `chrome://browseros/settings`
2. Click **USE** on the **OpenAI Compatible** card
3. Set **Base URL** to `http://localhost:1234/v1/`
4. Set **Model ID** to the model you loaded
5. Set **Context Window** to at least `20000`
6. Click **Save**
![LM Studio in BrowserOS](/images/byollm--lmstudio-config.png)
</Step>
</Steps>
</Tab>
</Tabs>
---
## Recommended Models
Pick a model based on your available RAM/VRAM. Smaller models are faster but less capable.
### Lightweight (under 5 GB)
Good for machines with 8 GB RAM. Fast responses, suitable for simple chat tasks.
| Model | Publisher | Params | Quant | Size |
|-------|-----------|--------|-------|------|
| `qwen/qwen3-4b` | Qwen | 4B | 4bit | 2.28 GB |
| `mistralai/ministral-3-3b` | Mistral | 3B | Q4_K_M | 2.99 GB |
| `deepseek-r1-distill-qwen-7b` | lmstudio-community | 7B | Q4_K_M | 4.68 GB |
| `deepseek-r1-distill-llama-8b` | lmstudio-community | 8B | Q4_K_M | 4.92 GB |
### Mid-range (1015 GB)
Needs 16+ GB RAM. Better reasoning, handles longer conversations well.
| Model | Publisher | Params | Quant | Size |
|-------|-----------|--------|-------|------|
| `openai/gpt-oss-20b` | OpenAI | 20B | MXFP4 | 12.11 GB |
| `mistralai/magistral-small` | Mistral | 23.6B | 4bit | 13.28 GB |
| `mistralai/devstral-small-2-2512` | Mistral | 24B | 4bit | 14.12 GB |
### Heavy (60+ GB)
For workstations with 64+ GB RAM. Closest to cloud model quality.
| Model | Publisher | Params | Quant | Size |
|-------|-----------|--------|-------|------|
| `openai/gpt-oss-120b` | OpenAI | 120B | MXFP4 | 63.39 GB |
<Tip>
Start with `qwen/qwen3-4b` if you're unsure — it's small, fast, and surprisingly capable for its size.
</Tip>