--- title: "Bring Your Local Model" description: "Run AI models locally with Ollama or LM Studio for free, private, offline use" --- BrowserOS works great with local models for Chat Mode. Run models completely offline — your data never leaves your machine. ## Context Length **Ollama defaults to 4,096 tokens of context — this is too low for BrowserOS.** Below 15K tokens, the context overflows and the agent gets stuck in a loop constantly trying to recover. Only Chat Mode will work at low context lengths. Set at least **15,000–20,000 tokens** for local models to function properly. Set context length when starting Ollama: ```bash OLLAMA_CONTEXT_LENGTH=20000 ollama serve ``` Increasing context length uses more VRAM. Run `ollama ps` to check your current allocation. See the [Ollama context length docs](https://docs.ollama.com/context-length) for more details. --- ## Setup The easiest way to run models locally. Download from [ollama.com](https://ollama.com) and install it. ```bash ollama pull qwen/qwen3-4b ``` ```bash OLLAMA_CONTEXT_LENGTH=20000 ollama serve ``` 1. Go to `chrome://browseros/settings` 2. Click **USE** on the Ollama card 3. Set **Model ID** to `qwen/qwen3-4b` 4. Set **Context Window** to `20000` 5. Click **Save** ![Ollama in BrowserOS](/images/byollm--ollama-config.png) Nice GUI if you don't want to use the terminal. Download from [lmstudio.ai](https://lmstudio.ai) and install it. Open LM Studio → **Developer** tab → load a model. It runs a server at `http://localhost:1234/v1/`. ![LM Studio](/images/setting-up-lm-studio/lmstudio-step1.png) 1. Go to `chrome://browseros/settings` 2. Click **USE** on the **OpenAI Compatible** card 3. Set **Base URL** to `http://localhost:1234/v1/` 4. Set **Model ID** to the model you loaded 5. Set **Context Window** to at least `20000` 6. Click **Save** ![LM Studio in BrowserOS](/images/byollm--lmstudio-config.png) --- ## Recommended Models Pick a model based on your available RAM/VRAM. Smaller models are faster but less capable. ### Lightweight (under 5 GB) Good for machines with 8 GB RAM. Fast responses, suitable for simple chat tasks. | Model | Publisher | Params | Quant | Size | |-------|-----------|--------|-------|------| | `qwen/qwen3-4b` | Qwen | 4B | 4bit | 2.28 GB | | `mistralai/ministral-3-3b` | Mistral | 3B | Q4_K_M | 2.99 GB | | `deepseek-r1-distill-qwen-7b` | lmstudio-community | 7B | Q4_K_M | 4.68 GB | | `deepseek-r1-distill-llama-8b` | lmstudio-community | 8B | Q4_K_M | 4.92 GB | ### Mid-range (10–15 GB) Needs 16+ GB RAM. Better reasoning, handles longer conversations well. | Model | Publisher | Params | Quant | Size | |-------|-----------|--------|-------|------| | `openai/gpt-oss-20b` | OpenAI | 20B | MXFP4 | 12.11 GB | | `mistralai/magistral-small` | Mistral | 23.6B | 4bit | 13.28 GB | | `mistralai/devstral-small-2-2512` | Mistral | 24B | 4bit | 14.12 GB | ### Heavy (60+ GB) For workstations with 64+ GB RAM. Closest to cloud model quality. | Model | Publisher | Params | Quant | Size | |-------|-----------|--------|-------|------| | `openai/gpt-oss-120b` | OpenAI | 120B | MXFP4 | 63.39 GB | Start with `qwen/qwen3-4b` if you're unsure — it's small, fast, and surprisingly capable for its size.