Ollama (Local)
Run AI models locally with Ollama — no API key required.
Ollama (Local Models)
Run AI models entirely on your computer with Ollama. No API keys, no internet required, complete privacy.
What is Ollama?
Ollama is a tool for running large language models locally. It’s perfect for:
- Privacy — Data never leaves your computer
- Offline use — No internet required
- Free — No API costs
- Customization — Use any compatible model
Setup
1. Install Ollama
macOS / Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download from ollama.ai
2. Start Ollama
ollama serve
This starts the Ollama server on http://localhost:11434.
3. Pull a Model
# Recommended general model
ollama pull llama3.2
# Other popular options
ollama pull mistral
ollama pull codellama
ollama pull phi34. Configure in BraceKit
- Open Settings → AI Provider
- Click Ollama in the provider grid
- The Base URL should be
http://localhost:11434(default) - API key is not required for localhost
- Select your model from the dropdown
Settings are saved automatically as you type.
Note: BraceKit automatically fetches available models from Ollama.
Available Models
Popular models available through Ollama:
| Model | Size | Best For |
|---|---|---|
| llama3.2 | 3B | General use |
| llama3.2:1b | 1B | Fast, lightweight |
| mistral | 7B | Balanced |
| codellama | 7B | Code generation |
| phi3 | 3.8B | Efficient |
| deepseek-coder | 6.7B | Code |
| llava | 7B | Vision (images) |
Browse all models at ollama.ai/library.
Features
Think Mode
Some models support extended thinking:
- Click the brain icon (🧠) in the toolbar
- Send your message
- The model shows its reasoning
Vision (LLaVA)
The llava model can analyze images:
- Pull the model:
ollama pull llava - Select
llavain BraceKit - Attach an image to your message
- Ask questions about it
Auto Model Fetch
BraceKit automatically fetches your Ollama models:
- Pull a new model:
ollama pull model-name - Open the model selector in BraceKit
- The new model appears automatically
Model Parameters
Ollama-Specific Settings
| Parameter | Effect |
|---|---|
| num_ctx | Context window size |
| num_predict | Max tokens to generate |
| temperature | Randomness (0-2) |
| top_p | Nucleus sampling |
| top_k | Token selection |
| repeat_penalty | Avoid repetition |
| seed | Reproducibility |
Configuring in BraceKit
- Open Settings → AI Provider
- Expand the Advanced section
- Set temperature and other parameters
Changes apply to Ollama requests automatically.
Hardware Requirements
Minimum Requirements
| Model Size | RAM | Storage |
|---|---|---|
| 1B-3B | 8GB | 5GB |
| 7B | 16GB | 10GB |
| 13B | 32GB | 20GB |
| 70B | 64GB+ | 50GB+ |
GPU Acceleration
Ollama automatically uses GPU when available:
- NVIDIA: CUDA support (fastest)
- AMD: ROCm support
- Apple Silicon: Metal support (M1/M2/M3)
Running Multiple Models
You can run multiple models and switch between them:
# Pull multiple models
ollama pull llama3.2
ollama pull mistral
ollama pull codellama
# List installed models
ollama list
All installed models appear in BraceKit’s model selector.
Troubleshooting
“Connection refused”
- Ensure Ollama is running:
ollama serve - Check the Base URL in settings:
http://localhost:11434 - Verify no firewall is blocking localhost
“Model not found”
- Pull the model first:
ollama pull model-name - Check model name spelling
- Run
ollama listto see installed models
Slow responses
- Larger models are slower
- GPU significantly improves speed
- Try a smaller model:
llama3.2:1b
Out of memory
- Use a smaller quantization
- Try a smaller model
- Close other applications
Models not appearing in BraceKit
- Ensure Ollama is running
- Click the refresh button in the model selector
- Check the console for errors
Custom Models
From Hugging Face
# Pull from Hugging Face
ollama pull hf.co/username/model-nameCreate Custom Model
Create a Modelfile:
FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful coding assistant.
Build and run:
ollama create my-model -f Modelfile
ollama run my-modelRelated
- Custom Provider — For other local servers
- Configuration — All settings
- Ollama Documentation — Official docs