Ollama (Local Models)

Run AI models entirely on your computer with Ollama. No API keys, no internet required, complete privacy.

What is Ollama?

Ollama is a tool for running large language models locally. It’s perfect for:

Privacy — Data never leaves your computer
Offline use — No internet required
Free — No API costs
Customization — Use any compatible model

Setup

1. Install Ollama

macOS / Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai

2. Start Ollama

ollama serve

This starts the Ollama server on http://localhost:11434.

3. Pull a Model

# Recommended general model
ollama pull llama3.2

# Other popular options
ollama pull mistral
ollama pull codellama
ollama pull phi3

4. Configure in BraceKit

Open Settings → AI Provider
Click Ollama in the provider grid
The Base URL should be http://localhost:11434 (default)
API key is not required for localhost
Select your model from the dropdown

Settings are saved automatically as you type.

Note: BraceKit automatically fetches available models from Ollama.

Available Models

Popular models available through Ollama:

Model	Size	Best For
llama3.2	3B	General use
llama3.2:1b	1B	Fast, lightweight
mistral	7B	Balanced
codellama	7B	Code generation
phi3	3.8B	Efficient
deepseek-coder	6.7B	Code
llava	7B	Vision (images)

Browse all models at ollama.ai/library.

Features

Think Mode

Some models support extended thinking:

Click the brain icon (🧠) in the toolbar
Send your message
The model shows its reasoning

Vision (LLaVA)

The llava model can analyze images:

Pull the model: ollama pull llava
Select llava in BraceKit
Attach an image to your message
Ask questions about it

Auto Model Fetch

BraceKit automatically fetches your Ollama models:

Pull a new model: ollama pull model-name
Open the model selector in BraceKit
The new model appears automatically

Model Parameters

Ollama-Specific Settings

Parameter	Effect
num_ctx	Context window size
num_predict	Max tokens to generate
temperature	Randomness (0-2)
top_p	Nucleus sampling
top_k	Token selection
repeat_penalty	Avoid repetition
seed	Reproducibility

Configuring in BraceKit

Open Settings → AI Provider
Expand the Advanced section
Set temperature and other parameters

Changes apply to Ollama requests automatically.

Hardware Requirements

Minimum Requirements

Model Size	RAM	Storage
1B-3B	8GB	5GB
7B	16GB	10GB
13B	32GB	20GB
70B	64GB+	50GB+

GPU Acceleration

Ollama automatically uses GPU when available:

NVIDIA: CUDA support (fastest)
AMD: ROCm support
Apple Silicon: Metal support (M1/M2/M3)

Running Multiple Models

You can run multiple models and switch between them:

# Pull multiple models
ollama pull llama3.2
ollama pull mistral
ollama pull codellama

# List installed models
ollama list

All installed models appear in BraceKit’s model selector.

Troubleshooting

“Connection refused”

Ensure Ollama is running: ollama serve
Check the Base URL in settings: http://localhost:11434
Verify no firewall is blocking localhost

“Model not found”

Pull the model first: ollama pull model-name
Check model name spelling
Run ollama list to see installed models

Slow responses

Larger models are slower
GPU significantly improves speed
Try a smaller model: llama3.2:1b

Out of memory

Use a smaller quantization
Try a smaller model
Close other applications

Models not appearing in BraceKit

Ensure Ollama is running
Click the refresh button in the model selector
Check the console for errors

Custom Models

From Hugging Face

# Pull from Hugging Face
ollama pull hf.co/username/model-name

Create Custom Model

Create a Modelfile:

FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful coding assistant.

Build and run:

ollama create my-model -f Modelfile
ollama run my-model

Custom Provider — For other local servers
Configuration — All settings
Ollama Documentation — Official docs