Ollama (Local)

Run AI models locally with Ollama — no API key required.

Ollama (Local Models)

Run AI models entirely on your computer with Ollama. No API keys, no internet required, complete privacy.

What is Ollama?

Ollama is a tool for running large language models locally. It’s perfect for:

  • Privacy — Data never leaves your computer
  • Offline use — No internet required
  • Free — No API costs
  • Customization — Use any compatible model

Setup

1. Install Ollama

macOS / Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai

2. Start Ollama

ollama serve

This starts the Ollama server on http://localhost:11434.

3. Pull a Model

# Recommended general model
ollama pull llama3.2

# Other popular options
ollama pull mistral
ollama pull codellama
ollama pull phi3

4. Configure in BraceKit

  1. Open Settings → AI Provider
  2. Click Ollama in the provider grid
  3. The Base URL should be http://localhost:11434 (default)
  4. API key is not required for localhost
  5. Select your model from the dropdown

Settings are saved automatically as you type.

Note: BraceKit automatically fetches available models from Ollama.

Available Models

Popular models available through Ollama:

ModelSizeBest For
llama3.23BGeneral use
llama3.2:1b1BFast, lightweight
mistral7BBalanced
codellama7BCode generation
phi33.8BEfficient
deepseek-coder6.7BCode
llava7BVision (images)

Browse all models at ollama.ai/library.

Features

Think Mode

Some models support extended thinking:

  1. Click the brain icon (🧠) in the toolbar
  2. Send your message
  3. The model shows its reasoning

Vision (LLaVA)

The llava model can analyze images:

  1. Pull the model: ollama pull llava
  2. Select llava in BraceKit
  3. Attach an image to your message
  4. Ask questions about it

Auto Model Fetch

BraceKit automatically fetches your Ollama models:

  1. Pull a new model: ollama pull model-name
  2. Open the model selector in BraceKit
  3. The new model appears automatically

Model Parameters

Ollama-Specific Settings

ParameterEffect
num_ctxContext window size
num_predictMax tokens to generate
temperatureRandomness (0-2)
top_pNucleus sampling
top_kToken selection
repeat_penaltyAvoid repetition
seedReproducibility

Configuring in BraceKit

  1. Open Settings → AI Provider
  2. Expand the Advanced section
  3. Set temperature and other parameters

Changes apply to Ollama requests automatically.

Hardware Requirements

Minimum Requirements

Model SizeRAMStorage
1B-3B8GB5GB
7B16GB10GB
13B32GB20GB
70B64GB+50GB+

GPU Acceleration

Ollama automatically uses GPU when available:

  • NVIDIA: CUDA support (fastest)
  • AMD: ROCm support
  • Apple Silicon: Metal support (M1/M2/M3)

Running Multiple Models

You can run multiple models and switch between them:

# Pull multiple models
ollama pull llama3.2
ollama pull mistral
ollama pull codellama

# List installed models
ollama list

All installed models appear in BraceKit’s model selector.

Troubleshooting

“Connection refused”

  • Ensure Ollama is running: ollama serve
  • Check the Base URL in settings: http://localhost:11434
  • Verify no firewall is blocking localhost

“Model not found”

  • Pull the model first: ollama pull model-name
  • Check model name spelling
  • Run ollama list to see installed models

Slow responses

  • Larger models are slower
  • GPU significantly improves speed
  • Try a smaller model: llama3.2:1b

Out of memory

  • Use a smaller quantization
  • Try a smaller model
  • Close other applications

Models not appearing in BraceKit

  • Ensure Ollama is running
  • Click the refresh button in the model selector
  • Check the console for errors

Custom Models

From Hugging Face

# Pull from Hugging Face
ollama pull hf.co/username/model-name

Create Custom Model

Create a Modelfile:

FROM llama3.2
PARAMETER temperature 0.7
SYSTEM You are a helpful coding assistant.

Build and run:

ollama create my-model -f Modelfile
ollama run my-model