Guides > External tools & integrations

Set up Ollama

Install Ollama, run LLMs locally, compare model performance, and integrate local models into your apps using Warp.

Running AI models locally just got easier — and faster — with Ollama.

In this guide, we’ll walk through how to use Warp to install, profile, and integrate Ollama into your local setup.

1. Check your system specs

Before running large language models (LLMs) locally, confirm your hardware can handle them.

Example setups:

Mac: 64GB unified memory — good for larger models but with lower throughput.
Windows (NVIDIA RTX 5090): 32GB VRAM — excellent performance, but limited by VRAM capacity.

🧠 Rule of thumb: You’ll need roughly 1GB of VRAM per billion parameters.

2. Run your first model

Run a model locally:

ollama run gpt-oss

For example:

Try GPT-OSS 20B (requires ≥16GB VRAM, supports tool calling).
Then try Mistral 8B for a faster, smaller alternative.

Compare their performance and quality side-by-side.
Use Warp to easily monitor GPU usage and model response time.

3. Understanding model terms

Here’s a quick glossary for choosing the right local model:

Term	Meaning
Thinking	The model “thinks” before answering; better for complex reasoning.
Tools	Models can use external utilities (e.g., web search).
Vision	Can process and respond to images.
Embedding	Converts text to numeric form for search or RAG pipelines.
Quantization	Reduces memory use by lowering precision (e.g., 4-bit).

4. Integrate Ollama into your app

Most apps use OpenAI-compatible APIs, so integration is simple.

Open your app’s code in Warp.
Locate the OpenAI client initialization.
Replace the base URL with Ollama's
Update your API key and model name.

Warp helps you quickly locate, edit, and test the integration directly from the terminal.

6. Customize model behavior

Pull and modify a model.

Then save it as a custom model with new settings like temperature or system prompt.

Use Warp to generate a model file automatically.

This adds a structured system prompt for that task — ready to use instantly.