← Back to glossary
+Suggest a term
Tool·Infrastructure·Added 1 month ago

Ollama

Also known as: ollama.ai, local llm runner

A lightweight open-source framework for running large language models on your own hardware. You pull a model like Llama or Mistral with one command, and it runs locally with no API keys, no cloud calls, and no data leaving your machine.

Ollama makes running open-source models locally as simple as running a Docker container (a self-contained software package). One command pulls the model and starts a local server that speaks the same API format as OpenAI, so your existing code often works without changes. Models run entirely on your GPU or CPU, with no external network traffic.

The use cases are clearer than they sound: internal tools where you cannot send data to a cloud API due to privacy or compliance constraints, local development where cloud API costs or latency add friction, experimentation with model behavior without incurring API costs, and custom fine-tuned models you want to host privately. Ollama supports Llama, Mistral, Qwen, Gemma, and dozens of other open-weight models.

By 2026, Ollama had become the default local model runtime for builders. Most BYOK agent tools (Cline, Aider, OpenCode, smolagents) support Ollama as a model backend out of the box. Teams use it in two common patterns: fully local for privacy-critical workloads, or as a development/test environment against cheaper local models before incurring costs on frontier model APIs for final runs.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.
Related terms
Open-source modelInferenceBYOKModel routerFine-tuning