Ollama
Also known as: ollama.ai, local llm runner
Ollama makes running open-source models locally as simple as running a Docker container (a self-contained software package). One command pulls the model and starts a local server that speaks the same API format as OpenAI, so your existing code often works without changes. Models run entirely on your GPU or CPU, with no external network traffic.
The use cases are clearer than they sound: internal tools where you cannot send data to a cloud API due to privacy or compliance constraints, local development where cloud API costs or latency add friction, experimentation with model behavior without incurring API costs, and custom fine-tuned models you want to host privately. Ollama supports Llama, Mistral, Qwen, Gemma, and dozens of other open-weight models.
By 2026, Ollama had become the default local model runtime for builders. Most BYOK agent tools (Cline, Aider, OpenCode, smolagents) support Ollama as a model backend out of the box. Teams use it in two common patterns: fully local for privacy-critical workloads, or as a development/test environment against cheaper local models before incurring costs on frontier model APIs for final runs.