Llama
Also known as: Meta Llama, Llama models
Llama (Large Language Model Meta AI) launched in February 2023 as a research model and became unexpectedly influential when the weights were leaked publicly, catalyzing a wave of open-source AI development. Meta then leaned into openness deliberately, releasing Llama 2 and Llama 3 with permissive licenses. By mid-2026, Llama 4 was the most widely deployed open-weight AI model ecosystem globally, with 25+ launch partners including AWS, NVIDIA, Databricks, and Snowflake, and integrations with Ollama, vLLM, Hugging Face Transformers, and every major inference framework.
Llama 4, released in April 2025, brought two significant architectural shifts: mixture-of-experts (MoE) and native multimodality. Llama 4 Scout (17B active / 109B total, 16 experts) offers a 10 million token context window, the largest of any openly available model. Llama 4 Maverick (17B active / 400B total, 128 experts) targets stronger reasoning at a larger scale. Both models process text and images natively. Llama 4 Behemoth, a 288B active parameter teacher model, was still in training as of April 2026. The open-weight models were positioned against GPT-4o-era benchmarks; current closed frontier models (GPT-5.x, Claude Opus 4.x) are meaningfully ahead.
For builders, Llama's appeal is the combination of open weights, zero per-token API cost when self-hosted, and a massive ecosystem of fine-tuned community variants. Scout running on an RTX 4090 costs roughly $46 per month in electricity, compared to potentially thousands of dollars in API fees at the same token volumes. The tradeoff is hardware complexity and the capability gap to latest frontier models. Teams with data privacy requirements, high-volume workloads, or a need for custom fine-tuning often start with Llama as their foundation.