Concept·AI Models & Capabilities·Added 1 month ago

Nemotron 3 Ultra

Also known as: NVIDIA Nemotron Ultra, Nemotron Ultra 550B

NVIDIA's 550-billion-parameter open-weights model, released June 2026. The most capable US-built open model on intelligence benchmarks, designed specifically for long-running agentic workflows. Runs at 300+ tokens per second, making it unusually fast for its size.

Nemotron 3 Ultra is a Mixture-of-Experts (MoE) model: it has 550 billion total parameters but only activates about 55 billion of them per token. That sparsity is why it can serve over 300 tokens per second at inference time, a speed most models this large cannot match. NVIDIA released the weights on June 4, 2026, alongside training data and recipes, under a permissive Linux Foundation license that allows commercial use.

On the Artificial Analysis Intelligence Index, it scores 48 — the highest of any US open-weights model ever released, and well ahead of the previous US leader. The honest caveat: China's open-weights frontier (Kimi K2.6 at 54, DeepSeek V4 Pro) still leads globally. Nemotron 3 Ultra closes but does not erase that gap.

For builders, the relevant angle is the agentic design intent. NVIDIA built this model specifically for multi-step agent pipelines: planning, tool calling, delegating to subagents, reading observations, recovering from errors across hundreds of turns. It's available via Hugging Face, OpenRouter, and NVIDIA NIM (NVIDIA's hosted inference service), and is self-hostable on vLLM, SGLang, and TRT-LLM — but self-hosting requires datacenter GPU infrastructure.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms