Concept·AI Models & Capabilities·Added 1 month ago

Mixtral

Also known as: Mixtral 8x7B, Mixtral 8x22B, MoE model (Mistral)

Mistral AI's sparse mixture-of-experts model series, released starting December 2023. Mixtral 8x7B was one of the first prominent open-weight MoE models, with 46.7B total parameters but only 12.9B active per token, enabling GPT-3.5-competitive performance at much lower inference cost.

Mixtral is Mistral AI's brand name for its mixture-of-experts (MoE) architecture models. When a token is processed, a routing network selects a small subset of 'expert' parameter groups to activate, rather than using all parameters. This means the model has more total parameters (and thus more knowledge capacity) than a dense model of the same inference cost, because at any given moment only a fraction of the parameters are running.

Mixtral 8x7B, released December 2023, had 46.7B total parameters but only 12.9B active per token at inference time. It matched or outperformed GPT-3.5 and Llama 2 70B on most benchmarks while running at the speed and cost of a 13B dense model. Mixtral 8x22B followed in April 2024 with a larger parameter count. Both models were released under the Apache 2.0 license, making them freely downloadable and commercially usable. The Mixtral releases helped popularize MoE as a practical architecture for open-weight models, influencing subsequent releases from Meta (Llama 4) and Mistral itself (Mistral Large 3).

Mixtral as a product line has been superseded in Mistral's lineup by the Mistral 3 generation (Large 3, Small 4, Medium 3.5), but Mixtral models remain in active use in many production systems that were built on them. They are available on Hugging Face, Amazon Bedrock, and Ollama for local deployment.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms