Concept·AI Models & Capabilities·Added 1 month ago

MoE

Also known as: mixture of experts, mixture-of-experts, sparse model, MoE architecture

Mixture of Experts. An architecture where a large model is actually made up of many smaller specialized sub-models (the 'experts'), and each input is routed to only a few of them. Gets you a big, capable model while keeping the cost of each individual inference lower than it looks.

A standard LLM activates its entire set of parameters on every query. MoE models instead route each token (a word-chunk input) through a small subset of the available 'expert' networks, chosen by a gating mechanism. The total parameter count looks huge on paper, but only a fraction fires at any given moment, making the model more efficient to run than its size implies.

Mixtral by Mistral was one of the first open models to popularize MoE at scale, and GPT-4 was widely reported to use a MoE architecture. The design matters for builders for two reasons: inference (generating answers) is cheaper per query relative to the total model size, and specialized knowledge can be distributed across different experts, potentially improving quality on diverse tasks.

You'll see MoE mentioned when people compare the 'active parameters' of a model to its 'total parameters.' A model with 400B total parameters and 40B active parameters per query is a MoE model. It's a useful concept for reading model specs and understanding why two models with the same parameter count can have very different speed and cost profiles.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms