Concept·AI Models & Capabilities·Added 1 month ago

Diffusion model

Also known as: latent diffusion model, LDM, denoising diffusion model

A type of AI model that generates images by learning to reverse a noise process: it starts with pure random noise and gradually removes it until a coherent image emerges. The dominant approach behind most image generators from 2022 through 2025, now competing with newer transformer-based methods.

Diffusion models work by training on the task of denoising: given a noisy version of an image, predict what the clean image looks like. Through many noisy versions at different levels, the model learns a rich understanding of what realistic images look like. At generation time, you start with pure noise and run this denoising process in reverse, guided by a text prompt.

The 'latent' in latent diffusion model (the approach used by Stable Diffusion) means the noising and denoising happens in a compressed representation of images rather than at full pixel resolution, which makes it much faster. The model has two parts: an encoder that compresses images to a latent space and a decoder that expands them back, plus the actual diffusion model that works in latent space guided by text embeddings.

Diffusion models dominated image generation from 2022 through 2024 and still underpin many of the most widely used tools, including Stable Diffusion, many Flux variants, and Midjourney. The architecture has since expanded into video (Sora, Veo, Kling all use diffusion-based approaches), audio, and even protein structure prediction. Competing approaches, especially autoregressive methods and rectified flow transformers, have gained ground in 2025, but diffusion remains the most-deployed substrate for generative images.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.

Related terms