SLM
Also known as: small language model, small model, edge model, on-device model
LLMs like GPT-5 or Claude Opus are enormous, requiring expensive cloud GPUs (specialized processors) to run. SLMs are the other end of the spectrum: models with a few billion parameters rather than hundreds of billions. Examples include Microsoft's Phi series, Meta's smaller Llama variants, and Google's Gemma. They can run on a laptop, in a browser, or even on a phone, without a cloud API call.
The case for SLMs in builder projects is often economic and practical. If your use case is narrow enough, like classifying support tickets or extracting structured data from invoices, a well-tuned SLM can perform comparably to a frontier model at a fraction of the cost and latency. They're also better for offline or privacy-sensitive workloads because the model runs locally.
The catch: SLMs have real limits on complex reasoning, long context, and unfamiliar tasks. The builder question isn't 'is it as smart as GPT-5?' but 'is it smart enough for this specific job?' That's often yes, especially after fine-tuning on domain data. Watch for SLM options in your stack as inference cost becomes a real constraint at scale.