Guardrail
Also known as: guardrails, AI guardrail, safety guardrail, output guardrail, input guardrail
Guardrails are the product safety layer around AI models. The model itself has its own built-in training-based safety (Anthropic's Constitutional AI, OpenAI's RLHF alignment work), but those are never enough on their own for production use. Guardrails are the additional checks you build: input filters that catch harmful or off-topic requests before they reach the model, output validators that verify responses meet your format and content requirements, and policy layers that restrict what the model is allowed to say in your specific context.
Common guardrail implementations include: topic filters (the customer service bot only talks about our products), format validators (the output must be valid JSON), PII (personally identifiable information) detectors that strip or flag sensitive data, and toxicity filters that block harmful content. Tools like NeMo Guardrails from NVIDIA, and built-in features in AI gateways, make this layer easier to implement.
For agentic systems, guardrails become more critical because agents take actions, not just produce text. A guardrail on an agent might prevent it from writing to a database without human approval, from making API calls above a cost threshold, or from accessing files outside a specified directory. The 12-factor agents principles and excessive agency guidelines both emphasize guardrails as a core part of responsible agent design.