AI Gateway
Also known as: LLM gateway, model gateway, AI proxy
As teams use multiple AI providers (OpenAI for some tasks, Anthropic for others, local models for sensitive data), managing API keys, rate limits, and observability across all of them becomes its own engineering problem. An AI Gateway sits in front of all of them: your application sends requests to the gateway, and the gateway handles the routing, caching, and logging.
Popular options include Cloudflare AI Gateway, Portkey, and self-hosted solutions. They typically offer a unified OpenAI-compatible API surface regardless of which backend model is actually serving the request, which makes swapping models easier. Some gateways also support semantic caching (reusing recent responses to identical or near-identical prompts), which can significantly reduce costs in high-traffic applications.
For builders operating at even moderate scale, an AI gateway provides visibility that is otherwise hard to get: which prompts are expensive, which requests are failing, where latency is coming from, and how costs are distributed across providers. It also makes provider fallback easier: if OpenAI is down or rate-limited, route to Anthropic automatically.