← Back to glossary
+Suggest a term
Concept·Infrastructure·Added 1 month ago

Private deployment

Also known as: on-premise AI, private cloud AI, air-gapped AI, self-hosted AI, VPC deployment

Running AI models inside your own infrastructure — a private cloud, VPC (Virtual Private Cloud), or on-premise servers — rather than sending requests to a third-party API. Data never leaves your environment, which matters for compliance, privacy, and regulated industries.

Most AI products start by calling a hosted API: you send a prompt to OpenAI or Anthropic, get a response back, and build on top of that. But in regulated industries — healthcare, finance, legal, government — data residency requirements or security policies can prevent sensitive information from being sent to third-party servers. Private deployment addresses this by running the model where you control the compute and the data.

Private deployment options range across a spectrum. At one end, you run open-weight models (models whose weights are publicly released) on your own GPU servers inside your infrastructure. In the middle, hyperscalers offer services like Azure OpenAI, where the model runs in a Microsoft-managed environment that is contractually isolated from other customers' data. At the other end, some frontier model providers (like Anthropic and OpenAI) offer enterprise agreements with private deployment options.

The infrastructure complexity of private deployment is real: you take on responsibility for model serving, autoscaling, version management, and security that a hosted API provider handles for you. Tools like vLLM, Ollama, and Llama Stack are designed to make self-hosted model serving more accessible. For builders, the decision is almost always driven by customer requirements rather than technical preference.

This definition is AI-generated and refreshed weekly. It may contain inaccuracies. Use your own judgment, especially for production decisions.
Related terms
Local modelBYOKSandboxingOpen weightsInference