RunPod
Also known as: RunPod.io, Run Pod
RunPod gives builders access to GPUs — from consumer-grade RTX cards to data-center H100s — without the overhead of enterprise cloud contracts or the complexity of the hyperscalers' pricing models. You can rent individual GPUs by the hour for experimentation, run serverless endpoints that scale to zero when idle, or deploy persistent pods for continuous workloads.
For builders self-hosting open-weight models, RunPod is a common starting point. You bring a container image (a packaged application environment), point it at RunPod's hardware, and get an API endpoint back. Many model serving frameworks like vLLM publish ready-to-use container images that deploy on RunPod with minimal configuration. The serverless option handles autoscaling automatically, which keeps costs down for workloads with unpredictable traffic.
RunPod differentiates on price relative to hyperscalers and on the breadth of GPU SKUs available (over 30 hardware options as of 2025). It also has strong community support in the open-source AI space, with tutorials and templates specifically built around deploying LLMs. Tradeoffs include fewer managed services beyond compute — you are closer to the metal compared to something like Modal, which abstracts more of the infrastructure concerns.