LLMOps engineer
Also known as: LLM operations engineer, AI ops engineer, GenAI ops engineer
MLOps (machine learning operations) was the field that emerged to manage the infrastructure and lifecycle of traditional ML models in production. As large language models became the dominant AI primitive, LLMOps emerged as its counterpart. The concerns overlap but the specifics differ: LLMs have token-based costs that scale with usage, context windows that fill up and need management, model versions that change frequently at the provider level, and latency profiles that vary dramatically with prompt length and model size.
An LLMOps engineer monitors model performance across live traffic, detects when output quality degrades or costs spike unexpectedly, manages prompt versioning so teams can roll back changes, and builds the infrastructure for prompt caching, model routing (sending different requests to different models based on cost or capability), and observability (logging and tracing what happens inside agent pipelines).
At startups, this work tends to fall to AI engineers or platform engineers. Dedicated LLMOps roles are more common at companies running LLMs at significant scale where cost and reliability are business-critical. The tooling ecosystem, including platforms for tracing, evaluation, and cost tracking, is maturing rapidly and lowering the operational complexity for smaller teams.