Computer use
Also known as: computer use API, GUI agent, UI agent, screen agent
Most AI tools work by calling clean, well-defined APIs (interfaces that let software talk to other software). Computer use takes a different approach: the model sees a screenshot or rendered view of a screen, decides where to click or what to type, and actually interacts with the interface as a human would. This means it can work with any app, website, or legacy system, not just the ones that have an API.
Anthropic introduced a computer use API for Claude in late 2024, and the capability has become a benchmark category. OSWorld measures how well models can complete tasks inside a desktop operating system. Claude Sonnet 4.6 led OSWorld benchmarks as of early 2026. The use cases are large: automated QA testing, filling out forms in systems with no API, browser-based data entry, and anything requiring interaction with a visual interface.
The catch is reliability. Computer use agents are slower than API-based tool calls and can fail when interfaces change or render differently. Security is also a concern: an agent with screen and input control has broad access to a system. Most production deployments today scope computer use to sandboxed environments or specific, well-defined UI workflows rather than open-ended access.