Mondoo

What AI Agents Are and How to Classify What's on Your Estate

AI agents are not chatbots with extra features. They perceive, reason, and act on real systems with real credentials. Before you can secure them, you need to know what you are dealing with: who controls the runtime, where it executes, and what it can reach.

Christoph Hartmann
Christoph Hartmann
·8 min read·
What AI Agents Are and How to Classify What's on Your Estate

This post is part of the series Securing AI Agents: A Practical Guide for Security and Engineering Leaders. Each post ends with actions for security teams and engineering teams.

Before you can secure AI agents, you need a shared language for what they are and how to classify them. This post covers the agentic loop that every agent runs, then introduces the four dimensions you need to classify every agent on your estate: what it can reach (agency), how independently it acts (autonomy), who controls the runtime (deployment category), and where it executes (execution environment).

What makes an agent an agent

Every AI agent, regardless of vendor or framework, runs the same loop: perceive, reason, act.

The agentic loop: User Request flows into the Perceive-Reason-Act cycle. Perceive assembles context from input sources (user messages, files, web pages, API responses, MCP server data). Reason sends context to an inference provider where data leaves your perimeter. Act executes tool calls against external systems (filesystem, shell, APIs, databases, git, Slack). Results feed back into Perceive for the next iteration.

Figure: The agentic loop. Every agent runs the same cycle: perceive context, reason about what to do, act on real systems. The security-relevant boundaries are where data enters (any input source is a potential injection vector), where reasoning happens (context is sent to an inference provider outside your perimeter), and where actions execute (tool calls hit production systems with real consequences). Results feed back and the loop repeats.

Perceive. The agent ingests context: user prompts, files, web pages, database results, service responses. Everything it reads becomes part of its decision space, and that is where the attack surface begins.

Reason. The model processes accumulated context and decides what to do next. Prompt injection and semantic manipulation can corrupt decisions before any tool is invoked. The context sent to the inference provider may contain PII, credentials, or proprietary code, and that data is now outside your perimeter regardless of whether the agent acts on the result. Man-in-the-middle attacks on the inference API can alter what the model receives or returns. Reasoning is both a decision point and a data exposure point.

Act. The agent executes its reasoning against real systems: writes to the filesystem, runs shell commands, calls APIs, commits code, sends messages. A chatbot that hallucinates wastes your time. An agent that hallucinates and then acts can delete your production database in nine seconds. It happened in April 2026.

After acting, the agent perceives the result, reasons about whether it succeeded, and decides what to do next. A single user request can trigger dozens of iterations.

Tool use turns language into action

A language model on its own can only produce text. Tool use turns it into an agent. The model receives a list of available tools with descriptions and parameter schemas. When the model decides a tool is needed, it emits a structured tool call instead of plain text. The calling system executes the tool and feeds the result back into the model's context for the next reasoning step.

The user asks the agent to check which pods are failing in the staging cluster:

JSON
// The model emits a tool call instead of a text response
{
"type": "tool_use",
"name": "run_shell_command",
"input": {
"command": "kubectl get pods -n staging --field-selector=status.phase=Failed"
}
}

The system executes the command and returns the result to the model:

JSON
// The tool result feeds back into the model's context
{
"type": "tool_result",
"content": "NAME READY STATUS RESTARTS AGE\npayment-svc-7b4d6f8-x2k9p 0/1 Failed 3 47m\nauth-worker-5c8a3e1-m7q2r 0/1 Failed 1 12m"
}

The model reads this result, reasons that two pods are failing, and decides what to do next. It might read the failing pod logs, or respond with a summary. Each perceive-reason-act iteration is one turn of the agentic loop.

This mechanism is standardized across providers. Anthropic's "Building effective agents" (December 2024) defines the distinction between workflows with predetermined code paths and agents with model-driven decision loops. OpenAI's "A practical guide to building agents" (April 2025) describes equivalent patterns. Google's Agent Development Kit (April 2025) provides a framework for multi-agent applications with tool use at the core.

The Model Context Protocol (MCP), originally created at Anthropic in November 2024 and donated to the Linux Foundation in December 2025, standardizes how agents discover and connect to external tools. An MCP server exposes tools with descriptions, parameter schemas, and return types. The agent reads these descriptions and decides when to use each tool. This is how a single coding agent can connect to a database, a cloud provider, a ticketing system, and a documentation service through a uniform protocol.

Now that you understand the mechanism, the security question is: how do you assess the risk of a specific agent? Two dimensions matter. The first is what it can reach. The second is how independently it acts.

Agency defines what the agent can reach

The tool set defines the agent's agency: the scope of actions it can perform on real systems. Agency is not binary. It ranges from reading a single file to modifying infrastructure across cloud accounts. The same model with different tool sets is a fundamentally different security proposition. Most teams evaluate agents by model and vendor, not by the agency they grant.

Read-only. The agent can read files, documentation, or data but cannot modify anything. Code suggestion tools that only read the current file operate here. The blast radius of a bad decision is zero because no action can change state.

Project-scoped. The agent can read and write within a bounded directory or project. It can edit files and run project-level commands but cannot reach outside its scope. A coding tool restricted to a single repository with no network access operates here.

Full system. The agent can access the entire filesystem, run arbitrary shell commands, and use whatever credentials the host user has. A coding agent running as the developer's user session with no sandboxing operates here. If the developer has SSH keys, cloud credentials, or API tokens on disk, the agent can reach them.

System plus external services. The agent reaches beyond the local machine to cloud APIs, databases, messaging platforms, and CI/CD pipelines. Production agents, CI/CD pipeline agents, and autonomous daemons with service account credentials operate here. A single bad decision can propagate across systems.

Agency tells you the blast radius. Autonomy tells you how likely an unchecked decision is to reach that blast radius.

Autonomy determines who is in control when the tool fires

The tool call mechanism is identical whether a human approves every action or nobody is watching. What happens between the model deciding to call a tool and the tool actually executing is the autonomy spectrum, the single most important variable for how much damage a bad decision can cause.

Multiple independent frameworks have converged on this spectrum. Google DeepMind's "Levels of AGI" (Morris et al., 2024) defines six autonomy levels from "No AI" through "AI as an Agent," where the highest level means the human is fully disengaged. Feng, McDonald, and Zhang at the University of Washington published "Levels of Autonomy for AI Agents" (June 2025), defining five levels by the user's role: operator, collaborator, consultant, approver, and observer. Kasirzadeh and Gabriel at DeepMind proposed a four-axis characterization (April 2025) that treats autonomy as one dimension alongside efficacy, goal complexity, and generality.

The more autonomy you give the agent, the more you depend on it making correct decisions, because the human checkpoint either comes later or not at all. The frameworks above use between four and six levels with academic naming. This series uses the same progression in plainer terms:

Suggestion. The model proposes; the human decides. GitHub Copilot autocomplete falls here. The model generates a code completion, the developer reads it, and presses Tab or ignores it. No tool fires without explicit human action.

Assisted. The model proposes a set of changes, and the human reviews before applying. Cursor's apply mode operates here. The model might suggest edits across five files, but nothing changes on disk until the developer confirms.

Agentic. The model executes multi-step tasks with intermittent human checkpoints. Claude Code in its default mode falls here. The agent reads files, writes code, runs tests, and iterates autonomously within its allowed scope, pausing for approval only when it needs to run commands outside that scope. The human is in the loop, but the agent is driving. Between checkpoints, the agent may execute dozens of tool calls without review.

Fully autonomous. The model runs without human intervention, often on a schedule or in response to events. Always-on daemons like OpenClaw, autonomous coding agents like Devin, and production agents on LangChain or Google ADK operate here. Intervention happens after the fact, if it happens at all.

Each level compresses the time between a bad decision and an irreversible action. At the suggestion level, every action passes through human judgment. At the fully autonomous level, the agent decides and executes before anyone knows it happened.

Risk lives at the intersection

Neither agency nor autonomy alone determines risk. A fully autonomous agent with read-only access to public documentation is low risk. A suggestion-level tool with write access to production is also low risk, because a human reviews every action. Risk emerges where broad agency meets high autonomy.

OWASP ranks Excessive Agency as LLM06 in its 2025 Top 10 for LLM Applications, citing three contributing factors: excessive functionality, excessive autonomy, and excessive permissions. All three map directly to the agency and autonomy dimensions above.

The AWS Agentic AI Security Scoping Matrix (Brown and Saner, November 2025) combines both dimensions into four scopes, each mapping to specific security controls:

  • Scope 1 (No Agency). Read-only access, human-initiated. The agent cannot modify state. Suggestion-level tools fall here.
  • Scope 2 (Prescribed Agency). Bounded modifications with human approval for every change. Assisted tools with project-scoped access fall here.
  • Scope 3 (Supervised Agency). Autonomous execution across the full system with intermittent human checkpoints. Agentic coding tools fall here.
  • Scope 4 (Full Agency). Self-initiated, continuous operation reaching external services. Autonomous coding agents and production daemons fall here.

The grid below places common developer tools at the intersection of their agency and autonomy levels. Risk increases from the bottom-left to the top-right. The AWS scopes run along the diagonal:

Read-onlyProject-scopedFull systemSystem + external
Fully autonomousOpenClawDevin · Scope 4
AgenticCursor agentClaude Code · Scope 3
AssistedCursor apply · Scope 2
SuggestionCopilot autocomplete · Scope 1

The dangerous quadrants are in the top-right: an agentic coding tool with production credentials, a fully autonomous daemon with filesystem and network access, an autonomous coding agent with deployment keys and access to external services.

The PocketOS incident illustrates this precisely. The autonomy level was "agentic with human approval," which sounds safe until you realize the agent found an overprivileged token in the repository and used it to delete a production database during its approved execution window. The agency was the problem, not the autonomy level. The right question is always both dimensions together: what can this agent reach, and who decides when it acts?

From risk level to inventory: who controls what

Agency and autonomy tell you how risky an agent is. To actually govern the agents on your estate, you need two more dimensions: who controls the agent's runtime, and where it executes. These determine which agency and autonomy controls you can enforce. An agent you built and deploy in your own cloud is one where you can restrict its tool set and require human approval gates. A vendor SaaS agent or an employee-installed tool may not give you either option.

Deployment category (who controls the runtime)

Organization-built agents. Your team builds, operates, and is accountable for them. You control the architecture, tool set, and deployment. The risk is that developers may not build with agent-specific threats in mind.

Vendor-run SaaS agents. Embedded in products you already buy: Salesforce Agentforce, Zendesk AI Agents, ServiceNow Otto. The vendor controls runtime, model, and tool access. You are accountable for the data you expose.

Managed desktop agents. M365 Copilot, Google Workspace Gemini, ChatGPT desktop, Claude Cowork. The vendor provides the model, but the agent executes locally on your users' machines and in their cloud accounts. It reads their email, documents, calendar, and chat history. You manage the deployment and the data exposure.

Employee-run tools. The category that grows fastest and gets governed last. ChatGPT, Claude Code, Cursor, Windsurf, browser extensions, OpenClaw. Employees adopt these on their own, sometimes with personal accounts, sometimes pasting proprietary code into chat windows. Visibility is close to zero in most organizations.

Execution environment (where the agent runs)

Cloud-deployed agents. Python or JavaScript backends using LangChain, CrewAI, Google Agent Development Kit (ADK), Vercel AI SDK, or AutoGen, deployed on platforms like AWS Bedrock Agents, Azure AI Agent Service, or Google Gemini Enterprise Agent Platform. These run in your cloud against production APIs, databases, and customer data with the same privilege as any other server-side workload.

Coding agents on workstations. Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Windsurf. These run on developer laptops with access to the local filesystem, shell, git, and whatever credentials the developer has stored locally. They operate with the full privilege of the developer's user session.

Browser agents. AI-native browsers like Perplexity Comet, Opera Neon, Dia, and agentic features built into Chrome and Edge, plus AI browser extensions like HARPA AI and Sider. These can read every page the user visits, access session tokens and cookies, and interact with web applications. The agent's access scope is the union of every service the user is logged into.

Desktop productivity agents. M365 Copilot, Google Workspace Gemini, ChatGPT desktop, Claude Cowork. These bridge local files and cloud services, reading documents, emails, and calendar entries. The execution environment spans the user's desktop and their cloud productivity suite.

CI/CD pipeline agents. GitHub Actions AI steps, GitHub Copilot Autofix, and similar integrations. These run inside your build and deployment pipeline with access to code repositories, deployment credentials, and production environments. A compromised pipeline agent can push malicious code to production without human review.

Autonomous daemons. OpenClaw, Claude Managed Agents, OpenAI Codex automations, and always-on monitoring agents. These run continuously or on a schedule, maintain persistent memory across sessions, and execute tasks without human involvement. The execution environment persists, which means mistakes compound.

Why both dimensions matter

A cloud-deployed agent your organization builds is a different control surface than a cloud-deployed agent a vendor runs, even though the execution environment is similar. Your org-built agent runs on credentials you issued, against APIs you control. The vendor-run agent processes your data in an environment you cannot inspect. Same execution environment, different ability to enforce agency and autonomy constraints.

Similarly, a coding agent an employee installs on their laptop falls into the same execution environment as one your IT team deploys. But you have no visibility into the employee-installed version, no ability to enforce configuration, and no audit trail. The next post in this series covers how these gaps translate into real-world attacks.

Putting the framework to work

Every agent on your estate can be classified along four dimensions: agency (what it can reach), autonomy (who decides when it acts), deployment category (who controls the runtime), and execution environment (where it runs). Agency and autonomy determine the risk level. Deployment category and execution environment determine which controls are available to manage that risk.

The rest of this series works through each combination: how to secure the runtime, enforce least privilege, lock down the tool set, monitor what the agent does, and respond when something goes wrong. The following actions are a starting point for both teams.

For the security team

The classification framework above gives you a shared vocabulary. These actions turn it into an operational intake process so every new agent gets evaluated before it reaches production data.

  • Build a seven-question intake questionnaire. For any new agent request, ask: (1) Who built it? (2) Who operates the runtime? (3) Where does it execute? (4) What tools and credentials does it access? (5) What is the autonomy level? (6) What data does it process? (7) What is the blast radius of a worst-case action? The answers place the agent on all four dimensions and determine the risk tier.

  • Define four risk tiers with named examples. Low: suggestion-level tools with no sensitive data access (Copilot autocomplete in a sandbox). Medium: assisted tools with access to internal systems (Cursor with scoped credentials). High: agentic tools with production or customer data access (Claude Code with production SSH keys). Critical: fully autonomous agents with broad access and no approval gate (always-on daemons with API keys and messaging access).

  • Use the AWS Scoping Matrix as the intake framework for all new agent requests. Place each agent in the correct scope (1 through 4) based on its autonomy and agency, then evaluate by deployment category, execution environment, and tool access. Two tools from the same vendor can fall into entirely different scope levels depending on how they are deployed.

  • Present the scoping grid to your board and audit committee. Nontechnical stakeholders understand "who controls it" and "where it runs" more readily than they understand model architectures or prompt injection. Print the grid with named examples in each cell. Update it quarterly.

  • Audit your estate now. Most organizations do not have a complete inventory of the AI agents their employees use. Shadow adoption in the employee-run category is the norm. You cannot govern what you cannot see.

For the engineering team

Engineering teams are the first to adopt new AI tools and the first to connect them to production systems. These actions help you evaluate tools before adoption and communicate risk in terms that lead to appropriate controls rather than blanket rejections.

  • Before evaluating any new tool, identify its deployment category and execution environment. If you cannot place the tool on all four dimensions, you do not understand it well enough to adopt it. An org-built agent in a sandboxed environment follows a different approval process than an employee-run tool that touches production credentials.

  • Communicate risk in concrete terms. Do not say "this tool is risky." Say "this tool runs on developer workstations with full-system agency at an agentic autonomy level, which places it in the high-risk tier and requires credential isolation before approval." Specific language produces specific controls.

  • Match approval requirements to risk tier. Low-tier: lightweight review and acceptable-use acknowledgment. Medium-tier: security architecture review and credential scoping plan. High-tier: add monitoring, audit logging, and blast-radius containment. Critical-tier: explicit CISO sign-off and a documented incident response procedure.

  • Before any vendor proof of concept, ask six questions. (1) Where does our data go during execution? (2) Does the vendor retain prompts or context after the session? (3) What tools and APIs does the agent call, and with whose credentials? (4) Can we scope its access to a subset of our data? (5) What audit logs can we export? (6) What happens to our data if we terminate the contract? Vendors that cannot answer these clearly are not ready for your environment.

What comes next

Classification is the foundation. Once you can name what an agent is, where it sits on the agency and autonomy grid, who controls it, and where it runs, you can make informed decisions about what controls it needs. Without that shared vocabulary, every conversation about AI agent risk devolves into "is this tool safe?" with no framework for answering.

The next post covers the AI agent threat landscape: the categories of attack that target each stage of the agentic loop, and real-world incidents that show what happens when controls fail.


References

About the Author

Christoph Hartmann

Christoph Hartmann

Co-Founder & CTO

Christoph Hartmann, co-founder and CTO at Mondoo, wants to make the world more secure. He's long been a leader in security engineering and DevOps, creating widely adopted solutions like Dev-Sec.io and Chef InSpec. For fun, he builds everything from custom operating systems to autonomous robots.

Ready to Get Started?

See how Mondoo can help secure your infrastructure.