MondooMondoo
AI Agent Security
Skills
Log inGet Assessment

AI Agent Skill Check is a free AI agent skill security scanner by Mondoo. We scan skills across ClawHub, Skills.sh, GitHub, Claude Marketplace, and SkillsMP to detect prompt injection, credential theft, data exfiltration, agent impersonation, and 28 threat types before they reach your agents.

Mondoo

  • Vulnerability Management
  • Technology
  • Services

Solutions

  • Financial Services
  • Manufacturing
  • Healthcare

Resources

  • Blog
  • Skill Check
  • Documentation
  • GitHub

Company

  • About
  • Careers
  • Partners
  • Contact

Legal

  • Privacy
  • Terms
  • Imprint
MondooMondoo© 2026 Mondoo, Inc.
/Security Checks

Security Checks

Every skill is analyzed for 6 threat classes using static pattern matching, ML classification, and LLM-based semantic analysis.

Detection is structured around the AI Agent Traps taxonomy (Franklin, Tomašev, Jacobs, Leibo & Osindero, 2026, Google DeepMind) which defines 6 threat classes targeting AI agents: Content Injection (Perception), Semantic Manipulation (Reasoning), Cognitive State (Memory), Behavioural Control (Action), Systemic (Multi-Agent), and Human-in-the-Loop (Human Overseer).

Threat Classes

Content InjectionPerception

Attacks that manipulate what the agent perceives by injecting hidden or disguised instructions into its input stream.

Semantic ManipulationReasoning

Attacks that exploit the agent's reasoning process through framing, persona manipulation, or authoritative language.

Cognitive StateMemory & Learning

Attacks that corrupt the agent's memory, knowledge bases, or learned policies to influence future behavior.

Behavioural ControlAction

Attacks that cause the agent to take harmful actions — executing commands, exfiltrating data, or spawning attacker-controlled sub-agents.

SystemicMulti-Agent Dynamics

Attacks targeting multi-agent systems — cascading failures, compositional fragment attacks across agent boundaries.

Human-in-the-LoopHuman Overseer

Attacks that exploit the human approver — approval fatigue, social engineering the human via agent output.

Prompt Injection

  • DAN (Do Anything Now) jailbreak patterns
  • Role/instruction override attempts ("ignore previous instructions")
  • System prompt extraction attempts
  • LLM control tokens (<|im_start|>, [INST], etc.)
  • Base64-encoded hidden instructions
  • Zero-width character steganography
  • HTML comment injection with instruction keywords
  • Instruction boundary manipulation (### system, etc.)
  • DeBERTa v3 ML classifier for subtle injection patterns
  • Indirect prompt injection hidden in processed data

Obfuscation

  • Long base64-encoded blobs (potential hidden payloads)
  • Hex-encoded payloads (\x sequences)
  • Character-by-character string concatenation

Syntactic Masking

  • CSS-based hidden text (display:none, visibility:hidden, opacity:0)
  • Markdown links with suspiciously long anchor text hiding instructions
  • HTML metadata attributes with hidden payloads (aria-label, data-*, alt, title)
  • Unicode directional override characters (RTL/LTR reordering)

Safety Filter

  • Gemini's built-in JAILBREAK classifier triggers on dangerous content
  • Content blocked by safety filter is itself a strong signal

Oversight Evasion

  • Red-team/educational framing ("for educational purposes only")
  • Role-play evasion ("pretend you are an unfiltered AI")
  • Fictional simulation framing ("in this hypothetical scenario")
  • Persona manipulation and authoritative language

Social Engineering

  • Impersonation, trust manipulation, deceptive framing

Description Mismatch

  • Compares claimed purpose against actual behavior
  • Flags skills that do something different than advertised

Memory Poisoning

  • Memory manipulation instructions ("write to your memory", "remember this forever")
  • Contextual learning manipulation ("from now on, always...")
  • Instructions to corrupt agent memory or knowledge bases

Supply Chain

  • Dependency confusion, typosquatting references

Command Execution

  • Shell execution functions (exec, system, popen, subprocess)
  • Reverse shell patterns (bash -i, /dev/tcp, netcat)
  • Remote code download and execution (curl | bash)
  • Dynamic code evaluation (eval, new Function)
  • Scheduled task/cron job creation

Data Exfiltration

  • HTTP URLs with embedded environment variables
  • Known exfiltration endpoints (burpcollaborator, requestbin, webhook.site)
  • DNS-based data exfiltration (nslookup/dig to attacker domains)
  • Pipe to curl/wget for data exfiltration
  • Markdown image tags with embedded data parameters

Credential Theft

  • Access to sensitive environment variables (API keys, tokens, passwords)
  • Known credential file paths (~/.aws/credentials, ~/.ssh/id_rsa, .env)
  • Keylogger patterns

Sub-Agent Spawning

  • Instructions to spawn new agents or delegate to sub-agents
  • System prompt injection for spawned child agents
  • Spawning agents with attacker-controlled prompts

Semantic analysis covers this class through LLM-based detection of multi-agent attack patterns.

Approval Fatigue

  • Patterns designed to exploit human approval fatigue
  • Social engineering the human via agent output