SAFE-MCP Is the New Standard. Here's How to Map Your Agent Security Coverage.
SAFE-MCP Is the New Standard. Here's How to Map Your Agent Security Coverage.
Published: April 7, 2026 · 10 min read · Tags: MCP Security, Threat Intelligence, Standards, Coverage Scoring
The MCP Dev Summit wrapped last week in New York. Ninety-five sessions. Anthropic, Microsoft, Datadog, Hugging Face. A dedicated security track with talks on mix-up attacks, multi-issuer token confusion, and OAuth consent abuse. If MCP was still experimental six months ago, it isn't anymore.
But the biggest thing that happened wasn't at the summit itself. It was the announcement that arrived alongside it.
SAFE-MCP — the Security Analysis Framework for Evaluation of MCP — has been formally adopted by the Linux Foundation and the OpenID Foundation. Contributors include Meta, eBay, Okta, Red Hat, Intel, and American Express. It's now the first open standard specifically designed to map adversary tactics and techniques for MCP environments. Think MITRE ATT&CK, but built for agent-tool interactions from the ground up.
This matters for every team running agents in production. Here's why — and what to do about it.
What SAFE-MCP Is (and Why It Changes the Question)
SAFE-MCP adapts the proven MITRE ATT&CK methodology specifically for MCP. The structure is straightforward:
- 14 tactic categories — adversary objectives (what they're trying to accomplish)
- 80+ techniques — how they accomplish each objective, each with a SAFE-T identifier
- Mitigation and detection guidance — what to do about each technique
The framework documents specific techniques including tool poisoning (SAFE-T1001), prompt manipulation (SAFE-T1102), OAuth consent abuse (SAFE-T1007), and agent CLI weaponization (SAFE-T1111). Each one is grounded in documented real-world attacks.
Before SAFE-MCP, "agent security" was a shapeless category. Teams asked "are we secure?" and got answers that were essentially vibes — a list of patches applied, a scanner that hadn't found anything recently, a policy document that nobody had tested. The question was unmeasurable.
SAFE-MCP makes it measurable. The question is no longer "do you have agent security?" It's "which of the 14 tactic categories are you monitoring? Which techniques in each category do your controls detect? Where are your gaps?"
That's a coverage question. And coverage is a number you can track, improve, and show to a board.
The 14 Tactic Categories: What They Cover and What's Broken in the Wild
SAFE-MCP's 14 categories span the full adversary lifecycle for MCP attacks. Here's what each covers and which real-world incidents map to it — because these aren't theoretical categories. Every one has been exploited in 2026.
Tool Poisoning and Description Mutation — Attackers manipulate MCP tool descriptions so the agent misuses legitimate tools. The dynamic description mutation variant (our vector AC-02-01-008) exploits a TOCTOU gap: tool descriptions pass the initial approval check, then mutate to inject malicious behavior at execution time. Documented in the wild this quarter.
Prompt Injection and Manipulation (SAFE-T1102) — The foundational attack class. Malicious content in the agent's context redirects its behavior. The Anthropic mcp-server-git vulnerabilities (CVE-2025-68143/44/45) enabled prompt injection through repository content, leading to RCE. 43% of the "30 CVEs in 60 days" catalog traces to this category.
OAuth Consent Abuse (SAFE-T1007) — Attackers exploit the OAuth authorization flow to gain permissions the agent should never have granted. The OpenClaw WebSocket authorization bypass (CVE-2026-22172, CVSS 9.9) is the clearest example: a client could self-declare administrative scopes during the WebSocket handshake, and the server honored them without verification. Nine OpenClaw CVEs in four days last month.
Supply Chain and Registry Attacks (SAFE-T1001 related) — Malicious packages published to MCP registries masquerade as legitimate tools. The fake @postmark/mcp npm package silently exfiltrated API keys and environment variables from every developer who installed it. The LiteLLM v1.82.7/8 compromise exfiltrated SSH keys and AWS credentials. 824+ malicious MCP packages have been discovered — 8% of the OpenClaw registry.
Privilege Escalation — Agents operating with more permissions than the current task requires. CVE-2026-26118 in Azure MCP exposed managed identity tokens via SSRF to the metadata service endpoint. Four CrewAI CVEs this year chained prompt injection into RCE, SSRF, and file read.
Session Hijacking and Handshake Exploitation — Attackers intercept or manipulate the connection negotiation between agents and MCP servers. The OpenClaw local WebSocket gateway vulnerability allowed malicious websites to hijack developer AI agents with no user interaction, simply by exploiting implicit localhost trust.
RAG and Memory Poisoning — Corrupting the knowledge base an agent draws from. Malicious documents inserted into vector stores rank artificially high in retrieval, injecting false facts, backdoors, or goal-redirecting content into agent reasoning.
Multi-Agent Orchestration Attacks — Attacks that propagate across agent networks through inter-agent communication. When agents delegate to sub-agents, a compromise at one node can cascade. Multi-agent consensus poisoning (our AC-07-03-006) fabricates a majority among agent nodes to override legitimate decisions.
Cognitive Architecture Exploitation — Attacks targeting the reasoning process itself. Denial-of-Wallet via recursive reasoning is the canonical example: the attacker induces token amplification that reached 142.4x in documented cases — generating an $82K API bill in 48 hours from a single stolen key.
Temporal and Stateful Attacks — Attacks that exploit session history, delayed triggers, or accumulated state. An agent that trusts its conversation history as ground truth can be manipulated by injecting false history, or by planting time-bomb instructions that activate at specific conditions.
Credential Scope Expansion — Agents that carry broad API credentials rather than task-scoped tokens. Token scope expansion attacks exploit overly-permissive credentials to access resources far beyond the task's requirements. This is the structural failure behind most "the agent did something unexpected" incidents.
Anti-Forensics and Behavioral Camouflage — Attacks designed to evade detection and destroy evidence. Selective log deletion, timestamp manipulation, slow-and-steady exfiltration that stays under rate-limiting thresholds, and behavioral mimicry that makes attacks look like normal activity.
Infrastructure and Runtime Exploitation (SAFE-T1111 related) — SSRF, container escape, debug interface abuse. CVE-2026-23744 in MCPJam Inspector gave unauthenticated access to a development endpoint that enabled RCE. The FastMCP SSRF (CVE-2026-32871) let attackers traverse directories via unsanitized path parameters.
Output Manipulation and Weaponization — Attacks that corrupt what the agent produces. Data exfiltration encoded in normal-looking output, citation forgery, hallucination exploitation, weaponized code generation that embeds privilege escalation in generated scripts.
How to Map Your Own Coverage (With a Worked Example)
The value of SAFE-MCP isn't reading the framework — it's mapping your defenses to it. Here's a practical process.
Step 1: List your current security controls. Be honest. Typical MCP stacks have some combination of: static scanning at install time (mcp-scan, Snyk), a logging layer (whatever your cloud provider gives you), access policies (probably coarser than you think), and maybe a gateway-level filter. Most teams don't have runtime behavioral monitoring. That gap matters.
Step 2: For each SAFE-MCP tactic category, identify which controls detect it. This is where it gets uncomfortable. Static scanning catches supply chain attacks on known-malicious packages — but only the ones that have been flagged before. It doesn't catch dynamic description mutation (TOCTOU attacks pass the scan, then mutate). It doesn't catch OAuth consent abuse at runtime. It doesn't catch behavioral camouflage. A rough mapping for a typical stack:
| SAFE-MCP Tactic | Static Scanning | Logging | Gateway Filter | Runtime Monitoring | |---|---|---|---|---| | Tool Poisoning (known) | ✓ | — | Partial | ✓ | | Prompt Injection | — | — | Partial | ✓ | | OAuth Consent Abuse | — | ✓ (after) | — | ✓ | | Supply Chain (known) | ✓ | — | — | ✓ | | Supply Chain (novel) | — | — | — | ✓ | | Privilege Escalation | — | ✓ (after) | — | ✓ | | Session Hijacking | — | ✓ (after) | — | ✓ | | RAG/Memory Poisoning | — | — | — | ✓ | | Multi-Agent Attacks | — | — | — | ✓ | | Cognitive Exploitation | — | — | — | ✓ | | Temporal/Stateful | — | — | — | ✓ | | Credential Scope | — | ✓ (after) | Partial | ✓ | | Anti-Forensics | — | — | — | ✓ | | Infrastructure/SSRF | — | ✓ (after) | — | ✓ | | Output Manipulation | — | — | — | ✓ |
The pattern is consistent: static scanning covers known-bad supply chain signatures. Everything else happens at runtime, and logging only tells you after the breach that something happened. Runtime monitoring is the only layer that detects in-flight.
Step 3: Identify your uncovered categories. Any row without a ✓ in any column is a blind spot. Any row where logging is your only control means you're detecting breaches, not preventing them.
Step 4: Prioritize by architecture. If you're running 10+ MCP servers from the public registry, supply chain and tool poisoning attacks are your highest risk. If you're using agents with persistent memory (RAG pipelines, long-running sessions), memory poisoning and temporal attacks are more likely. If your agents delegate to sub-agents, orchestration attacks are your exposure.
How Navil Maps to SAFE-MCP
We've mapped the Navil Threat Catalog's 11 attack classes, 33 detection categories, and 219 base vectors to SAFE-MCP's 14 tactic categories. The full mapping is published in the navil-threat-catalog repository under mappings/safe-mcp.md.
The structure:
- AC-01 (Multi-Modal Smuggling) → SAFE-MCP Prompt Injection/Manipulation, Output Manipulation
- AC-02 (Handshake Hijacking) → SAFE-MCP Tool Poisoning, Session Hijacking, OAuth Consent Abuse
- AC-03 (RAG/Memory Poisoning) → SAFE-MCP RAG/Memory Poisoning, Temporal/Stateful Attacks
- AC-04 (Supply Chain/Discovery) → SAFE-MCP Supply Chain and Registry Attacks (comprehensive)
- AC-05 (Privilege Escalation) → SAFE-MCP Privilege Escalation, Credential Scope Expansion
- AC-06 (Anti-Forensics) → SAFE-MCP Anti-Forensics and Behavioral Camouflage (comprehensive)
- AC-07 (Agent Collusion) → SAFE-MCP Multi-Agent Orchestration Attacks
- AC-08 (Cognitive Architecture) → SAFE-MCP Cognitive Architecture Exploitation
- AC-09 (Temporal & Stateful) → SAFE-MCP Temporal and Stateful Attacks
- AC-10 (Output Manipulation) → SAFE-MCP Output Manipulation and Weaponization
- AC-11 (Infrastructure & Runtime) → SAFE-MCP Infrastructure and Runtime Exploitation
Every one of SAFE-MCP's 14 tactic categories maps to at least one Navil attack class. Most map to multiple. The catalog covers 219 specific vectors across these classes, with detection hints for each — the specificity that SAFE-MCP's framework-level guidance can't provide on its own.
SAFE-MCP names the threat. Navil's catalog specifies the vectors within it, the detection signatures, and the CVE references that prove it's been exploited.
The Intelligence Layer SAFE-MCP Doesn't Provide
SAFE-MCP is a taxonomy — a shared language for describing threats. It's invaluable for that. What it doesn't provide is threat intelligence that improves as the attack surface evolves.
The attack surface isn't static. The 30 CVEs from January–February 2026 weren't in any framework a year ago. The nine OpenClaw CVEs from last month weren't in SAFE-MCP's initial release. Novel attacks don't wait for frameworks to catch up.
Navil's threat intelligence engine is built to handle this gap. Novel attack patterns discovered across the network are auto-generated into new detection variants and distributed to all proxies autonomously. The coverage isn't fixed at publication time — it compounds. Every week the catalog covers more of the attack surface, without a manual update cycle.
The combination: SAFE-MCP gives you the structural map. Navil gives you coverage that improves against the territory as it actually changes.
Your Next Step
SAFE-MCP is open source at github.com/safe-agentic-framework/safe-mcp. Read the 14 tactic categories. Do the mapping exercise against your own controls.
Then run navil test --pool mega to see your coverage score mapped to SAFE-MCP categories. You'll get a specific number: which categories you're blocking, which you're partially covering, and where you're blind.
"Are we secure?" is the wrong question. "What's our coverage score against SAFE-MCP, and how is it trending?" is the question that has an answer.
Navil Threat Catalog is published at github.com/navilai/navil-threat-catalog under CC BY-SA 4.0. The SAFE-MCP alignment mapping is in mappings/safe-mcp.md.
Get your coverage score
See how well your AI agents are protected against known threats.