Privacy Leaks Don’t Come Out the Front Door
Math Machine: Internal-Channel Exposure Machine
Release: Open (no DOI)
License: CC BY 4.0
On February 12, 2026, an arXiv preprint introduced AgentLeak, a “full-stack” benchmark for privacy leakage in multi-agent LLM systems. The paper argues that output-only audits miss major leakage pathways because sensitive data can travel through inter-agent messages, shared memory, and tool arguments. It reports 1,000 scenarios across multiple domains and a 32-class taxonomy, and states that testing several frontier models across 4,979 traces shows internal channels leaking at substantially higher rates than the final user-visible output.
Why this matters is that most privacy reviews implicitly treat the user-facing response as the system’s boundary. But multi-agent systems have a backstage: agents talk to each other, pass intermediate artifacts, and call tools with arguments that may contain sensitive content. If that internal traffic isn’t monitored, you can ship something that “looks clean” to the user while leaking internally in ways your controls never see.
What’s new here (as a Mathine technology demo) is the contract shift from “did the model leak in its answer?” to “where can sensitive data travel inside the system?” The paper even reports a counterintuitive pattern: multi-agent configurations can reduce leakage in the output channel while increasing total system exposure because internal channels become the dominant leak surface. That is a classic split-reality failure: the visible interface improves while the actual exposure grows.
Where this flips (regimes): single-agent versus multi-agent coordination; which channels are audited (output versus inter-agent messages versus tool arguments versus memory); how detection is implemented (the benchmark’s pipeline and taxonomy choices); and model/version changes that can move leakage from one channel to another without changing the “overall feel” of safety. In these regimes, “safer outputs” can coexist with higher system-level leakage.
Closure target: this becomes settled when privacy evaluations for agentic systems routinely publish per-channel leakage (not just output leakage), demonstrate that internal channels are actively protected and monitored, and show that improvements persist across domains and model versions—so “privacy-safe” means the whole system’s data paths are governed, not just the last message shown to the user.
