Shadow AI Data Leakage: How Employees Are Exposing Sensitive Data

The research methodology

With customer consent, we analyzed anonymized prompt telemetry from 180 enterprises over a 90-day period. We classified each prompt for data sensitivity using our enterprise classifier, then mapped findings to business unit and tool type. The results were consistent across industry verticals and company sizes.

What the data shows

38% of prompts submitted to unsanctioned AI tools contain data classified as sensitive or confidential. The most common categories: internal business strategy and financials (41% of sensitive prompts), customer PII (29%), employee data (18%), and source code or technical IP (12%). These are not edge cases — they are the median behavior.

The highest-risk employee populations

Finance and accounting teams have the highest rate of sensitive data in prompts — 61% of their AI interactions involve confidential financial data. Legal is second at 54%, with a high proportion involving privileged matter content. Engineering teams are third, primarily due to source code. HR follows closely with employee PII.

Why employees do this

It is not malicious — it is efficient. AI tools make complex tasks faster, and employees reach for whatever tool is fastest. When we surveyed employees who had submitted sensitive data, 78% said they did not realize it was against policy, and 94% said they would comply if there were a clear sanctioned alternative that was as capable.

The three-week fix

Week 1: deploy browser-level DLP that intercepts sensitive data before it leaves the endpoint. Week 2: publish a sanctioned AI tool list with a fast-track exception process. Week 3: run a 30-minute awareness session with the three highest-risk teams. This combination reduces sensitive-data prompt volume by 70% based on our customer data.