> if you are a company that handles confidential medical information (any health care organization, many insurers, every employee benefits management organization, &c), you may be required to have controls in place to ensure that nobody uses your Internet connection to exfiltrate people's PII through Google Mail.
Yes, but what are those controls? You check every packet to see if it contains any information from one of your databases?
What if the person sending the data just applies a simple obfuscation technique to the data, or just tunnels through some other encryption scheme?
You don't check for every record in your database, you create a regex (or multiple regexes) which matches the patterns you don't want leaked. This is how I've seen Data Loss Prevention done in the Sophos UTM.
Yes, if even the simplest obfuscation technique is employed, this system falls flat on its face. (Shh don't tell the regulators)
Sophos is a low-end solution. Higher end solutions (e.g. Vontu) do in fact let you detect on individual records or groups of non-regex detectable groups of records using fingerprinting.
Search for things like “data loss appliance”. As an example, when BlueCoat isn't helping repressive regimes spy on their citizens, they're helping businesses watch every outgoing packet:
“Blue Coat DLP allows you to easily create policies that analyze the data source, content, destination and more.
…
accurate data
“fingerprinting” capabilities, in addition to powerful keyword, pattern, and regular expression support, so you can
create precision policies to effectively
secure your data while minimizing false
positives.”
Sure, the every HN reader might have questions about this but I'd bet a LOT of C-level executives are receptive to this.
Oh, sorry, I meant questions like the ones raised about how someone might try to smuggle data past such filters or some of the security aspects of having a single point with access to everything.
I certainly agree that if you have a requirement to watch outbound data like this, having a system to selectively capture it is much better than simply attempting to record everything.
Simple answer: Yes, they basically check every packet, or at least as many as they can. No, DLP isn't perfect, and it doesn't always work. This should not be a shocker.
Notes:
1) Modern DLP solutions have some pretty sophisticated obfuscation detection tech. Like almost all of these kinds of technologies, they're looking for the 80% case, not the 99% case.
2) Tunneling out encrypted tunnels is subject to traffic analysis techniques. It's not as uncommon as one might suspect to detect out-of-band ex-filtration of many different types this way.
Please, point out any systems which have believeable claims for doing this. In my experience most 'DLP' systems do no such thing, they are just like the bit of string which stops you stealing pens at the bank, basically theatre.
Automatic analysis to statistically detect hidden channels is a research topic, it can be used to put bounds on the exfil rate but not reliably detect it.
Yes, but what are those controls? You check every packet to see if it contains any information from one of your databases?
What if the person sending the data just applies a simple obfuscation technique to the data, or just tunnels through some other encryption scheme?
Edit: downmods? really?