> if you are a company that handles confidential medical information (any health...

click170 · on March 24, 2015

You don't check for every record in your database, you create a regex (or multiple regexes) which matches the patterns you don't want leaked. This is how I've seen Data Loss Prevention done in the Sophos UTM.

Yes, if even the simplest obfuscation technique is employed, this system falls flat on its face. (Shh don't tell the regulators)

kjs3 · on March 24, 2015

Sophos is a low-end solution. Higher end solutions (e.g. Vontu) do in fact let you detect on individual records or groups of non-regex detectable groups of records using fingerprinting.

acdha · on March 24, 2015

Search for things like “data loss appliance”. As an example, when BlueCoat isn't helping repressive regimes spy on their citizens, they're helping businesses watch every outgoing packet:

https://www.bluecoat.com/products/dlp

“Blue Coat DLP allows you to easily create policies that analyze the data source, content, destination and more.

…

accurate data “fingerprinting” capabilities, in addition to powerful keyword, pattern, and regular expression support, so you can create precision policies to effectively secure your data while minimizing false positives.”

Sure, the every HN reader might have questions about this but I'd bet a LOT of C-level executives are receptive to this.

kjs3 · on March 24, 2015

What questions do you have? Limiting the scope of discovery to fingerprinted data is a pretty good way to limit "incidental" data filtering/discovery.

acdha · on March 24, 2015

Oh, sorry, I meant questions like the ones raised about how someone might try to smuggle data past such filters or some of the security aspects of having a single point with access to everything.

I certainly agree that if you have a requirement to watch outbound data like this, having a system to selectively capture it is much better than simply attempting to record everything.

kjs3 · on March 24, 2015

Simple answer: Yes, they basically check every packet, or at least as many as they can. No, DLP isn't perfect, and it doesn't always work. This should not be a shocker.

Notes:

1) Modern DLP solutions have some pretty sophisticated obfuscation detection tech. Like almost all of these kinds of technologies, they're looking for the 80% case, not the 99% case.

2) Tunneling out encrypted tunnels is subject to traffic analysis techniques. It's not as uncommon as one might suspect to detect out-of-band ex-filtration of many different types this way.

angry_octet · on March 24, 2015

Please, point out any systems which have believeable claims for doing this. In my experience most 'DLP' systems do no such thing, they are just like the bit of string which stops you stealing pens at the bank, basically theatre.

Automatic analysis to statistically detect hidden channels is a research topic, it can be used to put bounds on the exfil rate but not reliably detect it.

jsprogrammer · on March 24, 2015

I guess my hangup was on the claim that such a scheme could 'ensure' prevention of ex-filtration which, frankly, seemed laughably impossible.