Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Researchers Expose Cunning Online Tracking Service That Can’t Be Dodged (wired.com)
191 points by bmcmanus on July 29, 2011 | hide | past | favorite | 146 comments


KISSmetrics has a post explaining how the tracking works. http://www.kissmetrics.com/how-it-works They claim that simply using AdBlock is enough to defeat the tracking. They also claim "KISSmetrics has never, and will never, share anonymous customer activity of what people did on customer A’s site with customer B."


One important detail for AdBlock: you HAVE to be using a Tracking/Privacy filter subscription. Please, if you're using ABP, add one of these as a subscription: http://www.fanboy.co.nz/fanboy-tracking.txt (Fanboy's Tracking List) or https://easylist-downloads.adblockplus.org/easyprivacy.txt (EasyPrivacy). None of the default filter subscriptions block KISSmetrics, but either of these will.


If, as the article implies, the same user gets the same identifier across all KISSmetrics customers they ARE (implicitly) sharing customer activity between customers.

As the article states, customer A and B could simple put the information they each have about identity XYZ together. Without KISSmetrics having to be involved any further than in providing and maintaining the unique identity to each customer separately.

Even if it was not the intent of KISSmetrics for this to be possible (which I've a hard time believing), the chosen implementation makes it possible.


well the wired article directly contradicts what they say they are doing

"These services are using practically every known method to circumvent user attempts to protect their privacy (Cookies, Flash Cookies, HTML5, CSS, Cache Cookies/Etags…)"

They may not share information about specific users, but doesn't mean they don't use it to sell information in some aggregate form.


Ghostery blocks many tracking scripts.

http://www.ghostery.com/download


" They also claim "KISSmetrics has never, and will never, share anonymous customer activity of what people did on customer A’s site with customer B.""

but if they change their mind, there is no way I can stop them - right?


And there is nothing from stopping other companies from using the same techniques to sell individual browsing data.


>They claim that simply using AdBlock is enough to defeat the tracking

I'm highly suspicious of that claim. The only site I have whitelisted is reddit, and I found the i.kissmetrics.com cookie in with the rest. That's not to say reddit isn't using them, but I'd be surprised given their very cautious approach to advertising.


Unfortunately, this does not help with mobile devices which do not have the same capability.

Well, I guess you can always maintain a hosts file but that's the only way I can think of.


There's Firewall iP for jailbroken iPhones. Also one can install other browsers than Safari from the Appstore and, for example, iCab Mobile has the ability to use filters (and comes with some).


Mobile devices != iPhone+Android. I find this trend pretty disturbing.


I didn't even mention Android. I just gave an example of what's possible on one type of mobile device.


hazza for adblock, as much as i like to see the legislature and courts support privacy this is fundamentally a technological problem.


I disagree. This is a Whack-A-Mole game, and that means a societal solution is required.

Compare with spam. Technological solutions have reduced the problem, but to virtually eliminate it requires global law enforcement.


Any evidence that law enforcement could virtually eliminate spam?


Severe dents have been put in spam production when specific individual senders were shut down.


Temporary things. Same happens with illegal drugs when key people get killed. Does not mean the war on drugs needs more law enforcement to win...


I usually use Firefox with it set to forget everything on exit, along with the Noscript plugin. Does anyone know if this tracking service would work on a FF user running Noscript?

By the way, using Noscript has made me aware of something that I didn't previously know: many sites call Javascript from lots of other domains. I've seen websites with as many as 18 other domains listed on the Noscript pull down menu. And I have seen an increasing number of XSS alerts as well.


Amen to JS from third-party domains.

I see this as biting us in the butt sometime. Maybe not today, maybe not tomorrow, but soon, and for the rest of your life.

What's more annoying is playing the "NoScript allow roulette" game of trying to figure out which domains/scripts you have to allow for some site feature to work.


This is why I don't bother with cookie monitoring at all, and why I plug my ears and say "la la la" and pretend that everything will be alright. I really don't want to spend the time figuring out how to make my bank work.

I suppose when the government gets in the game, either through direct tracking or just making laws requiring tracking companies to keep particular data for particular lengths of time, then it will be a civil liberties issue and I'll care more. But it will probably also be illegal by then to circumvent tracking.

But that's just crazy. That would be like the government demanding that ISPs keep credit card information on their customers.


For banking and stuff with way too much Javascript I use a second browser, and I do a complete wipe of all private information after every use.


chromium --temp-profile --no-first-run is the ticket for me.


All the social sharing buttons use 3rd-party JS. You can see the "embed" code for Google's new +1 button here: http://googlewebmastercentral.blogspot.com/2011/07/1-button-... This lets the sites update APIs without breaking every page on the Internet :)


And disabling all those crappy social sharing buttons makes the pages so much better. From faster loading to less clutter there are a bunch of benefits. Sometimes I'm flabbergasted how crappy sites are when I sit down to a machine without a javascript blocker.


plus many sites use google's library api to load things like jquery! Used for good, 3rd party JS is a helpful thing.

http://code.google.com/apis/libraries/devguide.html


You might want to remove all caching. Though deleting cache on exit will disrupt tracking if you exit often.

The etag mechanism will return each user a different etag for a piece of content, so the browser will send an etag changed request with that etag in. This will be stored with the browser cache.


Jeez guys, not all tracking is evil. You know all that awesome content that exists on the web? Well the people that make and distribute that content need information to make your experience better. Let's say you start a new site. Let's use 8tracks for example: they provide a two-tiered service, one free and premium. The free service exists to drive you to a paid account, but you still derive value from it, nonetheless. In exchange for that free value, you give them stats that they use with their advertisers, who in turn give them cash they can then use to make your experience better. It's a give and take system. Thankfully, money isn't the only currency on the web, a little bit of info and some advertising goes a long way. I am willing to trade value for value, it's fair that way.


>not all tracking is evil.

Tracking isn't evil. Tracking people who specifically do not want to be tracked is evil.


that is true. they knew that some users block all third party cookies and they still wanted to track them, hence using Etag


What if I specifically don't want you on my website if you won't let me track you? You're using technology to circumvent me (adblock), why can't I use technology to circumvent your wishes (evercookie et. al.)?


You can specifically disallow such people from viewing your website, without being evil.


You mean, people who take unfair^ advantage of freemium services?

  ^ according to jscheel's assessment of fair trade


"You know all that awesome content that exists on the web? Well the people that make and distribute that content need information to make your experience better."

that's funny, most of the web content i use most heavily was 'made' and 'distributed' without any input from the marketers / advertisers / trackers whom you are defending. not all tracking is evil, but the evil tracking offsets the good stuff by a wide margin. not all gun owners are evil but if you're running a company that uses that defense to justify business practices that require an invisible gun owner to be in my living room, i'm not going to be caring about how much my living room experience has improved.


There was a TV executive a few years ago who stated that getting up to go to the bathroom during a commercial is theft.

It's the commercials that make non-premium TV possible in its current free-ish structure. It seems that broadcasters should run their business with the assumption that a certain number of people are going to go the the bathroom and miss some commercials. And web companies should run their businesses and still be able to provide their free-ish services if some people decide to opt out of tracking.

What if everyone went to the bathroom all the time and missed all the commercials? That doesn't seem to happen.


  Well the people that make and distribute that content need information to make your experience better.
If somebody wants my feedback, they can ask for it, and if I have the time and like/value their service, I will gladly comply. Simply 'taking' my feedback doesn't sit well with me.


that is very cool. The important thing about the system is that it is a matter of personal choice, right? I can choose not to participate, right?

because they ask my permission, dont they? I mean, I can choose between not seeing their contact or being tracked - yes?


Absolutely. You can easily choose not to participate.

If, on the other hand, you choose to deny access to your data while continuing to use the Web services in question, you would be at fault (using grandparent's definition of fair trade).


great!

how can I choose to deny access to my data?

heh heh. the silly article made it sound as if that wasn't an option. trust the journalists to stuff things up.

so how do I do it?

Edit: I dont use Firefox, so AdBlock is out of the question, and I have no idea what websites do this. Is there a list available somewhere? How can I discover whether or not a website does this without visiting them in the first place?


Either don't use websites that use this (what the grandparent was suggesting, I believe), or block KISSmetrics in your hosts file. Apparently it's also blocked by Adblock Plus if you're subscribed to the right lists (which are linked to elsewhere in the comments on this story).


Why single out Kiss Metrics? One example, I visited Fox News last month and found they set up an HTML5 database called, in a rather unsubtle choice, "evercookie". I can't confirm that this is the case currently, though since the ability to view HTML5 databases in Preferences seems to now be missing from all the browsers I have (which seems odd, too!).


I imagine it is named as such because they are using evercookie[1], off the shelf.

[1] http://samy.pl/evercookie/


In Chrome you can bring up the Developer Tools (View menu on the mac, I think it's in the wrench menu on other platforms). You can see databases, cookies, etc. in the Resources tab.


Thanks, I see that in the Inspector now.

I recall menus in Safari, Mobile Safari, Firefox and Chrome which listed all the databases stored, along with the name. It was in Preferences near the cookie and password management.

It looks like the 'databases' menu is no longer in Mobile Safari preferences, and now Safari 5.1 will tell you what a website is storing in general terms, but no longer details the individual databases in preferences.


"Evercookie" is more than just standard browser cookies: http://en.wikipedia.org/wiki/Evercookie


I'm not sure these researchers understand how private-browsing functions. The session in a private-browsing window is only private from the non-private sessions and only private from future private-sessions when all private sessions -- private-browsing windows -- are destroyed.

http://imgur.com/a/LjjYf

Here I have a non-private session, where I have request i.js (a second time), invoking an If-None-Match check with my non-private ETag of i.js. Opening a private session, my request to i.js does not invoke my non-private session's ETag and subsequent If-None-Match -- i.js is fetched as if my session has no memory of the URI.

In the second shot, I had closed my private session opened in the first test, and I then opened a new private session, without closing my previous non-private session. Again, my private session requests a new i.js, with no idea of the non-private session's nor the first, now closed, private session's version.

The onus is on browsers to restrict inner-private-session storage from leaking between tabs, but it could be quite messy.


The main exceptions to this are Flash cookies. These are shared between all browsers for a given user, since they're stored by the Flash plugin itself and independent of individual browsers' profile storage.


"Starting with Flash Player 10.1, Flash Player actively supports the browser's private browsing mode, managing data in local storage so that it is consistent with private browsing. So when a private browsing session ends, Flash Player will automatically clear any corresponding data in local storage."

Source: http://www.adobe.com/devnet/flashplayer/articles/privacy_mod...

Local storage here refers to "Flash cookies".


Does it work that way in other browsers too?


FF5 and IE9 function similar. Non-private and private sessions will not cooperate on the same cache, cookies, ETags, etc. Closing a private session will destroy all local cache, cookies, ETags, etc and is not reinstated when starting future private sessions.


I tried to do my best figuring out what this cunning new method is, but the article seems to have no information. Is it just that it's using my browser's ETags cache?

Also, what's with referring to ETags as a "theoretical technique never before seen in the wild"? It's pretty friggin standard.


The trick is the server generates a unique Etag for each visitor.

Then the visitor's browser sends the Etag back to the server (in an "If-None-Match" header), and thus it acts as a quasi-cookie.


That's the picture I get too, but I don't see how clearing the cache doesn't, you know, clear the cache. It would seem to imply that if the Etag is still around, it's not really cleared - maybe the data is gone, but the knowledge that the data existed isn't. And it persists through privacy-mode.

Which means I/we am/are either misunderstanding something, or the people who designed privacy and cache-clearing tools had a massive blind-spot.


There are many other techniques employed. As soon as one of the techniques works, it re-populates the others. (Cache clearing doesn't affect Flash cookies [LSO's]).


Cache clearing doesn't affect Flash cookies [LSO's]

This is coming soon to a Chrome near you: http://blog.chromium.org/2011/04/providing-transparency-and-...

Presumably other browsers will follow.



Analytics is here to stay. Unless this practice is regulated (which in turn can end up being heavy handed and far reaching and in turn could discourage innovation) analytics will remain a big piece of what IT will focus on, mainly in getting a 360 degree view of their customers.

Instead of regulating everytime we see a practice that we may not agree on, how about we treat it like when the "iPhone location" fiasco broke. Do not criminalize the possession of customer data or even tracking, criminalize distribution or malicious use of it. If Company A wants to know where I came from, so that they can share their ad dollars effectively, I am ok with it. But do ensure that they dont share it with other companies in that network (whether Kissmetrics or someone else) for any reason. My online identity remains my own, it does not need to be dissected for further analysis by doubleclick, kissmetrics et al.


I disagree completely.

I bought this computer. I pay for my internet connection. And someone like KISSMetrics wants to spy on me using MY stuff?

To profit from MY computer tracking me against my express commands? Incognito mode, cookies turned off and they're tricking my computer into tracking me?

These are people who have lost all perspective of what's right and wrong.

Analytics is a solved problem, there's no innovation here, there's cookies and a way of opting out of it. If regulation is what's needed to stop scum like Kissmetrics from violating my privacy, then regulation's what's needed.


I bought this computer. I pay for my internet connection. And someone like KISSMetrics wants to spy on me using MY stuff?

You may pay for your computer and internet connection but not for the (vast majority of) sites you visit. This popular sense of entitlement is problematic when "your" stuff live in 3rd party servers running 3rd party software that you're not paying for.


This is disingenous to the extreme.

Where on these sites does it warn you that all your browsing will be recorded without your permission? So they can sell your personal data?

I'm all for having advertising on google mail but this is totally different and any attempt to defend this position is treading on extremely thin ice.

This has nothing to do with entitlement and everything to do with immoral business practices. This is worse than being one of those 'we'll wipe off your debt' companies. It's a modern day scam that legislators have not caught up with, pure and simple.

Kissmterics are utter scum.


There is nothing more evil in modern business than marketers. Between real life experience and MBA classes I have come to despise most everything involved in modern marketing, especially in the technology space.


Give me a break.

This isn't evil marketers hiding in their underground lair. This is web developers, designers, and product managers gaining insight about their users. They don't package this information and sell it wholesale to advertisers. They use it to make the product better.

You are taking this -way- too seriously. The ability to have perfect information on how users interact with your product is one the earth-shattering advantages we have as makers in the digital world. It means we can make something people want - better and faster than ever before.

I'm not saying that all use of tracking is good. All power can be used for good and for evil. But that doesn't mean that power is implictly evil. Save your rage for when you discover someone actually being evil.


Bullshit. If it was used to improve a product, they wouldn't use a tracking method that tracks me between sites. They can track what I do on their site, but if I'm going from goat-sucking-maggot.com to hulu.com, then Hulu has no business knowing that and knowing that doesn't improve their damn product.

This is entirely just an attempt to get competitive analysis about their competitors at the expense of user privacy.


Accd to the article, Hulu didn't know that the ID was the same between sites, and (imho) probably didn't care. The fact that it's the same was revealed by Wired, and users of KissMetrics appear to have not known. It sounds like KissMetrics didn't do much more than could be gained from a referrer in their linking between sites, though clearly more could be done if they wanted to be bad. (Yes, I understand the bad in placing an "unkillable" cookie even on folks who didn't want to be tracked; I'm referring to the specific concern you raised).

There is no evidence in the article that Kissmetrics stitched these together in any way other than what is available in standard referrers. If you hand typed the info in, referrer is blank, and I don't think Kissmetrics imputed the referrer from their data. I don't use the tool, however; can actual customers let HN know if it actually does what Zed thinks it does? Because that would clearly be stepping some bounds if it did this even on "do not track" folks.

And is the fact that it CAN do this is different from the fact that it CAN but isn't (well, if it actually isn't, see previous paragraph)? If they are, then let's yell. But if it's just possibility, then it's like yelling about Google seeing all my searches. The answer: Yes, they do. I can choose not to use Google, or I can benefit from their tech at the cost of sharing some info.

Instead of believing that no one has any right to collect any data on my usage in a world where we leave digital tracks all over the place, lets instead work to minimize risk and maximize value for users. There is always data leakage, and that data can actually help folks if treated with respect and ethics.

And yes, actually, Hulu could use that data to improve their product. But if you don't want to tracked, it's none of their business and they'll have to find another way.


I'm not defending KM, and I'm not disagreeing that "Hulu has no business," but ...

If Hulu sees that noticeable numbers of visitors to sites like goat-sucking-maggot.com, or that site and a combination of other sites, tend to watch movies of a certain genre or other attribute, they can offer more movies like that, and suggest them for visitors to g-s-m.com. Which does improve their product, especially for visitors to g-s-m.com.


It's not competitor analysis, necessarily. I think rather they are trying to find out where users came to their site from so they can more accurately attribute the site that directed them, and thus throw more money at sites that do a good job of referring you to them, as opposed to just using "last click attribution". It's about them finding out how to best use their advertising budget.


Since when have users been entitled to privacy? It's just more obvious now when user data is collected, and things move much faster so it's on a more tangible scale. Small town gossip was used in much the same way to gain competitive edges at the expense of customer privacy, it just wasn't as easy to see/prove how the data moved from one spot to another back in the days of pre-electronic commerce.


Users can demand privacy. It's not a given who has rights to what, privacy or data.


> They don't package this information and sell it wholesale to advertisers.

Yet. Genie, meet bottle.

Thinking this information will never be sold is short sighted and naive.

When it becomes advantageous for them to do so, they'll sell it in a heartbeat and then sell it again. Somewhere in there the FBI and the NSA will start making "requests" for them to share with the government.

What begins as clever social commentary in movies like Minority Report tends to wind up as a sad fact of life years later.


Let's be clear: this ability isn't used to have perfect information on how users interact with your product.

It's used to allow publishers to make more money from their consumers through advertising.


I think the reason you're being downvoted is because you didn't provide any proof.


Sites like Hulu and Spotify predominately use web analytics to gauge their audience in order to sell advertising, not to make their sites better. That may be a by-product, but that's not driving their analytics usage.

Additionally, cross-site tracking isn't used for feature/usage tracking. It's used for highly targeted advertising.

My point was the "shattering advantages" are really advantages for advertisers, not web designers.

This technique wasn't developed with web designers in mind, so let's be clear about that.


... And my point was, you didn't cite any evidence to back up your claims. Twice.

EDIT: But apparently no one cares, and will listen to whoever talks like an authority.


Spotify has perhaps the least targeted ads i've ever seen/heard on any media source. We listen to indie rock and classics all day only to hear the latest Jason Derulo clip several times a day as an advertisement. That's clearly not targeted in any way. There is 0% chance we will do anything but hate the song as a result of this ad. None of the ads are in any way relevant or appear to be targeted by anything more accurate than "18-35 demographic, serve them ads for shitpop" without discretion:


Spotify dont sell enough ads to make targetting feasible. They play you all their inventory. Until you buy a subscription because it is too annoying...


Marketers do nothing to "make the product better".

They pump exorbitant amounts of money into advertising, thereby increasing the final price paid by customers without increasing product quality. Furthermore, they use these advertisements to inflate their customers' perceived value of the products, once again, without actually increasing product quality.

Marketers maximize profits, not quality.


If a product is crappy, no one will see the advertisements. If a product is great, people will use it, and see.

Google is nothing but an advertising company, yet 99+% of us only ever see them as a search, email, blogging, and cloud docs company. A huge number of us use their products, because they're good products. As their products improve, their cash accumulates.

I've never been in a Google meeting, but I suppose Google's marketers have some input into product development; at Google, that input is probably based on data.


Value does not exist except as what the customer perceives value to be.


> Save your rage for when you discover someone actually being evil.

So it's okay for them to be evil as long as we don't discover that they are? More seriously, once they have the info, how can we be sure we nail them? The best way is still to ensure that they don't get the information in the first place.


Sure there is nothing wrong with that, but these are sneaky underhanded tactics. Users should be able to opt-out if they don't want to be tracked.


Want to opt out? Don't visit sites using tracking methods with which you disagree. This is a fairly trivial way of opting out of such tracking, and would help incentivize those sites to change their tracking policies and technologies. If you really care that much, vote with your actions. It won't change anything, much like voting in united states elections, but you can at least back up your stance while knowing that most people do not care one bit nor wish to think about what happens behind the scenes of each website, tv program, and news story they consume in the course of being entertained. You're free to choose, at least until you decide not to be.


I really miss not being able to down-vote you.


Except for that virtually every single one of the biggest problems we face is a marketing problem, rather than an issue where we don't have the science or technology:

global warming, education reform, prison reform, the national debt, healthcare, literacy, food production, biodiversity, etc.

Marketing is probably the single most important career there is right now, and if there's any hope of humanity making it through the next thousand years then it'll almost certainly be due to improvements in our ability to market things rather than new technology.


You're making the leap from "every big problem we face is a marketing problem" to "and the solution is to track users in minute and excruciating detail".

I can accept that there are ideas which are critical to the survival of the human race and/or modern civilization, which require mass education, and utterly reject your conclusion.


But here's the problem - on a whole, no one is paying marketers to solve these problems. Marketers make their money by shilling for products that are the opposite of the issues you listed above.

For example, a newly minted MBA in Marketing needs to pay back school loans. Who has the money to pay them? Who has the job that will bring the most prestige and future earnings? The local farm selling grass-fed beef and organic products or P&G selling highly processed foods that are destroying the environment?

The marketer is going to take the P&G job and then spend their time and brainpower convincing people to buy even more processed crap.


Marketer is not a well-defined class. People who market may not identify as such- many don't even know they market. Marketing is just a word, a neat little box, it contains exactly and only what you want. While you have a lot of bile for marketers behind closed big corporate doors (and I agree they deserve much of this derision) the fact of the matter is, you always have a choice: stop buying their shit. It may take a lot of work to cut through all these layers of bullshit, sure, but ask yourself what it is about marketing which really makes you mad? Marketing, ads, or that you can't yet see through/around/past it enough to stop falling for exactly the traps marketers set?


HAVE A DOWN-VOTE! --A big, fat, and ugly DOWN-Vote!

At this moment, you're probably mad and probably wondering what you did wrong as well as wondering if I'm just being an asshole. You're also looking at the score of your post and wondering why it is what it is; "Did he down-vote me and someone else up-voted me? Or is he just trying to make a point?"

I was just making a point, so no, I did not actually down-vote you. The thing is, you and many others have fallen, "For exactly the same traps marketers set." The trap, even if entirely fake, is the reason for the your initial anger, confusion and pondering.

This is an odd and harsh way of both supporting your point and refuting it. You are only human, so you'll get angry if you fall for a trap. None the less, you are correct; often the answer is to stop interacting with or buying from those who would try to trap you, but unfortunately, you do not always have the choice until after the fact.


For those examples are you saying that science or research is clear? I think it isn't for any of them.


> Except for that virtually every single one of the biggest problems we face is a marketing problem, rather than an issue where we don't have the science or technology.

Interesting point, and I agree.

> if there's any hope of humanity making it through the next thousand years then it'll almost certainly be due to improvements in our ability to market things rather than new technology.

If by our ability you mean progressives/the left, then I agree. Surely marketing per se is at best a neutral force.


You know, when individuals access a company's computer using technically valid means (e.g. a username and password or by logging in from multiple locations), then its criminal charges, international arrest warrants, and jail time. [1] [2]

But when companies do it to people, oh its just a clever programming trick, and its not a problem because you could install additional software to prevent it from happening [3].

The law is showing up pretty clear that simply because you can access a computer system, does not mean that you may, and indeed that doing so without the user's permission is a crime. Causing a computer to store data on a user and then serve that data back to another computer seems dodgy without permission. Doing it when the user has taken reasonable steps to prevent it from happening? Class action time!

[1] http://www.techdirt.com/articles/20110722/02351315202/how-ci...

[2] http://www.geek.com/articles/geek-pick/aaron-swartz-spent-mo...

[3] http://www.kissmetrics.com/how-it-works


Can it be dodged by emptying browser cache as well blocking iframes which I assume is causing such content to be stored in browser?

Edit: seems so: snip ... the persistent tracking can only be avoided by erasing the browser cache between visits.


That Can't Be Dodged

Very interesting article, but the proclamation you can't avoid it seems a bit too far. When my browser exits it both deletes cookies and clears the cache, which looks like it's enough to break the tracks.


I do that too, but I don't think it's enough.

I use FlashBlock, which I think is enough, because they're apparently using flash cookies to recreate regular HTTP cookies (or something like that).

Flash is a huge POS in so many ways.


There's a whole bunch of places to stash unique identifiers. And you only need to overlook one of them, because they will repopulate all of them the next time you hit a KM-using site.

HTTP cookies, flash cookies, ETags, HTML5 databases ... it just goes on and on.


Flashblock isn't enough. The Firefox implementation lets the flash load, then hides it immediately.


I've worked with the KM folks. Great people, genuinely kind, and they want to make a great product. I think it's disgusting to single out a startup like this, especially right as they are gaining traction with some big-name clients.

There is value in what they are doing, and there's absolutely nothing wrong with it. They are tracking user behavior completely anonymously.


Road to hell is paved with good intentions, etc etc.

I'm not going to make any moral judgments about their team since I don't know any of them. Even if they are on the up and up and mean the best, IMO the technology they are employing is illegitimate at best - it violates user expectations when using the internet, and IMO makes the industry more dangerous for the rest of us by giving legislators and luddites more ammunition. I am absolutely against supercookies.

I'm not sure why it's relevant that they're tracking users anonymously - the user gets to decide at the end of the day who they share their information with. To make this decision for them is presumptuous, to force them to comply despite their implied non-consent is the height of arrogance.


There absolutely is something wrong with what they are doing. They are deliberately going out of their way to stop me from enforcing my personal privacy requirements on my browser running on my computer over my network.

If I installed an application on your computer that sent me the names of all the applications you ran, when you ran them and how long for. That reinstalled itself when you attempted to remove it and that used every technique it could to gather information about you, you would want me arrested as a hacker.

I think what they do is disgusting, and I hope to hell that stories like this kill the traction they have gained with some big-name clients.


I'll leave discussion about valid of KM's activity aside, what is it about the "I know X, genuinely great people and really want to do good stuff" formula (with no substantive arguments) that seems to make to the top of these hn debates? (I'm sure the KM people love puppy dogs, etc).


Indeed, it's annoying. I'm sure most spammers, phishers etc love their mom and are cool guys when you meet them over coffee or drinks. It says nothing about their business ethics.


If a user requests not to be tracked they should not be tracked. Even when the information is harmless, as I am sure it most likely is in this case, it sets a bad example and will make it worse for the industry.


They are tracking user behavior completely anonymously

Just because you, a human, cant look at the millions of data points and go "oh look, there's george tomlinson of 28 esperay avenue doing something we dont like" doesnt mean that it cannot be done or will not be done, or indeed is not being done already.

Some of us dont want to walk around with yellow badges thank you. Do you imagine that fact that the badges are only visible to those with the resources and motive to discover them, and not the average joe, is more, or less of a motivation for privacy?


Of course there's value in what they're doing — to advertisers. As a user, I am not 100% comfortable with any tracking service, supercookie-based or otherwise.

[Edit: Made the wording clearer]


KISSmetrics specializes in funnel analytics, not helping advertisers. I'm surprised by the number of people making that assumption.

I have a SaaS app and I use KISSmetrics to learn what sorts of things engage visitors and customers the most. It's helped me make critical decisions that benefit both me and my customers (by improving multi-step processes).


Yea, they are _now_. How quickly that can change.


If there is nothing wrong with what they are doing, why do they force it on me?

Actually, given what you said in that post, why do you force it on me?

Don't bother deny it -- you wouldn't know if they are getting traction if you weren't working with them.


"This is yet another example of the continued arms-race that consumers are engaged in when trying to protect their privacy online..."

I don't think arms race is a good analogy here. Arms race is a good analogy for virus-makers and antivirus software, since their goals are exact opposites.

The goal of analytics sites like KISSmetrics is to measure and understand the behavior of their customers as a group, not as specific individuals. The goal of people who wish to remain untracked is to avoid having personally identifiable information about them stored without their consent. These goals are not opposites and don't necessarily result in an arms race.


This has been known about for years, and was a concern on various mailing lists years ago. The solution at the time was said to be that browser vendors will build in tools for cache control in the same way they have for cookie controls.

The first sites to exploit this were, as always, porn sites. They used Etags in referral tracking to avoid webmaster fraud. (the webmaster would have to include a script from the affiliate co which would set an Etag).

You know what is more interesting? The Last-Modified header. The HTTP spec says that you are supposed to put a date in there, but it also says not to bother parsing the date if you are a client since date parsing is such a pain in the ass. So clients just copy the date string and store it and then replay it subsequent requests.

you can put whatever the hell you want in a last-modified field and all browsers will just store it and then replay it later in subsequent requests to the same resource. for eg.

initial request:

  GET /_modified_test HTTP/1.1
  Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
  Accept-Encoding: gzip,deflate,sdch
  Accept-Language: en-US,en;q=0.8
  Cache-Control: max-age=0
  Connection: keep-alive
  Host: localhost:8888
  User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.830.0 Safari/535.1
initial server response from my dev server (note Last-Modified header used):

  HTTP/1.0 200 OK
  Server: Dev/1.0
  Date: Sat, 30 Jul 2011 11:48:25 GMT
  content-type: text/html; charset=utf8
  Last-Modified: random_token_i_set
  Cache-Control: no-cache
  Expires: Fri, 01 Jan 1990 00:00:00 GMT
  Content-Length: 1634
subsequent browser request to the same resource:

  GET /_modified_test HTTP/1.1
  Host: localhost:8888
  Connection: keep-alive
  Cache-Control: max-age=0
  User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.830.0 Safari/535.1
  Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  Accept-Encoding: gzip,deflate,sdch
  Accept-Language: en-US,en;q=0.8
  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
  If-Modified-Since: random_token_i_set
with new webapps now being single-page with either hashchange or pushstate support, it means almost all requests are made on the backend to the same resource, so you can track the user across all pages on the entire site and across other sites.

concerning, but a known problem. even with these headers patched there is still a lot of information that can be used to fingerprint clients (ie. having everything switched off is still a fingerprint that makes you unique). I don't think chrome, safari, IE or Firefox will ever implement these advanced features, it will be up to somebody else to release a browser that is more privacy aware or to maintain a plugin that is.

I wrote a plugin that does this, but a lot of information still leaks through (it is in my github but I haven't released/announced it in any way). I am contemplating just forking webkit and doing a whole separate 'privacy aware' browser but haven't found the time. in short, the browser makers know about this, and have known about it for years - there is just no real interest in providing user tools to fully anonymize users.

Edit: if anybody is interested in the plugin it is here: https://github.com/nikcub/Parley

it blocks all third party requests and provides other features. it works, just needs a bit of a clean up and release.


Interesting. I looked at the RFC and it says it has to be in the format "Sun, 06 Nov 1994 08:49:37 GMT" (RFC2616/RFC1123).

But even if browsers makers would check the "If-modified-since" against that format then it would still be doable to give each visitor a slightly different date and track them that way.

Combining the date stamps on 2 or 3 jpg or css files present on every page on the site should give you enough entropy for even the highest traffic sites and make it very hard to detect.


Looks to be using the individual etags associated with each cached object. Pastebin: http://pastebin.com/FhUYuRsb


"I would be having lawyers talk to you if we were doing anything malicious." -- this seems like the type of defense that a good lawyer would tell you never to use.


great comments.

we're planning to follow up with a post that has the technical details of the Etag stuff (sorry about 'light on detail', it was a press piece after all).

you're right in that it's been a known method that has been written before (samy had it in evercookie which we site in the paper and a few others have blogged about it). what seemed new (at least to me) was actually encountering it 'in the wild' on a top50 site like hulu. if this type of thing been written about before, definitely let me know so we can cite it.

fwiw, yes noscript would block the javascript that kissmetrics uses to respawn using html5/etags, however there's still the swf that regenerates using flash cookies. also josh highlights ways the you could do this with javascript disabled using CSS (kissmetrics actually also uses hidden values in CSS as well if you look at the src)

either way, blocking javascript/flash would render hulu, and other 'rich media' services like it, largely useless unfortunately.

RE: foxnews/polldaddy. actually they were naming their database 'evercookie' some time ago although they've seemed to have changed that (now it's just called pd_poll__). you can see the script they use here which they use html5 and swf databases: http://pastebin.com/0ieZ2i22 (prettyfied from http://static.polldaddy.com/p/4424060.js )

it's likely that polldaddy/foxnews are using these techniques so to ensure that a given computer only gets to vote 'once'. however, i think there are probably much better ways to do this.

hope that helps. i'll link a blogpost down here somewhere (which means that i actually have to start blogging finally ;)


Title is misleading. I routinely 'dodge' this - all it takes is disabling caching. If you understand how caching works, it's trivial to conclude that it's possible to use etags for tracking. It's the same with the CSS-based browser history attack - if your browser is storing data, and it's possible for a server to tell you're storing it, it can be used to track you.


The issue is clearly not that they're tracking. The issue is that they're going to extremely devious lengths to prevent you from removing their ability to track you using standard tools.

I've quite a few /etc/hosts entries, blocking third party cookies, clearing cookies & cache on close, no flash cookies, and so on, but I always expect they'll be something they can find still.


The article focuses mostly on legal measures (e.g. lawsuits, regulation), but my guess is that those would only deter the largest companies. What I'm more worried about is why 'incognito' modes in current browsers don't appear to stymie this tracking, and how likely it is that that can be fixed.


I, too, would be rather surprised that incognito/private-browsing would share cached data (and thus ETags as sent on If-None-Match requests) with normal browsing.

Looking at the researchers' paper...

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1898390

...it's not clear that's what they're claiming. One quote is that "Even in private browsing mode, ETags can track the user during a browser session."

That suggests they may be concerned about cross-site tracking within a single private session, and the possible expectation that 'private browsing' prevents tracking from site to site. (I've never had that expectation; only that a private session is not connected to distinct prior private and non-private sessions.)


I just tested incognito, it seems to defeat the etags mechanism of tracking.



I think you're agreeing with catch23; that 'it' is referring to 'incognito' not 'KISSmetrics technique'. Incognito mode does defeat the ETag tracking (at least across distinct sessions).


I'm not so sure, based on the comment the counterclaim was in reply to. Nonetheless, hopefully these tests better suit the depiction of what we can come to expect from these tracking techniques in mixed session types.


From what i gather it's basically just an evercookie. Block kissmetrics with a host file, firewall, Ghostery (Not the chrome version, though), RequestPolicy, etc or defeat evercookie through usual means and you'll be fine.


I generally browse with Javascript, cookies, and plug-ins off (except for a few whitelisted sites). From what I understand of the technology (it loads some javascripts initially), I think that would dodge it.


Sorry, but no. You'll get tracked through ETags at least; if you have a late-model browser you may also be tracked through an HTML5 DB or history object.


What's it like living in 1994?


The counter might be, what's it like living in 1984?


is there any indication on where the data is stored?


Yes, but not easily seen unless you have a tool like FireBug to see.

It works by setting a unique cache tag (etag as in screenshot) for each user of a resource such as HTML, JPG, GIF, etc files. The later requests can then be extrapolated of what the user views per site. It's in effect, a cookie.

I think it's quite brilliant as an alternative to cookies but unfortunately I can't use it as a form of cookies as they are not a HTTP standard.

More: https://secure.wikimedia.org/wikipedia/en/wiki/HTTP_ETag


Etags are an http standard.


Haven't looked at KISSmetrics, but I assume it's some manifestation of evercookie: http://samy.pl/evercookie/


Yep the "never before seen in the wild" ETag approach was implemented September 2010 in an easy-to-use library. Very clever, but it seems like Samy Kamkar deserves the credit here, not the brilliant researchers who found the library's approaches being used somewhere.


I put together a detailed follow-up on the KISSmetrics/Hulu respawning mechanisms outlining exactly how they work (although this is probably pretty basic for most the audience here).

Details here: http://ashkansoltani.org/docs/respawn_redux.html

Feel free to send comments/suggestions.

Also nikcub - very enlightening about the Last-Modified header! It reinforces my point that the solution to all this might not be technical but require policy guidance as to best practices, etc.


> So if a user came to Hulu.com from an ad on Facebook, and then later, using a different browser on the same computer, visited Hulu.com from Google, and then at some point signed up for the premium service, KISSmetrics would be able to tell Hulu all about that user’s path to purchase (without knowing who that person was).

It seems their method relies on using cached javascript files to identify a user. How then are they able to track the same user using a different browser? Is it by IP address?


What happens when you login using both browsers? Now, Hulu.com can attribute both Km UID's to your account. Magic.


How then are they able to track the same user using a different browser?

Flash cookies. Presumably Silverlight has an equivalent.

(And I even heard once that Windows Media Player shares cookies with IE regardless of the browser that it is embedded in.)


They can use browser fingerprinting, for instance. http://panopticlick.eff.org/


Peerblock can be set to block port 80 by all list or leave it open. I want to be able to enable some blocklists for 80 but not others. So I can block ads and stuff like this at the stack instead of the browser, but leave the other lists affecting only other ports, for torrents etc. I don't think it makes peerblock too complex to have some lists that block everything and some everything but 80.


What's the cunning part? I skimmed the article and it seemed to have everything other then the technique.


It's talking about two separate issues which initially confused me, one of which is inappropriate data sharing.

The cunning part technically is their repurposing of "etags". These aren't that widely known about but it's a mechanism by which you can ask a webserver "I've already downloaded this file before, has it changed?". Typically the etag will be a revision number, or a hash of the file. The header to create one looks like this:

ETag: "686897696a7c876b7e"

And then in future requests your browser will include the header:

If-None-Match: "686897696a7c876b7e"

In the request. If the file hasn't changed since you last downloaded it, you get a 304 Not Modified. Given that you can store absolutely arbitrary data in the ETag, it's easy to see how this can be used to track users (and the same applies to the Last-Modified header, which is treated exactly like an ETag by your browser despite containing a date).


Ironically, they would never be caught if only they assigned different blobs to the same user on different properties, like KS_cookie XOR hash(property_name).


Whats next.. tracking users using browser exploits ?


Wow, an effing moral panic here. I thought KissMetrics was a darling startup?

Anyways, assuming they could offer their service tracking only on a customer's site, they should be serving from a subdomain, no?


Doesn't this achieve the exact same purpose as logging a combination of the user's IP address + user-agent + maybe some other stuff? Don't need no complicated, cunning technology to do this...


Exactly. On http://panopticlick.eff.org/ you can see how 'unique' your browser configuration, ip address, language settings, etc are. For most people, this creates a great many bits of information that can be used to track you even without cookies or any client-side storage.


Except this can do it cross browser and when I move from hotspot to hotspot, too.


"Then, if that user eventually signs up during a later visit, KISSmetrics will associate their previously anonymous profile with their email address or user name. Which means that site admins can look at both how a user is currently using their site, and how they used it months or years before they actually created an account"


Looks like it's using a variant of a technique I demonstrated a while back: http://joshduck.com/blog/2010/01/29/abusing-the-cache-tracki...


Sorry but what a bunch of crap... Privacy people are so annoying... If your concerned about this kind of tracking stop using online porn - otherwise As you were




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: