I love the idea of data portability and data dividends. I love it dearly. It gets at what are by far the most critical principles here - data sovereignty and data value.
So it is with a heavy heart that I similarly always find myself deeply concerned by the security implications of such a world. As soon as you create data import and export functions, we will collectively realize that most people both have absolutely no idea how to handle it and are incredibly resistant to education on the subject. A lot of well-meaning ordinary people are going to get hurt when they get tricked into mishandling their data. There's already a sizable black market for personal data, and increasing transparency and access is going to grow that by both providing more ways to access data and more ways to use it.
I don't know a good way to reconcile these two. The tension hurts my heart.
We should work on making user privacy more private, not more portable.
If it does not exist it does not need to be ported.
A piece of data which has been put somewhere is there forever. It cannot be moved, only copied.
And we should disincentivize data collection by making it worthless or costly, so only the strict minimum to make something work is collected.
edit: and having a central place where companies can request personal data is to be avoided: it should be hard to know where to find someone's data.
Instauring data dividends would risk encouraging people to share personal data because it could benefit them financially, and that we probably want to avoid as well.
> "We should work on making user privacy more private, not more portable." I don't understand this; can you explain?
I focused on getting data out of the FAANG Data companies. There would equally be regulations on getting it in. If you want to outlaw face recognition, you would make it illegal to add any data about facial characteristics. Or political affiliations, or porn history, etc. etc.
Have you mistaken me for someone with a 2,300 page legislative proposal all ready to go?
Anyhow, in the spirit of HN working this out together (should we form a bipartisan Working Group for this?):
Any data which, taken together, improperly identifies an individual may not be stored at all. I can't think of any reason why anyone outside of law enforcement needs this location information.
>> "We should work on making user privacy more private, not more portable."
> I don't understand this; can you explain?
handrous expressed my point better than me an hour ago [1] but I'll try to give an answer: I think we should focus on limiting data production / collection altogether rather than try to address data portability (re-using the term used in the first sentence of my parent comment).
I agree that we could do both (limiting data production and move data elsewhere / regulate its usage). But I'm not quite sure the problem (privacy) is solved by just moving the data out from its direct users, even as a first step. The data is still "out there" and it's a liability. Your "Amazon Data" can be compromised, receive government requests and pieces of data requested by a FAANG company might as well be forever at this FAANG company as far as your guarantees of privacy are concerned. I see pieces of data as "tainting" those who access them [3]: as soon as someone accesses them, you can't rely on them forgetting these pieces of data. These pieces of data are no longer things you can rely on them not having.
I can see that splitting Amazon in two parts "Amazon Data User" and "Amazon Data Provider" and forcing the former to pay the latter may disincentivize "Amazon Data User" to use your data too much, but it incentivizes "Amazon Data Provider" to sell it so I'm not quite sure where it leads. I also can't see "Amazon Data Provider" as working as an autonomous entity, so I'm not sure splitting quite makes sense.
To be honest, I fail to understand this solution, to be convinced that it may work. (I'm not dismissing your idea, I'm curious and want to understand more!)
edit: I'm all for some kind of HIPPA-like regulation as discussed in [2] however. Do you have an idea on how it would compare to the GDPR?
> "I can see that splitting Amazon in two parts "Amazon Data User" and "Amazon Data Provider" and forcing the former to pay the latter may disincentivize "Amazon Data User" to use your data too much, but it incentivizes "Amazon Data Provider" to sell it so I'm not quite sure where it leads. I also can't see "Amazon Data Provider" as working as an autonomous entity, so I'm not sure splitting quite makes sense."
Right now you have credit reporting companies (which are hardly a model of right-thinking behavior, btw), but don't they show that it's at least financially possible to split the data away from the data users (banks, lenders)?
So I don't think the money objection holds up. A bank right now might like to run ML on every credit card holder in the U.S., but that would either be impossible (Equifax just won't give it to them), or ruinously expensive. So Amazon Data User just won't be able to do all the analysis they do now, or at least they'll be more parsimonious about it.
Now, for the "taint" argument: rules like in legal discovery would have to apply. Amazon Data User has to swear that they don't have the data anymore, and we would rely on whistleblowers, subpoenas, and criminal penalties to enforce it. The fact that Jeff Bezos would go to jail ought to be enough incentive for Jeff to make sure it's gone.
At the risk of sounding like a dimwit, because this is the edge of my mental abilities here - is the concept, then, that data _itself_ becomes a regulated asset? Such that no company, should they not desired to participate on your REIT-like scheme, would be able to collect "data" without it being akin to holding a regulated asset?
I think that's it exactly. I suspect HIPAA is a model for this, although not a perfect one. It's like holding health data on large numbers of people -- you need to be registered & bonded, and you have to control access to it. Why is data on your friends & political beliefs any less sensitive than your health?
I hope no one thinks I have a whole manual of how this would operate. That would be the result of a large group saying "This sounds interesting. Let's try to flesh it out."
On the other hand, as I said: breaking up Ma Bell took 14 years. Just saying "break up Big Tech" doesn't answer all the questions, either.
What do you believe to be a "legitimate" use for this data then once it his holding by such a company with register and bond? Most times somebody need to authorize data for HIPPA people just sign and say it is fine, why wouldn't this happen here? People become used to some site saying "this site needs access to your data"? How is there any difference except one more layer of indirection? Or perhaps I am misunderstanding your statements.
So it is with a heavy heart that I similarly always find myself deeply concerned by the security implications of such a world. As soon as you create data import and export functions, we will collectively realize that most people both have absolutely no idea how to handle it and are incredibly resistant to education on the subject. A lot of well-meaning ordinary people are going to get hurt when they get tricked into mishandling their data. There's already a sizable black market for personal data, and increasing transparency and access is going to grow that by both providing more ways to access data and more ways to use it.
I don't know a good way to reconcile these two. The tension hurts my heart.