> However, RTBF wasn’t really proposed with machine learning in mind. In 2014, policymakers wouldn’t have predicted that deep learning will be a giant hodgepodge of data & compute
Eh? Weren't deep learning and big data already things in 2014? Pretty sure everyone understood ML models would have a tough time and they still wanted RTBF.
I'm pretty sure that the policymakers did NOT understand ML models in 2014 - and still do NOT understand it today.
I also don't think that they care. They don't care that ML is a hodgepodge of data & compute, and they don't care how hard it is to remove data from a model.
They didn't care about the ease or difficulty of removing data from more traditional types of knowledge storage either - like search indexes, database backups and whatnot.
RTBF was not proposed with any specific technology in mind. What they had in mind, was to try and give individuals a tool, to keep their private information private. Like, if you have a private, unlisted phone number, and that number somehow ends up on the call-list of some pollster firm, you can force that firm to delete your number so that they can't call you anymore.
The idea is, that if your private phone number (or similar data) ends up being shared or sold without your consent - you can try to undo the damage.
In practice it might still be easier to get a new number, than to have your leaked one erased... but not all private data is exchangeable like that.
GDPR and RTBF were formulated around the fears of data collection by the Stasi and other organizations. They were not formulated around easing the burdens of future entrepreneurs, but about mitigating the damage they might cause. Europeans were concerned about real harms that living people had experienced, not about enabling AGI or targeted advertising or digital personal assistants.
We have posts here at least weekly from people cut off from their services, and their work along with them, because of bad inference, bad data, and inability to update metadata based purely on BigGo routine automation and indifference to individual harm. Imagine the scale that such damage will take when this automation and indifference to individual harm are structured around repositories from which data cannot be deleted, cannot be corrected.
I don't know if people anticipated contemporary parroting behavior over huge datasets. Modern well funded models can recall an obscure persons home address buried deep into the training set. I guess the techniques described might be presented to the European audience in an attempt to maintain access to their data/and or market for sales. I hope they fail.
Agreed. The media and advertising industry was most definitely leveraging cookie-level data for building attribution and targeting models. As soon as the EU established that this data was “personal data”, as it could, theoretically, be tied back to individual citizens, there were questions about the models. Namely “Would they have to be rebuilt after every RTBF request?” Needless to say, no one in the industry really wanted to address the question, as the wrong answer would essentially shut down a very profitable practice.
More likely: the wrong answer would've shut out a profitable market rather than the practice. The EU is not the world. Anthropic seems to not mind blocking the EU for example.
1) At the time, the European data laws implied that it protected its citizens no matter where they are. Nobody wanted to be the first to test that in court.
2) The organizations and agencies performing this type of data modeling were often doing so on behalf of large multinational organizations with absurd advertising spends, so they were dealing with Other People’s Data. The responsibility of scrubbing it clean of EU citizen data was unclear.
What this meant was that an EU tourist who traveled to the US, and got served a targeted ad, could make a RTBF request to the advertiser (think Coca-Cola, Nestle or Unilever)
RTBF was introduced to solve a specific issue, no?
Politicians and their lobbyist friends could no longer remove materials linking them to their misdeeds as the first Google Search link associated with their names. Hence RTBF.
Now, there’s similar issue with AI. Models are progressing towards being factual, useful and reliable.
Of course, it’s not a regulation issue. The technology was introduced to users before it was ready. The very nature of training without opt-in consent or mechanism of being forgotten are all issues that should have been addressed before trying to make a keyboard with a special copilot button.
Eh? Weren't deep learning and big data already things in 2014? Pretty sure everyone understood ML models would have a tough time and they still wanted RTBF.