For which part? That photos take most of the space? Or that they're virtually un-de-dupable?
I'm fairly sure about the former (having seen some private numbers that I'm not allowed to share) -- although you can think through it yourself. What other kind of data is as easily and commonly produced as photos/videos and takes up so much space?
The latter is definitely true since it's a hard research problem that I've spent some time thinking about myself. There are approaches to lossy-de-duplication of photos that can achieve some significant savings, but the quality loss is too great to be useful at the moment.
What a fascinating problem, especially when it comes to personal photos that aren't really that 'personal.'
A thousand people will all visit Paris today, and all take a picture of the Eiffel Tower, and all upload said picture to their favourite cloud storage platform.
Is there any reason why we need a thousand, ever so slightly different, pictures of the Eiffel Tower from one day in April, stored in the cloud for eternity? Would people even notice if their images were quietly 'de-duplicated'?
Reminds me of an art installation I saw somewhere with the "21st century camera" (can't remember if that was the actual name). It was a black box with a single red button, which when pressed captured the lat-long of the box and the current time, so you could search on picasa, flickr etc for geo-tagged pictures around that time. Fascinating to think about.
"Would people even notice if their images were quietly 'de-duplicated'?"
If my wife's face does not appear next to the Eiffel Tower in the depulicated version I might be somewhat concerned.
Also photos are, at least in their highest form, an emotional response to light. One thousand good photographers shooting the Eiffel Tower at the same day and the same hour will probably generate 5,000 or more unique and interesting shots.
Music: many people seem to use Dropbox to sync iTunes libraries (granted, iTunes Match is probably eating into that) but I'd easily believe that there are many large, duplicate media files from the same stores or torrent sites.
I'm fairly sure about the former (having seen some private numbers that I'm not allowed to share) -- although you can think through it yourself. What other kind of data is as easily and commonly produced as photos/videos and takes up so much space?
The latter is definitely true since it's a hard research problem that I've spent some time thinking about myself. There are approaches to lossy-de-duplication of photos that can achieve some significant savings, but the quality loss is too great to be useful at the moment.