So ~650M daily active users..4PB of data warehouse created each day, that means ~7MB of new data on each active user per day. Given that its data warehouse, I'm going to guess its not images, seems like a lot to me. I guess it shouldn't surprise anyone that every interaction on and off the site, is heavily tracked.
A lot of that data is duplicated to allow for efficient querying or transformation. It often is too slow to process the data as it comes in, so an initial process will write the data in a raw form, and some other process might select a subset of the data to process, and then submit it in an "annotated" form (filling in, say, the AS number of the client IP). Another process will run later in a batched fashion and perhaps annotate the full set of information and summarize it into a bunch of easily-queried tables.
A lot of that data is also not tied to individuals either - for example the access logs for the CDN (which, being on a different domain by design, does not share cookies so is not attached to an account) even reasonably heavily sampled is probably tens of gigabytes a day, and is rolled up into efficient forms for queries in various ways. A lot of it isn't even about requests coming through the web site/API - it may just be internal inter-service request information, or inter-datacenter flow analysis, or per-machine service metrics ("Oh, look, process A on machines B through E went from 2GB resident to 24GB in 30 seconds a few seconds before the problem manifested").
(Not that it makes too much of a difference at this scale, but it is closer to 860M daily actives.)