If they've got memcached with their own patches, linux with their own patches, Hadoop with their own patches, etc. and tons of translations I can see 8 gigs of text.
Why would they put that all in the same repository? I'm pretty sure this 8 GB repo is just their website code. A frontend dev working on a Timeline feature shouldn't have to check out the Linux kernel.