Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While I'd be interested in seeing this issue further unfold, just the prospect of a 1.3M-file repo gives me the creeps.

I'm not sure what the exact situation at Facebook is with this repository, but I'm positive that if they had to start with a clean slate, this repo would easily find itself broken up into at least a dozen different repos.

Not to mention the fact that if _git_ has issues dealing with 1.3M files, I wonder what other (D)VCS they're thinking of as an alternative that would be more performant.



A lot of big companies have repos 10 or 100 times that size. With tens of millions of files, sometimes up to 100 gigs or more of data under source control.


True, but don't most places organize one git repo per project, rather than one for the entirety of the company's source code?


Most people like that use Perforce (e.g., Google).

And no, they don't split into multiple repos, they might well have the entire company's source code in a single repository (code sharing is way easier this way).


That's a pretty terrible way to share code. Simple example: I work on a project, write some code. Turns out that code is useful for someone else, so they reach in and include it, which is easy since it's all the same repo and all delivered to the build system. Now I make a change and break some customer I didn't even know exists. Oops.

This is what package systems are for.


Most companies of this size don't use git. ;)


Heh, at Apple it's currently 50/50 svn/git.


If you read further down the thread they say that that's already had the non-interlinked files split out. What they've got left isn't easily broken up.


Best argument I've ever seen for not wanting to work at Facebook... wow that's a lot intertwined spagetti code.

Our source repo at work (a C++ compiler with full commit history going back to the early 90s...) is smaller and more componentized!


That C++ compiler is a single product (okay, you might have built a linker, and an assembler as well - say 3-5 products). In even medium enterprises (say, 500 employees, about 250 developers) you might have upwards of 35 different products, each of which with a 5-6 year active history.

Enterprise source control can be ugly - particularly if you have non-text resources (Art, Firmware Binaries, tools) that need to be checked in and version managed as well.

With all that said - I don't really understand why all the code is in a single repository. Surely a company of Facebook's size would experience some fairly great benefits from compartmentalization and published service interfaces. I guess I agree with the parent - sounds like a lot of intertwined spaghetti code. :-)


There're costs and benefits both ways. AFAIK, Microsoft and Amazon both use the separate repositories model, and Google and Facebook use a single large repository. Most people I know that have worked at both of these styles prefer the Google/Facebook style.

The biggest advantage of a single repository is pretty intangible - it's cultural. When anyone can change anything or can use any code, people feel like the whole company belongs to them, and they're responsible for Google/Facebook's success as a whole. People will spontaneously come together to accomplish some user need, and they can refactor to simplify things across component boundaries, and you don't get the sort of political infighting that tends to plague large organizations where people in subprojects never interact with each other.

I think if it were my company, I'd want the single repository model, but there need to be tools and practices to manage API complexity. I dunno what those tools & practices would look like; there are some very smart people in Google that are grappling with this problem though.


Why is a single repo required for everybody to see all the code? Tools like gerrit and github can handle multiple repos and provide commit access for multiple repos among a large group of people. If it were my company, I would keep separate repos but allow read and merge requests for all employees. That keeps everybody involved in projects across the entire company, but also allows them to notice when individual projects get spaghettified and thereby deserving of some cleanup/breakup into components. A GB-scale codebase does not help smart, new employees grok what the hell they can contribute.


It's not a matter of being able to see all the code, it's a matter of being able to see and modify all the code. It allows you to have a "just fix it" culture when people see something's broken, and it lets you write changes that span multiple projects without worrying about how your change will behave when it can't be committed atomically.


Pretty sure Perforce performs fine with that.


Well, of course that at some specific scale, you're gonna start to have trouble with any DVCS maintaining a complete local copy of such a huge repository.

It's even worse that just disk space and performance issues.

I can totally imagine a huge, busy repository where by the time you've pulled and rebased/merged your stuff, the repo has already been committed to again, invalidating your fast-forward commit and forcing you to pull again and again before you have any chance of pushing back your changes.

This is an inherent problem with DVCS that just can't be solved (trivially) when working on huge repositories that span millions of files and involve thousands of developers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: