Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pointing at a big company and saying "they're doing it wrong" is easy enough to do, but you have to remember that every decision comes with tradeoffs. Take Google's codebase, since it's the one I know the best. A couple of the key decisions:

* Single rooted tree. Separated repositories would make it harder to share code, leading to more dupication.

* Build from head. We build everything from source, statically linked. No need to worry about multiple versions of dependencies, no lag between a bug fix and it being available to any and all binaries that need it, whenever they're next updated.

I don't think that an "internal Github" is going be a magic bullet here. It's more likely it would be a matter of trading one set of hard problems for another, as we all of a sudden need to figure out how to do cross-dependencies sanely, deal with multiple versions of libraries, etc, at scale. You are correct that one monolithic Perforce repo is a bit of a pain point, but that doesn't necessarily mean that the right decision is to shatter our codebase into different pieces - we'd rather make our repo scale better. For reference, we've already got hundreds of millions of lines of code, 20+ changes/minute 6 months ago (so what, 30+ now?), and plans for scaling the next 10x are in motion.

If you're interested, I recommend http://google-engtools.blogspot.com/. It details a number of the problems we've run into, and our solutions for dealing with them at scale.



Single rooted tree. Separated repositories would make it harder to share code, leading to more dupication.

I'm not convinced that the difference between a singly rooted tree and a multiple-rooted tree is going to make that much difference. I mean, think about it... if you 100k's or even millions of files, is anybody going to parse through all of that, looking for a reusable function, even if it is on their workstation?

And sure a compiled language would catch naming collisions on functions or whatever, but nothing stops somebody from creating a method

doQuickSort( ... )

and somebody else creating

quickSortFoo(...)

where they are semantically equivalent (or very nearly so).

It seems to me that the problem of duplicating code, because you don't know that a method already exists to do what you're trying to do, is the same problem regardless of how your tree is laid out; and is ultimately more of a documentation / process / discipline issue. But I'd be curious to hear the counter-argument to that...


is anybody going to parse through all of that?

Yes, in fact. We have some great tools that give us full search over our entire codebase (think Google Code Search), and you can add a dependency on a piece of code without needing to have it on your workstation already. The magic filesystem our build tools use knows where to get it and can do so on demand. Combined with good code location conventions, an overall attitude that promotes reuse over rewrites* and mandatory code reviews where someone can suggest a better approach, we do a pretty good job. Not everything is eliminated, of course, but I'm pretty happy with the state of things.

To your example, we'd use the STL for most of our sorting needs, but if you were to want, say, case-insensitive string sorting, I can tell you where to find it (ASCII, UTF8, or other). If you want a random number, any RNG you could want is available. Most data structures you could name have been written and tested already. Libraries for controlling how your binaries dump core, command line flags are parsed, callbacks are invoked, etc etc are readily available. We really do reuse code as much as possible, and it's wonderful to have ready access to all of this whenever you could ask.

*At a method level, anyways...we're famous for writing ever more file systems ;).


Reply to mindcrime sibling post

I've seen that before myself at other companies, and it's a shame. A healthy codebase is an investment in the future - if you're not taking the time to cultivate it you're sacrificing long term usability for short term gains. The larger the codebase the more difficult the task, of course, but for us that's just an excuse to solve the next hard problem :).

One more good link on the topic: our use of Clang to find and fix bugs in our existing codebase, as we find new classes of 'gotchas'. http://google-engtools.blogspot.com/2011/05/c-at-google-here...


Awesome, glad to hear you guys take reuse so seriously. I'm a little surprised, only because - in my experience - so few organizations put in the effort that you guys do.


Knowing that somebody is taking the effort to get this sort of thing right is really, unspeakably awesome. Though I guess not really, given that I'm posting it. This sort of thing is one of the main reasons I read HN.

Hopefully the methodology will filter out into the wider world one day. . . Anyway, thank you for posting it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: