Is there any technical reason why you would NOT? The JVM is a proven and stable ...

weavejester · on March 26, 2012

As a disclaimer, I wasn't intending to bait anyone. I'm genuinely interested if there are any reasons for writing a 100k+ LOC application, rather than dividing the functionality up between smaller applications and libraries.

Operating systems like the Linux kernel are usually over 100k LOC, but Linus has made some compelling arguments about why you'd want a monolithic architecture in an operating system you want to be reasonably performant.

But that argument doesn't appear to apply to server systems running under the JVM. Building an application out of small, interchangeable, independent components seems rather more sensible than one large monolithic architecture. My current hypothesis is that any server application over, say, 10K LOC is badly designed.

I'm therefore quite interested in anyone who maintains or develops very large server systems, and if they could perhaps offer any reasoning why systems that large are the best solution.

Admittedly this is a little off-topic, though :)

skybrian · on March 26, 2012

Well-designed large apps actually are divided up into smaller components - there'd be no way to maintain them otherwise. However, it doesn't help as much as you'd think. If it's a layered architecture, sometimes adding new feature means you have to modify code in every layer. And dividing code up into libraries doesn't help build times when you need to modify a base library that everything uses.

If you have trouble imagining this, pretend that every dependency you use is actually part of your code base, and sometimes you have to modify them because they're not completely baked. If you have non-trivial functionality, you are certainly using way more than 100k LOC.

weavejester · on March 27, 2012

I'm afraid I still can't imagine a large system that could not conceivably be divided into small, independent components.

You mention dependencies that are "not completely baked", but isn't that just another way of saying "bad design"?

batista · on March 27, 2012

I'm afraid I still can't imagine a large system that could not conceivably be divided into small, independent components.

Well, examine some large systems then, and see whether you can fare better.

Depends on what you describe as independent components. The 200LOC thing is probably also comprised of independent components. As in NGINX that somebody mentions elsewhere. That doesn't mean those independent components don't have to work together and a change in one doesn't affect the other.

Consider a plugin host and a plugin. They are independent, alright, but the plugin takes advantage of certain things the host offers. If you change the host in those areas, you'll have to change the plugins too. Components only provide independency until the place where they meet each other, i.e. the "joins". (Even in a functional language, a pure function only provides independency until the point you call it --there you have to adhere to its interface).

weavejester · on March 27, 2012

Well, examine some large systems then, and see whether you can fare better.

Examining very large codebases takes time, so this is easier said than done! I have worked with several large projects, but all of them would have been better factored out into smaller services.

Components only provide independency until the place where they meet each other, i.e. the "joins".

I don't think that's true. Two components can share the same contract, but still not be dependent on one another.

For example, the Unix applications "wc" and "grep" have no dependency on one another, but they can be piped together because they share the same STDIN/STDOUT interface.

Similarly, the functions "+" and "*" can be used together because they have compatible type signatures, but this does not mean they are not independent.

skybrian · on March 28, 2012

You have the dependency graph wrong. There's a script that uses both wc and grep and it depends on both of them. At the lowest level they are just streams of characters, but that doesn't mean anything. If you change the output format of wc or grep in any significant way then you will break every script that depends on them. Which is impossible, and that's why Unix commmands don't change. (Unless it's a new flag or something like that.)

In a well-designed system where the interfaces between components isn't actually public, you can change the API by finding all the usages and fixing them. And frequently you have to do so because the API is not a frozen standard - it's still being worked on.

weavejester · on March 29, 2012

You can change the API, but why would you?

There are some functions that never change their functionality. The expression "1 + 1" will always equal "2". The functionality of "+" might be expanded, for instance to cover complex numbers, but it will never be incompatible with previous versions.

So if the API of "+" is not expected to change, why do you expect the API of other components to change? Why assume that an API of a service has to be mutable?

I'd argue that an API can be immutable if the component is simple, by which I mean a component that only attempts to do one thing. The function "+" is simple, because it does only one thing: add numbers together. Because it is simple, the API doesn't have to change.

If your API of each individual component is frozen, this means they can be considered to be independent. You might have components that use other components, but if their APIs are frozen, unchangeable, then you might as well consider them as entirely separate applications.

In my opinion, a web service of 100K+ LOC indicates that the interfaces between components are not frozen, that the components are not simple, and to me this just seems like bad design.

skybrian · on March 30, 2012

Well sure, and if we wrote programs that have no bugs then we'd never have to debug them. Your position is seriously naive.

Getting an API right the first time is hard. For example, most languages get date libraries wrong the first time (for example, see Java and Python), even with a lot of design effort up front. And most library designers are not that good, especially when they're only part-time library designers whose main concern is writing an app.

weavejester · on March 30, 2012

Who said anything about bugs?

Getting an API right is hard, but if your API is simple, then by definition there are fewer things that can be changed, and that means there's less need to refactor.

I'd rather create 3 simple APIs and throw away the two that didn't work well, than create 1 complex API that needs to be constantly refactored. Small, immutable, simple components are preferable over large, mutable, complex components.

Saying I am "seriously naive" implies I don't have experience with designing components in this way, but it is precisely because of my experience that I advocate this position in the first place. The APIs I've had to change and refactor have been complex; the APIs I have not had to have been simple. Over the past few years, I have been slowly moving away from complex APIs, and there has been a dramatic decline in the amount of refactoring work I've done.

This is not to say that it's easy to create simple APIs, but good design is never easy, and I don't think it's naive to say that if you want good design, you need good designers.

JoachimSchipper · on March 26, 2012

Data point: nginx-1.0.14 has 124 574 lines in the src/ directory.

ExpiredLink · on March 26, 2012

> I'm genuinely interested if there are any reasons for writing a 100k+ LOC application, rather than dividing the functionality up between smaller applications and libraries.

IIRC, Google has such 'large' settings (for whatever reason). All companies I've seen use the latter approach.