*A third table, let's call it "FavoriteRestaurant" would have two columns [1]: a...

LogicHoleFlaw · on March 26, 2008

adding join tables like that has traditionally been considered an anti-pattern for relational databases because it increases duplication and denormalizes the data

I thought that join tables express a normalization of the data? You are then not storing restaurant data explicitly as a column in the Contact table, which reduces data duplication and gives you more fine-grained control over your structure.

Join tables (especially reflexive ones) gave me a bit of a headache when I first started working with SQL databases. Once I finally wrapped my head around them I started seeing a lot of uses for and advantages of them. However, I've had little formal training in database techniques; only a little bit of relational algebra. Is there something I'm missing here?

pius · on March 26, 2008

That sounds very plausible and, indeed, I have join tables throughout my apps. This is why I put the proviso in, "correct me if I'm wrong." :)

My understanding was that hardcore relational database guys would say that join operations are necessary when the data's totally denormalized, but having a join table wasn't necessarily a best practice because now you've got an additional table that could potentially get out of sync.

edw519 · on March 26, 2008

"If this is wrong, please do correct me."

It's not right or wrong. It's a trade-off, a design decision.

Just because a tool can do something, doesn't mean it should in every case.

Normalize as far as it makes sense for your app.

pius · on March 26, 2008

The "please do correct me" part was in regards to my characterization of the arguments that RDBMS gurus make against join tables. I wasn't asking you to judge my architectural decisions, thanks. ;)

I think it's besides the point whether or not an RDBMS can handle a given app's data; it almost certainly can. The real question is should people dogmatically choose an RDBMS for every single data persistence problem they need to solve.

While the article's title is obviously hyperbole, I think the dissent against choosing the RDBMS model of storing knowledge is a good one. I see the decision that system architects are faced with here as being an end-to-end argument: should the protections and optimizations provided by relational databases be enforced at such a low level or are dumb databases that delegate those features to other layers better design? There's decent evidence for the latter.

edw519 · on March 26, 2008

What you call "hyperbole", I call flamebait.

"Choosing the right tool for the job" does not equal "the old tool is dead".

pius · on March 26, 2008

"Choosing the right tool for the job" does not equal "the old tool is dead".

Can't argue with that.

demallien · on March 27, 2008

Exactly. It's a bit like Blub in the database space. Sure, you can do anything in a relational database, but there are certain problems that are better handled other ways.

Relational databases have been optimised to do a certain task very well: retrieve information very rapidly from a large dataset, that has well-defined, relatively static, relationships between entities. They were designed for things such as storing government census data, tracking customer details for very large corporations, insurance data etc etc. You may like to think of relational databases as being the C equivalent in the database space: relatively low-level, fast.

But there is a new type of database out there. It is more human in it's scale, with tables that might only have thousands, rather than millions, of entries. But the data has very fluid relationships between entities: Hank's example in the comments here about wanting to track a contact's favourite restaurant is a classic example. We might also want to track the contact's favourite film, their car model, and where they last went for their holidays. A bit later we may also consider it really important to know what brand of dishwashing detergent they use.

This sort of problem is not well handled by relational databases. They aren't optimised for it. They are optimised for speed over large datasets with static relationships. To get fluid relationships, you need to use join tables, which have a large size cost, and which degrade the reliability of the database. But speed is not typically a problem for these types of problems. There are lots and lots of small tables, instead of a few large tables. Manually examining every item in the table to find one that matches your criteria is not necessarily a prohibitively expensive operation.

Thinking about this kind of problem from a Rails perspective, for example, you might decide to modify ActiveRecord such that its has_one, belongs_to style declarators, instead of working on static fields in a table, instead attack one table for the object, containing a bunch of tuples that look like {"relationship_name", table_id, [item_id1, item_id2,...]} Operations on the relationships between objects modifies this table, rather than the actual tables that contain object information. As such, the structure of links between tables becomes just another table. It wouldn't be as fast as a relational database, but for this type of task, it is far better suited, as it would be far more expressive - a new relationship between objects can be created just by inserting a new row into the table, and is hence modifiable programatically. It's the lisp of the database world.