Well, as I said, there is nothing that cant be expressed in an RDBMS - at least at first. But let me give an example of the kind of use cases we see.
First, imagine having a database that allows one to freely create record types. One might have standard data types like contacts, events, emails, checks, expense reports, etc.
These record types are nodes. Now imagine being able to connect these nodes using any type of edge you like. For example a contact might be connected to an event as an "invitee". Thats how the edge would be labeled. Now the relational folks will say that that is a relationship that could be predicted. But at some point, some new type of record is created. And you as a user want to connect that record to existing records. For example you have added a "shoe" record type to keep track of all of your shoes. You then decide you want shoes to be connected to events so that you can map what shoes you wore to what events. You don't want to modify your schema. You don't want to add a new mapping table, you just want to connect the record. And you want to be able to query the graph for all the things of any type that are connected to that record. More importantly, you want the end user to be able to decide that it would be useful to connect shoes to events since no self respecting programmer is ever going to design such a system.
This is the type of flexibility that you need in a web application that will evolve over time. But the minute you want to connect that new record type to the existing object, you either have to modify your schema, or you have created a database that is highly flexible via totally generalized mapping tables, but is not optimized for these kinds of structures. For example just creating a giant mapping table to connect objects will work in an RDBMS but it is not at all optimized and will fall over at scale. Since we are building something that will handle awesome scale, using an RDBMS in this way was a non-starter. Philosophically, we probably have more in common with Google BigTable than with an RDBMS.
you either have to modify your schema, or you have created a database that is highly flexible via totally generalized mapping tables, but is not optimized for these kinds of structures
A generalizable mapping schema with tables for edges may not be optimal, but your comparison seems to be a bit of a bait-and-switch. Why compare the optimality of such a schema to a rigid schema instead of comparing it to the optimality of an alternative "graph-based" data store?
Granted, an extensible schema will be slow to query/etc. What makes you say that you can achieve better efficiency using a non-RDBMS approach? (Not that you can't, but I didn't see your argument to that effect. I'd say that without such an argument, the optimality/speed point is unsignificant.)
You can. Definitely. In fact, I've implemented this a few times (most recently last week); some for specific problems, some for more generic graph support.
In a nutshell:
The problem with RDBMS approaches is that the good ones assume you can pack your complex logic into a monster query or stored procedure and let the query optimizer do its thing. But if you're implementing an attribute-value system or graph traversal on top of an SQL database, you end up generating a ginormous number of queries just to do some basic traversal. You could potentially wrap those into a stored procedure that was doing selects into a temporary table, but that's not really the sort of thing that most query optimizers go to town on.
On the other hand, there are a number of systems out there that either attempt to be full object oriented databases, or object relational mappings, or RDF based stores, but the current off of the shelf ones tend to perform poorly since they're not very mature (and I get the feeling are more focused on just being able to conveniently store stuff, not actually hitting it very hard).
When I first started looking at the sort of problems that Hank's addressing (in a series of talks I did in 2004 titled "Beyond Hierarchical Interfaces") I naïvely thought that you could do everything with an SQL backend, tried and failed. I could blab on about the sort of indexing that you need for these sorts of storage, but I'll duck out for now.
Edit: Just one example of where I've done this, if anyone cares, was replacing the old SQL backend with a dynamic (schema-less) attribute-value system and basic query language, for my current job: http://grunge-nouveau.net/Kore.mp4
Now, I may be pretty naive here, but if you're doing full on graph traversal, why not just extract the full graph from the database and traverse it in memory on your own terms instead of leaving it to large unoptimized traversal queries?
Good points. I've run into this problem a lot, and generally handle it in one of 2 ways:
1. Make it easy for semi-technical project managers who are not coders to extend the schema. This is the 95% solution for us, and the core of our system.
2. use n-to-n lookup tables or lookup fields that use a second field to determine what you reference. We don't do this a lot, but we do it in a few places where there can be a more or less unbounded set of things that can be referenced. These indeed have problems, so we try to avoid them, especially in high-volume situations.
Then again, note that this solution (a) requires using our framework to be effective (b) has RDMS purists seeing red. So maybe you're right.
First, imagine having a database that allows one to freely create record types. One might have standard data types like contacts, events, emails, checks, expense reports, etc.
These record types are nodes. Now imagine being able to connect these nodes using any type of edge you like. For example a contact might be connected to an event as an "invitee". Thats how the edge would be labeled. Now the relational folks will say that that is a relationship that could be predicted. But at some point, some new type of record is created. And you as a user want to connect that record to existing records. For example you have added a "shoe" record type to keep track of all of your shoes. You then decide you want shoes to be connected to events so that you can map what shoes you wore to what events. You don't want to modify your schema. You don't want to add a new mapping table, you just want to connect the record. And you want to be able to query the graph for all the things of any type that are connected to that record. More importantly, you want the end user to be able to decide that it would be useful to connect shoes to events since no self respecting programmer is ever going to design such a system.
This is the type of flexibility that you need in a web application that will evolve over time. But the minute you want to connect that new record type to the existing object, you either have to modify your schema, or you have created a database that is highly flexible via totally generalized mapping tables, but is not optimized for these kinds of structures. For example just creating a giant mapping table to connect objects will work in an RDBMS but it is not at all optimized and will fall over at scale. Since we are building something that will handle awesome scale, using an RDBMS in this way was a non-starter. Philosophically, we probably have more in common with Google BigTable than with an RDBMS.