More

ethangunderson · on March 6, 2023

Hi, software developer at Cars.com here. While I appreciate the ego boost of thinking that we're somehow better than average, I don't believe this to be true. You can get away with sloppy Elixir code just as easily as you can in PHP, Ruby, Python, etc.

> they thought that switching to Phoenix was a bad idea

If you have the time I recommend re-watching Zack's talk. This is not a take away, or implied.

ethangunderson · on Sept 12, 2011

Comparing map/reduce in Mongo and Couch is really apples and oranges. They are designe to do two different things. i.e data processing vs building views.

Mongo is designed from the ground up to deal with large datasets. Take a look at their sharding architecture.

ethangunderson · on June 4, 2011

Full text of their post: https://gist.github.com/fb41d5002822fff18686

ethangunderson · on June 4, 2011

Damn, didn't realize you had to be logged in to see the posting about it.

ethangunderson · on May 11, 2011

It was a thinly veiled bash on Mongo.

ericflo · on May 11, 2011

Not just Mongo though, there are a lot of NoSQL companies out there right now whose marketing claims the impossible.

sophacles · on May 11, 2011

It's kind of sad too. There are lots of use cases where I want fast datastores, and you know what, if the database goes down, who cares?

For example, I do lots of experiment logging to a mongodb. If the power goes out, and the data is lost who cares? The data was no longer valid or useful -- but if I slow down my writes for 'safety' I will be causing problems by introducing delay in ways that could cause conflict.

tptacek · on May 11, 2011

Is there a backstory to Mongo vs. Riak? I've noticed that there's definitely a "Mongo user" and a "Riak user" and they tend not to agree about stuff.

ethangunderson · on May 11, 2011

Basho has always had an issue with the way Mogno was architected and marketed, and they have no issue with letting people know. (several blog posts, killdashnine parties)

I actually like both Mongo and Riak. I think they both solve a different problem set, and can actually complement each other in a polyglot persistence setup. It's a shame that there has to be so much negativity between them, because at the end of the day, this type of whiny blog post doesn't really help anyone.

batasrki · on May 11, 2011

To be honest, Mongo's execs have done pretty much the same thing. As I said in another comment, the Changelog episode on Mongo was very illuminating with regards to the marketing tactics of 10gen.

I do like both, as well.

mattyb · on May 11, 2011

Is this the episode?

http://thechangelog.com/post/3742814720/episode-0-5-1-mongod...

batasrki · on May 11, 2011

Yep, that's the one

ethangunderson · on May 11, 2011

If they're doing the same thing, that's just as shitty. But I've been meaning to listen to that episode of the Changelog for awhile now, so thanks for the reminder!

trustfundbaby · on May 11, 2011

I hate subliminal attacks like these ... if you have a problem with someone, come out and say it, don't let me have to figure out what or who you're getting at.

And with regards to Mongo ... they've since added a feature that allows 'safe' writes to your database (confirms the data is written before returning a response) ... so what's the rant about?

batasrki · on May 11, 2011

The 'safe' feature isn't on by default, yet. Also, the benchmarks 10gen publishes are based on default setup, so basically, Mongo writes to RAM, therefore it's fast.

I love Mongo and am using it in a few apps, but their marketing does blow, I admit.

Also, Eliot Horowitz came out and bashed on Riak's eventual consistency promise by basically misleading devs into thinking that writing to MongoDB will always result in 'full consistency'. Listen to the ChangeLog episode on Mongo to hear that.

kchodorow · on May 11, 2011

10gen doesn't publish any benchmarks. See http://www.mongodb.org/display/DOCS/Benchmarks for the official position.

I transcribed the MongoDB vs. Riak part of the Changelog webcast (available at http://thechangelog.com/post/3742814720/episode-0-5-1-mongod...):

------------------------

Riak and all the dynamo-style databases are really distributed key/value stores and I think, you know, I've never used Riak in production, but I have no reason not to believe it's not a very good, highly scalable distributed key/value store.

The difference between something like Riak and Mongo is that Mongo tries to solve a more generic problem. A couple of key points: one is consistency. Mongo is fully consistent, and all dynamo implementations are eventually consistent and for a lot of developers and a lot of applications, eventual consistency just is not an option. So I think for the default data store for a web site, you need something that's fully consistent.

The other major difference is just data model and query-ability and being able to manipulate data. So for example with Mongo you can index on any fields you want, you can have compound indexes, you can sort, you know, all the same types of queries you do with a relational database work with Mongo. In addition, you can update individual fields, you can increment counters, you can do a lot of the same kinds of update operations you would do with a relational database. It maps much closer to a relational database than to a key/value store. Key/value stores are great if you've got billions of keys and you need to store them, they'll work very well, but if you need to replace a relational database with something that is pretty feature-comparable, they're not designed to do that.

-----------------------

It starts at minute 17.

edited: formatting.

batasrki · on May 11, 2011

>Mongo is fully consistent

Can you please explain this for a case where there are multiple replica sets, the database is sharded and nodes are across data centers? What's sacrificed? Something must be.

ethangunderson · on May 11, 2011

When we talk about consistency, we're talking about taking the database from one consistant state to another.

With replica sets, we're still only dealing with one master. We can get inconsistant reads from the replicas, but we're always writing to a single master, which allows that master to determine the integrity of a write.

With sharding, we're still only dealing with one canonical home for a specific key(defined by the shard key). (besides latency, I'm not sure how datacenters would affect this)

What we're giving up in this case is availability. If an entire replica set goes down, we can't read or write any data for the key ranges contained on those machines. This is where Riak shines.

With Riak, any node can accept writes, and nodes contain copys of several other nodes data. What that means is, as long as we have one node up, we can write to the database. Because of this, there is the possibility of nodes having different views of the data. This is handled in a number of ways(read repairs, vector clocks, etc). Check out the Amazon Dynamo paper for more info, great read.

I'm sure I'm missing some stuff, but I think that covers the gist of it.

EDIT: One thing that I want to make clear, I don't think that one architecture is better than the other. They each have their own pros and cons, and are really suited to solve different problems.

batasrki · on May 11, 2011

None of this is guaranteed by default. By default, writes are flushed every 60 seconds. By default, there's no journaling. How can one claim full consistency if the the former two points are true?

Don't get me wrong, I love mongo. I'm building a web app backed by it. But the marketing talk is grating, which whT this post nails.

mathias_10gen · on May 11, 2011

I think those two issues are orthogonal to consistency. In ACID, consistency and durability are two different letters and CAP doesn't even mention durability. Are you referring to another definition of consistency?

batasrki · on May 11, 2011

How is flushing a write every 60 seconds orthogonal to consistency? If there's a server crash between the write to RAM and the subsequent flush, the data is lost, is it not? How do you guarantee the data is there in that case?

stonemetal · on May 12, 2011

That would mean the data set was not durable, it doesn't speak to consistency at all. DB consistency is about transaction ordering. Transaction 1 always comes before transaction 2, but 2 may exist or not as it pleases. Transaction 1 must be present if 2 is present.

kchodorow · on May 11, 2011

MongoDB is partition tolerant and consistent.

You can never have multi-master with MongoDB, which is required for "always writable." However, it can be readable. Our CEO did a series of posts on distributed consistency, see http://blog.mongodb.org/post/475279604/on-distributed-consis....

benblack · on May 11, 2011

If a slave can continue serving reads whilst partitioned from a master that continues to accept writes then you cannot guarantee consistency. If a slave cannot serve reads when partitioned then you aren't available. If a master cannot accept writes when partitioned then you aren't available. See this excellent post from Coda Hale on why it is meaningless to claim a system is partition tolerant http://codahale.com/you-cant-sacrifice-partition-tolerance/.

One love.

- Lil' B

kchodorow · on May 12, 2011

I interpreted "what is sacrificed?" as asking which letter of CAP MongoDB was giving up. Coda's article actually explains exactly the tradeoffs MongoDB makes for CP:

-------------------

Choosing Consistency Over Availability

If a system chooses to provide Consistency over Availability in the presence of partitions (again, read: failures), it will preserve the guarantees of its atomic reads and writes by refusing to respond to some requests. It may decide to shut down entirely (like the clients of a single-node data store), refuse writes (like Two-Phase Commit), or only respond to reads and writes for pieces of data whose "master" node is inside the partition component (like Membase).

This is perfectly reasonable. There are plenty of things (atomic counters, for one) which are made much easier (or even possible) by strongly consistent systems. They are a perfectly valid type of tool for satisfying a particular set of business requirements.

-------------------

jorgeortiz85 · on May 11, 2011

In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed. (You can optionally mark reads as "slaveOk", but then you admit inconsistency.)

This does sacrifice availability (in the CAP sense), but I haven't heard anyone claim otherwise.

benblack · on May 12, 2011

"In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed."

One would hope that reading and writing a single node database was consistent. This is table stakes for something calling itself a persistent store. Claiming partition tolerance in the above is the same as claiming availability. The former claim has been made. Rest left as exercise for the reader.

Namasté.

- Lil' B

jorgeortiz85 · on May 12, 2011

If a slave is partitioned from its master, it won't be able to serve requests. (Unless the request is a read query marked as "slaveOk", in which case you admit inconsistency.) I highly doubt anyone would claim otherwise.

tuna · on May 12, 2011

Which C is lying ? CEO or CAP ? I let to the heart of pure and to the late night sysadmin to decide.

aguynamedben · on May 12, 2011

Lil' B, stop trying to outsmart us all, MongoDB works, supports JSON, and autoshards.

jdefarge · on May 12, 2011

Thanks God. At least it's not Cassandra. :)

mihasya · on May 11, 2011

The implication is that the people for whom eventual consistency is not an option will never reach a data set size or availability requirement that'll require them to use replication and experience the lag (and eventual consistency) involved.

alexpopescu · on May 11, 2011

That's not completely true. Take a look at Google's Megastore paper: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

James Hamilton has a good summary of the ideas in the paper: http://perspectives.mvdirona.com/2011/01/09/GoogleMegastoreT...

mihasya · on May 11, 2011

think you're viewing my statement out of the necessary context..

batasrki · on May 11, 2011

Among major features touted are auto-sharding and replica sets. I don't know if the implication is that it's only for web apps/websites that won't need those

mathias_10gen · on May 11, 2011

In the sharded case, at any given moment each object will still live on exactly one replica set, which will have at most one master. You can do operations (such as findAndModify http://bit.ly/ilomQo) that require a "current" version of an object because all writes are always sent to the master for that object. You can also choose to accept a weaker form of consistency for some reads by directing them to slaves for performance. This decision can be made per-operation from most languages.

As for trade-offs: Relative to a relational db, there is no way to guarantee a consistent view of multiple objects because they could live on different servers which disagree about when "now" is. Relative to an eventually consistent system, you are unable to do writes if you can't contact the master or a majority of nodes are down.

ethangunderson · on March 22, 2011

Software engineering is a subset of programming, particularly the subset where people give a shit about the code they ship.

ethangunderson · on March 16, 2011

The issue with that is if your DB gets backed and starts queuing writes, the timestamps will be skewed. Obviously this isn't always a show stopper, but a caveat nonetheless.

mathias_10gen · on March 16, 2011

Actually the _id is usually created client-side so the timestamp will reflect when the application created the object rather than when it was processed by the DB. The only major exception is when the DB creates the _id for an upsert.

ethangunderson · on Feb 15, 2011

The first item in the list really bugs me. It's not enough to just tell me not to do something, teach me why it's bad. That way, I can explain to others why this is a mistake instead of just regurgitating the same explanation of 'it's a performance disaster'.

ethangunderson · on Jan 31, 2011

It's worth noting that Twitter has built a lot on top of MySQL to get to the scale they're at. Take FlockDB for example, https://github.com/twitter/flockdb

So I guess that statement should be written as "you can't scale with just those". :)

ethangunderson · on Jan 13, 2011

FWIW, Foursquare's situation was kind of unique. They have a need for all of their data to be in RAM. In most applications, you can define a working set of data. As long as you can keep that working set and indexes in RAM, you'll be fine.