Hacker Newsnew | past | comments | ask | show | jobs | submit | ethangunderson's commentslogin

Hi, software developer at Cars.com here. While I appreciate the ego boost of thinking that we're somehow better than average, I don't believe this to be true. You can get away with sloppy Elixir code just as easily as you can in PHP, Ruby, Python, etc.

> they thought that switching to Phoenix was a bad idea

If you have the time I recommend re-watching Zack's talk. This is not a take away, or implied.


Comparing map/reduce in Mongo and Couch is really apples and oranges. They are designe to do two different things. i.e data processing vs building views.

Mongo is designed from the ground up to deal with large datasets. Take a look at their sharding architecture.



Damn, didn't realize you had to be logged in to see the posting about it.


It was a thinly veiled bash on Mongo.


Not just Mongo though, there are a lot of NoSQL companies out there right now whose marketing claims the impossible.


It's kind of sad too. There are lots of use cases where I want fast datastores, and you know what, if the database goes down, who cares?

For example, I do lots of experiment logging to a mongodb. If the power goes out, and the data is lost who cares? The data was no longer valid or useful -- but if I slow down my writes for 'safety' I will be causing problems by introducing delay in ways that could cause conflict.


Is there a backstory to Mongo vs. Riak? I've noticed that there's definitely a "Mongo user" and a "Riak user" and they tend not to agree about stuff.


Basho has always had an issue with the way Mogno was architected and marketed, and they have no issue with letting people know. (several blog posts, killdashnine parties)

I actually like both Mongo and Riak. I think they both solve a different problem set, and can actually complement each other in a polyglot persistence setup. It's a shame that there has to be so much negativity between them, because at the end of the day, this type of whiny blog post doesn't really help anyone.


To be honest, Mongo's execs have done pretty much the same thing. As I said in another comment, the Changelog episode on Mongo was very illuminating with regards to the marketing tactics of 10gen.

I do like both, as well.



Yep, that's the one


If they're doing the same thing, that's just as shitty. But I've been meaning to listen to that episode of the Changelog for awhile now, so thanks for the reminder!


I hate subliminal attacks like these ... if you have a problem with someone, come out and say it, don't let me have to figure out what or who you're getting at.

And with regards to Mongo ... they've since added a feature that allows 'safe' writes to your database (confirms the data is written before returning a response) ... so what's the rant about?


The 'safe' feature isn't on by default, yet. Also, the benchmarks 10gen publishes are based on default setup, so basically, Mongo writes to RAM, therefore it's fast.

I love Mongo and am using it in a few apps, but their marketing does blow, I admit.

Also, Eliot Horowitz came out and bashed on Riak's eventual consistency promise by basically misleading devs into thinking that writing to MongoDB will always result in 'full consistency'. Listen to the ChangeLog episode on Mongo to hear that.


10gen doesn't publish any benchmarks. See http://www.mongodb.org/display/DOCS/Benchmarks for the official position.

I transcribed the MongoDB vs. Riak part of the Changelog webcast (available at http://thechangelog.com/post/3742814720/episode-0-5-1-mongod...):

------------------------

Riak and all the dynamo-style databases are really distributed key/value stores and I think, you know, I've never used Riak in production, but I have no reason not to believe it's not a very good, highly scalable distributed key/value store.

The difference between something like Riak and Mongo is that Mongo tries to solve a more generic problem. A couple of key points: one is consistency. Mongo is fully consistent, and all dynamo implementations are eventually consistent and for a lot of developers and a lot of applications, eventual consistency just is not an option. So I think for the default data store for a web site, you need something that's fully consistent.

The other major difference is just data model and query-ability and being able to manipulate data. So for example with Mongo you can index on any fields you want, you can have compound indexes, you can sort, you know, all the same types of queries you do with a relational database work with Mongo. In addition, you can update individual fields, you can increment counters, you can do a lot of the same kinds of update operations you would do with a relational database. It maps much closer to a relational database than to a key/value store. Key/value stores are great if you've got billions of keys and you need to store them, they'll work very well, but if you need to replace a relational database with something that is pretty feature-comparable, they're not designed to do that.

-----------------------

It starts at minute 17.

edited: formatting.


>Mongo is fully consistent

Can you please explain this for a case where there are multiple replica sets, the database is sharded and nodes are across data centers? What's sacrificed? Something must be.


When we talk about consistency, we're talking about taking the database from one consistant state to another.

With replica sets, we're still only dealing with one master. We can get inconsistant reads from the replicas, but we're always writing to a single master, which allows that master to determine the integrity of a write.

With sharding, we're still only dealing with one canonical home for a specific key(defined by the shard key). (besides latency, I'm not sure how datacenters would affect this)

What we're giving up in this case is availability. If an entire replica set goes down, we can't read or write any data for the key ranges contained on those machines. This is where Riak shines.

With Riak, any node can accept writes, and nodes contain copys of several other nodes data. What that means is, as long as we have one node up, we can write to the database. Because of this, there is the possibility of nodes having different views of the data. This is handled in a number of ways(read repairs, vector clocks, etc). Check out the Amazon Dynamo paper for more info, great read.

I'm sure I'm missing some stuff, but I think that covers the gist of it.

EDIT: One thing that I want to make clear, I don't think that one architecture is better than the other. They each have their own pros and cons, and are really suited to solve different problems.


None of this is guaranteed by default. By default, writes are flushed every 60 seconds. By default, there's no journaling. How can one claim full consistency if the the former two points are true?

Don't get me wrong, I love mongo. I'm building a web app backed by it. But the marketing talk is grating, which whT this post nails.


I think those two issues are orthogonal to consistency. In ACID, consistency and durability are two different letters and CAP doesn't even mention durability. Are you referring to another definition of consistency?


How is flushing a write every 60 seconds orthogonal to consistency? If there's a server crash between the write to RAM and the subsequent flush, the data is lost, is it not? How do you guarantee the data is there in that case?


That would mean the data set was not durable, it doesn't speak to consistency at all. DB consistency is about transaction ordering. Transaction 1 always comes before transaction 2, but 2 may exist or not as it pleases. Transaction 1 must be present if 2 is present.


MongoDB is partition tolerant and consistent.

You can never have multi-master with MongoDB, which is required for "always writable." However, it can be readable. Our CEO did a series of posts on distributed consistency, see http://blog.mongodb.org/post/475279604/on-distributed-consis....


If a slave can continue serving reads whilst partitioned from a master that continues to accept writes then you cannot guarantee consistency. If a slave cannot serve reads when partitioned then you aren't available. If a master cannot accept writes when partitioned then you aren't available. See this excellent post from Coda Hale on why it is meaningless to claim a system is partition tolerant http://codahale.com/you-cant-sacrifice-partition-tolerance/.

One love.

- Lil' B


I interpreted "what is sacrificed?" as asking which letter of CAP MongoDB was giving up. Coda's article actually explains exactly the tradeoffs MongoDB makes for CP:

-------------------

Choosing Consistency Over Availability

If a system chooses to provide Consistency over Availability in the presence of partitions (again, read: failures), it will preserve the guarantees of its atomic reads and writes by refusing to respond to some requests. It may decide to shut down entirely (like the clients of a single-node data store), refuse writes (like Two-Phase Commit), or only respond to reads and writes for pieces of data whose "master" node is inside the partition component (like Membase).

This is perfectly reasonable. There are plenty of things (atomic counters, for one) which are made much easier (or even possible) by strongly consistent systems. They are a perfectly valid type of tool for satisfying a particular set of business requirements.

-------------------


In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed. (You can optionally mark reads as "slaveOk", but then you admit inconsistency.)

This does sacrifice availability (in the CAP sense), but I haven't heard anyone claim otherwise.


"In a replica set configuration, all reads and writes are routed to the master by default. In this scenario, consistency is guaranteed."

One would hope that reading and writing a single node database was consistent. This is table stakes for something calling itself a persistent store. Claiming partition tolerance in the above is the same as claiming availability. The former claim has been made. Rest left as exercise for the reader.

Namasté.

- Lil' B


If a slave is partitioned from its master, it won't be able to serve requests. (Unless the request is a read query marked as "slaveOk", in which case you admit inconsistency.) I highly doubt anyone would claim otherwise.


Which C is lying ? CEO or CAP ? I let to the heart of pure and to the late night sysadmin to decide.


Lil' B, stop trying to outsmart us all, MongoDB works, supports JSON, and autoshards.


Thanks God. At least it's not Cassandra. :)


The implication is that the people for whom eventual consistency is not an option will never reach a data set size or availability requirement that'll require them to use replication and experience the lag (and eventual consistency) involved.


That's not completely true. Take a look at Google's Megastore paper: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

James Hamilton has a good summary of the ideas in the paper: http://perspectives.mvdirona.com/2011/01/09/GoogleMegastoreT...


think you're viewing my statement out of the necessary context..


Among major features touted are auto-sharding and replica sets. I don't know if the implication is that it's only for web apps/websites that won't need those


In the sharded case, at any given moment each object will still live on exactly one replica set, which will have at most one master. You can do operations (such as findAndModify http://bit.ly/ilomQo) that require a "current" version of an object because all writes are always sent to the master for that object. You can also choose to accept a weaker form of consistency for some reads by directing them to slaves for performance. This decision can be made per-operation from most languages.

As for trade-offs: Relative to a relational db, there is no way to guarantee a consistent view of multiple objects because they could live on different servers which disagree about when "now" is. Relative to an eventually consistent system, you are unable to do writes if you can't contact the master or a majority of nodes are down.


Software engineering is a subset of programming, particularly the subset where people give a shit about the code they ship.


The issue with that is if your DB gets backed and starts queuing writes, the timestamps will be skewed. Obviously this isn't always a show stopper, but a caveat nonetheless.


Actually the _id is usually created client-side so the timestamp will reflect when the application created the object rather than when it was processed by the DB. The only major exception is when the DB creates the _id for an upsert.


The first item in the list really bugs me. It's not enough to just tell me not to do something, teach me why it's bad. That way, I can explain to others why this is a mistake instead of just regurgitating the same explanation of 'it's a performance disaster'.


It's worth noting that Twitter has built a lot on top of MySQL to get to the scale they're at. Take FlockDB for example, https://github.com/twitter/flockdb

So I guess that statement should be written as "you can't scale with just those". :)


FWIW, Foursquare's situation was kind of unique. They have a need for all of their data to be in RAM. In most applications, you can define a working set of data. As long as you can keep that working set and indexes in RAM, you'll be fine.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: