MongoDB still has an awful reputation on Hacker News but I really appreciate the take from "Why RethinkDB Failed" [0]:
> People wanted RethinkDB to be fast on workloads they actually tried, rather than “real world” workloads we suggested. For example, they’d write quick scripts to measure how long it takes to insert ten thousand documents without ever reading them back. MongoDB mastered these workloads brilliantly, while we fought the losing battle of educating the market.
> almost everyone was asking “how is RethinkDB different from MongoDB?” We worked hard to explain why correctness, simplicity, and consistency are important, but ultimately these weren’t the metrics of goodness that mattered to most users.
> But over time I learned to appreciate the wisdom of the crowds. MongoDB turned regular developers into heroes when people needed it, not years after the fact. It made data storage fast, and let people ship products quickly. And over time, MongoDB grew up. One by one, they fixed the issues with the architecture, and now it is an excellent product. It may not be as beautiful as we would have wanted, but it does the job, and it does it well.
In my mind Mongo is a database that had great developer experience, excellent marketing, and some seriously bad technical gotchas. But marketing drove momentum long enough to cover bills, grab the market, and address most of the gotchas, so now it's a decent DB.
It also helps that the creator of the default storage engine (WiredTiger) are Keith Bostic of BerkeleyDB fame [0] and Michael Cahill, whose PhD thesis on serializable snapshot isolation [1] formed the basis of Postgres concurrency control [2]. Notably, both of them still work for MongoDB.
I think MongoDB 3.6 is when it became a "decent" DB.
Azure CosmosDB provides protocol level for MongoDB 3.6. This offers developers a neat way of using the power of distributed CosmosDB without sacrificing cloud portability.
Cosmos' API is an emulation of MongoDB which differs in features, compatibility, and implementation from an actual MongoDB deployment. Cosmos' suggestion of API version support (eg 3.6) is referring to the MongoDB wire protocol rather than the full MongoDB server feature set for that version. There are also some inherent differences, such as Cosmos' Request Units (RUs) which need to be considered for capacity planning and costs: https://docs.microsoft.com/en-us/azure/cosmos-db/request-uni....
Those differences may be fine for some use cases, but definitely compromise portability if you want to run or test the same application with a database deployment on GCP, AWS, or your own infrastructure. The lowest common denominator is based on Cosmos DB's underlying limits and features (not a MongoDB server feature set): https://docs.microsoft.com/en-us/azure/cosmos-db/concepts-li....
Anybody who makes complex technology decisions based on marketing copy doesn't deserve to be taken seriously either. Not that I think these people actually exist other than in the minds of many HN commenters.
Oracle on their website right now says "we lead the market in autonomous, cloud, and applications technologies".
I assume you think developers are going to suddenly abandon AWS, Python etc and move entirely to an Oracle stack based off this quote ?
at large firms the clueless technology managment often buy based on the marketing crap, then force it down the throats of the engineers
at my firm we're being all but forced to use Azure even though it costs more and makes absolutely no sense whatsoever for the domain, but since it's the "technology strategy" fighting it becomes exceptionally difficult
In seven years working at MongoDB the one thing I have learned is technology is never "forced down the throat of the engineers". Developers have choices and would not stay in an organisation that made decisions that way. The reason developer relations exists is the acknowledgement that developers ARE the decision makers in most technology selection.
I recently worked for a company where a different popular JSON document database was in fact forced down my throat.
I proved that a regular database would be orders of magnitude faster with a real example on real data and while the project/product was canceled for marketing/sales reasons. Until it was canceled we were told by multiple levels of "technical" managers to use the worse database because of unknown reasons.
Technology selection is a black art in many companies. Sounds like people were open to listening to the opinion of a developer in your case. Often-times the selection heuristic is habit or history. The right companies will pay more attention to reasoned arguments of their own developers.
The wild part about it was that they were trying to use elastic search for relational queries because some manager thought it would be faster than a relational db. (It was about 60x slower)
This video is pretty funny! I’ve used mongodb extensively over the years. It’s pretty solid and enjoyable to use, but yeah the js community tends to have a higher percentage of “fake it until you make it” type people and it seems like a lot of them use mongo too...
Plus, it was designed by people with no database expertise. Lots of people were burned by MongoDB stupidity. I have heard the company hired competent people since. Once burned, twice shy.
Mongo is much maligned here and I hesitate to even comment for fear of attack, but in my mind it has some compelling use cases and after Wired Tiger became the default storage engine, that pretty much solved the issues with compaction I had seen in the past while delivering a lot more performance.
I've built systems with it where we didn't own the schema - we were scraping data from other places and schemaless was a feature. And I benchmarked it against Postgres and a couple other things and I just couldn't get the same performance - note our operations were idempotent to the db, so even in a hard crash, we could just re-run the scraping job and we wouldn't really "lose" any data -- or even if the data was stale by a day, not a big deal... That system would do tens of millions of upserts per night on not crazy AWS hardware and it ran for years like this without problems.
Would I use mongo for situations where I needed transactions? Almost never - I actually like Postgres a lot and it's my default for your run of the mill CRUD apps since you can do geo, crypto, search, etc by just installing a few extensions. Do I got on HN on every mongo article and bash them constantly? No, I think they've built a pretty decent thing if you understand the implications of not confirming writes to replicas and whatnot and tune it to your needs.
Lots of interesting info, and I really like working with MongoDB. But I am baffled by the claim that "MongoDB is the king." In all the circles I work in, I only hear Mongo dismissed as a joke. Unfortunately the company's dismissiveness of RDBMSes, their hubris in pushing NoSQL, and their blunders over what are extremely poor default settings all combine to make MongoDB something I don't see anyone taking seriously. I use it in a project where it's basically just serving as a big cache, so the reliability and durability of the data is not critical. It's certainly way easier to query than any other document-oriented database, but getting a foothold to use for anything requiring long-term storage would be a major challenge.
I interviewed recently at a payment provider that is rewriting its PHP/Mysql monolith in Java & go microservices with MongoDb.
The architect would praise static typing but would prefer MongoDb "because it's easier to add a column". It felt weird but I've never used MongoDb so I could not really argue about it.
Sounds like a bad architect or someone who only knows surface features and doesn't have experience with the actual databases.
Adding a column is not hard in any relational database, and pretty much all modern ones support no-downtime transactional schema updates with backfills, concurrent index builds, etc.
Also the schema always exists somewhere, and it's usually to put it in the database so it's next to (and validates) your data rather than keeping spread out in your application code.
Adding columns can be hard at scale. MongoDB allows you to add columns incrementally without stopping the world. You just add the value and it's materialized in the DBMS. The point is you don't have to manually define schema but instead the database works with the structure of the data.
I'm a dyed-in-the-wool RDBMS user but I can see the value of this feature. The trade-off, of course, is that your application has to handle varying schema levels. MongoDB also won't protect you against typos, inconsistent types, and other foolishness. I would not judge people who choose to make this trade--it's a sensible one for many use cases.
Many analytic databases are headed in this direction due to the amount of data that arrives in the form of nested JSON structures. I can't speak for other DBMS types but it's something we're very interested in for ClickHouse.
Isn't adding a column even easier in column-oriented OLAP databases?
I agree that more complex row formats are needed though. Bigquery has done well with nested/repeating structures and Snowflake uses the PAX data format for JSON which has been very useful (however their JSON/VARIANT column doesn't support structured types).
It's definitely easier than conventional RDBMS. Adding a column in ClickHouse is just a metadata operation. BiqQuery has great nested structure support.
Still, it's hard to beat MongoDB in this respect. In my first app I was amazed that I could just insert a BSON object and MongoDB created a queryable table automatically. You pay for it of course in other ways but the ease of use is quite extraordinary.
you can use JSON schema to lockdown you schemas like SQL but few developers choose do that as they prefer the flexibility that MongoDB offers. Changes in the schema are almost always due to upstream changes so the schema is a follower rather than a leader.
Encoding your business logic in the database schema is rarely a good idea.
> Encoding your business logic in the database schema is rarely a good idea.
If by this you mean not defining tables with explicit columns in general I would disagree. It's one of the most effective ways to get speed in analytic applications because it optimizes compression. And in OLTP applications it's the best way to ensure consistency of data.
But perhaps you were referring to a narrower scope like just JSON & upstream changes?
No typing is great whether its BSON types in MongoDB or column types in RDBMS. Anything past that generally gets in the way of flexibility to respond to new business requirements. Business changes rarely originate at the schema level so its a bad place to encode business semantics.
This just isn't true for a lot of use cases. If your requirement is to get low latency response on large data sets you need to cluster data carefully and compress it to reduce I/O. That's why data warehouses use column storage and strong typing.
Compressed storage size for uniform datatypes can be phenomenally efficient. I've seen 10,000x size reduction in ideal cases like monotonically varying integers stored using double delta codec + ZSTD compression.
The comparison here is DB schema + DB schema migration + shared data access layer vs shared data access layer
With that I am not clear on what you think the problem is without domain knowledge.
I can understand broadly that RDBMS is better when you have joins and foreign key constraints and nosql is cheaper “web” scale plus there are ACID considerations yet without domain knowledge it’s plausible nosql is better
"Web scale" isn't a real thing and I don't know what "shared data access layer" refers to but relational databases have decades of features and tooling that prove to be very useful in most cases. Even if you don't use ACID, transactions, key relations and other features, maintaining the basic type information in the database still has numerous advantages in data consistency, integrity and performance.
Also everything can scale, and it's all using the same fundamental primitives (sharding, etc) to do so anyway. Some just make it easier with built-in functionality vs external layers.
This is not to say that non-relational systems aren't useful, but that they are rarely used correctly instead of chosen for marketing hype, and hearing things like "adding a column is easier" from an architect usually points to the later situation.
Database schemas only validate the data type and whether it's null or not.
It's next to useless for ensuring data integrity and why every app you see will have data validation inside the code itself. Whether it's checking that an email has a valid structure or that a payment is not negative.
Majority of the logic will be the code so it makes no sense to me other than if you have multiple clients accessing that database why it must be enforced at the database level as well.
The examples you give are perfectly possible in any decent SQL server (mssql, postgres). If anything, database servers are made to ensure data consistency, checks like "payment not negative" are no brainers. If you don't define these constraints at the database level you are leaving its power on the table and are re-inventing the wheel by putting it in your application layer somewhere.
That doesn't mean that treating your database as a dumb datastore and having the "smarts" in the application layer doesn't work better for some applications. Both ways have trade-offs.
The shape of a record, property types, foreign keys and more are all critical part of a schema.
Business logic validation is a separate layer but still requires proper types and data integrity underneath, and that's where strong schemas in the database help. More so when you have multiple apps interacting with the same database.
In companies with DBAs the constrains are baked into the DB, in places without DBAs (or weak DBAs dev relation) that is absorbed into the code.
To be fair depending on the culture of the company it maybe easier to put in code and update it there than to wait to have it approved, discussed and scheduled by the DBA
Oh man. Payment provider data in mongo... even if by now MongoDB doesnt lose data... you REALLY want your ledger or OLTP to be in a relational, transactional database.
Coinbase uses MongoDB,
Barclays uses MongoDB,
BBVA uses MongoDB,
Capital One uses MongoDB.
Charles Schwab uses MongoDB.
FICO uses MongoDB.
Goldman Sachs uses MongoDB.
HSBC uses MongoDB.
Intuit uses MongoDB.
Uk Inland Revenue uses MongoDB.
UK Dept of Work and Pensions uses MongoDB.
This is the tip of the iceberg when it comes to MongoDB's use cases in Financial services. These organisations REALLY use MongoDB for financial data and they are all public references. Relational databases are great for financial data and so is MongoDB.
I am biased but you are definitely missing out by just listening to your friends on this one. These days MongoDB is pretty mature and the “MongoDB sucks” meme is getting pretty long in the tooth.
It turns out it takes a decade to build a new database that’s half decent and has all the features people want. It’s really hard! Ask anyone that’s tried. The path is littered with skeletons.
Of course, there are those databases that are “perfect” from the start and never made any mistakes but is anyone talking about them today? Even Postgres gets it wrong sometimes.
And that's just a publicly available example. I have a client who paid for a MongoDB Inc. to send an "expert" down to assess the viability of a project, who flat out said the project can't be done and left it there; a week later the official MongoDB Inc. report says "We can definitely get it done. Why don't you move to our managed MongoDB Atlas service? That'll be $10K." For the record, my professional assessment was also that it'd be impossible to do it on a data model like Mongo's.
----
> Of course, there are those databases that are “perfect” from the start and never made any mistakes but is anyone talking about them today? Even Postgres gets it wrong sometimes.
1. The snark isn't helping your cause.
2. Then there are databases that claim to be "perfect" from the start, having always put up a facade without every admitting any of their own flaws. Like MongoDB.
The thing MongoDB does best — though not something a DBMS can be judged by — is marketing. Not just the marketing they push themselves — "MongoDB is web scale!", "90% of RDBMS use can be replaced by MongoDB!" — but also the marketing it can get its fans to push.
I think the core issue here is not the quality of marketing , but simply the fact that it exists. People need some database, and nobody is spending money on advertising Postgres, so they don't know anything about it.
Also Mongo wins by a large margin on "how little you need to know to get started" with any relational db.
MongoDB is the fastest and easiest to scale schema-on-read document store.
So if your domain model is document orientated e.g. a star schema with dozens of joins, where you don't know the schema upfront or you have polymorphic relationships it is a really useful way to store your data.
I would also add that the replica set concept that is based on Raft [0] allows for built-in high availability, so the individual servers can be maintained while the whole set is running and servicing clients.
That's actually kind of the point – if you're working on a new project whose requirements might change rapidly Mongo can be a really great fit (eg a toy project; a prototype for a new internal service; a hackathon; a pre-traction startup).
Schemaless data (or at least schema-on-read rather than on-write) is the primary feature. Store JSON documents and index on any field.
Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.
These days there's not much benefit over a JSON field in a relational database, unless you're really invested in JSON/Javascript through your entire stack and want that to reach into the database as well.
Most JSON field databases treat the JSON field as text and index it as free text. with MongoDB you can index at any level into the document. Since 4.2 you can use wildcard indexes to index a document and any new fields that are subsequently added automatically.
The whole point of having a JSON field is so that it has more structured than just text, otherwise you can just use a text field (like SQL Server does). Also they all support various JSON querying and indexing functions that including subfield access with optional computed properties.
Sure MongoDB has some extra ergonomics for dealing with JSON/BSON data, but how much benefit this really adds is still up for debate. As horizontal scalability becomes more natively supported, MongoDB will lose even more of its benefits.
> Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.
What applications are much better in your opinion?
I'd recommend sticking with relational databases since they all support JSON columns now. If you need horizontal scalability then there are many choices like CockroachDB, Yugabyte, TiDB, Vitesse, MemSQL, and others.
If still want a document-store then RavenDB is a great choice with proper clustering, full-text search, SQL-like querying, graph queries, etc. ArangoDB is also good choice.
I apologize, because I literally don't know, but have you used any of these solutions at the scale where there might be billions or trillions of rows in a table/collection? I'm currently using Mongo at that scale and would love to evaluate some alternatives.
If it helps for context, we have accepted that ad-hoc queries are not possible, and we have our own solution for searching.
We used MemSQL with 100s of billions of rows for years in production.
If you're working at that scale, it sounds like that's more of an OLAP use-case where MemSQL and other column-oriented databases would suit better than an OLTP document-store. Maybe you can share more details for better recommendations.
I can share why I used it for 2 products (and regret not using it in 1):
- if you are using an ORM library, it avoids unnecessary sync/migration steps by moving the schema definition only in the ORM (as opposed as in the ORM and synced to prostgres/mysql)
- It is the fastest while used in-memory. I run the full suite of integration tests for a medium complexity api in 30 seconds. On one product (w/ postgres) we did run tests on sqlite but it's much slower (10x times).
- It includes a lot of small features common in web sites or apis: media storage, queues, auto removal of old rows, full text search, geo queries. A dedicated solution (solr, s3, redis) would be better, but for small scale projects mongo was just fine (and a single thing to maintain, backup, monitor)
- easy to learn, never hired someone with previous mongo experience, but it was never a problem: having a less expressive query language means that's easier
It is a free and open source NoSQL database that can horizontally scale if needed without any additional plumbing. A lot of projects use it due to its simplicity in developer workflow.
There is this myth that schemas must be enforced at the database level.
But the majority of databases are only accessed by one web app. And in that web app you can enforce that schema in code. In fact in code you have much safer and powerful options e.g. enforcing business rules such as this string field must start with aaa.
>There is this myth that schemas must be enforced at the database level.
You must have single point to enforce anything. This is very rarely the case with the app, where a) there will be 20 places that access database and b) often some tasks are done by operating on a database directly
Some rules cannot be enforced by database, sure, but "a field must exists and be a string" is infinitely better than noting.
The thing about MongoDB is that it does support $jsonSchema for enforcing a schema in a very flexible way. So rather than having to have a strict schema for every piece of data, you can use $jsonSchema for as little or as much of your data as you see fit so really you can have the best of both worlds.
You do have a single point to enforce everything: code.
In most cases it is only a single web app connecting to a database and in micro-services architectures you can enforce it through a shared database access library.
And any company that allows users to make direct changes to a database without going through some security layer is pretty incompetent. Quite sure you wouldn't be able to get PCI/HIPAA certified with that sort of behaviour either.
>You do have a single point to enforce everything: code
"code" usually is made of many smaller parts, what will keep those in sync to enforce anything? You are placing a burden on a developer (even more likely - on a group of developers), that just doesn't work in practice.
> And any company that allows users to make direct changes to a database without going through some security layer is pretty incompetent
Sure. But without schema at database level, there is no "security layer" to rely on. And you will eventually need to make a change that cannot be done via UI.
In reality people don't do this. When people use schemaless databases, usually they don't even know what their schema is and it gets enforced in an accidental, half-assed way.
> People wanted RethinkDB to be fast on workloads they actually tried, rather than “real world” workloads we suggested. For example, they’d write quick scripts to measure how long it takes to insert ten thousand documents without ever reading them back. MongoDB mastered these workloads brilliantly, while we fought the losing battle of educating the market.
> almost everyone was asking “how is RethinkDB different from MongoDB?” We worked hard to explain why correctness, simplicity, and consistency are important, but ultimately these weren’t the metrics of goodness that mattered to most users.
> But over time I learned to appreciate the wisdom of the crowds. MongoDB turned regular developers into heroes when people needed it, not years after the fact. It made data storage fast, and let people ship products quickly. And over time, MongoDB grew up. One by one, they fixed the issues with the architecture, and now it is an excellent product. It may not be as beautiful as we would have wanted, but it does the job, and it does it well.
In my mind Mongo is a database that had great developer experience, excellent marketing, and some seriously bad technical gotchas. But marketing drove momentum long enough to cover bills, grab the market, and address most of the gotchas, so now it's a decent DB.
[0]: https://www.defmacro.org/2017/01/18/why-rethinkdb-failed.htm...