MongoDB is the fastest and easiest to scale schema-on-read document store.
So if your domain model is document orientated e.g. a star schema with dozens of joins, where you don't know the schema upfront or you have polymorphic relationships it is a really useful way to store your data.
I would also add that the replica set concept that is based on Raft [0] allows for built-in high availability, so the individual servers can be maintained while the whole set is running and servicing clients.
That's actually kind of the point – if you're working on a new project whose requirements might change rapidly Mongo can be a really great fit (eg a toy project; a prototype for a new internal service; a hackathon; a pre-traction startup).
Schemaless data (or at least schema-on-read rather than on-write) is the primary feature. Store JSON documents and index on any field.
Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.
These days there's not much benefit over a JSON field in a relational database, unless you're really invested in JSON/Javascript through your entire stack and want that to reach into the database as well.
Most JSON field databases treat the JSON field as text and index it as free text. with MongoDB you can index at any level into the document. Since 4.2 you can use wildcard indexes to index a document and any new fields that are subsequently added automatically.
The whole point of having a JSON field is so that it has more structured than just text, otherwise you can just use a text field (like SQL Server does). Also they all support various JSON querying and indexing functions that including subfield access with optional computed properties.
Sure MongoDB has some extra ergonomics for dealing with JSON/BSON data, but how much benefit this really adds is still up for debate. As horizontal scalability becomes more natively supported, MongoDB will lose even more of its benefits.
> Also it was great at sharding and scaling horizontally when first released, and one of the few options available at that time. It's since been eclipsed by much better systems that don't have such a convoluted and fragile setup.
What applications are much better in your opinion?
I'd recommend sticking with relational databases since they all support JSON columns now. If you need horizontal scalability then there are many choices like CockroachDB, Yugabyte, TiDB, Vitesse, MemSQL, and others.
If still want a document-store then RavenDB is a great choice with proper clustering, full-text search, SQL-like querying, graph queries, etc. ArangoDB is also good choice.
I apologize, because I literally don't know, but have you used any of these solutions at the scale where there might be billions or trillions of rows in a table/collection? I'm currently using Mongo at that scale and would love to evaluate some alternatives.
If it helps for context, we have accepted that ad-hoc queries are not possible, and we have our own solution for searching.
We used MemSQL with 100s of billions of rows for years in production.
If you're working at that scale, it sounds like that's more of an OLAP use-case where MemSQL and other column-oriented databases would suit better than an OLTP document-store. Maybe you can share more details for better recommendations.
I can share why I used it for 2 products (and regret not using it in 1):
- if you are using an ORM library, it avoids unnecessary sync/migration steps by moving the schema definition only in the ORM (as opposed as in the ORM and synced to prostgres/mysql)
- It is the fastest while used in-memory. I run the full suite of integration tests for a medium complexity api in 30 seconds. On one product (w/ postgres) we did run tests on sqlite but it's much slower (10x times).
- It includes a lot of small features common in web sites or apis: media storage, queues, auto removal of old rows, full text search, geo queries. A dedicated solution (solr, s3, redis) would be better, but for small scale projects mongo was just fine (and a single thing to maintain, backup, monitor)
- easy to learn, never hired someone with previous mongo experience, but it was never a problem: having a less expressive query language means that's easier
It is a free and open source NoSQL database that can horizontally scale if needed without any additional plumbing. A lot of projects use it due to its simplicity in developer workflow.
There is this myth that schemas must be enforced at the database level.
But the majority of databases are only accessed by one web app. And in that web app you can enforce that schema in code. In fact in code you have much safer and powerful options e.g. enforcing business rules such as this string field must start with aaa.
>There is this myth that schemas must be enforced at the database level.
You must have single point to enforce anything. This is very rarely the case with the app, where a) there will be 20 places that access database and b) often some tasks are done by operating on a database directly
Some rules cannot be enforced by database, sure, but "a field must exists and be a string" is infinitely better than noting.
The thing about MongoDB is that it does support $jsonSchema for enforcing a schema in a very flexible way. So rather than having to have a strict schema for every piece of data, you can use $jsonSchema for as little or as much of your data as you see fit so really you can have the best of both worlds.
You do have a single point to enforce everything: code.
In most cases it is only a single web app connecting to a database and in micro-services architectures you can enforce it through a shared database access library.
And any company that allows users to make direct changes to a database without going through some security layer is pretty incompetent. Quite sure you wouldn't be able to get PCI/HIPAA certified with that sort of behaviour either.
>You do have a single point to enforce everything: code
"code" usually is made of many smaller parts, what will keep those in sync to enforce anything? You are placing a burden on a developer (even more likely - on a group of developers), that just doesn't work in practice.
> And any company that allows users to make direct changes to a database without going through some security layer is pretty incompetent
Sure. But without schema at database level, there is no "security layer" to rely on. And you will eventually need to make a change that cannot be done via UI.
In reality people don't do this. When people use schemaless databases, usually they don't even know what their schema is and it gets enforced in an accidental, half-assed way.