For multi-tenant architecture, the following topics are always top priority for me:
- Shared compute with tenant context passed via JWT
- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements
- Enforcing tenant context at the API gateway
- Always leveraging policies and ACLs via JWT to enforce secure data retrieval
- Sometimes using RLS within the database
- Either universal data encryption or per tenant depending on requirements
- JWT is fine and webscale but plain sessions are also fine. Associating logins with tenants is the important bit.
- Shared compute is actually the part that to me means diddly squat and customers seem to prefer dedicated. It costs nothing to spin up more stateless-ish app servers dedicated to a tenant. It’s the db, logs, caching, load balancers, queues, monitoring I don’t want to split up. Also nothing is still wrong with normal sessions stores in Redis.
- Separate schemas are not preferred but fine kinda but at very least don’t create separate db accounts per tenant. The credential/connection management will make your life a living nightmare and doesn’t work with SQL proxies.
- We must seriously have vastly different JWT experiences. Every super businessy app I’ve made hits the ceiling fast of how much junk you can store in the JWT before having to punt to the db for user permissions.
- RLS is dope and you should choose it every time when you can. Not having to do #customers schema migrations is worth it.
>- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements
It's interesting to me that you and @abraae seem to take the exact opposite view on the topic of data isolation where he/she has a much... harsher opinion:
>We took the decision to use separate databases (schemas in the Oracle world) ... I now think of this approach as maybe getting into heroin (no direct experience). Feels great, powerful, you can do anything. But ultimately it will kill you.
Of course this doesn't apply to the case of column association, but I'm interested on your take on this.
If your app is super-sensitive and absolute security is an absolute must then sure, using a separate database for every customer will help you answer those pesky security questions "How do you ensure that resources are not accessed by other tenants?".
My experience though is that if you are after velocity, and ease of maintenance, then you need a single database and tables with a "tenant id" column in them.
Even simple things get hard when you have separate databases. Say I want to know how many customers have how many widgets on average. If everyone's in one database, that's a SQL query. If they're all separate, it's ten times harder to answer.
This isn’t possible if you’re expecting all transactions to be in a single table, at some point you’re going to have to share things making any query at scale more complex.
Can you clarify? You can vertically scale a database a few orders of magnitude. My view is that “some point” is a long way off for most OLTP workloads.
It pretty much always is, but people are very wary of doing anything directly in the database these days, even stuff that's security critical and should apply to every query.
I mean it’s not super common, people usually opt for separate servers/schemas first. I’ve only been at one shop that’s actually done multi-tenant with RLS.
- Shared compute with tenant context passed via JWT
- Data isolation by either physical separation (i.e. separate database) or logical separation (i.e. separate schema, table, or column association) depending on requirements
- Enforcing tenant context at the API gateway
- Always leveraging policies and ACLs via JWT to enforce secure data retrieval
- Sometimes using RLS within the database
- Either universal data encryption or per tenant depending on requirements