Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are basically two options for multi-tenancy with their own tradeoffs.

1. An account/tenant_id field for each table

2. A schema for each tenant wrapping all of the tables

Option 2 gives you cleaner separation but complicates your deployment process because now you have to run every database change across every schema every time you deploy. This gets more complicated as your code is deploying in case the code itself gets out of sync, there's a rollback or an error mid deploy due to an issue with some specific data.

The benefit of the approach is the option to do different backup policies for different customers, makes moving specific customers to specific instances easier and you avoid the extra index on tenant_id in every table.

Option 1 is significantly easier to shard out horizontally and simplifies the database change process, but you lose space on the extra indexes. Plus in many databases you can partition on the tenant_id.

Most people typically end up with option 1 after dealing with or reading horror stories about the operational complexity of option 2.



The secret bomb in option 1 is that you generally have to have smarter primary keys that fully embrace multitenancy and while Atlassian hires smart folks and I'm sure they at some level know this--that's a relatively hard retrofit to work into a system.


Option 2 has many unforeseen consequences.

Business wants to run a query across customers? In most DBs you need either custom code or to create a stored procedure to iterate across schemas.

Every table that you create is multiplied by the number of customers. This has implications for some database systems (like PG's vacuum).

Your migrations will take _forever_ to run.

Etc.


The second problem is mitigated by the fact that schemas are trivially migratable between database servers. Once you grow too big for one cluster just make another.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: