i hate deleting things. prefer flags that hide things instead (like a boolean deleted flag in an rdbms table).
prevents data integrity issues in relational databases, makes debugging easier and prevents disasters.
ideally also include a timestamp, both for bookkeeping and safe tools that only remove things that have been soft deleted for some time and are safe to delete without compromising integrity of anything that is not deleted (this is especially important in relational data models)
Better still: a field that registers at what date a record was supposedly marked as deleted. Because otherwise you still can't bulk recover from an error.
yep. but at least in the rdbms case, and probably in all cases, a flag (and an index on it) tends to be essential for query performance since the state of the flag will appear in most, if not all queries.
that's okay though, queries that reference the timestamp can be slow since they're housekeeping.
The GDPR and various things have made companies more skittish in doing things this way, because they get scared.
Perhaps an effective measure would be to create a key that encrypts a customer's data, and give them a copy of the key, and let them know that after a certain point your copy of the key will be deleted, and if they want a restore past that point they'll need to provide the key.
You may as well just delete it, then. I guarantee a high percentage of users won't save that key and be able to find it later. GH (edit: or similarly nerdy sites) might (might!) be able to get away with that, but as soon as part of your process is "give the user a cryptographic key" you've just guaranteed yourself a support nightmare, with normal users. It's why the only cryptographic person-to-person communication systems that've been broadly successful haven't involved keeping track of anything, and don't have a setup process more complex than "point camera at QR code".
Yeah, you end up in the case where you "officially" cannot recover after X, but then you make sure that "accidentally" you might be able to recover by keeping copies around somewhere ... until someone realizes and you get sued.
that's an interesting question, i've given a little thought to this multi tenant saas stuff...
not sure if the right way forward is some sort of innovation in operating system and software design where people write and run apps that feel like single tenant apps attached to dedicated per tenant datastores where os and framework magic handle per tenant encryption and segmentation (tenant id as an os level concept)
or... if it makes more sense to encrypt at the record level with keys that only the customers hold using (assuming it's up to the task) homomorphic encryption for things like searches and other backend functions.
either way, for now, soft deleting and following up with an automatic daily hard delete of things soft deleted more than x days ago is a totally reasonable approach.
ops scripts should require typing "yes i know what i'm doing" if someone attempts to hard delete things that have not yet been soft deleted.
Yeah, soft delete is the way to go in 99.99% of the cases, with a system setup to eventually hard delete on some schedule (preferably don't hard delete until X number of backups have caught the soft deleted data safely, for example).
Hi, this is Mike from Atlassian Engineering. Strongly agree with this. I'd say that if you can afford it, don't do the hard deletes on a schedule though. You never know when there's a system out there referring to soft deleted data that fails once the data is hard deleted. Hard deletes should feel frightening because they are frightening.
i disagree for one reason. you really don't want the tooling or the process to rot. running it automatically normalizes the scary. otherwise you have bespoke tools in indeterminate states being run by people who are learning how to run them again. that's when i believe things get dangerous.
if it forces additional fail safes or backups to be able to do so safely, then that's probably a good thing to have anyway, no?
> The GDPR and various things have made companies more skittish in doing things this way, because they get scared.
They may be scared. But are they scared enough to reload every single backup they have, purge the desired records, and resave each and every single backup they have? And not also worry they will corrupt/break the backups in the process.
GDPR compliance is a mess of contradictions and unreasonable asks which all seem to amount to "depends on who you ask."
prevents data integrity issues in relational databases, makes debugging easier and prevents disasters.
ideally also include a timestamp, both for bookkeeping and safe tools that only remove things that have been soft deleted for some time and are safe to delete without compromising integrity of anything that is not deleted (this is especially important in relational data models)