I've been thinking a lot about how I manage my own data lately (notes, photos, c...

silicon2401 · on Jan 27, 2020

what's the optimal setup for long-term, large-scale (personal) data storage?

I want to build one big Backup. Some initial research has pointed me to something like Bacula to manage the data backup process from a machine. With the 3-2-1 rule, I know I also need my Backup itself to have at least 3 copies, in at least 2 different forms (cloud/hard disk), at least one of which is off-site from me.

As an individual, do you or anybody else know the best way to implement such a system? Should I buy one giant hard drive, use many hard drives to create a RAID array, something else?

kortex · on Jan 27, 2020

Oooh. I've been wrestling with this problem for a while now.

Basically I'm working on a tiered system. Files/dirs are categorized by size (<10MB, <25GB, >25GB) , and by sensitivity (public, confidential, secure. And importance is usually proportional to security). I have fortunately found that security is usually inverse to size. Github/lab anything which makes sense. Confidential small stuff (sans keys) is just stored in gmail/drive. Big, boring stuff (music, ebooks) is just kept on external hard drives.

Secure, ultra-important stuff, I don't really have a system for.

The system I'm leaning towards is just encrypt archives and store the key/password securely, and store it like you would any boring data, with a local NAS and a cloud backup service of some sort, or just stored on drives offsite.

silicon2401 · on Jan 28, 2020

Do you feel comfortable using cloud storage for so much of your content? My ideal is to be entirely self-backed-up. I want a personal git server, photo archive, etc. With bandwidth, service costs, vendor issues (dealing with google seems like a nightmare from reading online).

How did you construct your NAS? Is it a single system, or multiple hard drives/storage solutions connected to your network?

kortex · on Jan 31, 2020

It depends. Github is not going down. Gmail is not going down. If they do, it's Bug-out-bag time, and I am working on curating what information subset I need for that.

Ideally though yes I would have my own entire backup system but I frankly don't trust myself enough to do it right, so hence some redundancy in the cloud.

The NAS I am still designing actually :p

philsnow · on Jan 27, 2020

You mention S3 and Athena, but also that you're building for longevity. Are you planning for the future obsolescence of AWS, or going to cross that bridge when you get to it?

napoleond · on Jan 28, 2020

The S3 files are mirrored to a local drive as a collection of plain .md, .jpg, etc. The Athena search index is secondary in importance to the source data and not necessarily permanent (presumably the options for "take this folder full of files and let me search it" will only improve over time).

That being said, one of the reasons I chose S3 vs. other AWS services or other companies is because I expect it to be around for a very long time. (Just because I've preserved the option of migrating away doesn't mean I relish the idea.)