Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I run a small home lab, about 30 services

One day I decided to change my main disk and used the opportunity to rebuild everything from scratch and from backups. I was up in about an hour.

And then I spent a week fixing this and that, ah yes I changed that too and, crap, I cannot remember why this thingie is set up this way. And some more.

This is a one-man lab, with simple services, all on docker. I also work in IT.

Recovering from scratch a whole infrastructure managed by many people over the years is a titanic task.

I helped to recover my nearby hospital as a volunteer when it was ransomwared. The poor two IT guys over there has no idea how to recover and the official help was pityful.

I also helped with a ransomware attack on a large company. The effort people had to do to remember why something was that way, or just remember whatever was colossal. Sure a lot of things were "documented" and "tested" but reality hit hard.



I had to rebuild a significant percentage of my homelab after my house was raided by the police and they took about $10k-worth of my gear; desktop, laptop, NAS, hard drives.

However, because in a previous life I'd been responsible for backups and involved in disaster recovery planning I was already kind of prepared with:

- a mirrored on site copy of backups (that they either didn't find or chose to leave behind)

- older hardware that had once been performing the duties of the existing seized gear (I'm a bit of a hoarder, I like repurposing or keeping for just such an occasion)

- multiple off site backups

- pretty good documentation of my setup

I was back up and running within a day or two and had lost maybe a couple of days of data. And it's a home lab, so nothing super important anyway, but a (not really) nice resilience test.

It also gave me the experience to work out a few structural changes to further limit the impact of an event that takes out a bunch of processing and storage.

(After 8 months they told me to pick up all my gear, they found nothing, but thanks for traumatising my kids)


Why did they raid you?


Short version:

Possibly the worst thing to be raided for: distribution of CSAM.

Apparently based purely on the 'evidence' of my IP address being on some list - that's the only explanation I ever got.

Funny thing is, they did so little background research they didn't even know to expect kids in the house when they raided at 6:30am.

It still triggers me. This was in August 2022. I wrote pages and pages of my memories and thoughts about it, and it still makes me angry for about ten different reasons.

The long version I haven't written yet and probably never will. I don't want to dwell on it, I want to get on with my life and have an even worse drama to deal with at the moment: https://news.ycombinator.com/item?id=44533637

I know I'm alive, that's for sure. I'm trying to make lemonade by the goddamn bucket load.

P.S. I have written prior HN comments referring to the raid if you care enough to go back that far.


> It still triggers me. This was in August 2022. I wrote pages and pages of my memories and thoughts about it, and it still makes me angry for about ten different reasons.

As someone who was arrested in his PJs at 4am due to a false accusation that the police did not investigate and for which they did not have probable cause, I feel this in my bones.

$15k in legal fees, a day in jail, and three months later, the charges were dropped because, as per the DA, “we cannot in good conscience pursue this case”.

No consequences for the person who made the false accusation, or the officer that enacted an arrest without probable cause.

My heart still skips a beat whenever I think I hear a knock at the door or noises in the middle of the night. I’ll wake up from a dead sleep in a panic. In theory I could pursue a lawsuit against both the accuser and officer, but that feels overwhelming — I’ve just tried to move on.

It completely changed the way I see the police and the criminal justice system. The process is, in of itself, punishment.

I was fortunate enough to be able to afford good legal representation, and I now have a great deal of empathy for those who are railroaded by the system because they cannot.


I'm only postulating here, but I have a strong suspicion that my ability to get past it as well I have (not perfectly, but I don't have nightmares or break into a sweat when I hear police sirens - although it does remind me almost every time) is because I had to give the impression to my kids that there was nothing to worry about; this is not a big deal; this is not traumatic and will not change the course of your life.

> It completely changed the way I see the police and the criminal justice system. The process is, in of itself, punishment.

One million times this. Even if it's not completely true at it's base, it's true for those unfortunate enough to have experienced this.

One of the funny (?) things about it, is that most of the individual officers were nice enough people, personable (other than one inflammatory officer who should not be employed in a role that comes into contact with people), one even gave the cat a pat. It's the system / systems, and likely the perverse incentives for their monthly stats.

I'm glad I didn't end up needing legal representation.

The way I deal with it is to actively let it go, forgive the situation in my mind, treat it as one of those anomalous circumstances that happen every second on a global scale. Treat it as a reminder of the unpredictability of life and an opportunity to be thankful that it's not my 'normal'.


>It completely changed the way I see the police and the criminal justice system.

Well,hopefully it has also changeed the way you see society in general, it's terrifying how common people willingly ( even when they have options) defer to the govt./authorities. The system didn't get be this this way overnight. Covid was one example of the tyranny of the majority.


That's absolutely horrifying! Glad to hear you've managed to move past it, as it would have absolutely broken me.

My home was searched by the police for something much less serious (buying lab equipment, completely legally), and the experience left me having panic attacks every time there was a knock at the door.


It makes me crazy that police in the U.S. nowadays can get a search warrant permitting seizure of large amounts of valuable computer and networking gear along with digital devices certain to massively disrupt anyone's life - only from buying things which are completely legal to buy and possess. Apparently all it takes is "a suspicious pattern of behaviors" to get a judge to issue an expansive warrant. The "suspicious pattern" is often defined ad hoc by police under no objective standard and never detailed in the warrant request. Judges are really failing in their duties because there are too many cases like this happening.

Depriving people of their valuable property for 8 months or more is also abusively punitive. In warrants that grant seizures of all or most digital devices, judges should require police to return the items within 30 days if they don't either file charges or go back to the judge with good cause for an extension. If police can't get around to actually looking at the evidence they claimed was so crucial in 30 days maybe it's not a high-priority crime. And if having a reasonable time limit makes it too hard to look through so much stuff, they're free to more narrowly tailor their seizure requests so they don't have so much to troll through.


And my experience was in Australia, so the "decline of policing" has well and truly reached out shores as well.

To possibly make this even more frustrating, when I was told I could pick up my gear, the detective in charge said that a few things they found flagged as suspicious:

1. I had / used virtual machines

2. I had "Tor" on my computer(s)

3. I had downloaded stuff from Megaupload

Now I'm not entirely sure whether these comments were based on what they found on my seized gear, or whether these were actually sufficient 'red flags' to make them think the warrant was justified initially, but, my god, how completely out of their depth, and therefore totally unqualified, they are to make life-changing adjudications about these things - and that their access to metadata only makes it more likely that they'll make false positive mistakes (which is just terrible for society overall).

I'm literally not sure what they meant by saying "you have tor on your computer", whether there's evidence of my having visited the dark web, or just having a (way outdated) copy of the tor browser saved somewhere.

And I think the only things I'd ever downloaded from Megaupload were Android ROMs.

Regarding Virtual Machines: I can't even... they're obviously non-technical so couldn't possibly understand, and yet... gah, I can't even...


This is extremely concerning. I was reading this thread thinking thank god this could only happen in the US.

My concern is around the sequence of events that needed to take place for this to happen to you. Also as a former network operator I want to know how laws like the data retention act, identify and disrupt, etc play a role in these situations - ie who triggered what. I think I’ll review your comment history.

Sounds like you have handled it in about as healthy manner as one could hope. I saw that as a compliment.


Where you actually charged and prosecuted for anything or did they just steal your gear without due process?


They had a warrant for the raid. Or at least they showed me a piece of paper, but my mind was so thrown that I literally couldn't read it (I've never experienced such a thing before or since - I literally couldn't make out letters on the page, such was my state of shock at the time).

I wasn't arrested or charged, they found nothing of what they were looking for on the multi terabytes of disks they seized. No further action other than the raid.


Did they eventually return your gear?


> (After 8 months they told me to pick up all my gear, they found nothing, but thanks for traumatising my kids)


Don't forget yourself, the breadwinner of the household!



In the USA? Where you can be sued if someone slips on your sidewalk? Can't you sue the gvmt?


Australia.

We looked into anything that could be done to minimise the chances of such a thing happening to innocent parties, but the only option was to make a complaint about an individual officer. There's no (easy, obvious) way to question the system they use to determine "validity" of raids or due diligence prior to requesting a warrant, or evidence required to justify a warrant.

The whole thing just felt to me like it was blindly rubber stamped all the way through because "protect the children". Pity my daughter was a child and absorbs such experiences... My son was also a child, but he's less affected by such things.


It's good for children in the US to learn early that they can't trust the police.


Same goes for almost any other country I've been to. But the US does seem to be one of the worst places from what I see online.


In which country can the children trust the police?


I know I'm going to get a lot of flak for looking like I'm shilling, but the UAE, ironically. They do not mess around with kids, and make sure they're not exposed to whatever issue the parents might be facing. In many cases, the police allow for lenient visitations for the mother and children. These instances are often not portrayed online because 1.) family guys tend to be less involved in open crime whatsoever, 2.) the UAE has a large singles population so whatever instances happen are very rare, and 3.) the surveillance state ensures that the police already know who's at fault and who isn't.

But God forbid if you're ever caught for any crime whatsoever. Or if you're detained for domestic violence. Especially if not in Dubai (which is miles more lenient than other emirates).


In general, the most civilized part of the world, western & north europe.

So, Norway, Dennmark, Sweden, Finland, Germany, Belgium, Netherlands, France, Spain, Austria, Switzerland


You can, and people have, get arrested and sentenced to prison in most of those countries for posts on social media.


Do you have some specific examples? You present it as if it's a bad thing but I can imagine scenarios where it makes perfect sense. As an obvious example the person could be sharing CP on social media.

So what exactly are you referring to?



The Telegraph article does seem a bit ridiculous but the others seem fairly fine to me. One of them wasn't even an arrest, it was just a politician pressing charges which is his right.

I think it's fair to demand that people follow laws on the internet as in person. Germany has laws against supporting nazism and for good reason. I don't give a crap about those people's right to free speech. The laws against insulting politicians seem a bit less reasonable but honestly just don't call people names online. Can't say I'm bothered by these articles.


Most Germans wouldn't be able to recognize Nazism if it was doing the harlem shake on their dining table.

Mention to them that the only way Communism and Nazism differ is Nazism following Engel's nationalist approach to overthrowing capitalism rather than Marx's international one and Nazis having been obsessed with spiritual and Iranian occultism and you're met with blank stares.

Yet those are the fundamentals of Nazism everything else derives from. Including the Holocaust.

Same goes for the understanding of the Nazis "Nationalism". Most Germans just won't believe you when you tell them that it was an absurd use of that term and much more similar to the modern "pro-EU" mindset than the modern understanding of nationalism.

The Germans are propagandized into thinking the definition of Nazis is being called "rightwing" by the media and wanting decentralization of power and cultural homogenity (modern nationalism). Yet the left:

- has already its SA precursors back on the streets (Hammerbande, Antifa, "NGOs")

- is getting rid of free speech to the maximum extent they somehow can pull off within the limits of the law and beyond that through mandatory voluntary industry initiatives (e.g. "Trusted Flaggers")

- are manufacturing external enemies at fault for all wrong (the Russians)

- is spinning up the war machine (massive military spending at 5% of GDP, near limitless commitment to finance the Ukraine war instead of trying to push for peace)

- Are centralizing more and more power in the hands of highly intransparent or even unelected institutions (e.g. EU or international contracts)

And various other stuff. E.g. some of the stuff you can hear on e.g. party congresses of the German "Die Linke".


Are you by any chance a European?


He said not to call people names, dummy.


China?


Japan, England


In England the police arrests you for a tweet under hate speech laws and they threw the post office workers under the bus to protect the politicians and buggy SW of Fujitsu. Not the place where I'd trust the law enforcement at all.

And Japan, while being clean, safe and Kawai, its legal system has like a 90%+ conviction rate, so make of this what you will.


Can they? I've heard of police in Japan pinning murder cases on people they don't like. I believe there has been some reporting on this related to why thy have such high clearance rates. Don't the police in the UK still have a lot of sexual misconduct scandals?


> In the USA? Where you can be sued if someone slips on your sidewalk? Can't you sue the gvmt?

Sure you can sue anybody for anything. Whether your case actually gets heard or not is another consideration. And even if it gets heard, the judge can simply dismiss it for a variety of reasons before proceeding to trial.

Also, state and the federal governments have sovereign immunity and qualified immunity. Basically the government has to allow itself to be sued.

True this doesn't apply to counties or cities, however there is still a much higher bar for tort even for local police. Generally if they are operatikng within the law, like executing a valid search warrant, the standard is much higher than it would be for an average citizen.


You can sue the government, but the grounds for winning are much narrower.

Meerly suffering harm from government action is not sufficient. Having property impounded as part of an investigation, pursuant to a warrant, is likely not actionable, unless there was malice involved. Using slim evidence isn't really actionable.


The government has endless resources; you would go bankrupt unless a law firm saw a huge payout in taking your case. The system is rigged in favor of the government. They could have burned down his house and the neighbor's house, and not been responsible. Land of the free, God Bless America......


Also, there is almost no deterrent effect. The people who authorized or perpetrated the abuse are not punished if you sue and win a settlement. They don't even have to hire and pay the lawyers. The payment comes out everyone's taxes, perhaps with interest if the government has to pay by issuing debt.

When the police abuse their power, it's the community that pays their salaries that feels the pain.


This was in Australia but your point stands.


Why would you be able to sue the government for conducting a search authorized by a judge? It's expected that result of some searches is "Oopsie doopsie nothing found".


It’s even worse than that, in the US police have broad latitude to destroy property, kill pets, seize any cash or assets (theoretically related to the crime, but very easy to abuse) and etc. while executing a search, with little to no recourse.


Not just kill pets, but kill people. Even if they raided the wrong home.


Yeah, it’s true.


I think it's fair to expect that the authorities must have a very good probable cause to perform a search of your home, and that any search that turned out to be unwarranted results in a big compensation and a public announcement stating that the specific police department and judge violated the right to privacy.


I'm guessing search should still happen in a way to limit damage (physical, psychological) to other parties (in this case, the kids present).


You could sue if they made some major mistakes or were fabricating evidence or some other significant malpractice. It's a pretty high bar.


You can but chances are you'll still lose.


If your downspout is draining onto the sidewalk and turning it into an invisible ice rink...


I woulda just left it at "screen name checks out"


Yeah, BLKNSLVR came from "Black and Silver" based on some crappy piece of art I did way long ago. It wasn't until using it as an in-game handle that a friend read it out to me as "Black Enslaver" that I even realised it could read another way.

I don't like that it can be interpreted that way, but I also refuse to stop using it since that's someone else's reading.

The only factor I use to treat one human different from another is whether they're a jerk or not, and I have to know them well enough to work that out. I've come across a few jerks in my time, and they take many and varied forms.


Thank you for the explanation! I totally forgot about this, and I appreciate the clarification. Now I can read your screen name as its meant to be. Btw if I came off like a jerk - not at all my intent. My brain made the association and I couldn't unthink it


This is why documenting is so crucial. Even on a software architecture level.

A few months from now, I'd love to have written down decisions for my current project:

- Why did I decided to use Kysely over Drizzle, Knex, Prisma, TypeORM or other ORM/SQL tool?

- How am I going to do migrations?

- Why am I using one of Deno/Bun over sticking to nodejs?

- Why did I structure the project as a directory per feature over controllers/models/services directories?

- Why did I fork this library and what are the steps to keep this thing updated? Do I plan to upstream my changes? Is there a GitHub issue or PR about it?

- Why am I hosting in one of AWS/GCP/Azure? Why not lambda functions? Why docker?

- Why did I pick this specific distribution of kubernetes over the other also lightweight alternatives?

- Why did I even start this project and what do I aim to accomplish with it?

So I created a # Decisions section in README.md

This way I don't keep doubting my own decisions and wasting time opening 20 documentation tabs to compare solutions yet again.


I use GitHub Issues for this. It works so well - any time I make a decision I drop a comment on the relevant issue (often formatted as "Decision: ..."). Now they are archived, searchable, accessible via API and easy to navigate to from my source code because my commits all reference the issue number that relates to the change.


Til Github deletes your account randomly! Happened to a friend of mine recently, and he didn't get any explanation or recourse.

Of course, you have a relatively high profile, so could probably avoid it/get it reversed.


What do you use for archiving github issues?


I've tried a couple of things. I wrote a tool for exporting them to SQLite: https://github.com/dogsheep/github-to-sqlite

I've also tried a mechanism where I have GitHub Actions write them out as JSON files in the repo itself, then I can git clone them in one go: https://gist.github.com/simonw/0f906759afd17af7ba39a0979027a... and https://github.com/simonw/fetch-github-issues


you could just call the github API


Every project I work on has a technical-decisions.org file. Also a daily-notes.org file with every failed experiment, test, install command, etc. The top level headings are dates.

Technical decisions used to be in the daily-notes.org file, but keeping in a separate file makes it more accessible to LLMs. I actually started that practice before LLMs were in common use, I struggle to remember why.


> I struggle to remember why.

Should that "why" be in technical-decisions.org or daily-notes.org?


It should have been the first entry in some project's technical-decisions.org file!


this is why in 2023 i started livestreaming whenever I work at my PC. I also take these kinds of daily and project notes, but it's a bit tedious and can take you out of the flow. so I just let youtube capture everything I'm doing and if I need to go back and remind myself of something (or ask an LLM a question about my livestream history, in the not too distant future) it's all right there.


We just recently started using ADRs (Architectural Decision Records). They are deliberately stored (in markdown) in the same repository as the source code for our SaaS business lives. If we can recover the source, chances are high that we can also recover the "why's". If we cannot do that, we are screwed anyways.


This. I encouraged my team to use a templated (standardized) ADR for any big decisions that don’t have an obvious answer or complete consensus and it had reduced the second guessing and relitigation of decisions to nearly zero. It also gave is a good snapshot of where we were when we made that call so historic decisions weren’t disparaged.


Could you share the template you're using?


There is an open community proposed standard template for ADRs, but I don't have the link


You also have to document alternative worlflows for your business while you don't get everything back to normal.

Lots of things can keep going with pen and paper or some cloud software.

At the very least, you have to communicate with your clients.


Modern IT practices don’t really contemplate disaster recovery. Even organisations with strict backup procedures seldom test recovery (most never at all).

Everything is quickly strapped together due to teams being understaffed. Preparing infrastructure in a way such that it can easily be recreated is easily twice the effort as “just” setting it up the usual way.


Actually I think this is hard to properly implement. If you're a small shop, really setting up backups with redundancies, writing the documentation, and testing disaster recovery, that's so much more work than people anticipate, and it has implications on all areas of the business, not just IT. So usually it's hard to justify to management why you would put in all that work and slow down operations—which leads to everyone postponing it.

Either that bites you sooner or later, or you're lucky and grow; suddenly, you're a larger organisation, and there are way too many moving parts to start from scratch. So you do a half-hearted attempt of creating a backup strategy held together by duct-tape and hope, that kinda-sorta should work in the worst case, write some LLM-assisted documentation that nobody ever reads, and carry on. You're understaffed and overworked anyway, people are engaging in shadow IT, your actual responsibilities demand attention, so that's the best you can do.

And then you've grown even bigger, you're a reputable company now, and then the consultants and auditors and customers with certification requirements come in. So that's when you actually have to put in the work, and it's going to be a long, gruesome, exhausting, and expensive project. Given, of course, that nobody fucks up in the mean time.


Indeed. Setting up infrastructure properly and documenting it properly is even more complex than coding, to me.

I can go back to code I wrote months or years ago, and assuming I architectured and documented it idiomatically, I takes me only a bit of time to start being able to reason about it effectively.

With infrastructure is it a whole different story. Within weeks of not touching it (which happens if it just works) I start to have trouble retaining a good mental model of it. if I have to dig into it, I'll have to spend a lot of time getting re-acquainted with how it all fits together again.


As much as Cloudformation and Terraform annoy me (thankfully I’ve never been burdened with k8s) there is something magical about having your infrastructure captured in code.


Just the other day one of my clients had a production critical server failing and we started restoring it from backups.

Turns out some of the software running on it had some weird licensing checks tied to the hardware so it refused to start on the new server.

It turns out that the company that made this important piece of software doesn't even exist anymore.


Virtualization really helps. We have a lot of weird software which requires hardware dongles, but they're all USB dongles and they're all virtualized, one of the DC racks has a few U worth of just USB socket -> dongle wired up so that if we spin up a VM it can say "Hey, give me a USB socket with a FooCorp OmniBloat dongle on it" and get one unless they're all used.


would certainly be interested to learn more about this


> Turns out some of the software running on it had some weird licensing checks tied to the hardware so it refused to start on the new server.

This is around the time when you call that one guy on your team that can reverse engineer and patch out the license check.


Interoperability exception might allow this in exigent circumstances when you do have a valid license, but I wouldn’t do this without running it by the software vendor whose license you are using. In a recovery situation, you’ll probably need to be on the phone a lot, so I can see how you might think it’s quicker to bypass the license check, but that is one person giving some or all of their attention just to that. Disaster recovery isn’t a one person job unless that one person was the whole team anyway, so I think this idea needs to be calibrated somewhat to expectations.


This is a nightmare kind of discovery. I had a similar one, but fortunately, it wasn't as impactful.

This is why I like docker, if you keep the sources, you recover no matter what (at least until the "no matter what" holds water)


> This is why I like docker,

my understanding is that docker would not have helped in that scenario


it really depends on the scenario but if the application was dockerized and they had an image, it would be just starting it again, somewhere else.

Possibly with the same network settings if the licensing check was based on that.

But of course it can easily go south, though testing the recovery of a container based off an image and mounted volume is simple and quickly shows you if it works or not.

But of course it may work today but not tomorrow because the software was not ready for Y2K and according to it we are in the XX century or something and the license is 156 years ... young. Cannot allow this nonsense to proceed, call us at <defunct number>

IT is full of joy and happiness


> it really depends on the scenario

yeah and that scenario was clear:

> Turns out some of the software running on it had some weird licensing checks tied to the hardware so it refused to start on the new server.


"hardware" does not mean "bare metal". It could be a MAC, a serial number or similar things that may be linked to a generic or clonable value in virtualization.


but docker isn't virtualization, you understand this, right ?


To some extent, yes -- having developed apps that were dockerized, and having managed virtualization systems (ESXi and similar), as well as docker engines.

I am not sure I see your point, though.


If you’re doing it right, the DR process is basically the deployment process, and gets tested every time you do a deployment. We used chef, docker, stored snapshot images, and every deploy basically spun up a new infrastructure from scratch, and once it had passed the automated tests, the load balancers would switch to the new instance. DBs were created from binary snapshots which would then slave off the live DB to catch up (never more than an hour of diff), which also ensured we had a continuously tested DB backup process. The previous instance would get torn down after 8 hours, which was long enough to allow any straggling processes to finish and to have somewhere to roll back to if needed.

This all got stored in the cloud, but also locally in our office, and also written onto a DVD-R, all automatically, all verified each time.

Our absolute worst case scenario would be less than an hour of downtime, less than an hour of data loss.

Similarly our dev environments were a watered down version of the live environment, and so if they were somehow lost, they could be restored in the same manner - and again, frequently tested, as any merge into the preprod branch would trigger a new dev environment to automatically spin up with that codebase.

It takes up-front engineering effort to get in place, but it ended up saving our bacon twice, and made our entire pipeline much easier and faster to manage.


> Modern IT practices don’t really contemplate disaster recovery. Even organisations with strict backup procedures seldom test recovery (most never at all).

I think this is an outdated view. In modern enterprises DR is often one of the most crucial (and difficult) steps in building the whole infra. You select what is crucial for you, you allocate the budget, you test it, and you plan the date of the next test.

However, I'd say it's very rare to do DR of everything. It's terribly expensive and problematic. You need to choose what's really important to you based on defined budgets.


Budgets - and lowering them - win every time. I do budgeting and forecasting for SaaS companies and this kind of work is always the first cut


Is there a recurring theme for why? There is huge risk exposure.


People round down small risk to zero risk. Meanwhile the cost to run a full DR drill is a certain and immediate cost to their budget.


That's a choice that companies make. I've certainly worked at places which don't test DR, while at my current job we do annual DR runs, where we'll bring up a complete production ready environment from scratch to prove that the backups work, and the runbook for doing a restore actually works.


I'm retired now, but the last place I worked estimated it would take months to do a full restore from off site backups, assuming that the data center and hardware were intact. If the data center was destroyed... Longer.


Say what you want about European financial organizations but they are legally obliged to practice their recovery strategies. So every other month production clusters with all user data get teared down in one cloud region and set up in another one over night. This works surprisingly well. I guess they would never do that without the legal requirements.


I used to find it amusing how many people thought Backup was a requirement.

"No, Restore is" I would say to stunned faces...


In the 1990s mainframes got so stable and redundant that some were not rebooted in over a decade - they could even upgrade the kernel without rebooting. Then one company had a power failure andthe backup generators failed. When the power came back it was months before they figured out everything it was doing and then how to start that service where the guy who started it originally quit years ago.

most companies started rebooting the mainframe every six months to ensure they could restart it.


That's why I delete all my company's data stores every quarter too!


I was very supportive of the infrastructure IT team when they moved their datacenter. I also had popcorn when watching the switch being figuratively flipped on.

It went surprisingly well despite having stayed 15 years in the old DC without rebooting. They were super scared of exactly the case you described but except for some minor issues (and a lot of cussing) it was OK.


The data center where I work self-tests this stuff unintentionally a couple of times a year. The typical case: UPS maintenance, room is put on bypass, load drops when switching back.


This is arguably better than not having any tests.

This is why I reboot my server from time to time after having applied patches or made more significant changes, despite the fact that "it should not change anything". This is a good moment to realize that it did change something and you have the opportunity to fix the issue while it I sfresh in your mind, and possibly with more time.


On the other hand, I’ve worked in places where the total destruction of IT (so as to start again from a clean slate) was within the Overton window of options for how to transform the business.


> I helped to recover my nearby hospital as a volunteer when it was ransomwared.

I'm curious about how you got in the door here. Very cool, but isn't healthcare IT notoriously cagey about access? I've had to do PHI training and background checks before getting into the system at my (admittedly only 2) PHI-centered jobs.

Granted, if it was such an emergency, I could see them rushing you through a lite version of the HR onboarding process. Did you have a connection in the hospital through whom you offered your services?


The nature and place of my work helped to quickly clear this.

I volunteered to help because I knew that even broadly planning the recovery, evidence preservation etc. would be completely beyond the capabilities of the two IT folks (they were extremely nice and helpful, and glad that there was someone to help).

I was there to draw things on the board and ask the questions that will help to recover. I would not have (nor want, not have the need) to access patient information. This is something I warned them about early in the process, as the chaos was growing.

You need to imagine a large hospital completely blocked, with patients during an operation being stabilized and driven away.

I am used to crisis situations and having someone who will anticipate things you do not think about (how to communicate, how to reach prople having planned procedures, who does what and who talks with whom) is a useful person to have before the authorities kick in.

My wife had a planned operation that morning and I was on site when the ransomware hit, it is just this. Nothing James Bond like, just sheer luck to have been around.

The hospital made a recovery but it took about a year IIRC


That's really cool. I was mostly envisioning hands-on admin stuff (because that's the work I'm most familiar with), but I hadn't thought about how much of a boon it would be to have someone with incident management experience arriving to help out. If you ever do a write-up about your experience, I'd love to read it.


In that case, doing just incident response would not have been enough to be frank. They needed guidance on what to do and what not to do, technically speaking, so that on the one hand, they have hope to start things up, but also to preserve evidence.

Even the sequencing (recover and secure the network, then the AD, then some Tier-2 apps etc.) was something they were not ready for. I cannot blame them - the way these things are managed is really messy, with no clear responsibilities beyond the everyday operations.

My hope is that the continuous attacks on the national infrastructure (such as hospitals) will build a more coordinated and homogenous approach. This would be a great lesson learned.


> Recovering from scratch a whole infrastructure managed by many people over the years is a titanic task.

Half of the work is to know what you need, the other half is to know how you do it, while the third half is to cope with all the undocumented tinkering which happened along the way. So in that regard, starting from scratch can be acceptable, as long you are not starting from zero, and can build up on the knowledge and experience of the previous run(s). I mean, there is a whole gaming-genre about this, which is quite popular. And usually you have the benefit that you might be able to fix some fundamental failures which you had to ignore because nobody wanted to take the risk.


wait, what games are in that genre?


I think they are talking about roguelikes/roguelites


I've worked in tech all my life, and long ago learned how important it is to be impeccably diligent in documenting build processes whenever creating, deploying or adjusting new architecture.

Now it's simply become part of my engineering hygiene - as natural and effortless as brushing my teeth.

Actually drilling your DR is also crucial. If you never put it to the test, your documentation isn't worth the paper it's printed on.

In fact the last few years I've been thinking about ways for these systems to rebuild themselves on a continuous basis. Eg. I'd love a smartphone that competely restores itself from backup every night, even to brand new identical hardware, including secure element artifacts (either via private keys I securely control or reregistering everything in an automated fashion), with no user-noticeable impact.


Yeah, I had similar experiences, but now I use nix, which solves these problems.


I don't really know nix, but have used Ansible to try to have all configuration version-controlled and automated. But if there's any possibility of making changes outside of that, you have to be very disciplined. As soon as someone makes a one-off manual change to a crontab or a systemd unit, you're screwed.


NixOS just doesn’t let you do that in the nominal case, most of /etc consists of symlinks to a read only partition that is managed by nix - it is actually more difficult to do one-off scripts or config changes via files than it is to do so via nix, at least nominally - there are of course software that has it’s own special config format or that keeps its config in a database - but those get snapshot(ed?) and backed up anyway.

Imo, nix is more finnicky but more of a complete solution than ansible.


> but now I use nix, which solves these problems

Um, sorry but what do you mean ?


Everything is configured via nix, I can swap out the hardware and redeploy everything from 0 with a single command invocation.


What if something happens to where you keep your configuration?


It is in git, I have backups. The secrets are not backed up, but those I can recycle if need be.


Chamath's new company 80/90 is targeting this pain. Large firms often have no idea what their software is trying to do. Rebuilding it is cheaper and leads to better software.


If you are professionally responsible for infrastructure, and you haven't thought very hard about the "how do we rebuild from scratch" case, you're committing professional malpractice.

The problem is environments like hospitals, who'll "cost center" their IT department to death, where even the most seasoned pro has no chance to ever do the right thing. There should be liability at the board level. There never will.


- i'm scripting at best the full setup of my servers (mostly Nixos and some debian).

- daily backup locale + remote (blackbaze with 60 readonly retention strategy, separated bucket by service)

- monthly offline backup

- a preprod server where my users can restore entiere environment for testing purpose (CI)

in case of full house fire, i can be back online in an working day.

PS: i have only some TBs of data so quite easy to do.


> I helped to recover my nearby hospital as a volunteer when it was ransomwared.

How did they prevent threat actors presenting themselves as volunteers, were you vetted?


A real person showing up is a huge cost and risk. No threat actor will continue an attack on just a hospital like that. The economics make no sense and any money is already extracted. Ransomware shops are very happy to just shotgun the internet from afar.

A far bigger risk is accepting incompetent volunteers if anything.


I didn't say anything about original attacker continuing the attack.


The same answer still applies: That attack vector doesn't have a positive ROI.


The nature of my work helped to quickly sort that out


I would hope that military facilities follow better standards and are capable of recovering more quickly than a hobbyist...


>This is a one-man lab, with simple services, all on docker. I also work in IT.

TBH your mistake was only running one layer of virtualization. What I do on my home setup is run a docker in a VM in a VM in a docker in a docker in a VM in a docker in a VM in a VM in a docker. This, I feel, ought to be the minimum level of indirection and virtualization in any technical configuration in perpetuity. Anything less is bush league and prone to errors.


side remark: I like the ambiguity of titanic (giant) task and Titanic (1912) task :-)


Don't forget Titanic (1997)! :D


Thats why Infra as Code is very very important.


Not really, the OP was already using docker, but even with IAC on a small home lab like this you're going to modify one or two small things manually here and there over the years.

Sure it can help, but it's just not a one fix solution people thing. If you want a good test of your IAC, just provisioning a brand new environment first time using only your iac.


The first rule of IAC club is that you do not hand modify your infrastructure.

The second rule of IAC club is that you do not hand modify your infrastructure.


If you hand modify the already IAC system , you are not doing IAC. IAC with CI/CD is what we do. We don't even use AWS Console , we do everything in terraform/opentofu code.


> you're going to modify one or two small things manually here and there over the years.

Huh? This is a strange assumption to make. Is your premise that IAC can't ever be truly reproducible?

If you are modifying things manually then you're not doing IAC.


Yeah as soon as you start hand tweaking the system it breaks IAC.


30 services is a small home lab?


Out of these 30 services, ~8 are useful on a daily basis, some from time to time,e and the rest is either forgotten or tests.

What I meant I that I do not have the kind of setup a lot of people show: a DC-grade self, with a U2 switch etc. I have an old server, my ISP box, a UPS, a switch and maybe something else I forgot. But since the server runs both Home Assistant and Pi-hole, it is critically important.

OTOH this criticality allows me not to invest too much in monitoring: I have family and friends yelling immediately when something is down :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: