hot reload of code is nothing new nowadays, but people use it only locally durin...

diath · on Dec 13, 2024

> in actual production, people prefer to operate at the container level + traffic management, and dont touch anything deeper than the container

How do you think video games like World of Warcraft or Path of Exile deploy restartless hotfixes to millions of concurrent players without killing instances? I don't think it's a matter of "prefer to", it's a matter of "can we completely disrupt the service for users and potentially lose some of the state"? Even if that disruption lasts a mere millisecond, in some context it's not acceptable.

Thaxll · on Dec 13, 2024

Most of those hot fixes are data driven as in database updates. Gameserver just reload the data, the binary itself is not touch.

I've never seen a game where they hot reload code inside the gameserver itself, it's usually a downtime or rolling updates.

diath · on Dec 13, 2024

> Most of those hot fixes are data driven as in database updates. Gameserver just reload the data, the binary itself is not touch.

And since the data from the disk/database (whether it's a Lua table, XML structure, JSON object, or a query) is then representend as a low-level data structure, that's essentially what hot reloading is - you deserialize the new data and hot-swap the pointers in the simplest terms.

>I've never seen a game where they hot reload code inside the gameserver itself, it's usually a downtime or rolling updates.

In World of Warcraft, you will literally have bosses despawn mid-fight and spawn again with new stats or you will see their health values update mid-fight, all without the players getting interrupted, their spell state getting desynced, or spawned items in the instance disappearing. This can be observed with the release of every single new raid on live streams as Blizzard employees are watching the world first attempts and tweaking/tuning the fights as they happen.

EDIT: Here's such an example, for the majority of the fight the extra tank could keep a spawned monster away from the boss, then mid-fight, the monster suddenly started one-shotting the tank, without the disruption of the instance, this was Blizzard's way of addressing a cheese strat to force the players to do the right as designed: https://www.youtube.com/watch?v=7gMm60BXAjU

Thaxll · on Dec 13, 2024

Yes but again it's not hot swapping code as in Erlang, the C++ code is unchanged, they just change some xml somewhere.

By your definition every CRUD app have hot reloading capabilities.

diath · on Dec 13, 2024

> Yes but again it's not hot swapping code as in Erlang, the C++ code is unchanged, they just change some xml somewhere.

Right, not on the C++ side, but on the Lua side that WoW uses - you load the new gameplay code that pulls the new data, and override the globals with new functions.

dgfitz · on Dec 13, 2024

Why does it matter the language? C++ built in the tooling to allow hot swapping, no?

Thaxll · on Dec 13, 2024

C++ because 99% of the major games are built in that language.

tomjakubowski · on Dec 13, 2024

LPMUDs ran almost entirely on hot reloadable code written in a quirky language called LPC, which later inspired the Pike language.

I believe that only the "driver" code, which handles system calls and hosts the LPC interpreter and is written in C, couldn't be hot reloaded; everything else running in the game could be reloaded without restarting the server.

I'd guess in the modern day, there would be some games where Lua scripts can be hot-reloaded like any other data, from a database or object store.

cess11 · on Dec 13, 2024

It's a rather fun language and programming environment, I'd recommend playing around with it over doing AoC.

swat535 · on Dec 13, 2024

In addition to what most people said, many other game servers just simply announce upcoming maintenance work and take the services offline until the patches are deployed.

This way they can properly test everything and rollback any potential fixes if required.. even banking systems regularly goes down for maintenance.

qudat · on Dec 13, 2024

WoW restarts every week. Not sure that’s better than zero downtime deployments

diath · on Dec 13, 2024

That's just how it works when your backend is a hybrid software that utilizes a low-level compiled programming language and a high-level language that runs in its own VM. You can use the latter for gameplay features, and can hotfix on the go, and then for core changes you have to restart, which is also why WoW will hotfix the latter on the go, usually every day on an expansion launch, whereas they defer the bulk of backend changes for the next weekly restart without continuously disrupting the game for players.

AnotherGoodName · on Dec 13, 2024

That’s a very big assumption that they do code hotpatching.

It would seem far more likely they seperate the stateful (database) and stateless layers (game logic) and they just spin up a new instance of the stateless server layer behind a reverse proxy and spin down the old instance. It’s basically how all websites update without down time.

diath · on Dec 13, 2024

A website that just proxies to another server does not need to do much to restore the previous state to make it look seamless to a user, the client will just perform another GET request that triggers a few SELECT queries, it's far more complex in the context of a video game.

Muromec · on Dec 13, 2024

Games do in fact have downtimes on major releases and you have to restart the client too before connecting.

diath · on Dec 13, 2024

For major patches/backend changes that require recompiling - yes, for gameplay tweaks/hotfixes - no, hot reloading is preferable where possible.

aeturnum · on Dec 13, 2024

I work at a company that deploys Elixir/Erlang and while we do /prefer/ to push a fully tested build in a new container, sometimes things get nasty and we need to console in and re-define a module in production. It's not a "best practice" but it stems the bleeding while the best practice is going though its test suite.

simoncion · on Dec 13, 2024

> in actual production, people prefer to operate at the container level + traffic management, and dont touch anything deeper than the container

Fred Hebert (and many of the folks he has worked with) do not operate that way: <https://ferd.ca/a-pipeline-made-of-airbags.html>

One nice quote (out of many) from the article:

> The thing that stateless containers and kubernetes do is handle that base case of "when a thing is wrong, replace it and get back to a good state." The thing it does not easily let you do is "and then start iterating to get better and better at not losing all your state and recuperating fast".

(And if one wants to argue with that quote, please read the entire essay first. There's important context that's relevant to fully understanding Hebert's opinion here.)

AlphaWeaver · on Dec 13, 2024

People may "prefer" simply replacing containers, but as some siblings mention, some applications might require more reliability guarantees.

Erlang was originally designed for implementing telephony protocols, where interrupted phone calls were not an acceptable side effect of application updates.

AnotherGoodName · on Dec 13, 2024

FWIW as soon as you start using containers you should be able to handle those containers spinning up/down. Pretty much the whole point of containers. At which point you don’t need to bother with code hot swapping since you already have a mechanism for newer containers to spin up while older ones spin down.

The sibling post “that’s how they update without downtime” is super naive. It is absolutely not how they do it.

Muromec · on Dec 13, 2024

That's kinda what erlang does, just on a different level. Your docker and your load balancer are both inside your app.

simoncion · on Dec 13, 2024

If we to wedge how Erlang does hot code swapping into a container metaphor, then to get what Erlang does, you'd need to have a container per function call.

Given that it would be absurdly wasteful to use OS processes in containers to clone Erlang's code reload system, AnotherGoodName might take ten minute to watch Erlang: The Movie to get a better sense of the capabilities of that system. The movie is available from many places, including archive.org.

Muromec · on Dec 13, 2024

>If we to wedge how Erlang does hot code swapping into a container metaphor, then to get what Erlang does, you'd need to have a container per function call.

You have a container that responds to HTPP requests sitting behind a load balancer, then you spawn a new container and tell load balancer to redirect calls to the new one. From the point of view of whoever is calling the load balancer you have hot swapping. You may even separate containers into logical groups and call it microservices architecture. Or you can define a process as something having qualified name and a mailbox and is sending messages to other processes.

Now reasonable people may disagree about what's wasteful, but the market seems to tolerate places where adding a checkbox to a form is a half a year process involving five different departments and the market can't be wrong.

simoncion · on Dec 13, 2024

Sure, you can shut down and restart your entire application. You could do that back in 1990 without containers, too.

The thing is that Erlang does hot reload at a per-function (or -according to Hebert- sometimes more-fine-grained) level, so nuking the entire program and paying the cost to start it up again is not at all the same thing as -say- using a not-absurdly-priced AWS Lambda [0] or similar to get per-function hot reloading.

By the way, have you read "A Pipeline Made of Airbags"? If not, you should give it a read: <https://ferd.ca/a-pipeline-made-of-airbags.html>. It might be old news to you, maybe, but maybe not.

[0] Good luck finding one, though.

Muromec · on Dec 14, 2024

I didn't read that one before, but I share the sentiment. We can't have cool things and it was all dumbed down, so the worst case become a default mode of operation. This didn't happen specifically with hot reload in erlang, it happens all the time at all levels.

foota · on Dec 13, 2024

Amusingly, this reminds me sort of about the story of a person who joins a new company only to discover that their programming framework is intricately linked to their version control system.

toast0 · on Dec 13, 2024

> in actual production, people prefer to operate at the container level + traffic management, and dont touch anything deeper than the container

I mean, this seems to be "best practices" these days, but I certainly don't prefer it. At least the orchestration I use is amazingly slow. And cold loading changes is terrible for long running processes... this makes deployment a major chore.

It's less terrible if you're just doing mostly stateless web stuff, but that's not my world.

In the time it takes to run terraform plan, I could have pushed erlang code to all my machines, loaded it, and (usually) confirm I fixed what I wanted to fix.

Low cost of deploy means you can do more updates which means they can be smaller which makes them easier to review.