The author left off the part where it was invented by a mother and how dentists ...

dblohm7 · on Nov 22, 2022

> I suppose it technically improves stability, but the cause seems like a flaw in the Windows operating system, if I'm understanding correctly.

It's not a flaw at all, when you understand what is going on. Part of the issue is that in 2022 so many developers come from Linux backgrounds that they assume that the Linux way of doing things is the "normal" or "correct" way.

The NT kernel does not overcommit, and thus does not have an OOM killer. If the kernel cannot commit pages, the system call fails. That's it. No process terminations.

Firefox would crash because (even on Windows) it uses a customized version of jemalloc that is configured to be infallible by default. If jemalloc requests pages and those pages cannot be committed, the heap allocation will fail, and thus Firefox will self-terminate. That's simply a policy decision on Firefox's part.

Going back to Windows: suppose that the commit request failed because the swap file was full. Assuming that Windows was configured to automatically resize the swap file, the OS will then grow the swap file. That's why pausing for a bit and then retrying the VM allocation works: the swap file was grown, and now the pages can be committed.

ziml77 · on Nov 22, 2022

The Linux design to allow for overcommit is weird to me. Why would you not want to be up-front about the fact that you don't have enough memory available to hand over to a process? The process requested a certain amount of memory to be allocated so surely it expects to use it, no?

wizeman · on Nov 22, 2022

> Why would you not want to be up-front about the fact that you don't have enough memory available to hand over to a process?

Just because a process allocates a certain amount of address space doesn't mean it will use all of it.

> The process requested a certain amount of memory to be allocated so surely it expects to use it, no?

No, not really. I think you are confusing memory usage with address space allocation.

You may wish to allocate a huge amount of address space for convenience purposes, as it could allow you to use much simpler data structures (such as sparse arrays) which can be a lot more performant than more complicated data structures and use a lot less memory (by orders of magnitude) than the address space that you allocated for them.

An example of this is AddressSanitizer's shadow memory, which is sort of a giant sparse bitmap whose address space is allocated up front, but only a small part of it is actually used at any one time.

From the blog post, it sounds like Windows has a separate system call that is required to be used in-between allocating address space and actually using it. I think this is a good design and I personally prefer it over Linux's overcommit design.

However, I think it has a disadvantage in that it could require you to do a lot more system calls (and therefore have slower performance) than would otherwise be needed in Linux, presumably (especially if the access patterns are unpredictable in advance?).

Also, if some process misbehaves and starts using more memory than there actually exists in the system, it can cause significant problems in other processes/applications that have nothing to do with it (as their allocations would start failing). Linux handles this by killing the misbehaving process, which allows the system to recover transparently / without intervention (at the risk of possibly killing innocent processes before actually killing the misbehaving one), hopefully allowing other processes to continue working as if nothing happened.

I think a reasonable approach could be to use Windows' system by default and guarantee that processes are never killed by the OS (which would benefit well-designed applications), but allow an application to optionally use overcommit if it wants the added performance (at the risk of being killed by the OOM killer if the system runs out of memory).

Unfortunately, I suspect that applications which actually handle out-of-memory conditions gracefully are very rare and that most of them would just crash or kill themselves if this happens, which on average is probably a worse outcome than letting the OS kill the misbehaving application.

magicalhippo · on Nov 22, 2022

> Just because a process allocates a certain amount of address space doesn't mean it will use all of it.

Indeed.

> From the blog post, it sounds like Windows has a separate system call that is required to be used in-between allocating address space and actually using it.

Precisely. You can reserve pages in the address space[1] without committing physical memory to back those pages. You can then later commit pages[2] when you want to use parts of that address space, or decommit[3] pages when you're done using them, without changing the reserved address space.

Coming from a Windows background I much prefer this system. I understand why Linux, with its reliance on forking, has gone with the overcommit route, but I consider it a sub-par solution. At least for non-server systems.

[1]: https://learn.microsoft.com/en-us/windows/win32/memory/virtu...

[2]: https://learn.microsoft.com/en-us/windows/win32/api/memoryap...

[3]: https://learn.microsoft.com/en-us/windows/win32/api/memoryap...

wizeman · on Nov 23, 2022

> Coming from a Windows background I much prefer this system.

I come from a Linux background and I much prefer that system too.

Well, at least in theory, because in practice I don't usually run into these kinds of problems nowadays due to my hardware having significantly more memory than my software needs to run. But I still think Windows is designed more elegantly in that respect.

> I understand why Linux, with its reliance on forking, has gone with the overcommit route, but I consider it a sub-par solution.

Well, if Linux actually supported committing pages like Windows does, then there would be significantly less need for Linux to overcommit memory, because when forking you'd only have to reserve swap space for the committed memory of the process.

This means Linux wouldn't have to choose between reserving no swap space or reserving swap space for the entire address space (or some constant fraction of it, which Linux supports but doesn't make much sense either)!

dwattttt · on Nov 22, 2022

> You may wish to allocate a huge amount of address space for convenience purposes, as it could allow you to use much simpler data structures (such as sparse arrays) which can be a lot more performant than more complicated data structures and use a lot less memory (by orders of magnitude) than the address space that you allocated for them.

Windows does allow you to reserve address space without committing it, but it then requires you to commit the chunks before you use them (see VirtualAlloc + MEM_RESERVE)

wizeman · on Nov 22, 2022

> Windows does allow you to reserve address space without committing it, but it then requires you to commit the chunks before you use them (see VirtualAlloc + MEM_RESERVE)

Personally, I think this is the right approach.

Because you shouldn't have to decide between memory overcommit (which can fail badly in some reasonable scenarios) or "having to reserve all allocated address space of all processes in swap", which is way too pessimistic and would start causing unnecessary memory allocation failures in many reasonable scenarios.

And I also think that even if you decide to support memory overcommit, it should be done per-process (at least), not at the whole system level like Linux is doing.

ziml77 · on Nov 22, 2022

I'm not convinced I like the overcommit, but it now makes more sense to me with your distinction of asking for address space versus using memory.

layer8 · on Nov 22, 2022

See https://unix.stackexchange.com/a/521737 for example.

wizeman · on Nov 22, 2022

> See https://unix.stackexchange.com/a/521737 for example.

I'm not convinced that the fork()/exec() issue is the correct justification, at least not entirely. As an example, Solaris supports fork()/exec() yet it doesn't do overcommit (at least not by default, I think) and has no OOM killer. And Solaris has a long history of being a very robust OS.

Also, in most cases I don't see what is the problem with reserving more disk space for swap (disk space which would almost never be used anyway), especially if the system becomes more robust as a result of that.

I think I recall Linus justifying memory overcommit by observing that the vast majority of applications don't handle out-of-memory conditions gracefully, so it's better for the OS to kill a misbehaving process (that is allocating too much memory) and let the others continue working normally than have every single process in the system have to handle out-of-memory failures, which they don't usually handle gracefully. But I'm not sure if I'm recalling this correctly.

I don't think either the overcommit design or the naive "all allocated address space must be reserved" design is the correct approach. I know almost nothing about Windows memory management, but it sounds to me like having "commit/uncommit memory" system calls separate from the "allocate/deallocate address space" system calls to be a better approach.

But I think the OS should try to reserve more space for the swap file before it starts returning ENOMEM to the "commit memory" system calls like Windows seems to be doing (judging from the blog post)...

trelane · on Nov 22, 2022

> I don't think either the overcommit design or the naive "all allocated address space must be reserved" design is the correct approach.

Fortunately, this is tunable. You can also turn off overcommit entirely.

https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

wizeman · on Nov 22, 2022

> Fortunately, this is tunable. You can also turn off overcommit entirely.

I know, but as soon as I tried doing that, my systems started to experience unavoidable failures (although I don't remember exactly why, but I certainly don't think it was simply due to lack of swap space).

Again, I don't remember exactly, but I suspect I experienced these failures because there are applications that expect Linux to be doing overcommit, or at least, they couldn't work without overcommit unless Linux added new features.

I could be wrong (I'm not exactly a Linux memory management expert), but I think the root issue is that Linux is handling committed memory badly, because when you disable overcommit, Linux tries to reserve swap space for the entire allocated address space of all processes. But actually, this is not needed -- it's way too pessimistic and is extremely likely to lead to unnecessary memory allocation failures.

What is needed is a way for applications to communicate with the kernel about which parts of the address space they might actually use (or not use) at any point in time, which may be significantly less (by orders of magnitude) than the amount of address space that they have allocated.

Then you wouldn't need to reserve swap space for all the address space that processes have allocated, you'd only need to reserve swap space for the amount that the processes have declared to be (possibly) using.

This could have a performance cost, but there are ways to reduce it, for example, by allowing this information to be declared in mmap() as a flag (which avoids doing 2 separate system calls in the typical case), by batching several syscalls into just one (similar to readv()/writev()), by using io_uring(), etc.

I also think that, even if you want to use memory overcommit, then enabling overcommit should be done with (at least) process granularity, not for the whole system at once, which again, greatly limits the usefulness of disabling overcommit.

trelane · on Nov 22, 2022

You may find mmap and mlock may be helpful.

wizeman · on Nov 22, 2022

> You may find mmap and mlock may be helpful.

I know about mmap and mlock. Can you be more specific as to how they are useful in the scenario I mentioned above?

Specifically, when memory overcommit is disabled, I can use mmap() to allocate address space but this causes Linux to unnecessarily reserve swap space for the entire amount of the address space allocation.

This means that if the address space allocation is big enough, it would almost certainly fail even though I would only need to use a very tiny fraction of it.

Do you understand what I mean? Just because I allocate address space, it doesn't mean I will use all of it.

As far as I know, Linux can't handle this problem, because there's no way to communicate to the kernel which chunks of the address space allocation I wish to use.

Which means that disabling overcommit in Linux can be completely useless, because many mmap() calls start failing unnecessarily.

I don't think mlock() has anything to do with this problem.

trelane · on Nov 23, 2022

They ensure that regions of memory are backed by actual memory. This sounds like how to tell the kernel that you're actually need and will use this memory.

wizeman · on Nov 23, 2022

mlock() forces pages which have already been allocated and are already being used within an address range to stay resident in memory, so that they are not swapped out to disk. These pages were already being accounted for in swap reservations.

When disabling overcommit, what you'd need to tell Linux is actually quite different: it's that you are about to start to use a new address range, so please make sure that there is enough free swap space for this address range (and reserve it!).

Only after the latter is completed would the kernel be allowed to allocate new pages for this range and the program would be allowed to use them. The kernel would also be free to swap them out to disk like normal, unless mlock() would be called (but only after those pages are already being used, not before!).

So as you can see, mlock() accomplishes something very different and is orthogonal to the functionality I'm discussing, which means it can be used (or not) independently of this new feature to reserve swap space.

This new functionality (to notify Linux that you are about to use a new address range) would also implicitly allow Linux not to reserve swap space for any address range for which it hasn't been notified, which would allow Linux to use swap space much more efficiently and would allow users to disable memory overcommit on Linux without causing a bunch of unnecessary program failures / crashes.

mmap(), on the other hand, normally does two things:

1. It allocates a range of address space.

2. It reserves swap space for this address range (when memory overcommit is disabled).

Notably (and people get confused by this a lot), mmap() doesn't actually allocate memory (i.e. memory pages), it only assigns a range of address space for the program to use, exactly as large as the program requested, and then reserves swap space for it, if required.

What I'm proposing is to separate those two things, i.e. allow programs to allocate address ranges separately from doing the swap space reservation.

If you don't separate these two things then programs allocating a huge range of address space will simply fail due to lack of available swap space (again, when memory overcommit is disabled, which is the goal here).

dblohm7 · on Nov 22, 2022

> But I think the OS should try to reserve more space for the swap file before it starts returning ENOMEM to the "commit memory" system calls like Windows seems to be doing (judging from the blog post)...

One could just as easily argue similarly with respect to Unix and needing to loop over write watching for EINTR et al.

wizeman · on Nov 22, 2022

> One could just as easily argue similarly with respect to Unix and needing to loop over write watching for EINTR et al.

I'm sorry, but I fail to see how that is related.

EINTR is useful so that you can handle a signal in case it is received in the middle of a very long system call. For example, if an application is doing a write() on an NFS filesystem and the NFS server is not reachable (due to some network outage), the write() syscall could take minutes or hours before it completes.

So it's good, for example, than you can Ctrl-C the process or send it a SIGTERM signal and abort the syscall in the middle of it, letting the application handle the signal gracefully.

What I'm talking about is not related because allocating disk space on local disk (where the swap file would be located) is generally a very quick process. Mind you, only the disk space allocation is needed for reserving swap space -- it's not necessary to write anything to the swap file. In fact, it's not even necessary to actually allocate disk space, but I digress.

And even if reserving swap space would take a long time, allowing the syscall to fail with EINTR would also work fine.

What is not fine is letting the application believe that the system has completely run out of memory when in fact a lot of disk space can still be used for swap reservation.

dblohm7 · on Nov 22, 2022

That Stack Exchange answer is kind of weird in the sense that it conflates different things. You can have demand paging without overcommitting (as NT does); the kernel simply needs to assure that there is somewhere to commit a particular page, even if that page isn't committed yet.

layer8 · on Nov 22, 2022

The answer may not be the best-argued one, but it links to other useful discussions, and it is correct that the forking mechanism is an important factor.

Committing is limited by RAM plus swap. You’d have to reserve much more swap than is typically ever actually used by processes, at any given time.

wizeman · on Nov 22, 2022

> You’d have to reserve much more swap than is typically ever actually used by processes, at any given time.

But if the swap wouldn't actually be typically used, then what's the problem with that?

Especially considering that the amount of disk space required for that is cheap.

And why not let the user decide for himself if he prefers to guarantee that their applications never get killed by the OS at the cost of reserving some disk space that he probably wouldn't ever use anyway (as most filesystems' performance nosedives after >90% disk usage anyway).

mrob · on Nov 22, 2022

>most filesystems' performance nosedives after >90% disk usage anyway

With modern filesystems using delayed allocation to reduce fragmentation, and SSDs reducing the cost of fragmentation, you can often get good performance at higher occupancy nowadays.

wizeman · on Nov 22, 2022

Sure, I know, but if you're talking about really modern filesystems (which do copy-on-write/CoW), then the fragmentation caused by CoW is even worse than that avoided by delayed allocation.

SSDs certainly alleviate this problem, but even in SSDs, sequential I/O can be much faster than random I/O.

Anyway, I guess my point is that the vast majority of systems don't run with >90% disk space usage, so reserving up to 10% of the filesystem for swap space is not unreasonable.

Note that this would just be a space reservation. You wouldn't need to actually allocate specific disk blocks or write anything to the swap file, unless the system starts running out of memory.

In reality, you'd need much less than 10% (in the vast majority of cases), especially if you have a Windows-like API where you can allocate address space separately from committing memory (which means uncommitted address space doesn't need swap space reservation).

nwah1 · on Nov 22, 2022

Operating in a low memory environment is inherently about tradeoffs.

Where is the blame?

Maybe Firefox is too bloated or memory-inefficient. Maybe Mozilla didn't understand Windows's memory management strategy until now.

Or maybe Windows is too bloated and memory-inefficient. Or maybe the memory management tradeoffs were suboptimal.

Or maybe nobody is to blame, and they are taking advantage of something in a novel way that allows them to squeeze more juice out of the same fruit than others.

dblohm7 · on Nov 22, 2022

(Former Mozilla engineer who can speak Windows)

> Maybe Mozilla didn't understand Windows's memory management strategy until now.

That is part of it. A lot of FLOSS engineers come from a Linux background and tend to assume that the Linux way of doing things is the "normal" way. While I was there I had to explain to more than one developer that Windows doesn't have an OOM killer because the NT kernel doesn't overcommit.

narag · on Nov 22, 2022

Where is the blame?

Not sure, but sound stopped working in VLC about one year ago and was restored this month at the same time that dropping files stopped working. Every time after a Windows update and with the same VLC version.

I don't think Windows is too bloated or memory-inefficient, but I do believe that they're abusing the AV and "monitoring" stuff. It's a cat & mouse game, with me trying to disable crap and they re-enabling it with every update or simply making impossible to disable annoyances.

Also I suspect they're trying to "fix" drivers that worked before the fixes.

dr_zoidberg · on Nov 22, 2022

> I don't think Windows is too bloated or memory-inefficient.

You know I used to be on that boat up until very very recently. I have a 2019 HP Stream 10" with 32GB eMMC and soldered 4GB DDR4. Windows got it bricked on a botched update, so I installed Linux Mint to get it back up quickly.

Not only is Linux using a lot less storage (which was always and issue with Windows Update), but the RAM usage is about 700MB without any applications running, and well under 4GB for most common uses. Sure, Chrome(ium) will eat a hefty chunk of it when you run it, between zram-config and some extra swap space, it handles things a lot better than Windows ever did.

Antoher point of anecdata: shared libraries are actually shared a lot more between processes on Linux than on Windows. That tends to mean a lot less RAM usage on certain process. I know it's down to how they handle sharing things and that Windows takes the "safe" approach (pretty much what the article talks about!) but it ends up hitting memory a lot more than Linux and macOS.

nwah1 · on Nov 22, 2022

The safe approach to me is exactly their answer to the tradeoff question.

Forcing the application developers to be cautious is just changing who needs to be safe, because reducing problems for your app while increasing overall system instability is still bumping up against the issue at hand.

It is sort of like how someone might be upset with speed limits, but the people who set the speed limits aren't thinking about your particular desire to get to work today, but the overall flow of traffic and public safety needs.

I realize linux still has out-of-memory handling and various safeguards in their memory management, but it is sort of like arguing whether to set the speed limit at 50 or 60. Nobody is actually right, it is just different preferences.

3pt14159 · on Nov 22, 2022

Eh—the blame comes down to the swap.

If the swap were allocated to the whole HD that wasn't used for actual files, then this hack wouldn't work.

If the swap were leaner that it already is, this hack would be necessary in every program.

If I had to point a finger at who is to blame, it's the Windows swap allocation team. Some combination of predictive analytics or even just a saner ratio of swap to free HD for an incoming file would fix this problem for most users most of the time.

But computers are hard and people want to keep them running for days on end. I get that memory just slowly, slowly gets eaten up by all the zombie procs out there.

adql · on Nov 22, 2022

> If the swap were allocated to the whole HD that wasn't used for actual files, then this hack wouldn't work.

Then the hack wouldn't be neccesary as they'd have far more committed space to waste

3pt14159 · on Nov 22, 2022

Right. What I'm saying is that swap vs available HD is a tradeoff the OS should be making and it's probably going too far in the "have avail HD" side of the spectrum or at least not employing enough predictive analytics to figuring out the right balance for any given time.

Brian_K_White · on Nov 22, 2022

That update behavior where ff is still running and existing tabs mostly keep working, but new tabs don't, has been a thing for years and everyone hates it, not just you or me.

I thought it was possibly related to ff on ubuntu switching to being a snap by default (even though I thought I had forced my system to have no snaps and no snapd and added a special ppa for ff) and said something in a comment on hn, and several people clued me in it's way older than that and I'm not the only one who hates it.

It's like ff devs don't actually use browsers, which is crazy of course. But, they really are ok with always having to blow away everything ypu have going at any random time middle of the day? (it's always someone's middle of their day or stretch of involved work)

They never have tabs open with partially filled forms or search results or web apps that "restore tabs" won't restore the way they were? Or this just doesn't bother them?

It feels like a case of "you're holding it wrong", as in the user should shape their usage pattern around ff's update strategy, like, always do and apt upgrade before sitting down, and never after starting to work, and if you leave tabs and work open over night, well I guess just don't do that?

michaelt · on Nov 22, 2022

> But, they really are ok with always having to blow away everything ypu have going at any random time middle of the day?

Y'all don't get your tabs restored when you restart your browser?

For me, the restart experience pre-snap was very easy - close, re-open, and you're right back. Most 'serious' webapps will happily save your drafts if, for whatever reason, you don't want to finish and send that half-composed slack message before restarting.

It got much worse with the switch to snaps.

tux3 · on Nov 22, 2022

>Y'all don't get your tabs restored when you restart your browser?

Ironically, the tab restore breaks when you open a new tab and it hits the "Restart required" page.

On restart, the browser found a clever way to maximize user frustration. It simply attempts to restore the "Restart required" page again (and fails), leaving the new tabs you tried to open blank after the restart.

I still have updates enabled, despite the "Restart required" page providing a strong push to disable them. But at the current rate I might give in eventually.

indentit · on Nov 22, 2022

Not only does it leave them blank, but most importantly it forgets the address of the page you were trying to visit...

Brian_K_White · on Nov 22, 2022

"Y'all don't get your tabs restored when you restart your browser?"

I addressed that.

mook · on Nov 22, 2022

The weird update behaviour is because file got replaced while Firefox is running. On Windows and macOS the updates happen on next start (whenever you choose that to be) so it's not an issue; on Linux updates are handled by your system package manager, so they couldn't line it up as nicely.

Of course that also means you could end up having queued but not applied updates for a long time on Windows and macOS…

pitaj · on Nov 22, 2022

You can avoid this issue (which pretty much only ever happens on Linux, mind you) by installing the package directly from their website instead of from your distro's package manager. If you'd like to help improve stability or use try features before they're fully stable, try the beta, dev, or nightly channels.

aftbit · on Nov 22, 2022

I run Arch and upgrade ~weekly, but I am very rarely inconvenienced by this restart behavior. Restore Tabs works pretty well on the modern web, and since old tabs still work, you can always complete whatever outstanding forms you have first.

nyanpasu64 · on Nov 22, 2022

Problem is when you click a link or enter a URL, restarting your browser restores the previous page from before you had triggered the restart prompt, discarding the page you were trying to open.

Brian_K_White · on Nov 22, 2022

No, I can't complete the in-progress forms, because it requires doing research in other tabs.

bitwize · on Nov 22, 2022

A flaw in the Windows operating system?

This is one of those many instances where Windows is doing absolutely the right thing and it's Linux that's screwed up.

BlueTemplar · on Nov 22, 2022

Why ?

mrob · on Nov 22, 2022

Overcommit exists mostly because of fork(). fork() exists because it was the simplest possible process creation API the Unix designers could get away with, which was an important consideration on a PDP-7. Now that computers are far more capable, we no longer need to sacrifice the ability for applications to handle low memory conditions in a more useful way than crashing.

Microsoft put out an interesting PDF paper about fork():

https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

I do not use Windows myself, but this is one thing I think they got right.

wizeman · on Nov 22, 2022

> Overcommit exists mostly because of fork().

Then why does Solaris support fork() yet it doesn't do overcommit nor have an OOM killer? And also has a long history of being a very robust OS.

For more details, see my comment here: https://news.ycombinator.com/item?id=33708633

mrob · on Nov 22, 2022

I suspect a reputation for robustness is at least partly a self-fulfilling prophecy. People who care about robustness will be more likely to write software for an OS they believe to be robust, and that software is also more likely to be robust.

fork() + no overcommit + no OOM killer can work if you're very careful with allocations in processes that fork, but it would be a disaster on Linux. Willingness to put up with the drawbacks of Solaris is a good signal that you value robustness/stability very highly. IMO, most people developing for Linux have different priorities.

wizeman · on Nov 22, 2022

I think I agree with you, but would like to add that from reading the blog post, it sounds like Windows is actually doing this better than both Linux and Solaris (but my knowledge is a bit limited in this area).

I don't quite like the overcommit approach nor the "all address space must be reserved in swap" approach, because both can fail badly in some reasonable scenarios.

I think having a separate system call to commit/uncommit memory like Windows seems to have is probably a better approach than just having mmap()/munmap() system calls (without a way of communicating with the kernel about which parts of the allocated space you are using), because then you can have the advantages of sparse address space usage while not having the drawbacks of having the OS kill innocent processes.

This would also have the advantage that in fork(), the kernel would just need to reserve swap space for the amount of committed memory, not the amount of allocated address space, the latter of which could be much larger (by orders of magnitude).

BlueTemplar · on Nov 22, 2022

Thanks, very infotmative !