Oh man, the memories. It was such a revelation discovering how much faster it was to POKE pixels directly into memory in QB rather than use the PSET built-in. And then much of that knowledge was directly transferable to Turbo Pascal and later C.
it really boggled the mind that it was possible to blt a sprite in rows rather than a pixel at a time, and under the hood, the computer was actually copying 4 bytes per instruction. Anyone else remember implementing the 320 row offset as the sum of two bit shifts? By the early 2000s, I seriously doubt that was any faster, but we all did it anyway for "performance".
And don't even get me started on palette hacks— redefining the 256 colors into bars of dark-to-light runs of a single colour, so that you could do smoke or halo effects by just shifting every affected pixel by one or two in either direction.
I don't know if he's around these days on HN, but a shout-out to Mark Sibly (Blitz) for bringing a lot of this stuff to the QBasicNews forum back in the day.
But you are also right that it is not a huge difference :) In no-optimization mode, it keeps the imul in for Multiply. the result is 1.2477 vs. 1.2668, relative to Noop.
The site is really cool, thanks for pointing me to it!
Interestingly, gcc-8 does some really weird stuff. Its version of Multiply is 1.4 times slower than its version of Shift, but according to the disassembly, the Shift implementation is actually using an imul, whereas the Multiply implementation is doing a lea/shl.
Brought a smile to my face. I see this as a shibboleth for the oldschool coders.
For those missing the joke, that's the assembly language commands to switch to VGA mode from DOS. Set AX register to 13 hex and call interrupt 10. That gives us VGA graphics mode 320x200 pixels!
Related: In the 1990s Michael Abrash via Dr. Dobbs Magazine, introduced the world to "Mode X" @ 320x240, which had the advantage of square pixels. That series of articles was responsible for my long-standing subscriptions to Dr Dobbs Journal.
320x240 was one of the benefits, but the more important part was that it allowed you to use all the memory, double buffer, use split screens (for HUDs) etc.
Addressing became more complicated, though. (And IIRC, Doom ran in 320x200 Mode X, for some design/monitor reasons)
Along with this interesting explanation https://meatfighter.com/puls/ of what's happening with the "binary-search" raytracing, lattice-effect, and tricks to make it fit in 256b.
I share the same feelings with you and all the people that replied. That's why some days ago I started a set of single-file libraries for opening a window and pushing pixels in the same fashion: by writing directly to memory (to a RGB or Indexed (256 palette) frame buffer).
The goal is to have multiple implementations, in different languages and OSs. For example, right now I wrote (partial or full) support (in C) to:
PNG, windows GDI, linux X11, linux OpenGL, linux Framebuffer and Javascript Canvas
A starting point... after which the countless "out_port" incantations begin.
All just to get to unchained mode (aka Mode-X) with hardware scrolling & split screen, 256 kB VRAM, VGA latch fills and VRAM to VRAM latch blitting. It was amazing to be able to set or copy four 256 color pixels by writing just a single byte.
Of course VGA latch blitting didn't make any sense on 486s, but on 8088 it could be almost 4x performance boost — 256 colors with similar CPU load as CGA's 4 colors! At least as long as you needed to blit something to an x-coordinate that's divisible by 4...