Reading the title my mind immediately went to mov ax, 13h int 10h I'm old.

mikepurvis · on June 5, 2019

Oh man, the memories. It was such a revelation discovering how much faster it was to POKE pixels directly into memory in QB rather than use the PSET built-in. And then much of that knowledge was directly transferable to Turbo Pascal and later C.

it really boggled the mind that it was possible to blt a sprite in rows rather than a pixel at a time, and under the hood, the computer was actually copying 4 bytes per instruction. Anyone else remember implementing the 320 row offset as the sum of two bit shifts? By the early 2000s, I seriously doubt that was any faster, but we all did it anyway for "performance".

And don't even get me started on palette hacks— redefining the 256 colors into bars of dark-to-light runs of a single colour, so that you could do smoke or halo effects by just shifting every affected pixel by one or two in either direction.

I don't know if he's around these days on HN, but a shout-out to Mark Sibly (Blitz) for bringing a lot of this stuff to the QBasicNews forum back in the day.

rzzzt · on June 5, 2019

y << 8 + y << 6 + x

(I think the bit shifting and addition trick would still win over an IMUL.)

Edit: also, wasn't there a way of (ab-)using LEA to do some of this?

mikepurvis · on June 5, 2019

It seems clang-7 agrees with you— it compiles the multiplication down to shl+add, so the perf is completely identical, see:

http://quick-bench.com/CdxqB_qPD3VS5H7igB31FFoqLLo

rzzzt · on June 5, 2019

But you are also right that it is not a huge difference :) In no-optimization mode, it keeps the imul in for Multiply. the result is 1.2477 vs. 1.2668, relative to Noop.

The site is really cool, thanks for pointing me to it!

mikepurvis · on June 5, 2019

Interestingly, gcc-8 does some really weird stuff. Its version of Multiply is 1.4 times slower than its version of Shift, but according to the disassembly, the Shift implementation is actually using an imul, whereas the Multiply implementation is doing a lea/shl.

jackhack · on June 5, 2019

Brought a smile to my face. I see this as a shibboleth for the oldschool coders.

For those missing the joke, that's the assembly language commands to switch to VGA mode from DOS. Set AX register to 13 hex and call interrupt 10. That gives us VGA graphics mode 320x200 pixels! Related: In the 1990s Michael Abrash via Dr. Dobbs Magazine, introduced the world to "Mode X" @ 320x240, which had the advantage of square pixels. That series of articles was responsible for my long-standing subscriptions to Dr Dobbs Journal.

mhd · on June 5, 2019

320x240 was one of the benefits, but the more important part was that it allowed you to use all the memory, double buffer, use split screens (for HUDs) etc. Addressing became more complicated, though. (And IIRC, Doom ran in 320x200 Mode X, for some design/monitor reasons)

Supersaiyan_IV · on June 6, 2019

If that brought a tear to your eye, you might enjoy:

http://www.pouet.net/prod.php?which=53816 "Puls" by Řrřola, 256b, MS-Dos

Along with this interesting explanation https://meatfighter.com/puls/ of what's happening with the "binary-search" raytracing, lattice-effect, and tricks to make it fit in 256b.

raverbashing · on June 5, 2019

I'm not that old, I would have used a higher level C function to call the bios

(And use VESA modes like 640x480x256 - thank Deity for DJGPP)

feiss · on June 5, 2019

I share the same feelings with you and all the people that replied. That's why some days ago I started a set of single-file libraries for opening a window and pushing pixels in the same fashion: by writing directly to memory (to a RGB or Indexed (256 palette) frame buffer).

The goal is to have multiple implementations, in different languages and OSs. For example, right now I wrote (partial or full) support (in C) to:

PNG, windows GDI, linux X11, linux OpenGL, linux Framebuffer and Javascript Canvas

https://github.com/feiss/zero

(I've just started! expect it to be incomplete, broken, faulty and buggy!)

vardump · on June 5, 2019

A starting point... after which the countless "out_port" incantations begin.

All just to get to unchained mode (aka Mode-X) with hardware scrolling & split screen, 256 kB VRAM, VGA latch fills and VRAM to VRAM latch blitting. It was amazing to be able to set or copy four 256 color pixels by writing just a single byte.

Of course VGA latch blitting didn't make any sense on 486s, but on 8088 it could be almost 4x performance boost — 256 colors with similar CPU load as CGA's 4 colors! At least as long as you needed to blit something to an x-coordinate that's divisible by 4...

krychu · on June 5, 2019

Same here. Felt cheated when I saw SDL mentioned :)

nurettin · on June 5, 2019

I must have done this in DOS like, a million times!

fersab · on June 5, 2019

So much memories! Thank you!

also old!

JustSomeNobody · on June 5, 2019

Ah the days.