Mandelbrot Set with SIMD Intrinsics

w0utert · on July 11, 2015

5.20ms per image to render a Mandelbrot set at 1440x800 using AVX. My mind boggles when I think back to the time (a little over 20 years ago) when I typed in a Mandelbrot program from a magazine on my Commodore 128, ran it, went to school, to return 8 hours later to just see it complete the last few pixels of a monochrome Mandelbrot at a glorious 320x200 resolution ;-)

drxzcl · on July 11, 2015

Just in case you are tired of old fashioned fixed function CPUs, check out this awesome realtime full HD FPGA renderer!

http://hamsterworks.co.nz/mediawiki/index.php/Mandelbrot_NG_...

panic · on July 11, 2015

The standard visualization for the Mandelbrot set (which is used here) colors pixels according to the number of iterations it takes for the corresponding point to get far enough away from zero that it can no longer be part of the set. It's worth noting that coloring pixels according to the estimated distance to the fractal instead (using a technique like http://iquilezles.org/www/articles/distancefractals/distance...) can produce a more detailed image: pixels very close to a thin edge of the fractal are included in a distance-estimation visualization but can be missed by an escape-time coloring.

anon4 · on July 11, 2015

A quick, very pedantic note - the images don't show colouring based on linear distance. The problem is that your monitor works in non-linear colourspace - sRGB, while the image is generated in linear RGB but not converted to sRGB afterwards. If you want to show proper colours, you need to convert the final value from RGB to sRGB. For people writing shaders, here are two handy functions I keep around:

    vec3 srgb_from_rgb(vec3 rgb) {
        vec3 a = vec3(0.055, 0.055, 0.055);
        vec3 ap1 = vec3(1.0, 1.0, 1.0) + a;
        vec3 g = vec3(2.4, 2.4, 2.4);
        vec3 ginv = 1.0 / g;
        vec3 select = step(vec3(0.0031308, 0.0031308, 0.0031308), rgb);
        vec3 lo = rgb * 12.92;
        vec3 hi = ap1 * pow(rgb, ginv) - a;
        return mix(lo, hi, select);
    }

    vec3 rgb_from_srgb(vec3 srgb) {
        vec3 a = vec3(0.055, 0.055, 0.055);
        vec3 ap1 = vec3(1.0, 1.0, 1.0) + a;
        vec3 g = vec3(2.4, 2.4, 2.4);
        vec3 select = step(vec3(0.04045, 0.04045, 0.04045), srgb);
        vec3 lo = srgb / 12.92;
        vec3 hi = pow((srgb + a) / ap1, g);
        return mix(lo, hi, select);
    }

They don't have any measurable performance impact, as far as I can tell.

Edit: comparison between non-colourspace-corrected and colourspace-corrected versions of the shadertoy link http://www.screenshotcomparison.com/comparison/134695

panic · on July 11, 2015

It's not pedantic -- interpolation in linear space makes a major difference! It doesn't look like this shader is meant to show distance directly, though: it's taking a fourth root and some other things:

    // do some soft coloring based on distance
    d = clamp( 8.0*d/zoo, 0.0, 1.0 );
    d = pow( d, 0.25 );
    vec3 col = vec3( d );

anon4 · on July 12, 2015

I'm calling it pedantic, because as far as I can see nobody cares outside of people doing movie VFX and the like, where it is actually really really important that you get it right. Your GUI for example doesn't care - elements are blended assuming a linear colourspace, which means that all font rendering is just slightly off. Even Apple don't care to make their blur effects correct - see https://www.youtube.com/watch?v=LKnqECcg6Gw . Even photoshop doesn't do it correctly by default. Almost everybody is perfectly happy treating the values as linear colour intensity.

The bottom line is that sRGB is hideously hard to work in. You really want to only use it for storage and only if you really must. It's an optimisation to allow you better range on the low end, at the expense of the high intensity end, to match human colour vision. However, that means using 48bit RGB and that's not a price everyone wants to pay. People who do professional graphics work simply use 32bit floats per colour channel - 128bit RGBA, or even 192bit RGBRaGaBa (separate alpha for each channel) and have workstations with 32 or 64GB of RAM. However, your normal everyday GUI application needs to run on a phone with about 1GB of RAM. Or going back farther, it has to run on an Intel 486 with 16MB of RAM. That kind of explains where the culture of "just treat it as linear, it's not that wrong" came from :) .

melling · on July 11, 2015

I noticed the plug for https://handmadehero.org

How's that going? Seems like progress has slowed.

RoboSeldon · on July 11, 2015

Casey has took a week off for a medical urgency in his family.

If you want to see how alive is Hand Made Hero just follow Casey on Twitter and check the Forums from handmadehero.org.

krylon · on July 11, 2015

> I didn't use C99's complex number support because -- continuing to follow the approach Handmade Hero -- I intended to port this code directly into SIMD intrinsics.

Which makes me wonder - do GCC and/or Clang compile complex arithmetic so SIMD instructions?

A couple of years back, I rewrote a mandelbrot renderer from C89 to C99 and used complex numbers, and I noticed that it ran faster afterwards (not dramatically so, but I noticed the difference). I never checked, though, if the compiler emitted SIMD instructions, and sometime since I lost the source code to a faulty hard disk and was too lazy to write another one.

Narishma · on July 11, 2015

I believe NEON was introduced with ARMv7, not ARMv6 as claimed in the article. Though some Intel ARMv5 CPUs (when they were still making them) did support a SIMD extension called WMMX, which was based on MMX and/or SSE.