kasper93's comments

kasper93 · 2025-07-21T00:39:37 1753058377

Moreover the baseline _c function is compiled with -march=generic and -fno-tree-vectorize on GCC. Hence it's the best case comparison for handcrafted AVX512 code. And while it's is obviously faster and that's very cool, boasting the 100x may be misinterpreted by outsider readers.

I was commenting there with some suggested change and you can find more performance comparison [0].

For example with small adjustment to C and compiling it for AVX512:

  after (gcc -ftree-vectorize --march=znver4)
  detect_range_8_c:                                      285.6 ( 1.00x)
  detect_range_8_avx2:                                   256.0 ( 1.12x)
  detect_range_8_avx512:                                 107.6 ( 2.65x)

Also I argued that it may be a little bit misleading to post comparison without stating the compiler and flags used for said comparison [1].

P.S. There is related work to enable -ftree-vectorize by default [2]

[0] https://ffmpeg.org/pipermail/ffmpeg-devel/2025-July/346813.h...

[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2025-July/346794.h...

[2] https://ffmpeg.org/pipermail/ffmpeg-devel/2025-July/346439.h...

etra0 · 2025-07-21T15:40:11 1753112411

I think this comment should be on the top lol.

I mean, I love ffmpeg, I use it a lot and it's fantastic for my needs, but I've found their public persona often misleading and well, this just confirms my bias.

    > We made a 100x improvement over incredibly unoptimized C by writing heavily specific cpu instructions that the compiler cannot use because we don't allow it!

2x is still an improvement, but way less outstanding as they want it to publicize it because they used assembly.

kasper93 · on Sept 2, 2024

> Updating HDD firmware is something you do to resolve a very specific problem, not ... just because it's available.

It is important to check if there is an update and what has been fixed. Like with any software, it may introduce new bugs, but blindly suggesting to "not touch, if it's not broken" is harmful too. Some time ago Samsung rolled out SSDs that were self-destructing after very short period of time and fixed this in firmware. If your SSD breaks or start having problems it is already too late to update, you have to be proactive. And hardware vendors doesn't release firmware updates for nothing, in most cases there is very good reason for that.

userbinator · on Sept 2, 2024

That wasn't an actual "fix", it was just a workaround --- the flash they were using was far too leaky and lost its charge very quickly, so they decided to have the firmware constantly rewrite in the background. Even the updated firmware won't help for machines that are powered off for months at a time.

https://forum.acelab.eu.com/viewtopic.php?t=8735

M95D · on Sept 2, 2024

Stop encouraging the manufacturers to ship bad firmware that "will fix later".

syntheticnature · on Sept 2, 2024

I think the manufacturers needed no encouragement, and at this point it would take multi-national intervention to get the genie back in the bottle. The poster you replied to simply recognized the situation for what it is: Samsung is still a going concern after releasing SSDs with firmware that would brick them after a relatively short service lifespan.

Dalewyn · on Sept 2, 2024

>Some time ago Samsung rolled out SSDs that were self-destructing after very short period of time and fixed this in firmware.

If it broke, fix it.

If it ain't broke, don't fix it.

The latter became a rule of life because many people decided to fix what ain't broke and got burned for their troubles.

lhamil64 · on Sept 2, 2024

But how would you know it's broken unless you proactively check for updates?

numpad0 · on Sept 2, 2024

Just checking for updates is fine, actually installing every single firmware versions is bad.

This is because embedded firmware are less moving parts more tightly packed, which makes their failure modes inevitably catastrophic; they're incapable of progressively degrading like Electron apps, but the whole system always spontaneously crash into the wall and die from just one typo, and you don't want that.

kasper93 · on Nov 24, 2021

Neat work.

Though I wouldn't be myself if I didn't have few nitpicks ;)

- You are mixing lossy and lossless compression in your rant. And you are mixing containers with formats. Thing about lossy compression is that it is by design stupidly slow and complex to produce smallest possible results which has best perceived quality. Compress once and trade time for both image and compression quality. Lossless is not that bad, although your simple solution is well, simple :)

- I feel like the benchmark suite is lacking. For better overview you probably should include libpng results with max compression level and lowest compression level. Lossless modes of AVIF and WEBP would be nice. (also could throw similar project to yours like https://github.com/catid/Zpng) Not saying the benchmark is bad, but IMHO doesn't paint the full picture. From quick test I got significantly better compression on libpng, ofc in expense of time, but you didn't configure libpng for speed either. So we have some results, but they are not really representative imho.

- I understand it is pet project, but don't forget about importance of image metadata, like colorspace and so on.

- Also keep in mind that compression is easy at the beginning, but the tricky part is how to squeeze more.

EDIT: formatting...

t0rakka · on Nov 26, 2021

I wrote a small test bench for my own uses as wanted to compare with different codecs and see how "raw compression" would work, how would QOI+ZSTD combination work and so on.

Here's example output..

  image: 1502 x 1800 ( 10560 KB )
  ----------------------------------------------
           encode(ms)  decode(ms)   size(KB)    
  ----------------------------------------------
  qoi:          31.2        21.6       2452  
  qoi+zstd:     46.7        19.3       1832  
  zstd:         47.4         8.7       2742  
  lz4:          14.1         3.0       4108  
  png:          92.4        18.0       2795  
  zpng:         24.1        13.3       1394  
  jpg:           2.7         1.8        517  <-- lossy
  webp:        199.6        19.1        227  <-- lossy

kasper93 · on March 15, 2021

Don't get overexcited. On "normal" CPU the improvement is more like 30%, still huge, but don't expect GTA Online to load instantly after the patch.

kasper93 · on May 4, 2017

I think that just in time compilers are better at doing thier things. Sure it is nice project that can interpret and print preprocessed js, but I think it might in fact not bring speed in most cases.

And the current state doesn't even know how to constant fold this loop.

function foo() { const bar = 42; for (let i = 0; i <= bar; i++) { if (i === bar) { return bar; } } };

yairhaimo · on May 16, 2017

As long as the ahead of time optimizations that dont have a large enough negative drawback in another criteria (speed vs size) are welcomed by me, even if the JIT can do the same optimizations too. Regarding your code example, maybe prepack was changed in the past week and a half, but it is folded quite fine when i tried it:

  (function() {
    function foo() { const bar = 42; for (let i = 0; i <= bar; i++) { if (i  === bar) { return bar; } } };
    console.log(foo());
  })();