You should compare against a GPU implementation of SHA256 (hashcat has a good one).
You may also want to consider ASICs - although you won't be able to build your own ASIC, you can look at the hashrates offered by bitcoin mining hardware, and extrapolate.
Also remember that native code can use multithreading, so if your challenge is something that could be farmed out to multiple CPU threads until one finds the solution, that's another factor in favor of native code performance.
Lets just assume that you solve the "it takes a while to run" thing through some clever bits of hard-to-optimise math, that's difficult to parallelise or multithread or whatever.
If all it takes is computer time, then that's a cheap thing for the botnet operator to solve. They can just spin up another instance, and split up the crawling task to another computer (or 200).
I haven't encountered a case where multithreading will make the algorithm weaker, but I do have a variation of the benchmark code(on disk, at the moment) that will spin up multiple worker threads to compute PoW.