If you toss 100,000,000 == 2 ^ 27, you should only expect around 30 in a row. To have a good chance of getting 100 in a row, you need about a billion squared times more.
And the problem is MUCH worse than described above: Let's say you test 1000 wrong hypothesises with p=0.05; 50 of those will be accepted as true, even though all are wrong. If you test 980 wrong hypothesises and 20 right ones, more than half of those that pass the p=0.05 "golden" significance test will in fact be wrong.
Now, when you see a medical journal with 20 articles using p=0.05, which do you think is more probable - that 19 are right and one is wrong, or 19 are wrong and one is right? The latter has a much higher likelihood.
Clinical researchers too. Because lives are at stake.
The whole field of systematic reviews and meta-analyses has developed around the need to aggregate results from multiple studies of the same disease or treatment, because you can't just trust one isolated result -- it's probably wrong.
Statisticians working in EBM have developed techniques for detecting the 'file-drawer problem' of unpublished negative studies, and correcting for multiple tests (data-dredging). Other fields have a lot to learn...
Clinical researchers working for non-profits / universities do, occasionally. I suspect it has become popular recently not because lives are at stake, but because it lets you publish something meaningful without having to run complex, error prone and lengthy experiments.
Regardless of the true reason, these are never carried out before a new drug or treatment is approved (because there is usually one or two studies supporting said treatment, both positive).
And if you have pointers to techniques developed for/by EBM practitioners, I would be grateful. Being a bayesian guy myself and having spent some time reading Lancet, NEMJ and BMJ papers, I'm so far unimpressed, to say the least.
If you toss 100,000,000 == 2 ^ 27, you should only expect around 30 in a row. To have a good chance of getting 100 in a row, you need about a billion squared times more.
And the problem is MUCH worse than described above: Let's say you test 1000 wrong hypothesises with p=0.05; 50 of those will be accepted as true, even though all are wrong. If you test 980 wrong hypothesises and 20 right ones, more than half of those that pass the p=0.05 "golden" significance test will in fact be wrong.
Now, when you see a medical journal with 20 articles using p=0.05, which do you think is more probable - that 19 are right and one is wrong, or 19 are wrong and one is right? The latter has a much higher likelihood.