From the end of the article: "... such as compelling researchers to share both data sets and the code for statistical models."
This would be a welcome change. It seems very strange to me that it is not the accepted practice to make available any source code used in the analysis. It's an essential part of their methodology, and any bugs in that code could produce hard-to-detect flaws in their results (dropped data, rounding errors, etc). Just look at the work jgrahamc did on climate change analysis: http://news.ycombinator.com/item?id=1128782.
It is stunning in general that it can count as 'science' and a piece of 'scientific research' though the code employed is held secret. In fact, the use of proprietary software in science is at least as dubious as the use of .doc format for government documents.
The first portion of the article entirely focuses on multiple testing. If you don't perform alpha spending or Bonferroni-type correction despite looking multiple times at your data, you will - of course! - find spurious associations. This is simply bad science and/or lazy refereeing on the part of the journals.
In human genetics we have Mark Daly, David Altshuler, and Eric Lander to thank for doing the thoughtful theoretical work beforehand to identify the alpha for identifying genomewide significance for any study: 5E-8. To my knowledge, GWAS results are much less commonly found to be invalid in a later study.
The message is this: everyone knows at least one method of correction for multiple testing, and virtually everyone knows that they should be doing this. No journal should publish papers that merely show nominal, instead of corrected, P values.
A big problem is that there's often no objective way to pick the "right" statistical test. This gives experimenter the freedom to choose the statistical procedure that favors positive conclusion. Here's a classical example when the statistician has the freedom in choosing between binomial and negative binomial test
Two experimenters contrast treatments A and B. They both have A preferred to B in first 5 patients, and B preferred in the 6th. First experimenter planned to run 6 experiments and count the number of successes, so they get P-value 0.11 for the hypothesis that A is better. The second experimenter planned to run comparisons until B is preferred, up to 6, and got P-value of 0.03.
(see Appendix A of http://www.annals.org/content/130/12/995.full.pdf+html)
Another example is choosing between one-tailed and two-tailed t-test. When you ask for the probability of effect being as extreme as observed x under null hypothesis, should you ask for probability of effect>x or |effect|>|x|?
The most illustrative example of subjectivity of hypothesis testing is probably the issue of testing strings for randomness. There are many tests for testing whether a particular string of bits is generated by a Bernoulli process, with not one having a legitimate claim to being "the right one"
One way to remove the bias is three-way triple-blind testing, ie to measure effects of existing, new, and placebo treatments, have statistician analyze datasets while blinded to their true labels.
There are plenty of methods (cough report Bayesian likelihood ratios cough and publish your raw data cough) that are simple enough for even average scientists to use. They would rather use much more complicated statistics, though, because then they get to publish more papers with "significant" results.
If you can handle calculus, which most scientists take, then you can handle likelihood ratios, believe me.
But the incentives are terrible, which is quite a different thing from supposing that the average PhD is too dumb to learn good statistics if the incentives were strong.
It can be if its used incorrectly. This is why they said
reviewers needed to hold studies to a minimal standard of biological plausibility
There's two "good" ways to do this (as far as I can see). 1) Come up with a biologically plausible idea and test it, using statistics to look at results. 2) Find a pattern in the statistics and find a biologically plausible explanation.
The biology alone isn't enough, you need to statistics to back up and show actual results. However, using statistics alone and in the way indicated in the article (looking at every test and every subgroup, etc...) is exactly what your saying: avoiding rigorous thinking in favor of getting a result.
Most medical studies have a small sample size because it is difficult and often prohibitively expensive to do studies with large numbers of subjects. Small sample size increases the likelihood of spurious results.
The other problem is that mainstream journalism does an abyssmal job of reporting science. How many times have your read a newspaper article or seen a segment on the evening news that tells you that X is good for your health or X is bad for your health as if it was an absolute truth, a law of nature on par with V=IR or F=ma? Then, out of curiosity, you look up the actual journal article published by the scientists and find their claims to be considerably more modest.
Three kids, same pediatrician. First kid: "Drink lots of whole milk". 2nd kid: "drink only skim milk; kids are too fat". 3rd kid: "drink 2% milk in moderation".
I have a friend who's a doctor. A few years ago I read a few books about nutrition for laypeople, and when I had questions, I tried asking him. He couldn't tell me much. Virtually everything I asked him about, I knew more about it than he did, just from reading a few books. He obviously knew more about enzymes, metabolic pathways, and cardiac arteries than I did, but when it came to practical questions like how much protein I need to consume or how well calcium is absorbed from broccoli and collard greens, he had learned literally nothing in medical school. The closest he came to studying nutrition was when he took a special elective unit on diabetes in rural Hispanic populations.
The way he sees it, nutrition is a separate specialty done by people who study less than doctors and get paid less than doctors, and it isn't taught in medical school, so he isn't embarrassed at all by his ignorance. He regularly gives advice to people who have risk factors like high blood pressure or high cholesterol, but the advice he gives them (eat in moderation, eat more vegetables and less junk food, exercise a little) is pretty generic, and none of them follow it anyway.
Doctors are typically educated in how to fix problems, not so much in prevent them. This is unfortunate, since most people think they're experts in everything including nutrition, and treat them as such, but they're really not.
Maybe if doctors were paid to keep their patient base healthy rather than just fix them when they're sick, they'd look into more of that.
Capitation is one approach to solving that problem.
http://en.wikipedia.org/wiki/Capitation_%28healthcare%29
Unfortunately the times frames are still fairly short because patients often move around. So even when doctors are compensated through a capitation system, there isn't enough incentive to encourage patients to make healthy lifestyle choices that will only pay off many years in the future.
>He regularly gives advice to people who have risk factors like high blood pressure or high cholesterol, but the advice he gives them (eat in moderation, eat more vegetables and less junk food, exercise a little) is pretty generic, and none of them follow it anyway.
Doesn't that kind of justify his ignorance, though? Doctors don't have time to be experts on everything (or on anything, really), and it seems unlikely that an ability to hold informed discourse on the nutritional value of broccoli would have anything more than a marginal effect on outcomes for his patients.
All it means is that patients who are interested in nutrition don't discuss it with their doctors. I think the people who actually ask him for advice on nutrition tend to be older, poorly educated, and very dependent on authority. Everybody else takes it for granted that doctors don't have any useful information on living a healthy life. A doctor is just somebody who diagnoses and prescribes. My friend got his head stuffed chock full of arcane knowledge in medical school, and he gets a torrent of "education" from drug companies, but when it comes to the effect of lifestyle on health, he's less educated than most of my yuppie friends.
This is a waste, because the people who are in the best position to understand and evaluate information about living a healthy lifestyle have no influence on the production or consumption of this information, except when they themselves get in on the business.
He could do better, though, not just with nutrition but with other lifestyle factors. For instance, he could explain to older women that doing weight-bearing exercise will reduce their risk of osteoporosis, and he could tell them how to get calcium if they have problems with dairy products.
But my point was that very few people even bother asking him for guidance. It's silly that educated people just assume that by reading a little in their spare time they can acquire a better practical knowledge of nutrition than a professional with four or more years of intense training in human medicine. It's mind-boggling that they're right. For instance, if my friend could name a couple of high-calcium non-dairy foods -- which I'm not confident he could -- it's only because he read some diet books looking for ways to lose weight. He could probably name more symptoms of Chagas disease than dietary sources of vitamin D.
Patient-care medicine is not a science. That is, doctors do not use the scientific method when dealing with patients. It's certainly informed by science regarding best practices and techniques, but diagnoses and treatments ultimately comes down to a doctor's intuition about a given situation.
Oh, I could tell you stories. OBGYNs who didn't know the answer to a question, and spent all of 10 minutes Googling it (I'm not kidding) and coming back with an "authoritative" recommendation. But really, there is a big difference between researchers and practitioners.
You do point out a very important phenomenon, however. A good many doctors don't know a thing about proper statistical analysis, or bayesian reasoning. They implicitly trust whatever research falls into their laps, even if they don't fully understand it, and they begin to treat based upon their faulty understanding.
One of the problems I've run into in my current work is that statistics deals with problems a lot like the drunk man looking for his keys under the lamp. As far as I can tell there is one man in the entire world who works on stable distributions (like normal distributions, but with tunable skew and heavy-tailedness parameters), which have been quite useful for me. If you have or believe you have data that has subtle dependencies among the random variables, you could use a copula but multivariate Archimedian copulas are hard to compute with (far as we've been able to tell, at least), copula fitting is a research problem, and copula choice is black magic.
I'm not a fan of the common sense check. A lot of science is, or at least was, completely contrary to common sense.
However, reproducibility is absolutely vital. If future researchers can't replicate the results of a study, then statistical flukes are going to be a real problem. If studies can be replicated, then statistical error can be handled very efficiently. Each time a given piece of research is reproduced (even if it's only reproduced to serve as a set-up control for a future study), will exponentially reduce the odds of a statistical fluke surviving.
Most likely, not nearly as much. In medicine, tests are expensive and ethically complicated, because you're experimenting on living beings whose inner workings we barely understand. So the ability to make firm conclusions, like you can do in the sciences, is not there.
Gravity is proved. Evolution is rock solid. We still don't know if coffee is good or bad for us.
Gravity is proved. Evolution is rock solid. We still don't know if coffee is good or bad for us.
Gravity is not proven. Falling is easily observed, and general relativity is a pretty good theory for it, but the jury is still out on how exactly gravity works. Evolution is also easily observed (e.g., drug resistant pathogens); its basis in natural selection and genetics is the part that is rock solid.
As for coffee, it's some combination of good and bad. Which part dominates in which situations is up for debate.
(Apologies for being pedantic -- caffeine high just set in.)
Sure, thats old physics/biology. You can't compare that to state of the art medicine. You need to look into modern cosmology, string theory, quantum mechanics and make the comparison. Unfortunately, I'm not in a position to make such a comparison.
That aside, the incentives are different. Medicine --> $$$. String theory doesn't.
In the humanities it is a bit harder than medicine because you can't even do controlled experiments in most situations, and observational statistics is complicated.
> In the humanities it is a bit harder than medicine because you can't even do controlled experiments in most situations, and observational statistics is complicated.
That's not the hard part. In the humanities, people care about the answers. They want to find economic value in diversity, for example.
No one went into medicine to prove that a particular compound cured gout.
In many of the social sciences, people come into the field with agendas and political leanings that at best subtly bias what they are doing and at worst override their search for truth and cause them to find the results they want, whether those results are real or not.
In the hard sciences, there is rarely an agenda. People often care very much about the results, but they rarely bring with them deeply rooted political biases.
Usually not. The researchers grants depend on figuring out the mechanisms around a certain class of compounds curing gout. A pharma company is going to take that research and find a compound that can be patented (a process that takes a while and rarely gets research funding
Many researchers also have for-profit companies on the side, and care about whether the results they publish in their day job are good or bad press for those companies. (Especially common in biomed.)
Strictly speaking the "humanities" (philosophy, literature, history) don't make any kind of attempt at science at all, whereas the social sciences (economics, sociology) have the difficulties you point out.
This would be a welcome change. It seems very strange to me that it is not the accepted practice to make available any source code used in the analysis. It's an essential part of their methodology, and any bugs in that code could produce hard-to-detect flaws in their results (dropped data, rounding errors, etc). Just look at the work jgrahamc did on climate change analysis: http://news.ycombinator.com/item?id=1128782.