One might say we have acquired something of an obsession with the scientific method. A notable characteristic of recent news articles and culture pieces is the ever-present terms "the science says" or "the experts say". Militant positions on the production of scientific knowledge are often celebrated, particularly when posed against the apparently corrupting influence of spiritual practice. Richard Dawkins is a convenient example, the subject of countless undergraduate debates:
Not only is science corrosive to religion; religion is corrosive to science. It teaches people to be satisfied with trivial, supernatural non-explanations and blinds them to the wonderful real explanations that we have within our grasp. It teaches them to accept authority, revelation and faith instead of always insisting on evidence.
There is, of course, a certain anti-intellectualism that pervades our political landscape, but by and large our modern culture is a culture that reveres science and reflective understanding as the only legitimate routes to knowledge production.
The scientific method is a belief system like any other
In this, we are mistaken. At the core of the scientific method is the hypothesis—a suggested explanation for some phenomenon or other. Crucially, we never accept a hypothesis. We can only ever reject or fail to reject it. As Stephen J. Gould put it:
Science is a procedure for testing and rejecting hypotheses, not a compendium of certain knowledge. Claims that can be proved incorrect lie within its domain
Thus a scientific fact is rarely a fact at all, but merely the steady winnowing of the possible explanations of a thing. It is a belief—that an explanation is the correct explanation, not because it's proven but because we have not yet disproven it.1
This attitude suffers from the same affliction as all beliefs—they are liable to errors of lazy application.
Consider, for example, one shining example of the scientific method: the test of statistical significance. This is the primary method for testing a hypothesis, employed in almost all scientific disciplines. It's basically a test of probability—if we assume nothing should have happened, how unlikely is it that we'd get the results we got? It's an odd concept to wrap your head around, comparing something to nothing. Far better explained here:
Sometimes you'll flip a penny and get several heads in a row, but that doesn't mean the penny is rigged. Suppose, for instance, that you toss a penny 10 times. A perfectly fair coin (heads or tails equally likely) will often produce more or fewer than five heads. In fact, you'll get exactly five heads only about a fourth of the time. Sometimes you'll get six heads, or four. Or seven, or eight. In fact, even with a fair coin, you might get 10 heads out of 10 flips (but only about once for every thousand 10-flip trials).
So how many heads should make you suspicious? Suppose you get eight heads out of 10 tosses. For a fair coin, the chances of eight or more heads are only about 5.5 percent.
What we're measuring here is how unlikely our result is, assuming that nothing should be happening. We assume the coin should just be evenly splitting heads and tails, and if so there's only a 5.5% chance we'd get 8 heads instead of 5.
That 5.5% figure might make you suspicious. But it wouldn't make most modern scientists. The typical threshold in many scientific disciplines to determine statistical significance is 5%. A 5% chance or lower of getting our result is considered statistically significant. Why 5%? Well, because one time a clever statistician used 5% as an example when explaining the concept. He never meant it to become the default value across most major scientific disciplines for decades. He just used it as a convenient threshold to assess the results of a gambling game, and explicitly noted that a researcher should:
give... his mind to each particular case in the light of his evidence and his ideas.
This is only the most egregious of the misuses of significance testing. For example, it doesn't tell us how likely our result it. Only how unlikely it is if we assume that nothing should be happening.
There's also important questions about how we determine that nothing should be happening. In a coin toss, nothing is an even number of heads and tails. But what is 'nothing' in comparison to a treatment for depression? Some kind of assumption about the stability of a group of people's symptoms—how much variance could be considered as a product of chance and not our treatment. This is substantially more fuzzy and significance testing becomes meaningless if you haven't chosen a nothing that makes sense or is 'true'.
There's also the fact that the unlikelyness of something happening doesn't tell you anything in particular about how significant it is. Rather it just tells you something about how much information you have about the likelihood. For example, if you do one test with a small group of people and another with a large group, you'll get get a smaller percentage in the larger group. With more poeple you have more information about the likelihood and it becomes more accurate (which invariably makes it smaller).
Indeed, just because something is unlikely according to your arbitrary threshold doesn't mean it has any kind of practical significance at all. This is why manuals on the treatment of psychological difficulty have to consider four models of mental health, of which statistical significance is only one, and we still have an enormous number of problems with diagnosis. This is why we celebrate the 'randomised controlled trial', in which researchers desperately try to minimise the impact of any possible 'confound'—that is any variable that might interact with the thing we want to look at. This seems sensible until we consider Mollison's full quote on the subject:
Scientific method is one of the ways to know about the real world… Observation and contemplative understanding is another. We can find out many things… by timing, measuring, and observing them; enough to make calendars, computers… but not ever enough to understand the complex actions in even a simple living system. You can hit a nail on the head, or cause a machine to do so, and get a fairly predictable result. Hit a dog on the head, and it will either dodge, bite back, or die, but it will never again react in the same way. We can predict only those things we set up to be predictable, not what we encounter in the real world of living and reactive processes.
Science can be a ritual too
And so, with these kinds of mind-bending difficulties in interpretation, it might not surprise you to learn of the raging replication crisis—the inability for scientists to reproduce the results of major studies across displines like psychology, medicine, economics, and more.
The scientific method is a belief system, subject to ritualisation and the consequent errors of lazy application. Here are some choice quotes from the earlier, excellent explainer article to drive the point home:
Andrew Gelman puts it bluntly: "The scientific method that we love so much is a machine for generating exaggerations."
Gerd Gigerenzer calls the "null ritual," a mindless process for producing data that researchers seldom interpret correctly. Psychology adopted the null ritual early on, and then it spread (like a disease) to many other fields, including biology, economics, and ecology. "People do this in a ritualistic way," Gigerenzer says. "It's like compulsive hand washing."
And we are left in a position in which scientists all over the world are signing petitions to ban the practice. Or here's the American Statistical Association complaining of the "statistics wars" and recommending significance testing put aside:
"We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term"statistically significant" entirely. Nor should variants such as "significantly different... and"nonsignificant" survive, whether expressed in words, by asterisks in a table, or in some other way.
This is not to say that the scientific method should be put aside. Although I gripe about it, I've written many times about those places I think it [adds value].
Not to mention that there are other methods for assessing results, and a there is a minority of researchers who utilise significance testing and other measures thoughtfully.
This is all merely to say that, as Pi:
It was my first clue that atheists [and the spiritual] are... brothers and sisters of a different faith... they go as far as the legs of reason will carry them—and then they leap.
And sometimes (often?), researchers leap lazily.