Black Swan Explorations

(*from my year 1 studio project)


March 20, 2020


This is Ronald A. Fisher, popularly known as the father of modern statistics looking rather contemplative. The obvious age of the photo is intended to ensure that you, the reader, are aware of quite how long the validity of experimentation has been measured by p-values.

In actual fact we can go back to 1914 for the first formal usage of the value by Karl Pearson. But it was our pictured practitioner Fisher who popularised its use aided by the publication of “Statistical Methods for Research Workers” in 1925. Ironically, and in line with the controversial subject of p-hacking, Fisher gained a degree of posthumous infamy for his lack of belief in the role of smoking causing cancer. His biographical story makes for entertaining reading and comes highly recommended.

In his own words, from his book (p44), he introduces the value:

“The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty.“

For readers with time and a dose of curiosity thus far, the following paper published in 2019 will go into far greater detail on the underpinnings of this article. “On the nineteenth-century origins of significance testing and p-hacking” by Glenn Shafer is available here. It discusses a lot of the statistical maths involved as well as quoting many of the key figures of history on their technical viewpoints. It’s clear to see how divisive the topic of research has been, especially when it comes to identifying significant results or achievements.

Interestingly, despite the recent increased awareness of ‘hacking’, we cannot state it as a modern phenomenon. We in fact have to assume it has always happened whether knowingly or not, thus casting a sceptical shadow over historical research. A multitude of reasons exist for this behaviour, such as confirmation bias, publishing pressure, poor methodology, naivety, lack of skill, researcher ‘degrees of freedom’ and misinterpretation to name just a few items which will be tackled in subsequent posts.

One could reason that the above could apply to any attempt at research verification. So why has the p-value endured? Despite its criticisms, could it be argued that there is simply no better way? Can it be said that in the intervening century, no other methodology will fix the aforementioned flaws in scientific validation? Or is the momentum of scientific machinery, the traditions and methods of communication or signalling too great? Perhaps there is a need to frame results in the same paradigm as those that already exist?

Nuanced conversation and debate in the current media climate is difficult. But debate needs to happen to invite focus and potentially strengthen a hypothesis. Is there an under appreciation of the nuanced definition of the P value, that its original highly skilled practitioners had interned, yet is missed today? “Almost all of them (scientists) think it gives some direct information about how likely they are to be wrong, and that’s definitely not what a p-value does,” Goodman (Fivethirtyeight blog)

At this stage, it is perhaps worth re-iterating the important questions to be asked:

What has enabled the p-value to endure?
Why do they p-hack?
Is there another method we should use?

Answers to these questions will be attempted in upcoming posts outlined in the footer. But to cap off this one and stick with the historical theme, I’ll share this:

“All scientific work is incomplete,” he said. “All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time.”
– Austin Bradford Hill, 1965, The Royal Society of Medicine.

Thanks for reading!

