Categories
Uncategorized

P-hacking with Taleb

Nassim Nicholas Taleb (profile link) is the inspiration for this blog series, hence the Black Swan Explorations title. His books are best sellers with his unique writing style and often strong and contrarian opinion. I admit to being a fan of his work although I hope to come across more impartial in this post! 

I will recommend his books as they are written in a non-technical way, yet the real world implications are astounding. What’s very interesting, or authentic about the topics is that he ‘backs it up’ with real world evidence from his career or in mathematical proofs. 

TOPICS

This post aims to showcase two of his papers on a model for verifying p-values (pdf download here) and correlation mistakes (link). Included are his own personal walkthroughs of the concepts and the thinking behind his proofs after which I attempt to summarise the key points of the concepts. Finally I’ve included some code that you may run yourself to get a practical feel for the implications of Taleb’s work.

Perhaps most controversially and most relevant to our blog series, is that he will vehemently call out errors, especially from those with high authority. In Taleb we have a famous author, a successful trader and skilled mathematician ready to fight false claims or bad science. He is particularly attuned to identifying the bias that pervades in research. If you’re a little in doubt as to what I mean have a quick scan of his Twitter feed. 

One very current example of his expertise and courage to speak out comes from the coronavirus outbreak. The statistical reports from China were enough for him to sound the alarm in mid January, as he co-authored this paper (link) on the precautionary principle. Sadly it took much longer before the potential of this disease was acted upon by people and government in the West. (Update: he has just released this draft version of a paper looking at covid-19 in more detail.)

THE META DISTRIBUTION OF STANDARD P-VALUES (LINK)

https://youtube.com/watch?v=8qrfSh07rT0%3Fversion%3D3%26rel%3D1%26showsearch%3D0%26showinfo%3D1%26iv_load_policy%3D1%26fs%3D1%26hl%3Den%26autohide%3D2%26wmode%3Dtransparent

His verbal style is quite casual in this walkthrough so hopefully the following clarifies his points and provides an accurate summary of the paper without diving into the extremely advanced mathematics. Just before the 7 minute mark is where he demonstrates an adjusting of the p-value with a graph output.

KEY POINTS

  1. What is the paper about?

P-value distributions. That is to say statistical enquiry in searching for what Taleb calls the ‘true’ p-value, not one that is decided upon or achieved by hacking.

  1. A skewed meta distribution

Meta analysis of published p-values proves how unlikely these results are, the values are very skewed towards the threshold of 0.05. The assumption in the modelling is that researchers whilst p-hacking will select the lowest of their p-value results.

This graph shows how the minimum value decreases as more and more trials are carried out

  1. Volatility

Talebs research shows that p-values are inherently random regardless of the sample size.

  1. Optionality

The option to showcase one trial’s p-value without disclosing the number of trials is classed as a convex payoff.

  1. Law of large numbers

This theory (image below from link), describes “the average of the results obtained from a large number of trials should be close to the expected value“, a method used by Taleb in creating a p-value distribution. You can watch his own video on the topic here (link).

  1. Resulting evidence

60% of the true p-value of .12 will be below .05. This implies serious gaming and “phacking” by researchers, even under a moderate amount of repetition of experiments.

  1. Significance illusion

Using a monte carlo simulation (code which simulates the stochastic (random) process) creates the output below. It looks significant because of the skewness of the data.

  1. Requirements

For a researcher or scientist to be able to realistically infer significance ”one needs a p-value of at least one order of magnitude smaller”.

The Randomness of Correlation and its Hacking by Bigdataists (link)

https://youtube.com/watch?v=yUooqL47akM%3Fversion%3D3%26rel%3D1%26showsearch%3D0%26showinfo%3D1%26iv_load_policy%3D1%26fs%3D1%26hl%3Den%26autohide%3D2%26wmode%3Dtransparent

In Taleb’s own words “This tutorial presents the intuitions of the randomness of sample correlation (spurious correlation) and the methodologies in derivations.”  

Around the seven minute mark, Taleb runs a p-hacking demonstration, showing how with two completely independent and random variables there can sometimes look to be ‘significant results’. I feel like this video provides a far more accessible insight into how random correlation can be and how it can be hacked to provide the desired answer.

As he goes on to further explain the supporting paper he points out using elements of an equation from 1915! Interestingly that math was authored by R. Fisher, the ‘godfather’ of the p-value. Read our earlier blog discussing his contributions to research methodology here.

RUN THE CODE

As an ode to Taleb and his background as a trader, I’ve included this Python code on finance oriented data. (Here is the link to the original author and article). The key reason for featuring this code is to look at what’s called a Monte Carlo simulation, a process mentioned multiple times in his books. The overall aim of this section is to provide a more detailed practical look at p-value verification. If you haven’t already, visit my other code oriented post first to get a feel of changing variables to change p-values. 

Essentially it generates randomness (there are three types of simulation: there is entirely random Monte Carlo’s, Random within a Normal Distribution Monte Carlo’s, and simple Random Trade Order) which can provide an unbiased range of outcomes and here, demonstrates the p-value volatility to which he alluded in his paper. 

The author of this code wrote an accompanying blog in a very interesting and personal way. He defines his usage of entirely random or ‘pseudo’ Monte Carlo paying tribute to our man of focus, Taleb. He states the point of this code is “to model the probability of complex events by compiling thousands – millions of various outcomes with a pre-determined ‘random’ (changing) variable” or “to determine the probability of outcomes”. For us, in relation to p-values, the ability to measure differences that are dependant on variables is key. 

Please download from the original source here and view using a Jupyter Notebook. The authors blog and inline code comments provide excellent insight so I will simply highlight the most interesting aspects.

The screenshot below (using Jupyter Notebook), is further along in the code where the Monte Carlo simulation begins. Earlier code in the notebook sets up the data from an imaginary investment portfolio “to predict expected returns, variance and worst-case scenarios”.

Below is a key part of the Monte Carlo algorithm, set to 1000 loops. As the author warns its a very computationally intensive loop – changing this figure to 8000 iterations added ten seconds to seeing the visualization and spiked my processor to max capacity!

Next, we have a visualization of the code that shows what is intuitive, that the higher returns come hand in hand with more volatility (that is to say risk or unpredictably). He highlights also what researchers when p-hacking would do, “you can make these return and volatility columns maximize anything you’d like”, such as “correlation”.

Correlation doesn’t mean causation is a meme at this point. Using randomness to detect and prevent what Taleb calls ‘spurious’ correlation is clearly valuable. The question we ask of those that claim they were not consciously p-hacking is how hard did they try to disprove the apparent correlation? Our author Zach includes a final visualization (heatmap) for this task. In his colourful data example, we can see the squares represent companies who apparently correlate over these 1000 runs. 

Richard Feynman’s sound advice in principled scientific thinking is often referenced when discussing research errors. I liked this advice from another blog!

“There should be a drinking game in which everybody has to drink when somebody cites Feynman in the context of the replication crisis. Just make sure that you use a drink with < 5% or maybe even < .05% alcohol” (link).

NOTES

  1. Proofs and his other research are either in his appendices or published on the web. (His website: link, his web only mathematics book: link part 1 and his published papers on the research site arxiv.org: link)
  2. This paper (https://arxiv.org/pdf/1906.06711.pdf) produced by researchers at the University of California in 2019 takes a similar approach to detecting p-hacking, also using monte carlo simulations. There is extensive data visualisation featured.
  1. For further code to look at and try see my other post on ‘practical p-values’.
  2. A bonus for any curious or sceptical reader about how prescient can be… this post highlights Taleb’s outspokenness on the replication crisis.

Leave a Reply