There has been a lot of theoretical discussion thus far on my p-hacking posts which is necessary to get the foundations of a complex subject well understood. But if you are anything like me then you will find a lot of value in practical examples and demonstrations. Even more so if they are interactive.
First off is a simplified p-value calculator. A Google search will typically present you with multiple calculators but often they are very complex. With that in mind I edited an example calculator (link to the original creator here) for you to play around with.
With my version, the P value threshold is the well known 0.05 and you can see what number of cases in each region creates a significant result. I feel like the choice of a health example is quite relevant in todays coronavirus dominated world.https://codepen.io/grepmalc/embed/preview/gOpZdjy?height=300&slug-hash=gOpZdjy&default-tabs=js,result&host=https://codepen.io
The first graph (pseudo data demo from Highcharts) below shows a normal density plot or distribution. You can think of it as a pattern of results. The dummy data in the code is shown with the scatter plot. This scatter of results is what generates the ‘normal’, popular bell curve, also known as a gaussian distribution. The most important reference point is the axis on the right hand side which relates to the density or probability of the result as the area under the curve equates to 100% or 1.https://codepen.io/grepmalc/embed/preview/mdJaawG?height=300&slug-hash=mdJaawG&default-tabs=js,result&host=https://codepen.io
Next, is also a normal distribution graph. I’ve edited the CSS a little from the original author (link – also includes his creation process). This gives a little insight on the math but for us creates a visual representation of the threshold p-value at 0.05 – lying in the extremes of the distribution, whats known as a tail probability. The closer you get to a normal result the higher the p-value.https://codepen.io/grepmalc/embed/preview/gOpZZzE?height=300&slug-hash=gOpZZzE&default-tabs=js,result&host=https://codepen.io
With the concept clearly defined, perhaps you want to visit the data specialists at fivethirtyeight.com for a fantastic interactive tool shown in a screenshot below. Here you can adjust the variables (there are 1800 combinations!) to recreate statistical significance in a very real world scenario. This is your chance to be a p-hacker!
One last tool you could try: a prototype app that trains you to be a better p-hacker! Start with the introductory blog post here and see what you can achieve!
Finally I wanted to share an explainer video from the Chief Decision Scientist at Google. I’m sharing this not only because of her position at one of the worlds most prominent companies, but because she does such a good job of avoiding monotony unlike most statistically oriented videos! The example is quite vivid!