WHAT DOES IT ALL MEAN FOR YOU?
This series on p-hacking has been very rewarding on a personal level as I feel it’s provided me with a much better understanding of what the research world entails. I hope I’ve been able to convey as much in my writing. If you’re a budding data scientist like the first year undergraduate I am, then you probably have a few questions or reservations about the series as a whole. Let me round off the series by answering them.
WHY WRITE? WHY NOT A PAPER?
So I admit that every aspect of this series is self serving in that I want to challenge myself in learning new skills, new topics and yet do something that can hopefully be of general value to others outside of Noroff University.
It appears absolutely vital for a data scientist to have good written and verbal communication skills, more so than other computer science related topics. This is due to conveying the difficult topic of prediction or complexity of research to a board of managers or other project stakeholders.
A paper is not a typical form of communication in this sense, so we chose to write a blog as its the form from which we have received so much value. In addition we wanted to write in an accessible natural way, communicating technical topics in a digestible but also enjoyable way. We view our project as a form of curation, translation and highlighting of a unique blend of information most first years are not presented with.
A final point is that this web based blog format allowed us to highlight some key video from the blog’s namesake. I’ve provided an audio version of my series as I know many people find it a better format over reading but also allows for listening on the go.
WHY A RESEARCH TOPIC?
So why did we choose a research based topic over that of something from the pure computing science domain? Quite simply, we sought the biggest challenge, throwing ourselves into something that we had little background in. I don’t want to speak on Timo’s behalf but I knew literally nothing about the process of science before tackling this series.
Getting to grips with the research topic of the replication crises involved a lot of background reading I am very thankful for having done looking back. It was Timo’s ambition in tackling a topic of far reaching consequence that made it very appealing to join him on this year long project.
I ended up focusing my side of the project on p-hacking for a couple of reasons. The experienced data scientist spends a lot of his time designing or running experiments and so it made sense to get a better understanding of the prime ‘scoring’ method of this process. Equally having read all of Taleb’s work, I felt there was a bigger picture of effect in the real world that could be understood better with this angle. I wanted to dive deeper into how a data scientist experiments and makes predictions.
WHAT HAVE YOU LEARNED?
From the outside looking in, science and research can be very intimidating. People have dedicated lifetimes to their work with little knowledge of what lies at the other side. Part of the impression also relates to how important their work can be with one successful experiment changing the course of the world. I’ve learned over the last six months what role a data scientist can play in this field.
I’d always viewed data science in the narrow interpretation as a part of computer science, with a software engineering mastery and the ability to work with machine learning. Now I know many data scientists come from a diametrically opposite position, from math and statistics and use code to express and define their experiments.
There is definitely one interpretation of the role that suggests becoming or being a scientist first and foremost, and a skilled data engineer second. This also implies the need for domain prowess. For example, I intuit that competing as a data scientist in the world of finance would be almost impossible when pitted against others that have years of experience understanding how markets work. It appears from much of my reading that a data scientist, even when working in larger organizations and teams, has a great deal of autonomy and so this domain knowledge goes a long way to being effective in the job. It would entail choosing which problems to work on but also in communicating a prediction or strategy to the rest of the team.
Reading NN Taleb’s work was the starting point for me in turning towards learning the skills that would result in a career change. It was obvious his skills and philosophical angle created a different outlook on the world, one devoid of bias, one with a hunger to see the way things are, able to spot errors behind the shield of apparent intellectualism. His discussions and paper on p-hacking were instrumental in choosing it as a topic for this series.
I’m aware I perhaps sound like a follower of a faith or movement, but that is just rampant enthusiasm. His style is unique and definitely not for everyone and I can’t say I always agree with his Twitter use! Here are a couple excerpts from one of his books, a short collection of aphorisms (he has also published a few other book chapters on Medium.com) if you want to get a taste of his style.
“They think that intelligence is about noticing things that are relevant (detecting patterns); in a complex world, intelligence consists in ignoring things that are irrelevant (avoiding false patterns).”
“Half of the people lie with their lips; the other half with their tears”
“They will envy you for your success, your wealth, for your intelligence, for your looks, for your status – but rarely for your wisdom.”
“Academia is to knowledge what prostitution is to love; close enough on the surface but, to the nonsucker, not exactly the same thing”
“In science you need to understand the world; in business you need others to misunderstand it.”
“Almost all those caught making a logical fallacy interpret it as a “disagreement.”
Thanks for reading. If you have any feedback I’d love to hear from you. Happy studying going forwards!