PeerJ Computer Science’s most cited paper: Probabilistic programming in Python using PyMC3 – Author Interview with Thomas Wiecki
Impact is more than just a number. For this interview series we revisited some of the most cited papers published in PeerJ Computer Science and asked the authors to tell us about the real-world impact and applications of their research beyond the number of citations they garnered.
In the first interview Thomas Wiecki, co-author of PeerJ Computer Science‘s most cited paper ‘Probabilistic programming in Python using PyMC3‘, talks about some of the myriad applications of the research and the impact of its publication.
Can you tell us a bit about yourself?
I studied bioinformatics in Tübingen, Germany where I worked for many years as a research assistant at the Max Planck Institute for Intelligent Systems. This time has been very influential on me and showed me how machine learning and statistics can be used to solve difficult problems that arise when trying to unravel the mysteries of the brain.
This interest led me to pursue a PhD at Brown University where I built neural networks models of the basal ganglia — a brain structure at the center of the brain involved in decision making and impacted by various neurological and psychiatric disorders. To validate these models it is important to combine the simulated results with behavioral data from psychological experiments. In order to analyze that data I found Bayesian modeling to be an invaluable tool which led me to become involved in PyMC3, the software project we eventually published in PeerJ Computer Science.
After I finished my PhD I worked as the VP of Data Science at Quantopian Inc — a Boston-based startup in the quant finance domain. Interestingly, I found Bayesian statistics and PyMC3 to be a perfect tool for most of the problems I encountered in this completely different domain too. These days I consult companies on how to solve challenging problems using Bayesian statistics as well as teach corporate training workshops on this topic. I’m also a podcaster, blogger and I also like to tweet about Data Science and Python. If you are interested in working with me, please get in touch by email
Can you briefly explain the research you published in PeerJ Computer Science?
PyMC3 is an open-source probabilistic programming package for Python which allows users to easily build complex statistical models. A central theme of Bayesian statistics is uncertainty quantification, so whenever we fit a model we also get an uncertainty estimate in the form of a probability distribution as to how certain we are in our estimates and whether other parameter values could also describe the data well.
What’s really cool is that these uncertainty estimates can be carried through to the decision making process. So whenever we have to make decisions in uncertain conditions, Bayesian modeling is a powerful ally. I wrote a blog post about this technique, called Bayesian decision making, here.
Can you tell us anything about how your research has been applied and re-used since its publication?
The software is being used successfully in academia as well as industry by researchers and data scientists around the world.
I am thrilled by the impact PyMC3 has had on academia: the paper has currently been cited over 650 times (268 of those citations are in 2019 alone). It is being used across many different domains including chemistry, physics, neuroscience, astronomy, genetics, psychology, and many more. In astronomy, the software is used to detect planets outside of the solar system which gave rise to the exoplanet software built on PyMC3. In seismology, the software is used to detect earthquakes and gave rise to the BEAT software using PyMC3.
In industry, the software is being used by companies including Google, SpaceX, SalesForce, Novartis, Hotels.com, trivago.de, channel-4 news. At SpaceX for example, the software was used to optimize supply chains. At trivago the software was used to assess which website-modification generates the most profit. Osvaldo Martin also wrote a book on it called “Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ”.
We currently have 20 core contributors to the repository who meet regularly online as well as in-person for developer sprints (which have been generously sponsored by Google). At one of our recent sprints, we started development of the next version: PyMC4, a complete rewrite based on TensorFlow rather than Theano.
What persuaded you to publish with PeerJ Computer Science when it was still a relatively unknown quantity?
Several reasons. PeerJ Computer Science is one of the few journals that realize the importance and scientific value of pure software contributions and has been ahead of the curve in that regard. It is also open access which was very important in the decision making progress. The fact that it was relatively unknown at the time was actually a benefit because it was obvious that the journal was trying to do something different than what came before and we really were not happy with the old publishing model.
How would you describe your experience of PeerJ?
We found the whole review and publishing process to be very enjoyable and would definitely publish in PeerJ again.
Thank you for sharing your research journey, Thomas! You can read his monumental paper here
Join Thomas and thousands of other satisfied authors, and submit your next article to PeerJ Computer Science