Stop Hiding Your Code
We, @PLOS, @PLOSONE and the open source community, will discuss why and how to #ShareYourCode in a tweet chat on 25 April, 10-11am Pacific Daylight Time/6-7pm British Summer Time. Join us!
A cornerstone of science is reproducibility, the notion that any colleague anywhere should be able to read your paper, repeat experiments or analysis, and come to the same conclusions as you did. Difficulties reproducing results are, in the vast majority of cases, not due to the authors having made mistakes or, even worse, forged data. Rather, the lack of detail about methods, data, and code in a paper is often the greatest impediment. Code written to simulate a system, analyse data, or generate figures forms a central part of the methods, at least in physics. It would be common sense to make it as easy as possible for colleagues to read and run the code to pre-empt and mitigate reproducibility and repeatability issues that could arise from any lack of detail.
However, this is not standard practice. Only a fraction of the authors that publish results relying on computations make code available upon request, even if the journal mandates it. Researchers express concerns that their code is “unelegant”, a piece of DIY with rough edges that works but looks like it has been written by a self-taught physicist rather than a software engineer. Cleaning it up and documenting everything properly would take so long that it would be a disproportionate burden on a student’s or postdoc’s time. The sentiment is that it is often quicker to write code anew rather than try to understand what someone else did. The argument also works the other way around: the PI spent years or decades writing up the code base that is at the core of his or her work, and the group makes gradual changes to get new results. All in FORTRAN 77.
There are many compelling reasons to open up your source code, irrespective of whether you are a graduate student or the PI of a massive grant. In the latter case, you are probably handling tax payers’ money anyway, so returning value to them is a moral obligation which makes being open and honest with your results an ethical necessity of sorts.
The easier it is for someone else to build on your results, the more impact the paper will have. Once your paper is accepted, it faces competition from hundreds of thousands of papers that are published around the same time in the same field. To make it stand out, you have to pay attention to the ecosystem around your research. Part of it is the ease of building on your results. Your paper does not live in isolation after all. There is evidence that sharing details about your data will increase the impact the work makes.
For a generation that grew up with computers, code is sometimes easier to understand than equations. It is undeniable that most academic coding efforts are one-off, ad-hoc solutions but making such code available online is still valuable for people who learned programming before calculus. If done well, you might attract developers polishing up the less glorious pieces.
Are you worried that instead of constructive contributions you will receive countless support requests? You must address a serious pain-point for a wide range of users for this to happen. Unless your discovery is the best thing since sliced bread, you probably won’t have to worry about a thousand support requests. And if it is, congratulations, making the code available will mean that the impact of the work is going to be huge. Furthermore, by knowing that the code could be studied and used by others, you force yourself to make it more maintainable and better documented, which will make your life easier if you ever want to use that code again, particularly when one student leaves and hands the baton to the next one.
Finally, the developers of the code are often PhD students or postdocs and it is a PI’s duty to consider their future. By making the code available at appropriate repositories, they get a fair attribution for their work which is especially important if they do not continue their career in academia. Having a great GitHub profile makes it more likely that they get a swanky job at Google, IBM and co.
If you made up your mind that you want to open up your code, you will realize that it is much easier to do than you think. You have countless options for sharing: GitHub is the most common platform, but there is also GitLab, Bickbucket, and many others. These commercial offerings are great, widely used, and make your work easy to find, but as scientists, we crave for something less ephemeral. Zenodo gives you a DOI for a particular version of your code, for instance, the one that was used in a paper. It was created by CERN researchers, continues to be non-commercial and integrates with GitHub, so getting a DOI for your code is a matter of a few clicks.
Once you picked your sharing platform, you can’t go wrong. The only forbidden thing is not having a license accompanying your code because then people don’t know what they are allowed to do with your code. GPL-type licenses are hostile towards non-academic users, since they require open sourcing any subsequent modification under the same license. If you want to maximize your impact, choose a permissive BSD-style or MIT license. The sharing platforms listed above offer you a variety of licenses when you create a new repository, so your cognitive burden is minimal.
With the choice of sharing platform and a license, your overhead is more or less done. It takes a day of effort to clean up your code and add some comments; if your programming skills are abysmal, Software Carpentry provides a series of lectures that can be covered in a few days and will result in vast improvements in how you develop scientific code.
If you want to take the pedagogical angle seriously, Jupyter notebooks are for you. They provide a framework for literate programming, that is, you can explain what you did along with equations, plots, and the code in the same environment. Once the code is clean, creating a nice, educational notebook takes about half a day.
With new efforts at PLOS ONE to be of service to the physics communities, we are working on changing the culture of secrecy around code. We started by having a call for papers to invite contributions that value openness in the way research is conducted. Our hope is that physics will embrace open science and we will look back at the era of closedness and isolation as the dark ages.
Image: In the 21st century, sharing code does not require snail mail. Image by Arnold Reinhold, Wikimedia shared under Creative Commons Attribution-Share Alike 2.5 license.