Written by Jänis Järvilehto (https://linkedin.com/in/janisj)
The Catalysis research group has published its first-ever open code! Coined DReaM-ALD, the recently published Matlab script provides an implementation of a diffusion–reaction model developed by Ylilammi et al. (J. Appl. Phys. 123, 205301 (2018), DOI: 10.1063/1.5028178). The model simulates atomic layer deposition in high-aspect-ratio structures and generates saturation profiles, which show how the film thickness evolves with penetration into the structure. This Matlab implementation was originally written by Emma Verkama in 2019 by request of Prof. Riikka Puurunen, and the code was later published by Jänis Järvilehto. In addition to Github, the code was also made available on Zenodo.
There was a significant delay (~3 years) between the creation and publication of the code. As there was no prior history of open code in the group, the barrier to publication was relatively high. Questions, such as…
Where should we publish the code? What kind of information would be useful for a potential user? How does the process work in general? Where do I click???
…may arise. While the best practices may seem obvious to a software engineer, generating research code can be a messy affair in other fields. Often, these researchers in other fields are less acquainted with the tools of the trade, such as a version control system like Github. Someone in the group would have to figure out how and where to publish the code so that it is easily accessible and citable. This causes useful and interesting projects to end up forgotten on university network drives, instead of being available for anyone to extend and experiment with. It is also difficult to replicate the results of a published work without access to the original code.
As a starting point or inspiration for other groups that are unsure about open code, we’ve summarized the steps we went through to get our code out there. We’ve also linked some helpful resources we’ve run into during the process!
Step 1: Set up an Organization on Github
If your research group has not published on Github yet, it is a good idea to think about how the repositories should be organized. We opted to create an Organization for the research group — this way the repository’s association with the Catalysis research group is clear. Remember to assign a second Organization Owner for your Organization, just in case. The other research group members can be added as Organization Members.
Step 2: Decide on a license
Choose a license for the software you intend to publish, if you haven’t already done so. Keep in mind that there may be restrictions on the type of license you can use, set by your organization or funding source. You can find an overview of common licenses at https://choosealicense.com/, for example. In our case, we opted for the MIT license.
Step 3: Set up a repository
Next, it is time to set up a repository for the project. There are two paths to take here: either one directly creates the repository to be published later, or, a draft repository can be created first. On Github, all changes to a repository can be traced back, so a potential editing history of, for example, the README.md file will be publicly visible. We decided to forgo the draft repository, as we had already prepared the repository content elsewhere. However, we set the repository to Private at this stage. At this point, you can push your code and a LICENSE.txt file to the repository.
Step 4: Create a README.md file
The README.md file is displayed whenever someone visits the repository. You can use the README.md as an opportunity to give a brief description of the project, reference relevant literature, explain how to use the code, and how you would like others to cite it in their work. We also decided to include a DOI badge, among others, at the top of the README.md. We used https://shields.io/ to generate the badges (shields) for our README.md file.
Step 5: Make the repository (easily) citable
Anyone can cite the repository as-is, however, there are some things you can do to make it more straightforward. For starters, we included a Citing section in our README.md to give an example on how to cite the repository. Furthermore, we created a CITATION.cff file in the root of the repository using cffinit. The CITATION.cff file causes Github to display a Cite this repository button in the About section of the repository, which provides a preformatted citation and BibTeX entry. The file contains citation information in a machine-readable format; you can find further information at https://citation-file-format.github.io/.
At this point, we also created a .zenodo.json file for the Zenodo synchronization (enabled in Step 7). This json file is placed at the root of the repository and Zenodo uses it to generate the deposit metadata. There are various fields you can fill out using the .zenodo.json file, a complete list of which can be found at here. We also validated our file before pushing it using a json validator, as it is easy to mess up the formatting.
Step 6: Publish!
Here, the procedure varies slightly depending whether a draft repository was created or not. If you created a draft repository, simply copy its contents to a new, public repository. As we were satisfied with our repository, we simply changed its visibility to Public. Now the code is publicly available on Github!
Step 7: Synchronize the repository with Zenodo
To obtain a DOI for our repository, we archived it as a Zenodo deposit. The most efficient way to achieve this is to connect your Github account to Zenodo, enable synchronization for the repository on Zenodo, and finally create a Release on Github. You can find more information on how to connect the accounts on https://help.zenodo.org/. After this, Zenodo will automatically issue a new versioned DOI whenever a new Release is created on Github. Notably, this only works if no prior Releases have been created before enabling synchronization. After obtaining our first DOI, we decided to update it in the README.md and CITATION.cff files as well.
Hopefully this blog post has been helpful in condensing the most relevant learnings of the process we went through to get our project published. While the steps may seem convoluted, the whole process should only take a few minutes, provided that the files have been prepared in advance. For further reading on research code, you can check out:
- Git — Aalto Scientific Computing (ASC)
- Information for new group leaders — Aalto Scientific Computing (ASC)
- P. J. Mineault & The Good Research Code Handbook Community (2021). The Good Research Code Handbook. Zenodo. doi:10.5281/zenodo.5796873
- G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, T. K. Teal (2017). Good enough practices in scientific computing. PLoS Comput. Biol. 13 (6) e1005510. https://doi.org/10.1371/journal.pcbi.1005510
Update 28.9.2023 (Riikka Puurunen): A video is available in Youtube, where a code is published, using these instructions. Access it at: https://youtu.be/ksxAIaytv68?si=3eWQF9lG2zbQ7ftP.