Category Archives: Data analysis

Remove some sources of frustration during data analysis

doit: a Python alternative to make

In an earlier post, I demonstrated the wonderfulness that is having a build-system and introduced the venerable make utility. You can use it to add plumbing to your analysis pipeline that keeps track of which analysis steps have been run for which subjects and whether some scripts have changed and should be run again. When properly implemented, it can save a lot of head aces, especially when deadlines are approaching, by making sure that everything is always ‘up do date’.

However, make is not the easiest tool to wrap your head around. The syntax is archaic, we needed to add “phony” targets and use a custom made, magical-looking function to iterate over subjects. So let me introduce you to another tool, called doit, which can be easier to use, especially if you are familiar with Python syntax. Continue reading doit: a Python alternative to make

make: intelligent plumbing for your analysis pipeline

Most of our data analysis happens in scripts. When these scripts grow and become more awesome, they also tend to take longer to run. We’re talking hours, sometimes even days, of computation time here. Naturally, it becomes unpractical to re-run the entire thing every time we change something. Instead, we only re-run the parts of it that changed. This post is about a tool that has been around for ages and can incredibly helpful here, but you may not have heard of it: make! Continue reading make: intelligent plumbing for your analysis pipeline