Using GNU Make to Manage the Workflow of Data Analysis Projects

Dublin Core

Title

Using GNU Make to Manage the Workflow of Data Analysis Projects

Subject

GNU Make, Make, reproducible research, R, rmarkdown, Sweave, Stata, SAS

Description

Data analysis projects invariably involve a series of steps such as reading, cleaning,
summarizing and plotting data, statistical analysis and reporting. To facilitate reproducible research, rather than employing a relatively ad-hoc point-and-click cut-and-paste
approach, we typically break down these tasks into manageable chunks by employing separate files of statistical, programming or text processing syntax for each step including the
final report. Real world data analysis often requires an iterative process because many
of these steps may need to be repeated any number of times. Manually repeating these
steps is problematic in that some necessary steps may be left out or some reported results
may not be for the most recent data set or syntax.
GNU Make may be used to automate the mundane task of regenerating output given
dependencies between syntax and data files. In addition to facilitating the management
of and documenting the workflow of a complex data analysis project, such automation
can help minimize errors and make the project more reproducible. It is relatively simple
to construct Makefiles for small data analysis projects. As projects increase in size,
difficulties arise because GNU Make does not have inbuilt rules for statistical and related
software. Without such rules, Makefiles can become unwieldy and error-prone.
This article addresses these issues by providing GNU Make pattern rules for R, Sweave,
rmarkdown, SAS, Stata, Perl and Python to streamline management of data analysis and
reporting projects. Rules are used by adding a single line to project Makefiles. Additional
flexibility is incorporated for modifying standard program options. An overall strategy is
outlined for Makefile construction and illustrated via simple and complex examples.

Creator

Peter Baker

Source

https://www.jstatsoft.org/article/view/v094c01

Publisher

University of Queensland

Date

June 2020

Contributor

Fajar bagus W

Format

PDF

Language

Inggris

Type

Text

Files

Citation

Peter Baker, “Using GNU Make to Manage the Workflow of Data Analysis Projects,” Repository Horizon University Indonesia, accessed April 19, 2025, https://repository.horizon.ac.id/items/show/8149.