Data Validation Infrastructure for R

Dublin Core

Title

Data Validation Infrastructure for R

Subject

data checking, data quality, data cleaning, R

Description

Checking data quality against domain knowledge is a common activity that pervades
statistical analysis from raw data to output. The R package validate facilitates this
task by capturing and applying expert knowledge in the form of validation rules: logical
restrictions on variables, records, or data sets that should be satisfied before they are
considered valid input for further analysis. In the validate package, validation rules are
objects of computation that can be manipulated, investigated, and confronted with data
or versions of a data set. The results of a confrontation are then available for further
investigation, summarization or visualization. Validation rules can also be endowed with
metadata and documentation and they may be stored or retrieved from external sources
such as text files or tabular formats. This data validation infrastructure thus allows for
systematic, user-defined definition of data quality requirements that can be reused for
various versions of a data set or by data correction algorithms that are parameterized by
validation rules.

Creator

Mark P. J. van der Loo

Source

https://www.jstatsoft.org/article/view/v097i10

Publisher

Mark P. J. van der Loo, Edwin de Jonge

Date

Januari 2021

Contributor

Fajar Bagus w

Format

PDF

Language

Inggris

Type

Text

Files

Citation

Mark P. J. van der Loo, “Data Validation Infrastructure for R,” Repository Horizon University Indonesia, accessed April 12, 2025, https://repository.horizon.ac.id/items/show/8185.