Data Validation Infrastructure for R
Dublin Core
Title
Data Validation Infrastructure for R
Subject
data checking, data quality, data cleaning, R
Description
Checking data quality against domain knowledge is a common activity that pervades
statistical analysis from raw data to output. The R package validate facilitates this
task by capturing and applying expert knowledge in the form of validation rules: logical
restrictions on variables, records, or data sets that should be satisfied before they are
considered valid input for further analysis. In the validate package, validation rules are
objects of computation that can be manipulated, investigated, and confronted with data
or versions of a data set. The results of a confrontation are then available for further
investigation, summarization or visualization. Validation rules can also be endowed with
metadata and documentation and they may be stored or retrieved from external sources
such as text files or tabular formats. This data validation infrastructure thus allows for
systematic, user-defined definition of data quality requirements that can be reused for
various versions of a data set or by data correction algorithms that are parameterized by
validation rules.
statistical analysis from raw data to output. The R package validate facilitates this
task by capturing and applying expert knowledge in the form of validation rules: logical
restrictions on variables, records, or data sets that should be satisfied before they are
considered valid input for further analysis. In the validate package, validation rules are
objects of computation that can be manipulated, investigated, and confronted with data
or versions of a data set. The results of a confrontation are then available for further
investigation, summarization or visualization. Validation rules can also be endowed with
metadata and documentation and they may be stored or retrieved from external sources
such as text files or tabular formats. This data validation infrastructure thus allows for
systematic, user-defined definition of data quality requirements that can be reused for
various versions of a data set or by data correction algorithms that are parameterized by
validation rules.
Creator
Mark P. J. van der Loo
Source
https://www.jstatsoft.org/article/view/v097i10
Publisher
Mark P. J. van der Loo, Edwin de Jonge
Date
Januari 2021
Contributor
Fajar Bagus w
Format
PDF
Language
Inggris
Type
Text
Files
Collection
Citation
Mark P. J. van der Loo, “Data Validation Infrastructure for R,” Repository Horizon University Indonesia, accessed April 12, 2025, https://repository.horizon.ac.id/items/show/8185.