intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset
Dublin Core
Title
intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset
Subject
nearest neighbors, likelihood-based method, heterogeneous intrinsic dimension,
Bayesian mixture model, R
Bayesian mixture model, R
Description
This article illustrates intRinsic, an R package that implements novel state-of-the-art
likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity
for most dimensionality reduction techniques. In order to make these novel estimators
easily accessible, the package contains a small number of high-level functions that rely on
a broader set of efficient, low-level routines. Generally speaking, intRinsic encompasses
models that fall into two categories: homogeneous and heterogeneous intrinsic dimension
estimators. The first category contains the two nearest neighbors estimator, a method
derived from the distributional properties of the ratios of the distances between each data
point and its first two closest neighbors. The functions dedicated to this method carry out
inference under both the frequentist and Bayesian frameworks. In the second category, we
find the heterogeneous intrinsic dimension algorithm, a Bayesian mixture model for which
an efficient Gibbs sampler is implemented. After presenting the theoretical background,
we demonstrate the performance of the models on simulated datasets. This way, we can
facilitate the exposition by immediately assessing the validity of the results. Then, we
employ the package to study the intrinsic dimension of the Alon dataset, obtained from a
famous microarray experiment. Finally, we show how the estimation of homogeneous and
heterogeneous intrinsic dimensions allows us to gain valuable insights into the topological
structure of a dataset.
likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity
for most dimensionality reduction techniques. In order to make these novel estimators
easily accessible, the package contains a small number of high-level functions that rely on
a broader set of efficient, low-level routines. Generally speaking, intRinsic encompasses
models that fall into two categories: homogeneous and heterogeneous intrinsic dimension
estimators. The first category contains the two nearest neighbors estimator, a method
derived from the distributional properties of the ratios of the distances between each data
point and its first two closest neighbors. The functions dedicated to this method carry out
inference under both the frequentist and Bayesian frameworks. In the second category, we
find the heterogeneous intrinsic dimension algorithm, a Bayesian mixture model for which
an efficient Gibbs sampler is implemented. After presenting the theoretical background,
we demonstrate the performance of the models on simulated datasets. This way, we can
facilitate the exposition by immediately assessing the validity of the results. Then, we
employ the package to study the intrinsic dimension of the Alon dataset, obtained from a
famous microarray experiment. Finally, we show how the estimation of homogeneous and
heterogeneous intrinsic dimensions allows us to gain valuable insights into the topological
structure of a dataset.
Creator
Francesco Denti
Source
https://www.jstatsoft.org/article/view/v106i09
Publisher
Università Cattolica del Sacro Cuore
Date
March 2023
Contributor
Fajar bagus W
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Francesco Denti, “intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/8300.