MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
Dublin Core
Title
MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
Subject
penalized regression, correlated variables, hierarchical clustering, group selection,
R
R
Description
The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables,
which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this
context, the performance of classical Lasso-based approaches strongly deteriorates as the
redundancy increases.
The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides
at each level a partition of the variables into groups. Then, the set of groups of variables
from the different levels of the hierarchy is given as input to group-Lasso, with weights
adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter.
The versatility offered by package MLGL to choose groups at different levels of the
hierarchy a priori induces a high computational complexity. MLGL, however, exploits the
structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final
time cost. The final choice of the regularization parameter – and therefore the final choice
of groups – is made by a multiple hierarchical testing procedure.
which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this
context, the performance of classical Lasso-based approaches strongly deteriorates as the
redundancy increases.
The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides
at each level a partition of the variables into groups. Then, the set of groups of variables
from the different levels of the hierarchy is given as input to group-Lasso, with weights
adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter.
The versatility offered by package MLGL to choose groups at different levels of the
hierarchy a priori induces a high computational complexity. MLGL, however, exploits the
structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final
time cost. The final choice of the regularization parameter – and therefore the final choice
of groups – is made by a multiple hierarchical testing procedure.
Creator
Quentin Grimonprez
Source
https://www.jstatsoft.org/article/view/v106i03
Publisher
Inria Lille-Nord Europe
Date
March 2023
Contributor
Fajar bagus W
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Quentin Grimonprez, “MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/8294.