Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

Dublin Core

Title

Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

Subject

: dynamic programming, optimal changepoint detection, peak detection, genomic
data, R

Description

We describe a new algorithm and R package for peak detection in genomic data sets
using constrained changepoint models. These detect changes from background to peak
regions by imposing the constraint that the mean should alternately increase then decrease. An existing algorithm for this problem exists, and gives state-of-the-art accuracy
results, but it is computationally expensive when the number of changes is large. We
propose a dynamic programming algorithm that jointly estimates the number of peaks
and their locations by minimizing a cost function which consists of a data fitting term and
a penalty for each changepoint. Empirically this algorithm has a cost that is O(N log(N))
for analyzing data of length N. We also propose a sequential search algorithm that finds
the best solution with K segments in O(log(K)N log(N)) time, which is much faster than
the previous O(KN log(N)) algorithm. We show that our disk-based implementation in
the PeakSegDisk R package can be used to quickly compute constrained optimal models
with many changepoints, which are needed to analyze typical genomic data sets that have
tens of millions of observations.

Creator

Toby Dylan Hocking

Source

https://www.jstatsoft.org/article/view/v101i10

Publisher

Northern Arizona University

Date

January 2022

Contributor

Fajar bagus W

Format

PDF

Language

English

Type

Text

Files

Citation

Toby Dylan Hocking, “Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/8244.