stringi: Fast and Portable Character String Processing in R

Dublin Core

Title

Subject

, character strings, text, ICU, Unicode, regular expressions, data cleansing,
natural language processing, R.

Description

Effective processing of character strings is required at various stages of data analysis
pipelines: from data cleansing and preparation, through information extraction, to report
generation. Pattern searching, string collation and sorting, normalization, transliteration,
and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package
for fast and portable handling of string data based on ICU (International Components
for Unicode), should be included in each statistician’s or data scientist’s repertoire to
complement their numerical computing and data wrangling skills

Creator

Marek Gagolewski

Source

https://www.jstatsoft.org/article/view/v103i02

Publisher

Deakin University
Polish Academy of Sciences

Date

July 2022

Contributor

Fajar bagus W

Format

PDF

Language

English

Type

Text

Files

v103i02.pdf

Collection

VOLUME 103 Tahun 2022

Citation

Marek Gagolewski, “stringi: Fast and Portable Character String Processing in R,” Repository Horizon University Indonesia, accessed April 14, 2026, https://repository.horizon.ac.id/items/show/8258.