stringi: Fast and Portable Character String Processing in R
Dublin Core
Title
stringi: Fast and Portable Character String Processing in R
Subject
, character strings, text, ICU, Unicode, regular expressions, data cleansing,
natural language processing, R.
natural language processing, R.
Description
Effective processing of character strings is required at various stages of data analysis
pipelines: from data cleansing and preparation, through information extraction, to report
generation. Pattern searching, string collation and sorting, normalization, transliteration,
and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package
for fast and portable handling of string data based on ICU (International Components
for Unicode), should be included in each statistician’s or data scientist’s repertoire to
complement their numerical computing and data wrangling skills
pipelines: from data cleansing and preparation, through information extraction, to report
generation. Pattern searching, string collation and sorting, normalization, transliteration,
and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package
for fast and portable handling of string data based on ICU (International Components
for Unicode), should be included in each statistician’s or data scientist’s repertoire to
complement their numerical computing and data wrangling skills
Creator
Marek Gagolewski
Source
https://www.jstatsoft.org/article/view/v103i02
Publisher
Deakin University
Polish Academy of Sciences
Polish Academy of Sciences
Date
July 2022
Contributor
Fajar bagus W
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Marek Gagolewski, “stringi: Fast and Portable Character String Processing in R,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/8258.