Package: textclean 0.9.7

textclean: Text Cleaning Tools

Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.

Authors:Tyler Rinker [aut, cre], ctwheels StackOverflow [ctb], Surin Space [ctb]

textclean_0.9.7.tar.gz
textclean_0.9.7.zip(r-4.5)textclean_0.9.7.zip(r-4.4)textclean_0.9.7.zip(r-4.3)
textclean_0.9.7.tgz(r-4.4-any)textclean_0.9.7.tgz(r-4.3-any)
textclean_0.9.7.tar.gz(r-4.5-noble)textclean_0.9.7.tar.gz(r-4.4-noble)
textclean_0.9.7.tgz(r-4.4-emscripten)textclean_0.9.7.tgz(r-4.3-emscripten)
textclean.pdf |textclean.html
textclean/json (API)
NEWS

# Install 'textclean' in R:
install.packages('textclean', repos = c('https://trinker.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/trinker/textclean/issues

Datasets:
  • DATA - Fictitious Classroom Dialogue

On CRAN:

data-mungingemoticonsregextext-analysistext-cleaning

65 exports 245 stars 9.84 score 33 dependencies 21 dependents 1 mentions 676 scripts 4.4k downloads

Last updated 3 years agofrom:5443d7484c. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 25 2024
R-4.5-winNOTESep 25 2024
R-4.5-linuxNOTESep 25 2024
R-4.4-winNOTESep 25 2024
R-4.4-macNOTESep 25 2024
R-4.3-winNOTESep 25 2024
R-4.3-macNOTESep 25 2024

Exports:%like%%LIKE%%slike%%SLIKE%add_comma_spaceadd_missing_endmarkas_ordinalavailable_checkscheck_textdrop_elementdrop_element_fixeddrop_element_regexdrop_empty_rowdrop_NAdrop_rowfgsubfix_mdyyyyglueglue_collapsehas_endmarkis_itkeep_elementkeep_element_fixedkeep_element_regexkeep_rowmake_pluralmatch_tokensmgsubmgsub_fixedmgsub_regexmgsub_regex_safereplace_contractionreplace_curly_quotereplace_datereplace_emailreplace_emojireplace_emoji_identifierreplace_emoticonreplace_fromreplace_gradereplace_hashreplace_htmlreplace_incompletereplace_internet_slangreplace_kernreplace_misspellingreplace_moneyreplace_namesreplace_non_asciireplace_non_ascii2replace_numberreplace_ordinalreplace_ratingreplace_symbolreplace_tagreplace_timereplace_toreplace_tokensreplace_urlreplace_whitereplace_word_elongationstripsub_holderswapwhich_are

Dependencies:clicpp11data.tabledplyrdttenglishfansigenericsgluelatticelexiconlifecyclemagrittrmgsubNLPpillarpkgconfigpurrrqdapRegexR6rlangslamstringistringrsyuzhettextshapetibbletidyrtidyselectutf8vctrswithrzoo

Readme and manuals

Help Manual

Help pageTopics
SQL Style LIKE%LIKE% %like% %SLIKE% %slike%
Ensure Space After Commaadd_comma_space
Add Missing Endmarksadd_missing_endmark
Check Text For Potential Problemsavailable_checks check_text
Fictitious Classroom DialogueDATA
Filter Elements in a Vetordrop_element drop_element_fixed drop_element_regex keep_element keep_element_fixed keep_element_regex
Filter Rows That Contain Markersdrop_empty_row drop_NA drop_row keep_row
Replace a Regex with an Functional Operation on the Regex Matchfgsub
Coerce Character m/d/yyyy to Datefix_mdyyyy
Test for Incomplete Sentenceshas_endmark
Make Plural (or Verb to Singular) Versions of Wordsmake_plural
Find Tokens that Match a Regexmatch_tokens
Multiple 'gsub'mgsub mgsub_fixed mgsub_regex mgsub_regex_safe
Prints a check_text Objectprint.check_text
Prints a sub_holder objectprint.sub_holder
Prints a which_are_locs Objectprint.which_are_locs
Replace Contractionsreplace_contraction
Replace Dates With Wordsreplace_date
Replace Email Addressesreplace_email
Replace Emojis With Words/Identifierreplace_emoji replace_emoji_identifier
Replace Emoticons With Wordsreplace_emoticon
Replace Grades With Wordsreplace_grade
Replace Hashesreplace_hash
Replace HTML Markupreplace_html
Denote Incomplete End Marks With "|"replace_incomplete
Replace Internet Slangreplace_internet_slang
Replace Kerned (Spaced) with No Space Versionreplace_kern
Replace Misspelled Wordsreplace_misspelling
Replace Money With Wordsreplace_money
Replace First/Last Namesreplace_names
Replace Common Non-ASCII Charactersreplace_curly_quote replace_non_ascii replace_non_ascii2
Replace Numbers With Text Representationas_ordinal replace_number
Replace Mixed Ordinal Numbers With Text Representationreplace_ordinal
Replace Ratings With Wordsreplace_rating
Replace Symbols With Word Equivalentsreplace_symbol
Replace Handle Tagsreplace_tag
Replace Time Stamps With Wordsreplace_time
Grab Begin/End of String to/from Characterreplace_from replace_to
Replace Tokensreplace_tokens
Replace URLsreplace_url
Remove White Space Charactersreplace_white
Replace Word Elongationsreplace_word_elongation
Strip Textstrip strip.character strip.default strip.factor strip.list
Hold the Place of Characters Prior to Subbingsub_holder
Swap Two Patterns Simultaneouslyswap
Text Cleaning Toolspackage-textclean textclean
Detect/Locate Potential Non-Normalized Textis_it which_are