Package: textclean 0.9.7
textclean: Text Cleaning Tools
Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.
Authors:
textclean_0.9.7.tar.gz
textclean_0.9.7.zip(r-4.5)textclean_0.9.7.zip(r-4.4)textclean_0.9.7.zip(r-4.3)
textclean_0.9.7.tgz(r-4.4-any)textclean_0.9.7.tgz(r-4.3-any)
textclean_0.9.7.tar.gz(r-4.5-noble)textclean_0.9.7.tar.gz(r-4.4-noble)
textclean_0.9.7.tgz(r-4.4-emscripten)textclean_0.9.7.tgz(r-4.3-emscripten)
textclean.pdf |textclean.html✨
textclean/json (API)
NEWS
# Install 'textclean' in R: |
install.packages('textclean', repos = c('https://trinker.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/trinker/textclean/issues
- DATA - Fictitious Classroom Dialogue
data-mungingemoticonsregextext-analysistext-cleaning
Last updated 3 years agofrom:5443d7484c. Checks:OK: 1 NOTE: 6. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Oct 25 2024 |
R-4.5-win | NOTE | Oct 25 2024 |
R-4.5-linux | NOTE | Oct 25 2024 |
R-4.4-win | NOTE | Oct 25 2024 |
R-4.4-mac | NOTE | Oct 25 2024 |
R-4.3-win | NOTE | Oct 25 2024 |
R-4.3-mac | NOTE | Oct 25 2024 |
Exports:%like%%LIKE%%slike%%SLIKE%add_comma_spaceadd_missing_endmarkas_ordinalavailable_checkscheck_textdrop_elementdrop_element_fixeddrop_element_regexdrop_empty_rowdrop_NAdrop_rowfgsubfix_mdyyyyglueglue_collapsehas_endmarkis_itkeep_elementkeep_element_fixedkeep_element_regexkeep_rowmake_pluralmatch_tokensmgsubmgsub_fixedmgsub_regexmgsub_regex_safereplace_contractionreplace_curly_quotereplace_datereplace_emailreplace_emojireplace_emoji_identifierreplace_emoticonreplace_fromreplace_gradereplace_hashreplace_htmlreplace_incompletereplace_internet_slangreplace_kernreplace_misspellingreplace_moneyreplace_namesreplace_non_asciireplace_non_ascii2replace_numberreplace_ordinalreplace_ratingreplace_symbolreplace_tagreplace_timereplace_toreplace_tokensreplace_urlreplace_whitereplace_word_elongationstripsub_holderswapwhich_are
Dependencies:clicpp11data.tabledplyrdttenglishfansigenericsgluelatticelexiconlifecyclemagrittrmgsubNLPpillarpkgconfigpurrrqdapRegexR6rlangslamstringistringrsyuzhettextshapetibbletidyrtidyselectutf8vctrswithrzoo