Skip to contents

This function gives general statistics for a character vector, e.g., obtained by loading a text file with the readLines or stri_read_lines function, where each text line' is represented by a separate string.

Usage

stri_stats_general(str)

Arguments

str

character vector to be aggregated

Value

Returns an integer vector with the following named elements:

  1. Lines - number of lines (number of non-missing strings in the vector);

  2. LinesNEmpty - number of lines with at least one non-WHITE_SPACE character;

  3. Chars - total number of Unicode code points detected;

  4. CharsNWhite - number of Unicode code points that are not WHITE_SPACEs;

  5. ... (Other stuff that may appear in future releases of stringi).

Details

None of the strings may contain \r or \n characters, otherwise you will get at error.

Below by `white space` we mean the Unicode binary property WHITE_SPACE, see stringi-search-charclass.

See also

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02

Other stats: stri_stats_latex()

Author

Marek Gagolewski and other contributors

Examples

s <- c('Lorem ipsum dolor sit amet, consectetur adipisicing elit.',
       'nibh augue, suscipit a, scelerisque sed, lacinia in, mi.',
       'Cras vel lorem. Etiam pellentesque aliquet tellus.',
       '')
stri_stats_general(s)
#>       Lines LinesNEmpty       Chars CharsNWhite 
#>           4           3         163         142