Approximates the number of text columns the `cat()` function might use to print a string using a mono-spaced font.
Details
The Unicode standard does not formalize the notion of a character width. Roughly based on http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, https://github.com/nodejs/node/blob/master/src/node_i18n.cc, and UAX #11 we proceed as follows. The following code points are of width 0:
code points with general category (see stringi-search-charclass)
Me,Mn, andCf),C0andC1control codes (general categoryCc) - for compatibility with thencharfunction,Hangul Jamo medial vowels and final consonants (code points with enumerable property
UCHAR_HANGUL_SYLLABLE_TYPEequal toU_HST_VOWEL_JAMOorU_HST_TRAILING_JAMO; note that applying the NFC normalization withstri_trans_nfcis encouraged),ZERO WIDTH SPACE (U+200B),
Characters with the UCHAR_EAST_ASIAN_WIDTH enumerable property
equal to U_EA_FULLWIDTH or U_EA_WIDE are
of width 2.
Most emojis and characters with general category So (other symbols) are of width 2.
SOFT HYPHEN (U+00AD) (for compatibility with nchar)
as well as any other characters have width 1.
References
East Asian Width – Unicode Standard Annex #11, https://www.unicode.org/reports/tr11/
See also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other length:
%s$%(),
stri_isempty(),
stri_length(),
stri_numbytes(),
stri_pad_both(),
stri_sprintf()
Author
Marek Gagolewski and other contributors
Examples
stri_width(LETTERS[1:5])
#> [1] 1 1 1 1 1
stri_width(stri_trans_nfkd('\u0105'))
#> [1] 1
stri_width(stri_trans_nfkd('\U0001F606'))
#> [1] 2
stri_width( # Full-width equivalents of ASCII characters:
stri_enc_fromutf32(as.list(c(0x3000, 0xFF01:0xFF5E)))
)
#> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [77] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
stri_width(stri_trans_nfkd('\ubc1f')) # includes Hangul Jamo medial vowels and final consonants
#> [1] 2