NEWS.md
tokenize_ptb() function for Penn Treebank tokenizations (@jrnold) (#12).chunk_text() to split long documents into pieces (#30).tokenize_tweets() preserves usernames, hashtags, and URLS (@kbenoit) (#44).stopwords() function has been removed in favor of using the stopwords package (#46).tif package. (#49)tokenize_skip_ngrams has been improved to generate unigrams and bigrams, according to the skip definition (#24).tokenizers supports (@ironholds) (#26).tokenize_skip_ngrams now supports stopwords (#31).NA consistently (#33).tokenize_words() gains arguments to preserve or strip punctuation and numbers (#48).tokenize_skip_ngrams() and tokenize_ngrams() to return properly marked UTF8 strings on Windows (@patperry) (#58).tokenize_tweets() now removes stopwords prior to stripping punctuation, making its behavior more consistent with tokenize_words() (#76).tokenize_character_shingles() tokenizer.tokenize_words() and tokenize_word_stems().