corpustext.com - Text Corpus Analysis • corpus

Example domain paragraphs

Corpus is an R text processing package with full support for international text (Unicode). It includes functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies (including n-grams).

Corpus does not provide any language models, part-of-speech tagging, topic models, or word vectors, but it can be used in conjunction with other packages that provide these features.

Corpus is available on CRAN .To install the latest released version, run the following command in R: