Sample from Large Movie Review Dataset (Maas et al. 2011)

A sample of 100 positive and 100 negative reviews from the Maas et al. (2011) dataset for sentiment classification. The original dataset contains 50,000 highly polar movie reviews.

Usage

data_corpus_LMRDsample

Format

The corpus docvars consist of:

docnumber: serial (within set and polarity) document number
rating: user-assigned movie rating on a 1-10 point integer scale
polarity: either neg or pos to indicate whether the movie review was negative or positive. See Maas et al (2011) for the cut-off values that governed this assignment.

Source

http://ai.stanford.edu/~amaas/data/sentiment/

References

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis". The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Examples

if (requireNamespace("quanteda", quietly = TRUE)) {
  # Inspect the corpus
  summary(data_corpus_LMRDsample)

  # Sample a few reviews
  head(data_corpus_LMRDsample, 3)
}
#> Corpus consisting of 3 documents and 3 docvars.
#> 1035_3.txt :
#> "A frustrating documentary. Louis Kahn's son, who saw his fat..."
#> 
#> 3540_3.txt :
#> "I truly was disappointed by this film which I had high hopes..."
#> 
#> 4526_4.txt :
#> "Rather foolish attempt at a Hitchcock-type mystery-thriller,..."
#>

Usage

Format

Source

References

See also

Examples