“Practical introduction to text mining”

This DSI workshop, led by Associate Director Dr. Carl Stahmer, will focus on doing and interpreting basic text analyses.

Topics will include: word frequency and distribution bigram networks parts of speech tagging named entity extraction * sentiment analysis

Prerequisites: Beginner R skills and working R environment with the following packages installed:

  • tm
  • koRpus
  • RWeka
  • zipfR
  • sentimentr
  • openNLP
  • openNLPmodels (Note: openNLPmodels must be complied from source: install.packages("openNLPmodels.en", repos="http://datacube.wu.ac.at/", type="source"))
  • NLP
  • ngram
  • hunspell
  • ggplot2
  • ggraph
  • dplyr
  • rJava (Note: rJava can be tricky to install. Come to Office Hours prior to the workshop if you need help getting it running. For Windows, if you are having errors calling rJava and are on windows 64-bit machine, check that you have the latest version of R and the 64-bit version of Java installed as well as the 32-bit version. Then re-install the rJava package and load that library.)

Repository with R scripts and data files