Text Mining Fundamentals

This workshop will cover two NLP methods for assessing syntactic and semantic complexity and identifying key concepts represented in texts. Specific topics covered will include: hapax richness, author attribution, and Term Frequency-Inverse Document Frequency (TF-IDF) weighting. Participants must have some programming experience in R and familiarity with the Unix command line and Git. (Participation in the January 20th workshop on Slash and Burn Command Line and Git and the February 3rd workshop on Text Mining Fundamentals will prepare you well for this workshop.) Please come to the workshop with a working R development environment and Git already installed and operational on your system.