Slides, Annotated documents, Videos, screencasts from the Web Scraping & Services Workshop. Materials are available here.

Date: Saturday 04/04/2015
Time: 9:45 am - 4:30 pm Venue: DSI classroom (Room 360)

This one day workshop will illustrate how to access data from HTML documents and Web Services. We will discuss tools for working with HTML and XML, and querying the hierarchical documents with XPath. We will also explore how to process dynamic content created with JavaScript. Having dealt with HTML documents, we will move to Web Services which provide significantly more structured access to data.

Instructors

Duncan Temple Lang (UC Davis, Member of the R-core development team, Director of Data Sciences Initiative, Prof. of Statistics) &
Deborah Nolan (UC Berkeley, Prof. Statistics), authors of

Topics

  • Reading data from HTML tables
  • Ethics and Best Practices for Web Scraping
  • Extracting links from HTML documents
  • Dealing with JavaScript & Dynamic Content
  • Web Forms
  • Web Services & Application Programming Interfaces - APIs
  • Fundamentals of HTTP Requests
  • XML
  • JSON
  • Using API keys in Requests
  • Using Passwords in Requests
  • Authentication with OAuth
  • OAuth1 and OAuth2 to access, e.g., twitter.