O'Reilly logo
live online training icon Live Online training

R Recommender systems and Collaborative Filtering

Create highly optimized recommender systems for your applications

Phil Rennert

This course will provide an introduction to recommender systems, particularly collaborative filters, on a heterogeneous dataset with more than just user ratings. We’ll discuss the spectrum of recommender systems, starting with simple What’s Hot recommendations, and gradually increase in sophistication and power to collaborative filters and model-based systems as you learn more about the person you’re recommending for.

In this hands-on course, we’ll build a collaborative filter from user movie ratings in R with a popular package. We’ll make recommendations with it, and compare them to actual user ratings from a holdout test set to assess accuracy. Then we’ll learn how to add more data, information about the movie, and ratings by critics, to improve our recommendations. We’ll learn about ensembles and how to combine different recommenders based on different types of data to improve performance, and methods for evaluating recommenders. Then we’ll learn how to go beyond the recommender algorithms, to incorporate things like trust, diversity, novelty, anti-spookiness, and product information, and make great recommendations. These will be illustrated with lessons learned and stories of recommender successes and failures.

This live online training course will give you the tools you need to build a high-quality recommender for your own products of interest.

What you'll learn-and how you can apply it

Automatic recommender systems are used for a large and growing variety of products. You’ll learn how to build your own system and specialize it for your application. - You’ll build a collaborative filter in R using the recommenderlab package. - You’ll learn ways to incorporate additional data to improve your recommender.

You can use this knowledge to produce high-quality recommendations for your products of interest.

This training course is for you because...

This course is for developers unfamiliar with collaborative filters and recommender systems who want to learn to employ these powerful techniques in their work, to produce recommendations for various products. You’ll get hands-on experience with different types of recommenders, on a mixed dataset.

Prerequisites

  • Coding Basics (R, Python)
  • A standard laptop with loaded datasets

This class will be demonstrated in R, but you can use the language and tools you prefer

Recommended Preparation:

Building a Recommendation System with R (book)

Materials, downloads, or Supplemental Content needed in advance

  • Download Recommender Dataset: Dataset MovieLens + IMDb/Rotten Tomatoes
  • (Standard course format - R Studio)
  • optional: an IDE for R
  • (Standard course format - Notepad++)
  • optional: a text editor
  • R package recommenderlab (or something equivalent, but the instructor will be using R)
  • install.packages(“recommenderlab”)

About your instructor

  • After starting out as a NASA aerospace technologist working on Space Shuttle entry guidance, Phil switched to Artificial Intelligence. Phil is in the overall business of extracting wisdom from information overload. He has 25 years’ experience in Machine learning, Natural Language processing, Data mining, and Statistics. He has worked on a wide variety of projects in these areas, including recommending movies and TV shows for Comcast, and colleges for prospective students. He writes code in many languages, but still prefer Perl for short scripts and rapid prototyping.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Day 1

Section 1: Overview of recommender systems and applications (1 hour)

  • 1.0 Recommender systems - overview, theory, examples of what they're used for
  • 1.1 Applications - movies, general products, books, grocery, financial, dating, popup ads and bidding...
  • 1.2 General types of recommenders: un-personalized, demographics, content-based, memory-based, model-based

Break 10 mins

Section 2: Collaborative filters (1 hour 15 minutes)

  • 2.0 Collaborative filters - succession of types
  • 2.1 What’s hot, demographics, item-based, user-based, essence-based - gradient of knowledge
  • 2.2 Highly parallelizable for big data applications
  • 2.3 Model-based methods: eigenvalue approaches; strengths and weaknesses
  • 2.4 Ensembles: combining, using weights varying with knowledge of user from cold start to maximum confidence

Break 10 mins

Section 3: Running a filter on the HetRec dataset (1 hour 15 minutes)

  • 3.0 Data set 1 (HetRec) - overview, discovery, tasks, including all files in dataset
  • 3.1 Demonstration of basic item-based collaborative filter on user ratings with recommenderlab
  • 3.2 Brief scan of papers from HetRec 2011 for further ideas
  • Lab - coding practice: Run other recommenderlab options on chosen train/test sets.

Break 10 mins

Section 4: Using additional data to improve (40 minutes)

  • 4.0 Different approaches, different kinds of data in our dataset
  • 4.1 Discussion, Q&A, other sources of data?

Overnight Exercise/ Lab 2 – 1 ½ -2 Hours: Produce recommendations for a test set, with additional data from HetRec dataset. How can you improve over what we did in class, using additional files from our dataset?

Day 2

Section 5: Overnight task results and discussion (1 hour 15 minutes)

  • 5.0 Results of overnight exercise, discussion, Q&A
  • 5.1 Summary of student approaches and results
  • 5.2 Lessons learned? And how applicable to other applications?
  • 5.3 Places to look for additional data in actual applications.
  • Lab - coding practice: Experiment with different ways of ensembling class recommenders.

Break 10 mins

Section 6: Filter’s up! What else do I need? (1 hour)

  • 6.0 Importance of feedback, tools for drawing it (gamification)
  • 6.1 Discussion of measuring recommender effectiveness
  • 6.2 Shilling, countering manipulative attempts

Break 10 mins

Section 7: Recommendations: going beyond the algorithms (1 hour 15 minutes)

  • 7.0 Cautions, recommender considerations going -beyond- the algorithms
  • 7.1 Trust, novelty, diversity, serendipity, privacy, repeats? Anti-spookiness (why this recommendation?)
  • 7.2 Recommender stories (Target, pushing products in pairs, discovery of patterns)
  • 7.3 Demographics, zip code, patterns vs. Time of day and time of action, any other public data?

Break 10 mins

Section 8: Summary and wrap-up

  • 8.0 Summary, high points, wrap-up
  • 8.1 On-Demand exercise on another dataset
  • 8.2 Class discussion, extended Q&A