by Krešo Josić
Today, UH math professor Krešo Josić talks about math and your movie choice. The University of Houston presents this series about the machines that make our civilization run, and the people whose ingenuity created them.
The companies that dominate the web are the ones that are best at pointing you to the information you seek — to the book, the CD or the movie you'll enjoy most. This is largely accomplished using linear algebra, a branch of mathematics that deals with the manipulation of columns and arrays of numbers.
Almost everyone is familiar with the online DVD rental service Netflix. After you return a movie to Netflix, you are asked to rate it between one and five stars. An important part of the company's operation is Cinematch, an algorithm that determines what movies you may enjoy, based on those ratings.
A simple way to determine whether you would like "The Seven Samurai", for instance, would be to average all the ratings that movie has received from other users. However, this is not a very good approach because it ignores whether you like or dislike samurai movies. Indeed, the current Netflix algorithm bases its recommendations on the ratings of users like you.
But how to decide which users are like you? This is where linear algebra comes in. Imagine a giant spreadsheet with columns named after every one of the tens of thousands of movies available on Netflix. Every one of the several million subscribers corresponds to a row in this spreadsheet. When a user rates a movie, an entry is filled in. For instance, if I choose to give "The Matrix" four stars, the column corresponding to that movie would be updated at the row corresponding to my name.
Few rate more than a couple hundred movies, so only a fraction of a percent of this giant spreadsheet is actually filled in. The job of the recommendation algorithm is to fill out the entire spreadsheet based on this very sparse information. How can this be done efficiently?
About three years ago Netflix offered a million dollars to anyone who improved the accuracy of their recommendations by 10%. Teams of software engineers and enthusiasts from around the world have taken part in that challenge. The most successful approaches are based on linear algebra: While the spreadsheet of user preferences is enormous, there are perhaps a few dozen stereotypical rating profiles. Everyone's tastes are described as a mixture of these profiles. For instance, you may like Westerns and an occasional slapstick comedy. Linear algebra can identify these stereotypical profiles, and provide the magic mix that describes you.
Improvements of 8 and 9 percent on the existing Netflix algorithm were achieved quickly after the prize was announced. However, the 10 percent goal proved elusive. After three years, a team of programmers from around the world finally cracked it in July, 2009. They did so when they took into account that our preferences change considerably over time.
Powerful algorithms shape what you see in your web browser. They may also subtly steer you to the next book you'll read and CD you'll listen to. When you answer these suggestions by providing feedback you are shaping not just your future recommendations, but those of everybody else using the same service. You are taking part in a conversation between machines and humanity.
I'm Krešo Josic, at the University of Houston, where we're interested in the way inventive minds work.
A nice discussion of the Netflix Prize can be found in the following Wired article (http://www.wired.com/epicenter/2009/06/1-million-netflix-prize-so-close-they-can-taste-it/). Wikipedia has a dryer, but more up-to-date description. More detail about one of the likely winners of the prize, and the approach he used can be found here http://cacm.acm.org/news/32450-award-winning-paper-reveals-key-to-netflix-prize/fulltext.
Another interesting article suggesting that a psychologist may outdo computer scientists and mathematicians in their quest for the prize also appeared in Wired (http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix#previouspost). "Just a guy in a garage" still finished in the top 20.
To get some idea about the type of mathematics used to predict user preferences,you can read about Singular Valued Decomposition here https://en.wikipedia.org/wiki/Singular_value_decomposition.