I don’t think I could possibly be any more giddy about something, than how I am concerning The Netflix Prize.
In short: Netflix’s vote prediction algorithm gets a deviation of 0.95 stars away from predicting your vote for a movie. If you can do 10% better, they’ll give you $1 million dollars.
That’s awesome and all, but what’s really awesome is their amazing training dataset. This is every data miners wet dream: 100,000,000 votes, 17,000 movies,
250,000 500,000 users.
They have two tests that you can run: One against your known data, and one that you’ll submit to Netflix. As far as I can tell, your standing (aka, your current deviation) is made public. The lower your number, the higher your rank. Every year that the algo isn’t improved by 10%, $50,000 is paid out to the current leader.
Another thing that I find to be interesting: Netflix gets the score that they do without assuming anything about the movie titles, genre, actors, etc. They just do straight number crunching. I’m impressed.
I’ve already got some techniques that I wanna try. I’ve got a feeling that I’m overly optimistic at this point, and that I’m going to be highly disappointed when I see my first score. But first, I have to generate my test bed and get to work, this is so cool.
I don’t know what it is with me and large, nicely formatted, datasets, but I don’t think there’s anything that can get me more excited.