We’re taking an innovative new approach to providing students with exercises in the new Khan Academy exercise framework (which will be released for beta testing soon). In the old framework a problem would be randomly generated and provided to the user. This would result in a near-infinite number of randomly generated problems.
This ends up being a double-edged sword. While it’s great to provide a ton of problem variety to the students it’s fundamentally tricky as we want to have a manageable sample size of problems so that we can analyze student behavior and responses. For this reason we want to reduce the pool of possible problems to a more-manageable size (like about 200 or so – we’ll experiment with exact figures as time goes by). Most students will only do about 10-20 problems before moving on (since we define a ‘streak’ as having completed 10 problems in a row correctly) – although we want to provide enough of a pool to allow adventurous users to explore more.
Having a smaller pool size creates another issue though: Potential for student overlap. If there’s one thing that we’ve learned so far it’s that students are quite resourceful and identify patterns very quickly. They will realize if they start on the same problem together and if the problems go in the same order.
On top of this we need to make it so that every time the student hits the exercise they are presented with the same exact series of exercises. The first problem will always be the same – and the 50th problem will always be the same for that user. (This will allow us to reproduce the problems at a later date, showing them their problems/answers or analyzing the results.)
Thus we need two pieces of information to determine which problem should be presented to a user: The user’s ID and the problem # that they’re currently tackling.
We start by placing the user into a simple “bin”: We take the CRC32 of the student’s ID and mod 200 to start the user at one of the 200 possible exercises. (We picked CRC32 as it’s a simple function and we only need a basic level of granularity in the results.) Next we use the CRC32 to figure out how the user should jump through the exercises.
We can’t jump through the exercises 1 by 1, since the students will instantly recognize that pattern, so we have to take a random approach – one developed by Khan Academy intern Ben Alpert.
We start with a pool of prime numbers (the first 23 primes, to be exact) and use the user’s CRC32 to determine which of those primes will become the “jump distance” for that user.
var primes = [3, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]; var num = crc32("jeresig"); // num = 1411847358 var problemNum = num % 200; // problemNum = 158 var jumpNum = primes[ num % primes.length ]; // jumpNum = 71
Thus when I visit a particular problem I will start at problem # 158, then go to problem 29, then problem 100, then problem 171, etc. On the other hand, if user “jeresig2” visits the same problem they start at problem 171 then visit problem 182, 193, 4, etc.
(Note that we leave out 2 and 5 to avoid creating a circular loop of exercises – 2 will result in 100 repeating exercises, 5 will result in 40 repeating exercises. Of course we could just make the number of exercises prime to achieve a similar result – we’ll see!)
Even though we’re not using any built-in random number generator in this process, the distribution of chosen problems appears to be similar to a random distribution (since we’ve effectively implemented one ourselves):
To re-emphasize: We’re not trying to stop problem overlap – in fact we want overlap to occur, to give us data to study – but a lack of repeatable patterns has definitely been arrived at here.
We chose a collection of 23 primes because it is sufficiently large – but it also doesn’t divide into 200 cleanly (giving users that might have an identical starting position a different jump amount).
Also note that in the final solution we’re going to be combining the user’s ID with the name of the current problem – so that even if, somehow, two students end up with identical start positions and jump rates on one exercise they will have a different start position and jump rate on other exercises.
If all goes well we’ll be opening up the new exercise rewrite for beta testing within the next week or two.
Note: I’ve been posting interesting dev updates over on Google Plus. Feel free to follow me there if you wish to find out more about what I’m working on with Khan Academy or jQuery.