Cycling Elo method

The Cycling Elo rating system is an equivalent to the chess ELO rating system. The main difference is the chess ELO rating is based on a two player game, while this rating system is based on a multiplayer game, which is in essence what cycling is. The method used for the Cycling Elo rating is loosely based on TrueSkill, developed by Microsoft for multiplayer games. TrueSkill uses a Bayesian approach to estimates a player’s skill, which is represented as a normal distribution characterized with a mean value and variance around it. With each game (or race in this case) the prior (the old rating) is updated with new information and adjusted accordingly (the posterior in Bayesian language). If the result is in line with the old rating, the rating doesn’t change, but the uncertainty (the variance) decreases

Rating based on mean and variance

The rating isn’t only represented by the mean value of the Gaussian distribution, but by the mean rating minus three times the variance. In that way we are for 99% certain the real skill of the particular cyclist is higher than the represented rating. In other words, it is a conservative estimate of a rider’s skill. This also means that a rider with a small number of results, or results that show more variance has a lower rating than a rider with the same mean, but with lower variance.

Procedure to update rating

The biggest difference with the original Trueskill algorithm that it is adjusted to work for a 200 player game. The original algorithm isn’t perfectly set up for this. To solve this problem one race result is adjusted to 100 results with maximum 30 riders. The 30 riders are randomly sampled from the result and play the multiplayer game with updated ratings for those 30 riders as output. This is done a 100 times. In a 200 rider race a rider will play on average 15 smaller games against different samples of riders. The updated rating of the riders after that race is the mean of those 100 samples.

Some point on interpretation of the ratings

The ratings of different types of races aren’t comparable. The top classified rider in the time trial ranking have higher ratings than the top riders in the hills ranking. Hill races are much more diverse (some are more for a little better climbing sprinters, others are almost for pure climbers), and it’s harder for the top riders to always get out on top. For example because of team tactics (which hardly play a role in time trials).

The last condition is that a rider needs a minimum number of results last year to get into the rankings. A rider needs more or less one year of race results to get a reliable rating.

The Cycling Elo method uses all UCI-races starting 2006.

More information method

To calculate and update the ratings, the R package trueskill by Brendan Houng is used. For more detailed information about the Trueskill method check the Microsoft page, or this description by Jeff Moser.