Difficulty Adjusted Results in Running Competitions
This article is about the calculation of a Difficulty Factor that makes it possible to compare results for the same or different runners, on the same, or across different running competitions.
The most basic comparison is obviously done using the finish time, for competitions over the same distance. But this won’t be enough, when we want to:
- compare across a range of events with different distances
- provide a “fair comparison” across age groups
- compare events run in different conditions that affect results, including terrain, elevation and weather
The proposed Difficulty Factor should provide a tool to resolve these comparison requirements.
Comparing across different age groups and events
We already have a powerful tool at our disposal: Age Graded Results. This allows us to compare the performance of runners belonging to different age groups, and across different events.
Every result for road or track events, on every distance and every age and gender, can be scored using “age grading”, resulting in a percentage. This percentage can be compared across all results.
You can read a bit of history here:
and read the latest proposed age grading tables from the World Masters Athletics.
Comparing across different racing conditions
For road running in particular, sometimes it is not enough to use age grading, as the race conditions may be very different from course to course.
One of the main factors is the elevation, positive and negative, of a given race. It wouldn’t be fair to compare a flat 10k result with an undulating one, offering a total of 200m positive and 200m negative elevation for the same distance.
But there may be other factors too, affecting performance, that we may want to account for, when comparing results across different races:
- terrain, e.g. grass, trails, technical terrain
- weather: rain, heat, icy conditions may have a detrimental effect on results
- the distance itself may not be exactly the advertised one
Proposal: Difficulty and Age Graded Result
We propose to introduce a Difficulty and Age Graded result (DAG result) which is calculated as:
DAG Result = Difficulty Factor * Age Graded Result
The Difficulty Factor (DF) should reflect the race conditions on competition day. A DF of 1 would indicate no additional diffuculty, while a DF > 1 would indicate additional difficulty, and a DF < 1 would signal better than expected conditions (e.g. anyone running a descending route?)
Example, a DF of 1.05 would imply a 5% increased difficulty of a course.
The solution we suggest is based on the retrospective analysis of the results for a given competition, and for recent results for all the runners involved in the competition itself. In summary, the solution is based on the following steps:
- For each of the runners involved in the competition, calculate the ratio between their expected performance (based on recent results) and the actual performance on the day
- Remove the outliers from the set of calculated ratios
- Calculate the Difficulty Factor (DF) as the median of the remaining ratios
The expected performance of each runner Exp(i) can be calculated by looking at their Age Graded results for past, recent performances, and averaging across the top ones. The different algorithm parameters are discussed in the notes:
- We consider a fixed period of time for past results (*)
- We introduce a linear time decay factor, to reduce the probability of picking an older result vs. a recent one (**)
- We average the Age Graded results across, at most, a fixed number of top results (***)
We only consider performances that can be predictors for the current race, and in particular we look at distances within certain ranges, based on the distance of the event we are analysing (****)
Finally, we look at the Age Graded result for the competitor, on the day AG(i)
Ratio(i) = Exp(i) / AG(i)
There might be runners who perform better than expected: Ratio(i) < 1, or worse than expected: Ratio(i) > 1
(*) We consider the past 12 months. We could use a longer period, but in any case the decay factor (see below) makes older results more and more unlikely to be picked to calculate the expected result
(**) We consider a linear decay factor of 5% per 12 months. To have a sense of what this means, let’s take an example: a male runner ran a 37:00 10K on his 40th birthday, which resulted in a 74.05% age graded result. After 6 months this result would decay by 2.5% to the equivalent of 37:57 (72.20% age graded) when ranking recent top results. After 1 year, this would decay by 5% to the equivalent of a 38:57 result (70.35% age graded).
(***) We only consider the top 5 results (or less, if 5 results are not available in the period under consideration). We could in principle only consider the best result, but we believe the expected result should, more conservatively, be the average of a relatively small number of top results.
(****) In order to predict a certain event distance, we only consider results that offer similar challenges, e.g. races up to 3k can only predict up to 10k, races from 3 to 5k can predict up to half marathon, etc.
Why does this work?
Let’s look at a number of scenarios, and why the above parameters should offer a reasonable solution:
- Runners who haven’t run at the best of their abilities over the recent past, or have only run at their best once or twice: their ratio actual result / expected result will be much worse (higher) than the others in the race, and so it will probably be discarded as an outlier
- Runners who don’t have any race results that can predict the race in question: they won’t be considered, as their ratio cannot be calculated.
- Runners whose best results are relatively old, and recent results are a bit worse: older results will only be selected as top results if, when the time decay factor is applied (up to 5% for 1-year-old results) they still remain better than recent ones, but this will be unlikely.
Ratio Outliers and Median of the Remaining Ones
To remove the outliers, we consider a “normal distribution” of ratios: we calculate the median (M) and the standard deviation (sigma) of the ratios, and exclude all results more than 1 sigma away from M.
Finally, we calculate the median of the remaining results.
Why does this work?
Let’s consider this extreme case: with a small number of runners with expected results and usable ratios, we could have a distribution which is unsuitable for the elimination of outliers (there would be more outliers than ratios in the “normal” range). This means that we won’t be able to calculate a Difficulty Factor for races with too few runners, or where only few have ratios in a reasonable range (they have a few results that can be used to predict the race in question, and they ran the race trying to go for their best).
But if we do have a reasonable core of runners with good ratios, they will support the elimination of outliers, and the identification of the Difficulty Factor. We assume we operate on races with a sufficient number of usable data points.
Summary of Assumptions:
- Many of the runners will race to the best of their abilities on the day. We don’t need a majority, but we need a core of runners giving their best
- These runners have a few past (relatively recent) results that can be used as predictors for the current race, and that they also ran at their best of their abilities
- For the others, while in rare circumstances there might be surprises with runners overperforming by a substantial margin respect to their usual performances, the most typical case will be runners who decide not to push to their maximum, or that couldn’t push to their maximum for other reasons (fitness being the most common)