Covid-19: Should we test as many people as possible, only the ones at risk, or only the ones with symptoms? It’s about choice and probability — and it’s NOT intuitive
We are referring to the type of tests conducted pretty much everywhere in the world until now to check whether a person is currently positive to the virus, typically consisting in taking nasal swabs and performing lab tests (Reverse Transcription-PCR).
There’s a fierce debate in almost every country about the number of covid-19 tests that are done and that should be done on a daily basis. South Korea tested a lot and it was part of a successful strategy for early containment. Italy is testing a lot, but it’s recording the worst mortality rate, Germany is testing more than anyone else, per head of population (planning 500,000 tests per week) but still reporting low mortality rate, the US has ramped up testing massively over the last week, and finally the UK is testing very little (not even 10% of Germany, but “ramping up” to 10,000 per day soon), and this has become a major political case: lack of chemical reagents for the labs — were you expecting this? not even systematically testing front-line NHS personnel yet? A lot of questions are asked on what is going on here.
Intuitively, testing more is good, so you can build a more accurate picture of the spread of the virus, and take control measures to isolate it, and slow it down. In practice, we can’t just randomly test a lot of people, as we would produce a massive amount of false positives (in addition to taking forever to find people with the virus). It has to do with test accuracy and with the percentage of the population currently infected.

The following could be one of those classic problems given to pupils on one of their first probability and statistics classes. Let’s try to resolve it together.
- If a Covid-19 test has 90% accuracy (probability to give you the correct result) i.e. positive result if you have the virus, and negative result if you don’t
- And if 1% of the population currently has Covid-19
- What is the probability that, when testing random people from the population, when a person is given a “positive” result, that this person really has Covid-19?
Probabilities are NOT intuitive — only 8.33% (1 in 12) of the ones given a positive result are really positive!!! The remaining 11 out of 12 would be false positives.
Allow me to go through the calculation steps before I try to use this argument for the testing of covid-19
In our random test:
A) if we pick a person with the virus (1%) then we have 90% probability to deliver a positive result. p(A) = 0.9% real positives
B) if we pick a person without the virus (99%) then we have 10% probability to deliver a positive result. p(B) = 9.9% false positives
So overall, we deliver a positive result to 10.8% (p(A)+p(B)) of the tested people, and only 0.9% (1 in 12) is really positive… a 91.67% probability of error!
To double check we can look at the negative results:
C) if we pick a person with the virus (1%) then we have 10% probability to deliver a negative result. p(C) = 0.1% false negatives
D) if we pick a person without the virus (99%) then we have 90% probability to deliver a negative result. p(D) = 89.1% true negatives
So among the negative results, only 0.1% in 89.1% will receive an incorrect result. A small 0.11% chance of error — the false negatives are not an issue.
So, we are fairly confident of the result when we receive a negative result, but we have no confidence at all when we receive a positive result.
But most countries are doing better than this. They actually test for a second time the people who get a first positive result. So, let’s check how probabilities stack when we incorporate a second test only for the ones who had a positive result on the first test (10.8% of the people we tested).
A,A) if we pick one who was really positive (0.9%) then this will receive a positive result in 90% of the cases: p(A,A) = 0.81% real positives
A,B) if we pick one who was really negative (9.9%) then this will receive a positive result in 10% of the cases: p(A,B) = 0.99% false positives
As we can see, we are not in a much better position: the real positives are still only 0.81% / (0.81%+0.99%) = only 45% of the ones who are given a second positive result… we still cannot deliver a confident message to the ones who tested positive twice
We can conclude that testing at random when the virus spread is at 1% of the population (or less!) doesn’t really help
How much would this change if, say, 3% of the population had the virus? Note that it is quite irrealistic that 3% of the population have the virus at the same time, as most people recover in a couple of weeks! This would be nearly 2 million people in the UK — even 1% is a fairly high figure of concurrently infected people. But nevertheless, even at 3% infection, we would still have 78% chance to deliver a false positive after 1 test, and 29% after 2 tests, still quite high uncertainty.
So what strategy can be adopted to perform more tests, but have the confidence in the results we deliver, especially when they are positive?
The variable to play with is the population we select from: instead of randomly selecting people form the general population, we need to select from a group of people where we suspect there is a much higher probability of infection, e.g.:
- People with serious symptoms (persistent cough, high fever)
- People who have been in contact with infected people for significant time (e.g. front-line health workers, family, co-workers, friends, etc.)
- People who have been in places where infected people have recently spent significant time (e.g. public places, transport, etc.)
Approach 1 is pretty obvious and it has been followed by every country, and it is (save some exceptions) the only approach used in Italy and the UK, until today, i.e. only test people who show serious symptoms.
If we assume that there is 50% probability that the “population” which shows serious symptoms has Covid-19, how do the “false positives” figures change with our tests?
The first test would only deliver 10% false positives, and the second test would deliver 1.2% false positives. A massive improvement. The second test would be very reliable, to any medical standards, but even the first one would be ok.
Approaches 2 and 3 have been systematically followed by South Korea and Germany, by tracing all the movements over the previous couple of weeks, of people with the infection.
It’s difficult to estimate the probability of infection for these 2 groups, but for the sake of this exercise let’s estimate them at 20% and 10% of having been infected. How would our test outcomes change, with these “population” probabilities?
At 20% infection rate, our test (which has 90% accuracy, remember) would deliver 31% false positives after 1 test, and under 5% false positives on the second test. A very good outcome for the second test.
At 10% infection rate these figures would increase to 50% (test 1) and 10% (test 2) probability of a false positive. Still a reasonable outcome, compatible (after 2 tests) with the general accuracy of our Covid-19 test.
Importantly, the figure I used for test accuracy, is actually fairly optimistic. While we don’t know for certain, some articles online provide lower figures (e.g. 60%-70% accuracy when you are positive, 90% accuracy when you are negative). So the arguments on false positives are even more valid.
The key point to take away is that testing more is good, so you can see how the virus is spreading, and take control measures like isolation of people and communities, but only when you can identify a population with reasonably high probability of infection (due to symptoms, contact with infected people or with places where there’s been a significant presence of infected people). So countries need a solid method and a strategy, they can’t just wait for sick people to call in, and can’t just test everyone.
A game changer will be the ability to test whether people have had the virus in the past (and may be free from it now) often referred to as “antibody tests”. In this case the all-important variable for the percentage of “positive” people (including past positivity to Covid-19) would be much larger and continuously growing, which will result in a small number of false positives.