The Wise son:
One may find this topic as confusing, and indeed it is very confusing!
The goal of a diagnostic test is to define whether a person has the disease or not. As statisticians, we are here to evaluate the performance of such diagnostic tests. This is achieved by comparing the test results to the Oracle Truth. If the Oracle is not available, use the results from a reference test, sometimes also called a gold-standard test. The data contains results from the two tests with N individuals and is summarized in a 2 by 2 table:
Gold Standard | |||
+ | - | ||
Test result | + | TP | FP |
- | FN | TN |
The test is a good one, if there is 100% agreement with the gold standard. That is, the test found all the True-Positives (TP) and all the True-Negative (TN) cases. However, sometimes the test is not perfect and there are two types of mistakes that can happen: the test was Falsely-Positive (FP) or Falsely-Negative (FN). Even if there are some mistakes, the test may be considered a good one if the mistakes are very rare. The test performance is evaluated with these statistics: Agreement, Sensitivity and specificity. Simple, would you take it from here?
The Simple son:
Yes, yes, I will give you the formal definitions:
Sensitivity = TP / (TP+FP)
Specificity = TN / (TN+FN)
Agreement = TP+TN / (TP+FP+TN+FN)
So you see, Sensitivity is the chance of a positive-test for cases. Specificity is the chance of negative cases for non-cases and agreement is the chance that the test gives a correct result. Once you see the formal definitions it is no longer confusing.
The Wicked son:
Ha? Let’s see! In a sensitive test, is the FN rate low or high? And what about that, a test is 100% sensitive if you set a rule that the test will always give a positive result. Sounds good, doesn't it? But what would be the specificity in that case? Think about that. There is always a give and take relationship between sensitivity and specificity. The blanket is always too short to cover both of them.
As far as I know, the test results are given as a scale, for example, the level of a chemical or optical absorption. The manufacturer decides on the cut-off level that corresponds to a positive test. The difficulty is how to evaluate the test if the sensitivity and specificity depends on the cut-off point.
He who couldn’t ask:
I’m especially a very sensitive guy, and you all know that. But I want to take Wicked's challenge! When I take a test and increase the sensitivity… the FN rate is then… hummm. Well, my heart rate has increased… and so has my blood pressure… and my ability to think about the FN rate decreases. This is a very nasty question after all. No wonder that my data-scientist girlfriend calls this table a confusion matrix.
The Wicked son:
Couldn’t ask, tell your girlfriend that the sensitivity and specificity rates also depend on the cases to non-cases ratio. Or as my epidemiologist girlfriend says, the disease frequency. We might show them how to work around it in our next post.