by Jess Behrens
© 2005-2019 Jess Behrens, All Rights Reserved
The 2019 tournament came and went with varying successes/failures. Prior to the tournament, the network predicted Michigan State would play Virginia in the final. That was close. Duke, also, was pinned as a pretender who would lose in the second round (and they almost did, I'd like to point out! They also barely escaped Virginia Tech in the Sweet 16.). I consider that a success, given that the rest of the world wanted to give Zion Williamson & RJ Barrett's team the championship before the First Four. Both Auburn & Purdue were slated as extremely good, Elite Eight caliber teams, as well. Also of note, as you can see in Figure 1 & Figure 2, the tournament is definitely a time series. In fact, the last
Figure 1. Index 15, Ranks 37-39 & Index 29, Rank 25, 207-2019
Figure 2. Index 14, Rank 68, 2017-2019
Figure 3. Index 16, Rank 62, 2006-2008 & Index 1, Rank 25, 2007-2009
However, aside from the above figure, the network missed, utterly, totally, & completely, on Texas Tech. The network makes a decision about a team by taking into consideration the position of a team across both the Win & Loss networks. While the above figures make it tempting to pick a champion off of one observation, doing so is rarely a good idea. So, how did it miss on Texas Tech? I don't believe those who say, "Texas Tech just played well." And I found an answer that explains why I'm correct in that belief.
Over the next few posts, I plan to enumerate where problems crept into the methods I've outlined here & how I've fixed them, starting with Texas Tech. The very first lesson is something that I should have seen from the get go because it relates to work I've done on the various jobs I've had in epidemiology over the years.
Sensitivity & Specificity are vitally important aspects of any statistical endeavor because they relate to the two different types of statistical error, Type I & Type II. Sensitivity is defined as the probability that a test result will be positive when the disease is present. Specificity is the probability that a test will be negative when the disease is not present. So, in the case of this project, each 'query' is really a test. The win queries test for the 'disease' of a first round win (which isn't really isn't a disease but is still a valid goal). Logically, the loss queries then test for the 'disease' of a first round loss. Because each of these queries are trying to make an affirmative prediction of the presence of something, they are naturally biased toward finding what they are looking for, be it a win or a loss.
Thus, the method I was using to weight the networks prior to this tournament, as described in Chapter 3, was over weighting the probability of finding either a loss or a win, depending whether or not it is a win or loss query. From a statistical modeling standpoint, I was over-fitting. It's that simple. And Texas Tech fell inside some very specific loss queries that over valued 'finding' a loss. The only fix to this problem is to re-weight both networks & see if that changes the results.
Figures 4 & 5, created using plotly & cufflinks in python, show a scatter plot of the Sensitivity & Specificity for all Win & Loss queries. Given that many of these queries are simply highly specific, with little to no sensitivity, a significant portion of them can be & were eliminated from use. They were too specific & made the entire system prone to error. These are the queries colored with black dots in Figures 4 & 5.
Figure 4. Sensitivity vs. Specificity, Win Network Queries
The queries colored blue were kept, but are still primarily specific & thus biased toward finding a win or a loss, depending on the network. Instead of weighting these queries using both the Positive Likelihood Ratio (PLR) & Negative Likelihood Ratio (NLR), as is described in Chapter 3, they queries were re-weighted using only the AIC output from the NLR.
For those unfamiliar with these two statistical measures, PLR is the ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease. NLR is the ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease. Both of them can be calculated using Sensitivity & Specificity.
The NLR was chosen as a weighting method because the entire system is prone to bias toward finding something, a win or a loss, because it is built to find those two things. Thus, what is missing is the
Figure 5. Sensitivity & Specificity, Loss Network Queries
likelihood that the test misses a positive prediction. The reality that each test is making a positive prediction is affirmed by the fact that the network links all of these teams to each other as a structural building block. Weighting those connections by the likelihood that the test will correctly make a true positive prediction, especially when the sensitivity of that building block is low, is redundant and prone to bias.
A third class of query (or 'test') is also identified in Figures 4 & 5. These are the queries which are both sensitive & specific. Technically speaking, they are the win/loss queries in which the NLR <= 0.7. The weight for these queries was determined using the method outlined in Chapter 3, where the PLR AIC is subtracted from the NLR AIC. These represent the 'best' queries/tests within the system.
So, back to Texas Tech. It turns out that the majority of the 'loss' queries in which they fell were among the 'green' queries shown in Figure 5. Duke was not, however, and continued to be a 'pretender' after re-weighting the network. I'll address the breakdown of teams by 'type', including their Evolutionary Game Theory Strategy, in the next post. However, for now, I will just say that Texas Tech provides a tangible example for how it is possible to unintentionally bias a project like this one, and is thus a great learning example.