by Jess Behrens
© 2005-2018 Jess Behrens, All Rights Reserved
The linear nature of the Hawk vs. Owl competitive relationship within the NCAA Tournament data set is not the only factor affecting the relative strength of these two primary species types. Whether or not the two have energetically separated from each at a population level is also a powerful determinant of the relative success for each of these two species in a given tournament year. Figure 1 is a box plot showing the 95% confidence intervals for the total energy (or success) of each species type across all tournament years.
Figure 1. Total Energy by Species Type with Conf. Intervals, All Tournament Years
Figure 1 was first displayed in Chapter 5 and shows that, when all tournament years are considered together, only the Owls have a statistically significant separation from the Doves & Dove-Owls. There is a large amount of overlap amount of overlap between Hawks & the Doves/Owls/Dove-Owls (separately) as well. Given the more stochastic nature of the Hawk strategy, in that it includes an additional cost term, this overlap makes a lot of sense.
From what we already know about the tournament, given that only Owls & Hawks have actually won the tournament and most often meet in the final, it seems logical to wonder: Do these two species types exhibit energetic separation in some of the tournaments? If they do, which of the two species types is favored by this separation? Years in which the 95% confidence interval of Hawks & Owls do not overlap would, ecologically speaking, mean that the two of them occupy separate energetic 'niches'. Niche theory states that two species can not occupy identical niches and that, given inter-species competition for a single, sharable resource, that competition will force one or the other, or both, to change behavior (strategy) to minimize this competition. In nature, species normally are able to mitigate this competition by considering alternative food sources. In the tournament, we have only one resource (food), and that is experience, as I stated in Chapter 1.
Thus, if there are years where Hawks & Owls are energetically separated, it would seem to suggest that one or the other had 'won' the competition. From what we know about game theory, owls win at the population level most of the time, meaning that as a group they have greater total energy. Figure 1 shows this in that the average population value for the Owl species is above the top edge of the Hawks 95% confidence interval. However, on average, individual Hawks out compete the owls, as was shown in the Average Predicted Energy values shown in Chapter 6. Based on these facts, a coherent hypothesis is that in years where the two populations separate, the energetic advantage enjoyed by the Hawks would result in a significantly high Poisson statistic for tournament winners. The lack of overlap would indicate that the two species are, technically, not competing any longer.
Figure 2. Total Energy by Species Type with Conf. Intervals, Separated vs. Un-separated Tournaments
As Figure 2 shows, there are definitely years where the two species have separate 95% confidence intervals. In fact, it happens more often than not. Here's the breakdown:
Separated Years: 2005, 2007, 2008, 2010, 2011, 2012, 2013, 2014, 2015, & 2016
Un-Separated Years: 2006, 2009, 2017, & 2018
Figure 2 clearly shows a high degree of overlap between the Owls & Hawks in Un-separated years, suggesting heavy competition between the two species. But does the hypothesis hold? Are Hawks at a distinct advantage in years where they enjoy a separate niche from the Owls? According to Table 1, yes,
Table 1. Poisson Significance of Tournament Results by Round & Species Type, Separated Years
the Hawks do enjoy a statistical advantage, at least according to the Poisson, in these years. Of note is the fact that the 'Less Linear Years' described in Chapter 6, where the Hawks are at a distinct advantage, are also 'Separated' years. Also, it's not that the Owls don't win in these years, in fact they have the same number (5) of champions as the Hawks. It's that there are so many more Owls that having 5 Owl champions is not significant, at least according to the Poisson.
Is the opposite true as well? Do Owls enjoy a statistical advantage in the four years where Hawks & Owls don't separate? As Table 2 shows, clearly the answer is again, yes. And, again, it's not that the
Table 2. Poisson Significance of Tournament Results by Round & Species Type, Un-separated Years
Hawks don't win it all in these years. It's that the 1 time they do win is not significant according to the Poisson.
Do the other relationships described in Chapter 6 hold? What do the population and individual species fitness plots say? Figures 3 & 4 show the population fitness plots by species type for,
Figure 3. Population Fitness by Species, Separated Tournament Years
respectively, Separated & Un-separated tournament years. In figure 3 (All species p<0.001; r-values: Hawks 0.82, Owls 0.54, Doves 0.66, Dove-Owls 0.55) , the Hawk & Owl lines are separate and cross, the equivalence point, at 48%. This rather large percent is similar to the high percentage crossing point found in the Less Linear fitness plot shown in Chapter 6 perhaps making it an indicator of Hawk strength across tournament structural considerations.
Figure 4, (All species p<0.001; r-values: Hawks 0.84, Owls 0.95, Doves 0.87, Dove-Owls 0.80) on the other hand, is markedly different than figure 3. As would be predicted by our hypothesis that
Figure 4. Population Fitness by Species, Un-separated Tournament Years
the lack of energetic separation between Hawks & Owls in these years would lead to increased competition, and subsequent suppression of the Hawk's individual energetic advantage, the two lines are essentially the same. They do not cross, either. The Hawks are slightly higher across all percentages.
Individually, the fitness plots for Separated & Un-separated years, Figures 5 & 6 respectively, confirm the relationships found in Figures 3 & 4. As seen in the Less Linear years individual fitness plot, Figure 5 (All Species p<0.001; r-values: Hawks -0.08, Owls -0.35, Doves 0.45 Dove-Owls 0.17) also contains a sharp, positively sloped Dove-Owl regression line that crosses both the Hawk & Owl regression lines.
Figure 5. Individual Fitness by Species, Separated Years
Furthermore, the small number of stochastic results at low percentages for Dove-Owls that is present in the Less Linear individual fitness plots (Chapter 6) is also repeated in Figure 5. Finally, the scatter plots in Figure 3 clearly show a high degree of separation between the Hawks & Owls.
Figure 6 (All Species p<0.001; r-values: Hawks -0.16, Owls -0.03, Doves 0.63, Dove-Owls 0.07) also conforms to expectations. The Dove-Owl line exhibits a high degree of stochasiticity at low
Figure 6. Individual Fitness by Species, Un-separated Years
percentages, which flattens the regression line and prevents it from crossing the Owl & Dove lines. Furthermore, the Owl scatter plot is very much at the center of a much more stochastic Hawk scatter plot. Unlike the Individual fitness plot shown for linear years in Chapter 6, however, the Hawk line in un-separated years maintains significance at p<0.001. At the population level the fitness plot, shown in Figure 4, clearly shows the Hawk & Owl lines cross at about 38% (Figure 6). This would suggest that the Owls in un-separated years are at an advantage because they are energetically equivalent to the Hawks, but don't experience the same degree of stochasticity.
One of the best tools for understanding the structure behind a data set is called a cluster map. I tried to find a succinct but robust explanation of what a cluster map does, but I couldn't find one. So, here's my attempt: data analytics is all about measurement and in any given analysis, that measurement will have a representative statistic. As a good example, consider 'Runners Batted In' (RBI) in baseball. Or Passing Yards in football. Both are measures that are used to help understand how well an athlete is doing week in and week out. A cluster map essentially takes that statistic and does two things:
1. It shows how that measurement varies with respect to two additional index variables
2. Rearranges the index variables to reflect the similarity between them given the underlying measurement.
Those of you familiar with Access and Excel, or who work with data even casually, will know what a pivot table is. Well, a cluster map takes a pivot table and rearranges the rows and columns so that the closer two of them are (rows or columns), the more similar they are.
This all makes a cluster map an ideal tool for visualizing how energetic (fitness) separation affects NCAA Tournament outcomes. Figure 7 shows an example of a cluster map where the rows are species type &
Figure 7. Cluster Map Showing Total Population Species Type Energy (Fitness) by Year
the columns are the years included in my data set. The color scheme goes from high (red) to low (blue). Since things that are close in Figure 7 are more similar, Doves & Owls are more similar than Doves & Hawks. Likewise, 2006 & 2018 are maximally different, while 2006 & 2017 are maximally similar. From Figure 7, we can thus deduce that the similarity between Doves & Owls is high and positive.
The lines on the top and the side are very important. They show the structure of the clusters within the data. Looking at the set of lines on the left, there are 2 clusters: One of Doves & Owls and the other of Hawks & Dove-Owls. From the top set of lines, we can deduce that there are quite a few clusters, some of which have smaller clusters within them. Look closely at the middle 12 columns in Figure 7. These are the columns running between 2011 & 2013. If you compare the years referenced in these columns with the list of 'separated years' listed above, you'll see that they are one and the same. That means that the un-separated years (2006, 2009, 2017, & 2018) are grouped together on the two edges of the cluster map. Thus, Figure 7 gives credence to the results presented here because it verifies the assertion that separated years are more similar to each other than to un-separated years. That assertion is reinforced by the cluster lines shown at the top of the Figure 7. There is a break linking 2006 & 2017 as well as one linking 2009 & 2018.
Unlike the Linear/Less Linear split, Separated vs. Unseparated years have no relationship to the number of upsets among highly seeded teams. Table 3 shows the number of upsets among 1, 2, & 3 seeds in
Table 4. Poisson Significance of First Round Loss by Seeds 1-3, Separated vs. Un-separated Years
separated & un-separated tournaments. Neither of these structural elements produces enough upsets to be significant, at least with respect to the Poisson. This result is of note because all of the Less Linear years, which do produce a significant number of upsets among the top seeded tournament teams, are also separated years. Of course, the separated years are closer to significance than the un-separated years, but p<0.21 is not significant in anyone's book.
Predicted Average Energy (Success)
As covered in a previous post, considering the average predicted energy for Hawks & Owls may provide more evidence that one species type is favored over another in a given tournament years. Tables 5 & 6 show the predicted average energy for Separated & Un-separated tournament years.
Table 5. Average Predicted Energy, Hawks & Owls, Separated Tournament Years
As was outlined in Chapter 6, where this method was first described, here is a step by step account of how the values in Tables 5 & 6 were produced:
Use the network to classify teams as Hawk, Owls, Dove-Owls, or Doves
Run the evolutionary game theory simulations using the percent of total population represented by the totals calculated in Step 1
Calculate the linear best fit regression model using the simulations from Step 2 for each species type.
Plug the percentages from Step 1 into the model developed in Step 3 to calculate the predicted total population energy (or success) for each species and divide by the number of individuals within that species for the given tournament year.
If energetic separation, or the 'niche' hypothesis described earlier, is correct, one would expect that the T-Test comparing Hawks & Owls would be significantly different in Separated years & not significantly
Table 6. Average Predicted Energy, Hawks & Owls, Un-separated Tournament Years
different in un-separated years. If the percent of total population for each species follows a different fitness path in Separated years, as shown in figure 3, one would expect more stochasticity in Table 5 (a significant T-Test) because those results are used to estimate the energy state of the real data. Conversely, the opposite is true in Table 6. A lack of separation in simulated energy totals for Hawks & Owls, as seen in figure 4, produced regression lines that were essentially identical. With such similarity between the Hawks & Owls in un-separated years, similar predictions in average energy should be expected as well. Thus, Tables 5 & 6 seem to confirm this hypothesis in that the T-Test for Table 5 is significant, while the T-Test for Table 6 is not significant.
I've now identified two structural factors that affect NCAA Tournament results: the linear, competitve nature of the Hawk & Owl species type and whether or not these two dominant species are energetically separate from one another in a given tournament year. But what happens when we consider both of these elements simultaneously? That will be the subject of my next post!
<--Chapter 6 Chapter 8-->
#SpeciesFitnessPlots #SpeciesCompetitionPlots #MonteCarloSimulations #niche #basketball #NetworkAnalysis #EvolutionaryGameTheory #NCAATournament #NCAA #MensCollegeBasketball #Seaborn #Matplotlib #ClusterMap