by Jess Behrens
© 2005-2018 Jess Behrens, All Rights Reserved
In my last post I covered entropy and how it can be used to develop an estimate for how 'disordered' a given Tournament year really is. In that process, I showed that much of our surprise as fans, when a lower seed team 'upsets' a 'better', higher seeded team is the result of how seeding is a necessary, but artificial, grouping (1 & 2).
1. Introduce the idea of entropy and why it is the ideal tool for quantifying tournament results.
2. Cover the difference in entropy by tournament seed vs. Evolutionary Game Theory (EGT) Strategy.
3. Show how the Tournaments group together into 2 clusters (Seaborn clustermap) based on
regression results from EGT simulations.
4. Examine the 2nd Round Game Theory equilibrium state in these two clusters.
5. Successfully predict the entropy in each of the Tournaments using EGT simulation totals within a
In this post, I'm going to cover how we can use simulation results, based on Evolutionary Game Theory (EGT) strategy, to group yearly tournament's into 'clusters', and then examine the Game Theory equilibrium for these clusters (3 & 4).
As covered in previous chapters, after grouping all tournament teams into their respective EGT strategy, I run random Evolutionary Game Theory (EGT) simulations. In these simulations, teams have no identification other than their strategy type. The process involves simulating 6 independent rounds for each of the Tournament years from 2005-2018 per iteration for 10,000 iterations. Each round has the same number of 'games' as it would in the real tournament (1st - 32; 2nd - 16, etc.) and no 'rules' are applied to who participates in those games. Instead, two competitors are selected at random based on the proportion of each EGT strategy in a given tournament year. As I said before, the rounds are not tied together in any meaningful way; you don't have to 'win' a game to move on.
This is because the goal is not to predict the winner, but to tabulate the total amount of 'energy' that accumulates for each EGT type over the course of a given iteration. Evolutionary Game Theory is an ecological concept, and the 'games' which have been outlined within that literature are designed to illustrate important ecological concepts. In the case of the Hawk/Owl/Dove game, the one we are using, the goal is to illustrate how species who cooperate with one another, who share energy, can out compete those who do not. If you want to know more about the Evolutionary Game Theory, check here. So, this is about how Owls & Doves combine to out compete Hawks, where out competing means accumulating more 'energy' as a group.
Once these totals have been tabulated for each iteration across all 14 tournaments (14 x 10,000 = 140,000), it becomes possible to look at how the energy total for each strategy type predicts every other strategy across all random tournament iterations two years at a time. So, for example, how Doves predict Owls, and vice versa (Owls predict Doves), between 2005 & 2006, 2005 & 2007, etc. across all iterations. When you do this, you end up a linear regression equation for each combination of strategy & year combinations. It then becomes possible to subset these regression values and look at how one of the regression statistics (slope, r_value, intercept, or ratio of slope/r_value) clusters using Seaborn's clustermap function.
But which groupings are the 'right' ones? Specifically, which groupings are the 'right' ones for predicting the total adjusted entropy for a given tournament year? To answer this question, I ran through all the possible combinations of total r-value by year from the simulation results & compared them to the total adjusted entropy by EGT strategy for the 2nd through the 5th rounds. I used the pearson correlation coefficient as a proxy for how the given total would do as an independent variable in a multivariate regression. Two performed acceptably well:
1. Hawk Total Energy as a predictor of Dove-Owl Energy & vice versa -> p = -0.4754
2. Owl Total Energy as a predictor of Dove Total Energy & vice versa -> p = 0.5304
These two groupings aren't surprising. In fact, they make a lot of sense, given the results of the homophily analysis I did in Chapter 20. Figure 1, which shows the rate of Out vs. In caste link & triad formation in both the 'win' & 'loss' networks, is reproduced here to illustrate why these two groupings
Figure 1. Out vs. In Connections by Evolutionary Game Theory Strategy
are not surprising. As you can see, in Figure 1 Hawks & Dove-Owls are 'closer' to each other than the other strategies in the loss network, as are Owls & Doves. In fact, Owls seem to be a more moderate version of the Doves & the Hawks a more moderate version of the Dove-Owls.
If we then isolate these two groupings, Owls -> Doves & Hawks -> Dove-Owls, from the simulation results, we can feed them into Seaborn's cluster map function. Figure 2 shows the result of that
Figure 2. Seaborn Cluster Map by Strategy & Year Combination, Total R-Value,
Hawk->Dove-Owl & Owl->Dove
cluster analysis. I've highlighted two clusters, A & B, of tournament years. As was noted earlier, it is possible to use regression statistics as the 'sum' being clustered in a seaborn clustermap. Figure 2 shows the total (sum) of the r-value for these groupings across all strategy & year combinations. The red squares are '1.0' because they show where a given tournament year matches with itself.
But are these two cluster groups fundamentally different in any meaningful way? One of the easiest &most direct ways to measure the difference between these two clusters is to perform a t- & f-test on the entropy scores as if cluster A is separate from cluster B. The results of these tests are listed below:
1. t-test of entropy in cluster A vs. cluster B = 0.948; p<=0.1
2. f-test of entropy in cluster A vs. cluster B = 0.135
The t-test results are significant at p<0.1 & very nearly significant at p<=0.05, while the f-test is not at all significant. This suggests that the population variance (f-test) is not different, making cluster A & B definitely part of the same population, but in which the mean entropy value is definitely trying to separate.
Another way to determine how clusters A & B differ is to use the game theoretic tools developed by John Nash. To make this work, we need to identify a consistent set of players & strategies. The strategies are self-explanatory & easy to identify: the EGT strategies Hawk/Owl/Dove-Owl/Dove. The two players, however, are not as easily identified. The only possibility is to use seed to determine which player is which. In a Nash square, the row player is traditionally labeled player one which leaves the column for player 2. Here, we'll make the higher/better seeded team in any given tournament game player 1 & the lower/worse seeded team player 2. This, then, makes it possible to examine all games from the first through the fourth game of the tournament. In the fifth round (Final Four) you have the possibility of teams who have the same seed playing one another.
Based on my previous posts, there seem to be possible Nash mixed strategy equilibria in the second round, depending on how you group the tournament years (which years you include; which you leave out). Those previous attempts did not seek to justify which tournaments were grouped together as explicitly as what I'm presenting here.
What this approach tests is the rational nature of the tournament games. While there is no one 'person' making a decision about which strategy to choose in a given match up, one could argue that a 'team', as an entity that is separate from the players involved, does have a different rational nature & motivation depending on their seed and the seed of the team they are playing. Thus, examining the second round using a game theoretic method is an examination of the potential for a given EGT strategy to confer a competitive advantage as a function of the proportion of each strategy within the overall tournament population.
Figure 3. Nash Equilibrium Analysis, 2nd Round Tournament Games
Figure 3 shows the output of 3 separate Nash equilibrium analyses. The top, labeled 'All', is just that: all match ups regardless of cluster type & year as labeled in Figure 2. The next two show the Nash equilibrium analysis for each of the two cluster types, 'A' & 'B', respectively. In cases where the probabilities are layered on top of one another, the top probability applies to the higher seed & the lower to the, uh....lower seed. Where the probabilities are horizontal, the first applies to the higher seed & the second (after the comma) to the lower seed.
Each of the three game theory 'squares' reduce from a 4x4 game to a 2x2 game. This is because the Dove & Dove-Owl types, for both players, are dominated by the other two strategy types. What this means, for those who are unfamiliar with game theory, is that there is never a situation where either player would benefit from 'choosing' to be either a Dove or a Dove-Owl in the second round.
As an example, consider the top 4x4 square (All). Looking at the higher seed player, when you compare the row probability for 'Dove' (the first of the two probabilities) all of them are zero. If you compare this value to the values above it, all of the 'row' probability values are larger than zero. Thus, assuming the high seed is a completely rational player, an assumption that is at the base of game theory, the high seed player would never choose to be a 'Dove'. You can see that the same is true for the 'Dove-Owl' row up until it intersects with the 'Dove' column (1 > 0). This column is eliminated, however, because in every situation, it is completely dominated by all other columns for the lower seeded player.
Yes, if left to their own devices, a higher seeded 'Dove-Owl' would consider 'Dove-Owl' to be a valid choice because of its relationship to the 'Dove' strategy for the lower seed. However, assuming the higher seeded player had perfect knowledge of the lower seeds choices, they would still never choose 'Dove-Owl' because they would know that the lower seeded team would never choose 'Dove'.
Iteratively eliminating each of these dominated strategies leaves us with a 2x2 square in all 3 cases. In the first case, where all years are grouped together, the higher seed has a mixed strategies equilibrium while the lower seed is purely hawk dominant. That mixed strategies equilibrium falls apart when one separates out the tournament years into their respective clusters, A & B. The A Cluster is Owl dominant for the high seed & Hawk dominant for the low seed; while the B cluster is purely Hawk dominant for both players.
So, to answer the question mentioned earlier: do these clusters differ in any meaningful way? The answer is absolutely, yes. These two groupings of tournament years show that the distribution of species type by year has a dramatic impact on the form that the tournament takes & who wins. Furthermore, these considerations really start to take shape in the second round of the tournament.
That's it for this post. Next up, I will cover the use of multi-variate regression as a tool to predict the total adjusted entropy by tournament year.
<--Chapter 21 Chapter 23-->
#entropy #Homophily #NashEquilibrium #MixedStrategiesEquilibrium #ClusterMap #NCAATournament #NCAA #EvolutionaryGameTheory #MonteCarloSimulations #basketball #NetworkAnalysis #MensCollegeBasketball #Matplotlib #Seaborn