Our Recent Posts

Tags

No tags yet.

Data Visualization Examples


by Jess Behrens

© 2005-2019 Jess Behrens, All Rights Reserved

These four examples are reproduced from my blog where I describe them in much greater detail. I'm including them here as examples of how I've used python to analyze & visualize output from my analysis of the Men's NCAA Tournament.

Overall, my project models the NCAA Tournament as a function of two separate but related processes:

  1. a team's tendency to win at least one game &

  2. a team's tendency to lose in the first round.

After the 2019 Tournament, I discovered that the way I was weighting these two networks over-emphasized specificity at the expense of sensitivity. A more thorough description of the math can be found here. To convey all of this, I constructed two graphics, one for each network (win & loss), to illustrate how a thorough analysis of the sensitivity & specificity made it clear that many of the structural queries I had written needed to be eliminated. The 'Win Network' version of these graphics is shown below.


After building the two networks, I use several of the network centralities calculated from them to classify all tournament teams into their Evolutionary Game Theory Species' archetype. That process is described in this post. I've also created this interactive plotly visualization to help people see how all 1002 teams, from 2005-2019, break into their archetypes based on 3 important centrality measures: Loss Network Harmonic Closeness Centrality, Win Network Betweenness Centrality, & Net 'Key Player' Score. A 'Key Player' cost/benefit value is calculated for each team in each network & the Net score is simply the Win Key Player Value - Loss Key Player Value.

If you're wondering, the Dove-Owl (Orange Square) with a net benefit value of around 8,000 is the only 15 seed to ever make the Sweet 16, 2013 Florida Gulf Coast, who was a-maz-ing-ly well positioned in the win network. They are still classified as a Dove-Owl, however, because of how centrally they were located in the loss network as well. This is common among most of the lower seeded teams who score huge upsets.


Homophily is an important principle that is theorized structure much of network science, and there are many different ways to define & quantify it within the field. I've used the Out vs. In Triad/Link ratio published by Chandrasekhar & Jackson in this project. The interactive figure below illustrates that work. When you examine this graphic, keep in mind that the 4 lines (2 for the win network & 2 for the loss network) represent the same teams grouped two separate ways: once by seed & once by EGT species.

To help you understand this graphic, consider that a point's distance away from the gray line is an indication that it is forming non-random triads & links within the two networks. Thus, as a point moves away from the gray line & grows larger along either the x or y axis, homophily decreases & triad/link formation among the members of the network become less random. Furthermore, as the ratio, whether Out links / In links or Out triads / In triads, grows, it is an indication that a group (seed or EGT species) has less of a cohesive identity within the given network (i.e. they are forming 'Out' triads or links with other seeds/species' faster than they are forming 'In' connections with their own seed/species').

None of the points in the figure fall on or below the gray line, so none of them are 'random', although the 'Dove' species in the loss network is the closest to random. Having all the points above the 1 to 1 line indicates that both networks have a type of 'social pressure' to form triads with other teams regardless of that teams identity. If you think about how the network is formed, this makes sense. The structural queries are weighted by their ability to group together teams with a similar outcome. And, while seed may be an important psychological factor in how a team plays, from the graphic below you can see that it is obviously not a driving factor in how teams group together in the network. These are teams fighting/competing to win, and at the end of the day, it is more important simply to have characteristics that are in common/be grouped with teams who have been successful in the past than it is to have a separate identity as a seed. What the EGT species' designation very clearly does is to mitigate some a large portion of a team's seed based identity. It provides a much better picture of what a team 'is' than seed.


You can see that homophily is much lower when the teams are grouped by seed by comparing the blue & pink (seed) lines to the orange & yellow lines (EGT Species). The Y axis represents the ratio of out caste triangles to in caste links. Don't let the word 'caste' confuse you. For the blue & pink lines, caste means seed; for the orange & yellow lines caste means EGT Species. An out 'caste' Triangle occurs when three teams are linked together in either network & any one of them is different than the other two.


So, for example, a 6 seed linked to another 6 seed in which both also have links to a common 7 seed is counted as an out 'caste' triangle in the blue & pink lines. A Hawk linked to a Hawk where both share links to a common Owl is an out 'caste' triangle by Species. Likewise, out & in 'caste' links work the same way, but for pairs of teams. Thus, from our first example, the link between the two 6 seeds is an in 'caste' link while the 6 seed linked to the 7 seed is an out caste link. It should be apparent from this description that a triangle is composed of 3 links.


What the above figure is telling us, then, is that the last blue dot, the 15 seeds in the Win network, has an out 'seed' triangle to link ratio that is nearly 1600. This means that the 15 seeds have very little 'identity' as such within either network - they don't bond with each other as much as they bond with teams of other seeds. If you compare the location of the 4 species win & loss network points (yellow & orange) to the seed points (blue & pink), you can see that the yellow & orange points have much smaller ratios. That is indicates that a teams species provides a much better 'picture' of their identity within the network than does seed.

Finally, as a former student of ecology, I was interested to know what Lotka-Volterra population equations could tell me about the NCAA Tournament. Thus, I put together a series of interactive plotly graphics, one of which is reposted below, that looked at how each EGT species effected every other species. Surprisingly, after 20,000 Evolutionary Game Theory simulations, it indicated that the Hawks & Doves are in a stable balance.


The lines in the figure above represent a carrying capacity isocline for each species' & represent the high end of the 95% confidence interval as estimated using the 20,000 EGT simulation results. Regression slope was used to calculate the impact a species has on all of the other species.

The point where the isoclines connect with the x & y axis is very important. It represents either their own species' carrying capacity (i.e. Hawks on the x-axis & Doves on the y-axis) or their projected equivalent in terms of the competing species (i.e. Hawks on the y-axis & Doves on the x-axis). The process of projecting from one species to another is possible because Evolutionary Game Theory (EGT) simulations record the accumulation of energy that results from random interactions among competing species'. Using regression, it is possible to translate each species into the functional equivalent of every other species.

The graph pictured above is said to be in stable equilibrium because the two carrying capacity isoclines cross, with the x-axis species dominating the y-axis species above the intersection & vice versa. What this means is that below the intersection point and between the two lines, Doves are favored because they have a higher carrying capacity than Hawks (i.e. the Dove isocline intersects the x-axis at a larger value than the Hawk isocline). Above the intersection point and between the two lines, Hawks are favored. Neither are favored in the areas above and below the two lines. While only one of the the tournament average points fall between the two lines, & the application of EGT simulations to a single elimination basketball tournament is a stretch, I find it very compelling that the two species are energetically balanced. If this were a real bird community, these two species would stabilize one another & be able to co-exist.

Given that the size of the teams assigned to each species' population was done using an equation based on two social networks & a hypothesis for how the network centralities relate to Evolutionary Game Theory, these results seem to support this network based method of classifying the teams. They output reflects a population biology 'state' that is well published and known to exist in nature.

#LotkaVolterra #PredatorPrey #BetweennessCentrality #KeyPlayer #EvolutionaryGameTheory #basketball #MensCollegeBasketball #NCAA #NetworkAnalysis #Homophily #MonteCarloSimulations

©2018 by jessbehrens.com. Proudly created with Wix.com