by Jess Behrens
© 2005-2019 Jess Behrens, All Rights Reserved
My first foray into projects that involved the integration of database programming for attribute management of a project intended for statistical analysis came during graduate school. I worked as a GIS Technician on the West Nile Encephalitis Monitoring Program, which required me to learn SQL & VBA. At the time, this was in the internet's infancy, data was still being transmitted primarily by FTP, & that was how we received daily updates from the states. My job was to integrate, clean, & aggregate it by county for mapping.
Since then, I've utilized this skillset on numerous projects, because every GIS project requires attribute management. Because it is ubiquitous, that management is often done in Microsoft Access on a mapping project basis (Postgresql & MySQL are also utilized, but less often). Furthermore, these attributes are often combined with census data to develop predictive models. As such, I've included two examples of my work in this area:
The database from my master's thesis project, the Worldwide Vector Density Database.
A description of the factor analysis work I did with Northwestern University
Worldwide Vector Density Database
While working on the West Nile project, I also was developing a linked, multi-Access database to store disease vector density data obtained anywhere on earth. Designed to be fully integrated with any GIS, this database was intended to make it relatively easy for users to enter and retrieve data, while also providing basic reports that could be customized by the end user. This was 1998 - 2003, so the options were much more limited & functioned using different technologies (i.e. ODBC) & I was a beginner at programming. However, I was able to accomplish many of the goals we set out for the project, despite the fact that my effort became obsolete with the development of the internet as a tool for recording, mapping, & analyzing data.
I've attached a copy of the WVDD (WVDD.zip) to this website for you to download. As I said, it is a linked set of Access databases. So, the files in this .zip will need to be kept together. I have an autoexec function that will run the first time you open the main database, labeled WVDD.mdb. You will get an error message indicating that the database can't find its linked tables. That's supposed to happen. After you open WVDD.mdb a second time, everything will run fine. I've also included a copy of the user's manual I authored for the project in WVDD.zip.
Exploratory Factor Analysis by US Census Block Group
While at Northwestern, I performed 3 separate backwards (subtractive) factor analyses for the entire US by Block Group that utilized 2010 US Census & 2011-2015 ACS 5-yr. estimates. Each of these analyses were based on different 'topics' and included:
Single origin ethnicity by age & educational attainment
Languages spoken in the home by age
The goal was to produce a national dataset that could be reliably used to make predictions at a national level based on regression analysis of local data (i.e. predict diabetes at a national level from Chicago area case data). While census & ACS data is widely used for regression analysis across multiple fields, it is notoriously unreliable, in part because the 'meaning' of census variables varies geographically. Performing a factor analysis helps to correct this problem by adjusting the variables based on their correlations at a national level while maintaining a degree of interpretability through the use of factor weightings. These adjusted correlations are far less likely to produce spurious results when making predictions at a national scale using a regionally based regression.
Liquor Licenses & Gunshot Wounds
A unique project that melded traditional statistical programming & GIS, while with Northwestern University's Medical School, I worked on a project looking at the relationship between gunshot wounds & packaged goods liquor stores. The method involved using the gravity model as described in the FCA 2 Step Method for estimating the probable relationship between a gunshot wound and all of the liquor licenses within the study area.
These gravity estimates were then aggregated by census tract and used as an independent variable in an iterative geographically weighted regression (GWR) analysis. GWR derives from multi-variate Ordinary Least Squares (OLS) regression but focuses on the spatial non-stationarity of the independent variables (measured using the Koenker statistic). Thus, significant, traditional OLS work must first be done prior to GWR as well as during each step of breaking the study areas into regions. Logistic regression was then used to identify the Odds Ratio for liquor licenses within each region.
To ensure that the relationship was not spurious, the process was done repeatedly using multiple business and location types. Schools, churches, grocery stores, etc. were included in the gravity calculations used in this second stage of the analysis. Two comparisons were made:
The first where the liquor licenses were included with all other business types
A second where only the other business types were included (liquor licenses left out)
While risk remained with the liquor licenses included, the risk disappeared when the liquor licenses were dropped.
Of note is the fact that risk was unequal across the study area, and the liquor licenses actually decreased the overall risk of gunshot wounds in some areas. Obviously, the focus was on those areas where the risk significantly increased, but noting this contrary fact is vitally important. The relationship between gunshot wounds and liquor licenses is a complex one that involves multiple social, economic, & spatial dynamics.
Due to publication rules, I can't reproduce the maps and tables here. If you're interested in reading about the study, it has been published in the American Journal of Surgery.
Crandall M, Kucybala K, Behrens J, Schwulst S, Esposito T. Geographic association of liquor licenses and gunshot wounds in Chicago American Journal of Surgery 2015;210(1): 99-105.