Abstract This thesis will use data analysis to explore changes in voting in Allegheny County over the last 6 years. Allegheny County provides information on polling locations, voting district boundaries, congressional district boundaries, and house of representative boundaries. In addition to these political data sets, demographic information such as school district boundaries, police zones, and census date will also be considered. Using data from 2014 to present, analysis will be performed to see what has changed about voting in Allegheny County over time. In addition to this data analysis, this paper will explore trends in voting in Metropolitan areas, such as Atlanta, and how those compare to Pittsburgh. Atlanta has had statistically significant changes in recent history and will feature a useful case study to compare to Allegheny County. 1 Introduction The goal going into this project was to perform data analysis using publicly accessible data to see how voting in Pittsburgh compares to other areas, and how it has changed over time. Initially, it was intended to focus on analyzing information related to polling booth locations and voting districts, but the formatting of that data did not fit the methodology that used for data analysis. It will still include an analysis of demographic data describing Allegheny County, and discuss what that information implies about voter suppression. Background Discourse about the 2020 election was nearly impossible to avoid prior to the election. Discussion of voter suppression, voter fraud, and accessibility to voting was prominent during this time. As a nation there were debates over how to make the process of voting accessible during the pandemic, and that conversation included how to avoid voter suppression that was already widespread in America. These current events led to the idea to conduct data analysis related to these topics, especially in relation to what data there was on Allegheny County to learn what that data implies about the state of voter suppression within Allegheny County. Aside from choosing this location due to our proximity to it, since Pittsburgh is in Allegheny county the metropolitan area could provide more interesting results. Literature Review The purpose of this literature review is to get background information on what voter suppression is, what caused it in the past, and what contributes to it today. Having 2 sufficient background knowledge on voter suppression will be necessary for further data analysis. “Passive Voter Suppression: Campaign Mobilization and the Effective Disfranchisement of the Poor” looks at Old vs. New Voter Suppression, Passive Voter Suppression, and how to combat passive voter suppression. “The Politics of Voter Suppression: Defending and Expanding Americans’ Right to Vote” is a broader overview of how voter suppression affects voting in America and how voter suppression affects elections. Both resources were helpful in gaining a base understanding of what may indicate voter suppression in my research. Historical Context The United States has a long history of discrimination affecting voting. When America was first founded, voting laws were decided by each state, meaning only white men who owned land had the right to vote in most cases (The Library of Congress). “African Americans, women, Native Americans, non-English speakers, and citizens between the ages of 18 and 21 had to fight for the right to vote in this country” (The Library of Congress). The Voting Rights Act was enacted in 1965, making sure people of every race had the right to vote (Parish, 2014, p. 1). Even though legally everyone has the right to vote now, there were and are still ways to enforce voter suppression. Voter Suppression Voter suppression is the attempt to “restrict the right to vote” (Wang, 2016, p. x). There have been many attempts of voter suppression throughout American history, some examples include a poll tax, grandfather clauses, and literacy tests (Wang, 2016, p. 20). Voter suppression is very harmful to democracy, both in the United States and 3 elsewhere. “Every effort at vote suppression harms democracy, and it harms democratic citizenship” (Wang, 2016, p. 109). Access to Voting Voting continues to get more accessible with each election. In the 2008 presidential election, North Carolina introduced in-person early voting and allowed people to register and vote at the polling place during the early voting period (Wang, 2016, p. 91). North Carolina had 236,700 new voters who took advantage of same day registration and had the largest increase of new voters in the country (Wany, 2016, p. 91). Analysis Plan For the data collection process Data.gov, “The Home of the U.S. Government’s open data” was utilized to find data about Allegheny County (U.S. General Services Administration). It provided Census data from 2000 and 2010, information on school districts in Allegheny County, and employment data. There was also data about voting districts and polling locations, but unfortunately the formatting did not work with the program that was used to perform data analysis. For the analysis plan, demographic data describing Allegheny County was used to determine what may be implied about voting. Another goal within the Analysis plan is to determine how Pittsburgh compares to other areas in relation to voter suppression. On its own the information from Allegheny County does not say much about voting, so looking at how that information compares to other places will be helpful in the analysis plan. 4 Data Design After finding data to use and developing a plan for how to analyze it, a data design was established to perform that analysis. The program used to analyze data required a connection to a server, so it was necessary to emulate a server. The Oracle VM Virtual Box simulates a server, that can run SAS without needing to connect to a specific network or use a specific server. It creates a window that runs the virtual machine which emulates a server, so you can use your web browser as a host. The data that was collected about Allegheny County was uploaded into SAS. Most of the data had a data dictionary to go along with it, which describes each component of the data. Since there was data from a few different sources, it was important to look at each data dictionary to understand what the data was before trying to do anything with the data. A data dictionary has each field and a brief description of the data in each field. Once the data was uploaded into SAS, different data models were used to visualize each component of the data to make it easier to understand. 5 Results This graph shows the average riders on different public transportation routes each month. Each symbol represents a different bus route. While initially this may not seem relevant to voting, we can look at where each of these routes is located and see how that compares to the voting districts. Is the accessibility to voting proportionate to the amount of people who live and work in these areas? Will people who rely on public transportation be able to access their assigned voting locations? We also see that the average amount of riders drops significantly in the time leading up to the election due to the pandemic. Will these people still be able to visit their voting locations, or will they be able to vote remotely? 6 “In the 2016 presidential election, there was, according to the United States Census, a 30% reported turnout gap between the wealthy and the poor” (Ross, 2019, p. 656). In addition to looking at bus usage throughout Pittsburgh, we can also look at the property values to make a stronger profile for what the demographic information tells us about voting activity. In this graph each dot represents a different property value, and they are sorted by their value and their location. Income is a strong indicator of if someone votes. If there is a clear disparity between property values in Allegheny County, that may indicate that there is also a disparity in income, and furthermore a disparity in voting. 7 8 This chart shows the Annual Salary in 2017 and 2020 of workers in Pittsburgh, sorted by their starting date. The different symbols are for different departments. Aside from some outliers, there is not any significant difference between these two charts that stand out to draw any conclusion from. We can use the visualization of annual salaries in Allegheny County to make some assumptions in relation to voting but that is about it. Discussion Based on the information found through data analysis, there was not any statistically significant data that implied that there was voter suppression happening in Allegheny County. While none of the data collected implies that there is voter suppression, that does not mean that it does not happen. In addition to that, Allegheny County does not represent the Unites States as a whole, and there may still be voter suppression elsewhere. Comparison to Atlanta Recently state lawmakers in Georgia made provisions to restrict access to early or absentee voting and require an approved form of identification. Senate Bill 202 has been compared to the voting laws present during the Jim Crow Era (Cox, 2021). While some of the data that was analyzed may indicate signs of voter suppression, that seems insignificant when there are clear signs of it occurring right now. While looking for de facto signs of voter suppression in Allegheny County, a movement for de jure voter suppression is underway in the state that Atlanta is located in. 9 The initial plan was to do a case study comparing results about Allegheny County to information on Atlanta, but as of right now that is not a fair comparison to make. Conclusion Going into this I expected to find gradual changes over the years that may indicate a mild amount of voter suppression. While I did not find that in Allegheny County, there is still clearly evidence of voter suppression, maybe just not in Pittsburgh. While I was looking for subtle trends in voting that changed gradually, majorly impactful changes to suppress voting are being made in plain sight. Going into my thesis this is not the conclusion that I imagined that I would be giving, but it feels fitting. While data analysis and data mining can be used to find deeper insights and structure in the data that we already have, they are not tools to predict the future. My research did not go as planned, but that is a part of this as a learning experience. 10 References Cox, C. (2021, April 10). Georgia voting law explained: Here’s what you need to know about the state’s new election rules. USA Today. https://www.usatoday.com/story/news/politics/2021/04/10/georgia-new-votinglawexplained/7133587002/. The Library of Congress. The Founders and The Vote. (n.d.). https://www.loc.gov/classroommaterials/elections/right-to-vote/the-founders-and-the-vote/ Parish, H. (2014). Voting Rights Act: Historical Context and Associated Issues and Trends. Nova Science Publishers, Inc. Roiger, R. J. (2017). Data Mining: A Tutorial-Based Primer, Second Edition. Chapman and Hall/CRC. Ross II, B. L., & Spencer, D. M. (2019). Passive Voter Suppression: Campaign Mobilization and the Effective Disfranchisement of the Poor. Northwestern University Law Review, 114(3), 633–703. U.S. General Services Administration. The home of the U.S. Government’s open data. https://www.data.gov/ Wang, T. (2016). The Politics of Voter Suppression: Defending and Expanding Americans’ Right to Vote. Cornell University Press.