Your browser does not support the HTML5 canvas tag.

Data Analysis

Vermont Department of Financial Regulation Projects

The following are data projects I have done with the state of Vermont that I can share publicly.

Auto Study

Starting in 2023, at the request of the state legislature, the Vermont department of financial regulation began collection data from auto body repair shops, auto insurance companies, and consumer complaints. The data collected in ther study included labor rates (from both the insurer and the auto shop sides), the use of preferred shops, the use of aftermarket parts, the practice of virtual adjusting, the use of supplements, and the effect on the consumer of current insurance and auto shop practices. The goal of the study was to make reccomendations to the state legislature on the need for updated/new legislation and on the need for updated regulation/regulator practices. My role in the auto study was to clean, analyze, and summarize the data that had been collected. I did this with the help of our summer interns. You can read the study here: Auto Study .

Network Adequacy

CMS provides network adequacy standards for health insurers (See CMS network adequacy). States are required to assess health insurance companies according to these standards on a yearly basis. The standard work essentially as follows: for each specialty of health provider (primary care, dermatlogy, etc..) and each population designation (rural, micro, metro, and large metro), the CMS sets a maxmium distance and a maximum time that an insured will have to travel to recieve in-network care. For example, in a large metro area, an insured should not have to travel more than 30 minutes to see an allergist/immunologist. The network adequacy criteria is applied at the county level. A company "passes" for a given county and specialty if 90% of people in that county can travel to an in-network provider of that specialty within the specified time and distance. Health insurance companies provided the Vermont department of financial regulation with a table of providers, their specialties, and their addresses. CMS provided a table of the standards for times and distances. CMS also provided census data with latitude/longitude pairs for a representative sample of Vermont citizens. I cleaned and analyzed this data in R (usigng a package that can find the travel time/distance between two latitude/longitude pairs) and created a scorecard assessing the netowrk adequacy of our health insurers.

AI Use Survey

In collaboration with other states and the NAIC, the Vermont department of financial regulation sent a multi-hundred question questionnaire on the use of AI to the major health insurers operating in Vermont. The questionnaire featured both quantitative and qualititative questions. I analyzed and summarized this data in order to give regulators both in and out of Vermont a clear picture of the level of AI use of these health insurers, their current AI governance structure, and the potential consumer impact.

Projects

Norwich IPEDS Data

The goal of this project was to help the faculty at Norwich University make informed decisions about the future of the school, with a particular focus on budget matters. Using the Integrated Postsecondary Education Data System (IPEDS), I collected information on core revenue and expenses from 2006 to 2022 and on staffing from 2012 to 2022. I also used data from the American Association of University Professors to analyze tenure status from 2002 to 2022. The data from these sources is already mostly clean, so I only had to clean up names and perform some pivots. I then used the tidyverse package to create the visuals and compute some extra columns such as the discount rate. Some of my key observations are that the discount rate when from 27% in 2008 to 59% in 2022 and that the number of part-time faculty has grown from 72 in 2002 to 239 in 2022 while the number of full-time tenure track and tenured faculty numbers have remained relatively constant. Below you will find a summary and full report of the data, a dictionary outlining the specific meanings of the categories, and a repository of the files I used for the analysis.

Best Original Score Data Analysis

In analyzing the best original score category for the Oscars, I hoped to challenge myself with a particularly unclean data set while learning more about two of my interests: movies and music. I scraped the data from the wikipedia page. On that page the nominees and winners are stored in separate tables by decade, values span across multiple rows and columns, category names are intermixed with film names, the nominees are listed in the same cell, separated by commas and the word "and," and the winners are merely highlighted in the tables. In addition to this the category name changed over a dozen times over the years and there is a many-many relationship between years and category names. There is also a many-many relationship between films and nominees. Using power query, I organized this data into one table with columns for year, award show number, film name, category, whether or not it won, and the nominee name. I then used the organized data to show the films and nominees, find the nominees with the most nominations and the most wins, and highlight how the categories have changed over time. I made sure to use slicers and drill-through effectively. The results clearly showed that John Williams was an outlier and that the Newman family was exceptional. Acting on this, I made separate pages dedicated to John Williams and the Newman family. To understand the Newman family I imported the hierarchy chart visual, wrote a table summarizing the family tree and joined it with the table of films. Below you will find a pdf sample of the power BI report as well as the power BI file itself. I do not have the correct license to share a web version of the interactive dashboard, otherwise I would.

Medical Spending of the US Elderly (Updated for 2021)

In 2016, De Nardi, French, Jones, and McCauley analyzed medical spending of the US elderly (link here). They used the Medicare current beneficiary survey (MCBS). I used this paper and the MCBS as on opportunity to learn about processing medical information. In the 2016 paper they found that when looking from 1996 to 2010, a higher percentage of women's health care is paid by Medicaid, and a higher percentage of women's health care costs come from nursing homes. I pulled the MCBS data from the most recent year (2021) to see if these trends had changed. The MCBS data is weighted, so they recommend using the survey package for R to process it. I followed their instructions along with the definitions in their legend and recreated the tables from the 2016 paper with the data from 2021. One major limitation is that for 2021 the MCBS excluded information on people who had spent time in nursing homes from the publicly available data set. Still, I was able to detect some interesting new trends in increased medicine expenses and decreased hospital expenses. Below you will find my report as well as a repository of the files I used for my analysis.

Training

You can see a good portion of my training progress on my LinkedIn Page. Beyond that I've worked through the following resources: