Adding Minority % to Social Vulnerability Index Calulation
During my Week6 work of my US broandband divide analysis project, I encountered an issue. The initial correlation matrix for the whole of the USA showed virtually no correlation (-0.00) between the Social Vulnerability Index (SVI) and the maximum download speed. This was puzzling, especially since the earlier Alabama-level analysis suggested a negative correlation. This led me to realize that what worked at a state level didn't necessarily hold true at a national, aggregate level.
To address this, I hypothesized that adding another socioeconomic metric might help uncover the hidden associations. I felt that the percentage of the minority (non-white) population could be a significant factor affecting social vulnerability (SVI) and, consequently, broadband access. To test this, I downloaded the necessary U.S. Census Bureau data—Total Population and White-alone population at the block group level—and used these metrics to calculate the minority percentage for each block group.
The next step was to integrate this new feature. I adjusted my feature engineering Python script to calculate and incorporate the minority percentage alongside median income in the SVI calculation. While the resulting correlation matrix still didn't produce the strong negative link I anticipated between `max_down_speed` and `SVI` (it was a low positive 0.02), the inclusion of `minority_pct` was a crucial step. It helped enrich the dataset and provided a more detailed view of the social makeup of the communities, which proved invaluable in the subsequent clustering and supervised modeling stages.
Ultimately, this adjustment to the EDA provided a richer set of features for my K-Means clustering.The later clustering analysis identified a critical "underserved" archetype (Cluster #2) characterized by High SVI, Low income, and a High minority percentage(79.6%). This cluster had a max_down_speed of 85.6 mbps, which is below the "broadband" speed. This successful identification of the most vulnerable target group for digital inclusion programs validated the decision to bring `minority_pct` into the analysis.
Comments
Post a Comment