Feature Engineering and Exploratory Data Analysis
For this week, I did Feature Engineering and Exploratory Data Analysis (EDA) , using the Alabama data as a prototype . I created the the custom features that define the originality of the project, and then did statistical and visual analyses to validate my core hypotheses . In the Feature Engineering phase, I utilized a Python program to merge the block-group-level broadband speed data (summarized to show max speed) with the median income data and the U.S. Census shapefiles, successfully creating a unified GeoDataFrame . From this GeoDataFrame, I derived three critical features: The Socioeconomic Vulnerability Index (SVI) : Calculated by inverting the normalized median income, this score quantifies digital inequity risk, with values closer to 1 indicating high vulnerability (low income). Neighbor Average Speed : This spatial feature captures the average broadband speed of all adjacent block groups, serving as a powerful proxy for regional infrastructure investment. ...