Posts

Showing posts from September, 2025

Feature Engineering and Exploratory Data Analysis

Image
  For this week, I did  Feature Engineering and Exploratory Data Analysis (EDA) , using the Alabama data as a prototype . I created the the custom features that define the originality of the project, and then did statistical and visual analyses to validate my core hypotheses . In the  Feature Engineering phase, I utilized a Python program to merge the block-group-level broadband speed data (summarized to show max speed) with the median income data and the U.S. Census shapefiles, successfully creating a unified GeoDataFrame . From this GeoDataFrame, I derived three critical features: The Socioeconomic Vulnerability Index (SVI) : Calculated by inverting the normalized median income, this score quantifies digital inequity risk, with values closer to 1 indicating high vulnerability (low income). Neighbor Average Speed : This spatial feature captures the average broadband speed of all adjacent block groups, serving as a powerful proxy for regional infrastructure investment. ...

Spatial Joins in QGIS

Image
This week,  I visualized broadband availability across Alabama at the census block group level using FCC broad band data. Here’s how I did it. Step 1: Clean and Prepare the Data I started with two FCC datasets: cable and fiber broadband availability, and a Census dataset for median income. Merge FCC datasets: Using Python and pandas, I combined the cable and fiber datasets, extracting the first 12 digits of the block_geoid to get the block group level , and calculated the maximum advertised download speed for each block group. Clean Census data: From the Census median income file, using python, I ignored rows with missing values ( -666666666 ) and constructed a 12-digit census block group GEOID by combining state, county, tract, and block group codes. Output: Two clean CSVs — one for max download speed per block group, and one for median income per block group. Step 2: Load Data into QGIS Loaded the Alabama census block group shapefile ( tl_2024_01_bg.shp ...

Downloading data from census.gov's ACS

Image
       The census.gov website has ACS ( American Community survey ) data that has a wealth of information about demographics, median household income etc. It provides API support to download the data. In this tutorial, I will explain how to download the data using API calls, and also creating a python script to automate downloading data with multiple API calls. step1 : get a API key. The API key is needed to make the API calls. sign up and get the key in this URL:  https://api.census.gov/data/key_signup.html step2 : assemble the API call URL. depending upon the information needed, we have to assemble the URL needed to do the API call.  for example: https://api.census.gov/data/2023/acs/acs5?get=NAME,B19013_001E&for=block%20group:*&in=state:36%20county:119&key=xxxxxxxxxxxxxxxxx in the above API Call url,  B19013_001E is the data point for household median income.  in=state:36%20county:119 represents NY state and Westchester county. &...

MSDS Practicum Project - US Broadband divide

Image
  The Term US Broadband divide indicates the unavailability of broadband internet to all americans. According to FCC, 100 mbps speed is the threshold to be called as broadband. The site Ookla.net collects speed tests and shows 60% of the tests > 100 mbps download speeds The data from the FCC- BDC as of December 2024 demonstrates a steady increase in locations with access to high-speed internet. According to the FCC, an additional one million locations gained access to 100/20 Mbps speeds between June and December 2024. Additionally, nearly 7 million additional locations gained access to even faster gigabit speeds (1 Gbps download and 100 Mbps upload) in the same time frame. While the overall trend is positive, the persistent percentage of unserved and underserved locations represents a significant population that is still being left behind The good news is over the years with expanding cable infrastructure and advancement in other technologies - Terrestrial fixed wireless, Cell...