Data Trimming vs Winsorization
During the data preprocessing stage, I came across several census block groups reporting broadband speeds exceeding 1500 Mbps, some even as high as 7500 Mbps.These extreme values had the potential to skew my statistical models, especially when analyzing national-level patterns.
At this point, I had two choices. One option was data trimming, which involves removing extreme outliers entirely from the dataset. This method ensured removing the very high block groups that skewed my analysis. However, trimming also means losing some geographic coverage. My total block groups went from about 240,000 to about 55,000. something I wanted to avoid in a spatial study, since this approach was causing black spaces in the QGIS maps.
I ultimately chose to apply Winsorization. It provided a balanced approach: retaining every block group while ensuring that the modeled broadband speeds stayed within a realistic range. This small but important preprocessing decision improved both the stability and interpretability of my final analysis.
Comments
Post a Comment