Data Prep and EDA
Data Preparation & Exploratory Data Analysis (EDA)
Data Sources and Collection Methods
For this project, two primary datasets were gathered to support the goal of forecasting mental health service demand based on projected population growth in Colorado:
This project also uses the Low-Value Care (LVC) dataset from Colorado’s Affordability Dashboard. The dataset summarizes medical services that do not improve patient outcomes, add avoidable cost, and in some cases may cause harm. It enables stakeholders (communities, providers, payers, policy makers, and agencies) to identify where wasteful utilization occurs so efforts can be targeted to improve value. This data was requested from the Center for Improving Value in Healthcare for the “Colorado All Payer Claims Database (CO APDC). CIVHC is the administrator of Colorado All Payer Claims Database (CO APCD), the state’s most comprehensive health care claims database representing the majority of covered lives and payers. The CO APCD includes the percentage of the population represented in each county and information on race and ethnicity data, behavioral health services, dental code volume, and vision claims.
- Colorado Population Projections Dataset
- Source: Colorado Information Marketplace
- Download Format: CSV
- Download Raw Dataset
- U.S. Census Bureau API – ACS 5-Year Estimates
- API Source: https://www.census.gov/data/developers/data-sets.html
- API GET Example:
https://api.census.gov/data/2022/acs/acs5?get=NAME,B01003_001E&for=county:*&in=state:08&key=YOUR_API_KEY - View Python Notebook for API Collection
- **Low-Value Care Dataset Low-Value Care dataset (Excel) — download
Why These Datasets Were Chosen
Population projections provide a foundational view of where service demand is likely to grow, especially among specific age groups and geographic regions. Combining this with U.S. Census estimates allows for a more complete picture of baseline population and demographic trends across Colorado counties.
Raw Data Snapshots
Example from Colorado Population Projections CSV:

Example from Census API Dataset:

Data Cleaning and Preparation Steps
| Step | Description |
|---|---|
| Remove Null Values | Checked for missing or null population values; no null values found. |
| Check for Mising Values | Ensure there aren’t unexpected null entries that need cleaning. |
| Generates Summary Statistics | Generates summary statistics like mean, min, max, and standard deviation to identify potential outliers or incorrect values. |
| Data Type Conversion | Converted numeric fields from string to integer for accurate calculations. |
| Sorting the Data | Orders all rows in DataFrame from largest to smallest population value. |
| Previewing Top Values | Displays the top 10 rows after sorting to check which counties have the highest reported populations. |
Exploratory Data Analysis (EDA)
Below are example visualizations that were created to explore population trends relevant to mental health service planning:
Least Populated Counties

Top 10 Counties by Population
.png)
Most Populated Counties

Population Projections by County and Year

Histogram of Total Population Values
.png)
Total Projected Population Over Time

Total Population by Year

Top 5 Counties by Population Share

Distribution of County Populations in Colorado

County Population Spread with Outliers
Summary Observations
Based on initial exploratory data analysis, Colorado’s population projections show notable growth concentrated in urban counties such as Denver, El Paso, and Adams, while many rural counties face either flat growth or population decline. This uneven distribution highlights future challenges for mental health service planning. Areas experiencing rapid growth may face increased demand for services that local systems are not currently equipped to handle, while rural areas may continue to struggle with limited resources despite smaller populations. Age distribution and population shifts revealed through both state-level projections and U.S. Census data suggest that specific demographic groups like working-age adults and older adults will drive much of this demand. These findings underline the importance of using predictive modeling tied to population growth as a proactive strategy for identifying future service gaps and ensuring equitable access to mental health care across Colorado’s diverse regions.
Linked Files and Resources
- Colorado Population Projections CSV
- Population_Projections_in_Colorado Code
- Colorado Population Projections Code
- Census API Python Notebook
