Introduction

Column

Abstract

This project will analyze how county-level conditions impact upward economic mobility for children in low-income families, specifically families who fall in the twenty-fifth percentile in income in the United States. The data used in this project is from the 1990 birth cohort. The main goal of the analysis is to identify the county-level characteristics associated with higher economic mobility, and to investigate if growing up in a poorer county inherently means lower upward mobility. The analysis will also seek to identify the county-level characteristics that are best at offsetting the negative mobility effects of poverty. The primary tool for these analyses is a multiple linear regression model, using predictive variables of two datasets from the Opportunity Insights organization to predict the response variable: Mean percentile rank in the national distribution of household income at age 27 for children whose parents are at the 25th percentile of national income, pooled across races and genders. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects.

Column

Research Questions

  • Which county-level characteristics are associated with higher income mobility for children from low-income families?

  • Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

  • How do county-level income mobility outcomes vary across the United States geographically? Are there notable regional patterns in upward mobility for children from low-income families?

Source

The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I’m using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.

Column

Background/Significance

Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.

Data

Column

Variables Used

Response variable:

  • kfr_pooled_pooled_p25: Mean percentile rank, relative to other children born the same year, in the national distribution of household income at age 27, for children whose parents are at the 25th percentile of national income, pooled across races and genders.

Explanatory variables:

  • emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.

  • hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.

  • poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).

  • frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor’s or higher), pooled across races/genders, 1990.

  • singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.

  • share_black1990: Fraction of population identified as Black in the 1990 Census.

  • foreign_share1990: Fraction of residents who are foreign-born in 1990.

  • gini1990: Measures income inequality for the county in 1990.

  • pop_pooled1990: Total county population in 1990.

Data Cleaning

Making the dataset

I began by importing two county‑level datasets from the Opportunity Atlas: one containing mobility outcomes by cohort and one containing county covariates. I restricted the sample to the 1990 birth cohort, to keep a single, comparable group of children across counties.

From the outcomes file, I kept only the variables needed for the analysis: state and county identifiers, names, and the mean income rank at age 27 for children whose parents were at the 25th percentile (kfr_pooled_pooled_p25). From the covariates file, I selected county characteristics measured around 1990, including employment rate, median household income, poverty share, college‑educated share, share of single‑parent families, racial composition, immigrant share, inequality (Gini), and population.

Then I merged the two datasets using state and county FIPS codes and checked for missing values across all variables. Observations with any missing values were removed using complete‑case analysis, resulting in a cleaned dataset (df_complete) with only counties that have non‑missing information on both mobility and covariates. This cleaned dataset is used for all descriptive analysis and regression models.

Complete‑case analysis (dropping rows with any missing values) simplifies the workflow and ensures that all variables are observed for every county. However, it may reduce the sample size and can introduce bias if the counties with missing data differ systematically from those with complete data (for example, if poorer or smaller counties are more likely to have missing covariates). The results primarily describe counties with relatively complete economic and demographic records.

EDA

Column

Outcome Distribution

The distribution of economic mobility is roughly bell‑shaped and centered a little below the middle of the national income distribution. Most counties cluster between about the 40th and 55th percentiles, with relatively few counties at very low or very high mobility. This supports treating mobility as a continuous outcome and suggests that, while there is meaningful variation across counties, extreme mobility outcomes are uncommon.

Mobility and County Income

This scatterplot shows that most counties cluster between about $30,000 and $70,000 in median income and around the 40th–50th percentiles of mobility. There is no strong upward or downward trend: richer counties do not consistently have higher mobility, and some relatively low‑income counties achieve fairly high mobility. This suggests that median income alone is not a strong predictor of upward mobility, motivating the need to include other county characteristics in the regression model.

Summary Statistic Table

Summary Statistic Table
Variable Mean SD Min Max
Income mobility (p25) 0.459 0.058 0.203 0.917
Employment rate (27) 0.680 0.077 0.307 0.863
Median income 59005.645 16275.520 21125.921 145716.012
Poverty rate 0.167 0.079 0.022 0.600
College-educated share 0.135 0.066 0.037 0.534
Single-parent share 0.203 0.067 0.048 0.602
Black population share 0.089 0.145 0.000 0.862
Foreign-born share 0.718 0.152 0.135 0.972
Gini (inequality) 0.424 0.038 0.271 0.592
Population 79891.963 264827.279 675.000 8863164.000

Correlation Among Predictors

Correlation Heatmap

Strongest correlations:

  • Median household and both employment rates and share of college-educated adults
  • Employment and education
  • Poverty and both employment and education

Map of U.S.

Methods

Row

The Model

Modeling Approach

This project uses multiple linear regression to model the relationship between county-level characteristics and economic mobility for children from low-income families. The response variable is the mean national household income percentile at age 27 for children whose parents were at the 25th percentile of the income distribution (kfr_pooled_pooled_p25). This modeling approach is appropriate because the response variable is continuous, and the goal is to quantify how it changes with several county-level characteristics measured on numeric scales, such as poverty rate, employment rate, education, and income inequality. This method allows estimating the expected difference in economic mobility associated with a one-unit change in each predictor while holding the others constant.

Regression Model

The first regression model in my analysis was used to estimate a child’s mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income.

Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.

The model is given by:

kfr_pooled_pooled_p25 = 0.5079 + 0.1875 * emp_pooled1990 - 1.044e-06 * hhinc_median_pooled1990 + 0.03723 * poor_share_pooled1990 + 0.1123 * frac_coll_pooled1990 - 0.4581 * singlepar_pooled1990 - 0.05009 * share_black1990 + 0.003764 * foreign_share1990 - 0.09823 * gini1990 + 7.532e-09 * pop_pooled1990

Row

Residuals vs. Fitted

Normal Q–Q

Scale–Location

Residuals vs Leverage

Column

Interpretation

The diagnostic plots for the regression model suggest that the linear model is a reasonable approximation for these data.

  • In the Residuals vs Fitted plot, residuals are scattered around zero but with a slight curved pattern, especially at lower and higher fitted values. This suggests some mild nonlinearity. The linear model may not capture the relationship perfectly across the whole range of predicted mobility. The vertical spread of residuals is fairly similar across fitted values, so heteroskedasticity does not appear severe, though some larger residuals occur at the extremes.

  • In the Normal Q–Q plot, most points follow the reference line closely, with heavier tails at both extremes, telling us that the residuals are approximately normal but with a few more extreme counties than the model would predict under perfect normality.

  • The Scale–Location plot shows a relatively even spread of standardized residuals across fitted values with only a gentle upward trend in the red line, again suggesting that any non-constant variance is modest.

  • The Residuals vs Leverage plot identifies a few counties with higher leverage and larger residuals, but almost all points lie well inside the Cook’s distance contours, so no single county appears to dominate the fit. Taken together, these diagnostics support using the multiple regression model for interpretation, while acknowledging that p‑values and confidence intervals should be viewed as approximate due to mild tail non-normality and slightly changing variance.

  • Despite these mild violations, the linear model remains useful for describing broad relationships among county characteristics and economic mobility. The departures from ideal assumptions, slight nonlinearity, and heavier tails mainly affect the precision of coefficient estimates rather than the overall direction or relative importance of predictors. Therefore, the model’s findings should be interpreted as approximate trends rather than exact causal effects, yet they remain informative and appropriate for addressing the research questions.

Results

Column

Research question 1

Which county-level characteristics are associated with higher income mobility for children from low-income families?

According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.

Regression Coefficient Table

term estimate std.error statistic p.value
(Intercept) 0.508 0.020 25.242 0.000
emp_pooled1990 0.187 0.015 12.329 0.000
hhinc_median_pooled1990 0.000 0.000 -11.390 0.000
poor_share_pooled1990 0.037 0.022 1.688 0.091
frac_coll_pooled1990 0.112 0.017 6.704 0.000
singlepar_pooled1990 -0.458 0.018 -25.735 0.000
share_black1990 -0.050 0.008 -6.605 0.000
foreign_share1990 0.004 0.006 0.679 0.497
gini1990 -0.098 0.032 -3.047 0.002
pop_pooled1990 0.000 0.000 2.588 0.010

Research Question 2

Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

To answer this question, I estimated regression models both with and without college-educated share and employment rate and used AIC to compare model fit. The AIC spiked when these variables were removed from the model, showing that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty on economic mobility.

AIC Comparison of Regression Models
Model Description AIC
Model 1 All predictors -10901.95
Model 2 No college, no employment -10658.01

As shown in the table, the AIC is over 200 units lower in the full model, meaning adding education and employment greatly improves explanatory power.

According to the bar chart below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success.

Research Question 3

How do county-level income mobility outcomes vary across the United States geographically? Are there notable regional patterns in upward mobility for children from low-income families?

Map of Mean Economic Mobility (p25) by State

The map shows clear geographic variation in upward mobility across the United States. Children from low‑income families tend to experience the highest average mobility in the Great Plains, Upper Midwest, and parts of the Mountain West, where states are shaded in lighter colors. In contrast, much of the Southeast and parts of the Rust Belt show darker shades, indicating lower mobility outcomes. The coasts are more mixed, with pockets of both high and low mobility. Overall, a child’s chances of moving up economically depend strongly on the region in which they grow up.

Conclusion

Column

Discussion

In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with more job opportunities and higher levels of education.

For Research Question 1, the regression results suggest that higher employment rates and a larger share of college-educated adults are strong positive predictors of mobility, while higher poverty and inequality are linked to lower mobility. For Research Question 2, the AIC comparison shows that adding education and employment greatly improves model fit relative to a poverty-only model, indicating that strong labor markets and education can partly offset the disadvantages of growing up in a poorer county. For Research Question 3, the geographic patterns in mobility show that upward mobility is not evenly distributed across the United States: children from low-income families tend to face better prospects in parts of the Great Plains, Upper Midwest, and Mountain West than in much of the Southeast and Rust Belt, underscoring how place itself shapes opportunity.

However, the analysis is limited by cross-sectional, observational data and missing factors such as school quality or neighborhood conditions, so we cannot make strong causal claims. Future work could use richer neighborhood measures and causal methods to better understand mechanisms. These findings still point toward policies that strengthen local labor markets and expand educational opportunities as promising ways to promote upward mobility.

Column

About the Author

My name is Scott Robbins, and I am currently a Senior at the University of Dayton pursuing a Bachelor of Arts in Economics with a minor in Data Analytics. Over the summer I worked as a sales intern at a crane rental company. After graduation, I hope to work in data analytics or economic research.

References

Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf

Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf

---
title: "Drivers of Economic Mobility, by Scott Robbins"
output: 
  flexdashboard::flex_dashboard:
    theme: simplex

    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(broom)
library(knitr)
library(corrplot)
library(tidyverse)
library(dplyr)
library(MASS)
library(ggplot2)
library(maps)
library(gridExtra)
library(plotly)
df1 <- read_csv("~/Downloads/county_by_cohort_estimates.csv")
df2 <- read_csv("~/Downloads/Table_8_county_covariates.csv")
outcomes_small <- df1 %>%

  filter(cohort == 1990) %>%
  dplyr::select(state, county, state_name, county_name,
    kfr_pooled_pooled_p25)

covars_small <- df2 %>%
  dplyr::select(
    state, county,
    emp_pooled1990,
    hhinc_median_pooled1990,
    poor_share_pooled1990,
    frac_coll_pooled1990,
    singlepar_pooled1990,
    share_black1990,
    foreign_share1990,
    gini1990,
    pop_pooled1990
  )

df <- outcomes_small %>%
  inner_join(covars_small, by = c("state", "county"))

colSums(is.na(df))

df_complete <- df[complete.cases(df), ]

colSums(is.na(df_complete))
model1 <- lm(kfr_pooled_pooled_p25 ~ emp_pooled1990 + hhinc_median_pooled1990
                       + poor_share_pooled1990+frac_coll_pooled1990+singlepar_pooled1990
                       +share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete)
model2 <- lm(kfr_pooled_pooled_p25 ~ hhinc_median_pooled1990
            + poor_share_pooled1990+singlepar_pooled1990
            +share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete) 

quant_vars <- c("emp_pooled1990", "hhinc_median_pooled1990", "poor_share_pooled1990", 
                "frac_coll_pooled1990", "singlepar_pooled1990", "share_black1990", 
                "foreign_share1990", "gini1990", "pop_pooled1990")

stepwise_aic <- stepAIC(model1, direction = "both", trace = TRUE)
stepwise_aic2 <- stepAIC(model2, direction = "both", trace = TRUE)
pretty_names <- c(
  "Income mobility (p25)",
  "Employment rate (27)",
  "Median income",
  "Poverty rate",
  "College-educated share",
  "Single-parent share",
  "Black population share",
  "Foreign-born share",
  "Gini (inequality)",
  "Population"
)
summary_long <- data.frame(
  Variable = c("kfr_pooled_pooled_p25", quant_vars),
  Mean = c(mean(df$kfr_pooled_pooled_p25, na.rm=T), 
           sapply(df[quant_vars], function(x) mean(x, na.rm=T))),
  SD = c(sd(df$kfr_pooled_pooled_p25, na.rm=T),
         sapply(df[quant_vars], function(x) sd(x, na.rm=T))),
  Min = c(min(df$kfr_pooled_pooled_p25, na.rm=T),
          sapply(df[quant_vars], function(x) min(x, na.rm=T))),
  Max = c(max(df$kfr_pooled_pooled_p25, na.rm=T),
          sapply(df[quant_vars], function(x) max(x, na.rm=T)))
)

summary_long$Variable <- pretty_names

summary_table <- df %>%
  dplyr::select(all_of(c("kfr_pooled_pooled_p25", quant_vars))) %>%
  summarise(across(everything(),
                   list(Mean = ~mean(., na.rm=TRUE),
                        SD = ~sd(., na.rm=TRUE),
                        Min = ~min(., na.rm=TRUE),
                        Max = ~max(., na.rm=TRUE)), 
                   .names = "{.col}_{.fn}"))

cor_matrix <- cor(df[, c("kfr_pooled_pooled_p25", quant_vars)], use = "complete.obs")
county_map <- map_data("county")
county_map <- county_map %>%
  mutate(
    state = tolower(region),
    county = tolower(subregion)
  )

df_map <- df %>%
  mutate(
    state = tolower(state_name),   
    county = tolower(county_name)
  )

plot_data <- inner_join(county_map, df_map, by = c("state", "county"))
coef_table <- tidy(model1)



aic1 <- AIC(model1)
aic2 <- AIC(model2)


aic_table <- data.frame(
  Model = c("Model 1", "Model 2"),
  Description = c("All predictors", "No college, no employment"),
  AIC = c(aic1, aic2)
)

main_effects <- data.frame(
  Variable = c("Poverty Rate", "College-Educated Share (age 25+)", "Emp. Rate at age 27"),
  Coefficient = c(
    coef(model1)[["poor_share_pooled1990"]],
    coef(model1)[["frac_coll_pooled1990"]],
    coef(model1)[["emp_pooled1990"]]
  )
)


us_states <- map_data("state")

us_states$region <- str_to_title(us_states$region)

df_group <- df_complete |>
  group_by(state_name) |>
  summarize(kfr_pooled_pooled_p25 = mean(kfr_pooled_pooled_p25))

map_df <- us_states %>%
  left_join(
    df_group,
    by = c("region" = "state_name")
  )

map_df$kfr_pooled_pooled_p25_percent <- round(map_df$kfr_pooled_pooled_p25*100, 2)


p1  <- ggplot(map_df, aes(long, lat)) +
  geom_polygon(aes(group = group,
                   fill = kfr_pooled_pooled_p25,
                   text = paste0(region, ":\n",
                                 kfr_pooled_pooled_p25_percent, " %")),
               colour = "white") +
  coord_fixed(1.3) +
  scale_fill_viridis_c(option = "C", name = "Average Mobility") +
  coord_fixed(1.3) +
  labs(
    title = "Average Economic Mobility by State",
  )  +
  theme_void()
```

Introduction
===
Column {data-width=330}
---

### Abstract
This project will analyze how county-level conditions impact upward economic mobility for children in low-income families, specifically families who fall in the twenty-fifth percentile in income in the United States. The data used in this project is from the 1990 birth cohort. The main goal of the analysis is to identify the county-level characteristics associated with higher economic mobility, and to investigate if growing up in a poorer county inherently means lower upward mobility. The analysis will also seek to identify the county-level characteristics that are best at offsetting the negative mobility effects of poverty. The primary tool for these analyses is a multiple linear regression model, using predictive variables of two datasets from the Opportunity Insights organization to predict the response variable: Mean percentile rank in the national distribution of household income at age 27 for children whose parents are at the 25th percentile of national income, pooled across races and genders. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects.


Column {data-width=330}
-----------------------------------------------------------------------

### Research Questions

* Which county-level characteristics are associated with higher income mobility for children from low-income families?

* Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

* How do county-level income mobility outcomes vary across the United States geographically? Are there notable regional patterns in upward mobility for children from low-income families?

### Source
The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I'm using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.


Column {data-width=340}
---

### Background/Significance

Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.







Data
===


Column{.tabset}
---






### Variables Used

#### Response variable: 
- kfr_pooled_pooled_p25: Mean percentile rank, relative to other children born the same year, in the national distribution of household income at age 27, for children whose parents are at the 25th percentile of national income, pooled across races and genders.

#### Explanatory variables:
- emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.

- hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.

- poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).

- frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor's or higher), pooled across races/genders, 1990.

- singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.

- share_black1990: Fraction of population identified as Black in the 1990 Census.

- foreign_share1990: Fraction of residents who are foreign-born in 1990.

- gini1990: Measures income inequality for the county in 1990.

- pop_pooled1990: Total county population in 1990.


### Data Cleaning

#### Making the dataset

I began by importing two county‑level datasets from the Opportunity Atlas: one containing mobility outcomes by cohort and one containing county covariates. I restricted the sample to the 1990 birth cohort, to keep a single, comparable group of children across counties.

From the outcomes file, I kept only the variables needed for the analysis: state and county identifiers, names, and the mean income rank at age 27 for children whose parents were at the 25th percentile (kfr_pooled_pooled_p25). From the covariates file, I selected county characteristics measured around 1990, including employment rate, median household income, poverty share, college‑educated share, share of single‑parent families, racial composition, immigrant share, inequality (Gini), and population.

Then I merged the two datasets using state and county FIPS codes and checked for missing values across all variables. Observations with any missing values were removed using complete‑case analysis, resulting in a cleaned dataset (df_complete) with only counties that have non‑missing information on both mobility and covariates. This cleaned dataset is used for all descriptive analysis and regression models.

Complete‑case analysis (dropping rows with any missing values) simplifies the workflow and ensures that all variables are observed for every county. However, it may reduce the sample size and can introduce bias if the counties with missing data differ systematically from those with complete data (for example, if poorer or smaller counties are more likely to have missing covariates). The results primarily describe counties with relatively complete economic and demographic records.



EDA
===



Column{.tabset}
---

### Outcome Distribution

#### 

```{r}
ggplot(df_complete, aes(x = kfr_pooled_pooled_p25)) +
geom_histogram(bins = 30, fill = "steelblue") +
theme_minimal() +
labs(
title = "Distribution of Economic Mobility",
x = "Average Economic Mobility",
y = "Number of counties"
)

```

#### 

The distribution of economic mobility is roughly bell‑shaped and centered a little below the middle of the national income distribution. Most counties cluster between about the 40th and 55th percentiles, with relatively few counties at very low or very high mobility. This supports treating mobility as a continuous outcome and suggests that, while there is meaningful variation across counties, extreme mobility outcomes are uncommon.


### Mobility and County Income

####

```{r}
ggplot(df_complete, aes(x = hhinc_median_pooled1990,
y = kfr_pooled_pooled_p25)) +
geom_point(alpha = 0.3) +
theme_minimal() +
labs(
title = "Mobility vs. Median Household Income",
x = "Median household income (2023 dollars)",
y = "Income percentile at age 27"
)
```

####

This scatterplot shows that most counties cluster between about $30,000 and $70,000 in median income and around the 40th–50th percentiles of mobility. There is no strong upward or downward trend: richer counties do not consistently have higher mobility, and some relatively low‑income counties achieve fairly high mobility. This suggests that median income alone is not a strong predictor of upward mobility, motivating the need to include other county characteristics in the regression model.






### Summary Statistic Table
```{r}
kable(
  summary_long,
  row.names = FALSE,
  digits = 3,
  caption = "Summary Statistic Table"
)
```

### Correlation Among Predictors

#### Correlation Heatmap
```{r}
corrplot(cor_matrix, method = "color")

```

#### Strongest correlations: 

- Median household and both employment rates and share of college-educated adults
- Employment and education
- Poverty and both employment and education

### Map of U.S.

#### 

```{r}
ggplotly(p1, tooltip = "text") 
```







Methods
===
Row {.tabset data-width=340}
---

### The Model

#### Modeling Approach

This project uses multiple linear regression to model the relationship between county-level characteristics and economic mobility for children from low-income families. The response variable is the mean national household income percentile at age 27 for children whose parents were at the 25th percentile of the income distribution (kfr_pooled_pooled_p25). This modeling approach is appropriate because the response variable is continuous, and the goal is to quantify how it changes with several county-level characteristics measured on numeric scales, such as poverty rate, employment rate, education, and income inequality. This method allows estimating the expected difference in economic mobility associated with a one-unit change in each predictor while holding the others constant.

#### Regression Model

The first regression model in my analysis was used to estimate a child's mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income. 

Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.

The model is given by:

kfr_pooled_pooled_p25 = 0.5079 +
  0.1875 * emp_pooled1990 -
  1.044e-06 * hhinc_median_pooled1990 +
  0.03723 * poor_share_pooled1990 +
  0.1123 * frac_coll_pooled1990 -
  0.4581 * singlepar_pooled1990 -
  0.05009 * share_black1990 +
  0.003764 * foreign_share1990 -
  0.09823 * gini1990 +
  7.532e-09 * pop_pooled1990

Row {.tabset data-width=333}
---

### Residuals vs. Fitted

```{r}
par(mfrow = c(1, 1))
plot(model1, which = 1)
```

### Normal Q–Q

```{r}
par(mfrow = c(1, 1))
plot(model1, which = 2)
```

### Scale–Location
```{r}
par(mfrow = c(1, 1))
plot(model1, which = 3)
```

### Residuals vs Leverage

```{r}
par(mfrow = c(1, 1))
plot(model1, which = 5)
```

Column {.tabset data-width=333}
---

### Interpretation
The diagnostic plots for the regression model suggest that the linear model is a reasonable approximation for these data. 

- In the Residuals vs Fitted plot, residuals are scattered around zero but with a slight curved pattern, especially at lower and higher fitted values. This suggests some mild nonlinearity. The linear model may not capture the relationship perfectly across the whole range of predicted mobility. The vertical spread of residuals is fairly similar across fitted values, so heteroskedasticity does not appear severe, though some larger residuals occur at the extremes.

- In the Normal Q–Q plot, most points follow the reference line closely, with heavier tails at both extremes, telling us that the residuals are approximately normal but with a few more extreme counties than the model would predict under perfect normality. 

- The Scale–Location plot shows a relatively even spread of standardized residuals across fitted values with only a gentle upward trend in the red line, again suggesting that any non-constant variance is modest.

- The Residuals vs Leverage plot identifies a few counties with higher leverage and larger residuals, but almost all points lie well inside the Cook’s distance contours, so no single county appears to dominate the fit. Taken together, these diagnostics support using the multiple regression model for interpretation, while acknowledging that p‑values and confidence intervals should be viewed as approximate due to mild tail non-normality and slightly changing variance.

- Despite these mild violations, the linear model remains useful for describing broad relationships among county characteristics and economic mobility. The departures from ideal assumptions, slight nonlinearity, and heavier tails mainly affect the precision of coefficient estimates rather than the overall direction or relative importance of predictors. Therefore, the model’s findings should be interpreted as approximate trends rather than exact causal effects, yet they remain informative and appropriate for addressing the research questions.

Results
===


Column{.tabset}
---
### Research question 1
#### Which county-level characteristics are associated with higher income mobility for children from low-income families?

According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.

#### Regression Coefficient Table
```{r}
kable(coef_table, digits = 3)
```


### Research Question 2

#### Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

To answer this question, I estimated regression models both with and without  college-educated share and employment rate and used AIC to compare model fit. The AIC spiked when these variables were removed from the model, showing that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty on economic mobility.

```{r}
kable(aic_table, caption = "AIC Comparison of Regression Models")
```

As shown in the table, the AIC is over 200 units lower in the full model, meaning adding education and employment greatly improves explanatory power. 

According to the bar chart below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success. 




```{r, fig.width= 10, fig.height=5}
ggplot(main_effects, aes(x = Variable, y = Coefficient, fill = Variable)) +
  geom_col(width = 0.7) +
  labs(title = "Main County-Level Effects on Upward Mobility",
       y = "Estimated Coefficient",
       x = "") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2") +
  geom_text(aes(label = round(Coefficient, 3)), vjust = -0.5)
```


### Research Question 3

#### How do county-level income mobility outcomes vary across the United States geographically? Are there notable regional patterns in upward mobility for children from low-income families?



#### Map of Mean Economic Mobility (p25) by State
```{r}
ggplotly(p1, tooltip = "text") 
```


The map shows clear geographic variation in upward mobility across the United States. Children from low‑income families tend to experience the highest average mobility in the Great Plains, Upper Midwest, and parts of the Mountain West, where states are shaded in lighter colors. In contrast, much of the Southeast and parts of the Rust Belt show darker shades, indicating lower mobility outcomes. The coasts are more mixed, with pockets of both high and low mobility. Overall, a child’s chances of moving up economically depend strongly on the region in which they grow up.


Conclusion
===
Column {data-width=250}
---
### Discussion

In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with more job opportunities and higher levels of education.

For Research Question 1, the regression results suggest that higher employment rates and a larger share of college-educated adults are strong positive predictors of mobility, while higher poverty and inequality are linked to lower mobility. For Research Question 2, the AIC comparison shows that adding education and employment greatly improves model fit relative to a poverty-only model, indicating that strong labor markets and education can partly offset the disadvantages of growing up in a poorer county. For Research Question 3, the geographic patterns in mobility show that upward mobility is not evenly distributed across the United States: children from low-income families tend to face better prospects in parts of the Great Plains, Upper Midwest, and Mountain West than in much of the Southeast and Rust Belt, underscoring how place itself shapes opportunity.

However, the analysis is limited by cross-sectional, observational data and missing factors such as school quality or neighborhood conditions, so we cannot make strong causal claims. Future work could use richer neighborhood measures and causal methods to better understand mechanisms. These findings still point toward policies that strengthen local labor markets and expand educational opportunities as promising ways to promote upward mobility.

Column {data-width=330}
-----------------------------------------------------------------------
### About the Author

My name is Scott Robbins, and I am currently a Senior at the University of Dayton pursuing a Bachelor of Arts in Economics with a minor in Data Analytics. Over the summer I worked as a sales intern at a crane rental company. After graduation, I hope to work in data analytics or economic research. 


### References

Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf

Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf