Plot coefficients

Here is a super quick tutorial on plotting coefficients from OLS models. This is highly recommended for presenting coefficients in research papers and vital for presentations at conferences.

The first thing to do is to load the necessary packages.

library(tidyverse) # always bring in the tidy tools
library(haven) # to read in Stata .dta files
library(jtools) # contains the plotting procedures
library(broom.mixed) # separate package called on by plot_coefs

Let’s read in some educational data for our example.

mlmdata <- read_dta("https://stats.idre.ucla.edu/stat/examples/imm/imm10.dta")
glimpse(mlmdata)
## Rows: 260
## Columns: 19
## $ schid    <dbl> 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7~
## $ stuid    <dbl> 3, 8, 13, 17, 27, 28, 30, 36, 37, 42, 52, 53, 61, 64, 72, 83,~
## $ ses      <dbl> -0.13, -0.39, -0.80, -0.72, -0.74, -0.58, -0.83, -0.51, -0.56~
## $ meanses  <dbl> -0.4826087, -0.4826087, -0.4826087, -0.4826087, -0.4826087, -~
## $ homework <dbl> 1, 0, 0, 1, 2, 1, 5, 1, 1, 2, 1, 1, 1, 2, 1, 4, 1, 2, 1, 1, 1~
## $ white    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ parented <dbl> 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 1, 2, 3, 3, 1, 3, 3, 3~
## $ public   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ ratio    <dbl> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1~
## $ percmin  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ math     <dbl> 48, 48, 53, 42, 43, 57, 33, 64, 36, 56, 48, 48, 44, 35, 50, 3~
## $ sex      <dbl> 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2~
## $ race     <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4~
## $ sctype   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ cstr     <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ scsize   <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3~
## $ urban    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ region   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ schnum   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~

Create a dummy variable for female. Estimate two models: one predicting mathematics scores and one predicting the amount of homework done per week.

mlmdata$female <- ifelse(mlmdata$sex == 2, 1, 0)
ols1 <- lm(math ~ ses + female + white, mlmdata)
ols2 <- lm(homework ~ ses + female + white, mlmdata)

summary(ols1)
## 
## Call:
## lm(formula = math ~ ses + female + white, data = mlmdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6697  -6.0448  -0.4286   6.2627  22.7777 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51.74425    1.26629  40.863   <2e-16 ***
## ses          7.11861    0.61874  11.505   <2e-16 ***
## female       0.07826    1.08977   0.072    0.943    
## white        0.05375    1.34520   0.040    0.968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.778 on 256 degrees of freedom
## Multiple R-squared:  0.3858, Adjusted R-squared:  0.3786 
## F-statistic: 53.61 on 3 and 256 DF,  p-value: < 2.2e-16
summary(ols2)
## 
## Call:
## lm(formula = homework ~ ses + female + white, data = mlmdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8782 -0.9882 -0.3111  0.9573  5.0651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.32032    0.20375  11.388  < 2e-16 ***
## ses          0.73129    0.09956   7.346 2.74e-12 ***
## female       0.17143    0.17535   0.978   0.3292    
## white       -0.45126    0.21645  -2.085   0.0381 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.412 on 256 degrees of freedom
## Multiple R-squared:  0.1788, Adjusted R-squared:  0.1692 
## F-statistic: 18.58 on 3 and 256 DF,  p-value: 6.15e-11

Here we can see that socio-economic status is a strong positive predictor of both math scores and homework. Also, non-white students do more homework than white students.

Now we’ll plot out these coefficients.

plot_coefs(ols1, ols2,
           coefs = c("SES" = "ses","Female" = "female","White" = "white"),
           scale = FALSE, # generates standardized coefficients when TRUE
           robust = FALSE, # robust standard errors when TRUE
           legend.title = "Academics",
           model.names = c("Math scores","Homework")) 

As you can see, this is a visually appealing way to present regression coefficients side-by-side. Significance is observed for coefficients that have error bars which fail to cross the dotted line.

In this case, the SES error bars are far to the right of the dotted line, indicating positive and significant relationships. The error bar for the White coefficient in the Homework model is just to the left of the dotted line, indicating a negative and significant relationship. All other coefficient error bars cross the dotted line, indicating insignificant relationships.

Further documentation on this procedure: https://jtools.jacob-long.com/reference/plot_summs.html