Here is a super quick tutorial on plotting coefficients from OLS models. This is highly recommended for presenting coefficients in research papers and vital for presentations at conferences.
The first thing to do is to load the necessary packages.
library(tidyverse) # always bring in the tidy tools
library(haven) # to read in Stata .dta files
library(jtools) # contains the plotting procedures
library(broom.mixed) # separate package called on by plot_coefs
Let’s read in some educational data for our example.
mlmdata <- read_dta("https://stats.idre.ucla.edu/stat/examples/imm/imm10.dta")
glimpse(mlmdata)
## Rows: 260
## Columns: 19
## $ schid <dbl> 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7~
## $ stuid <dbl> 3, 8, 13, 17, 27, 28, 30, 36, 37, 42, 52, 53, 61, 64, 72, 83,~
## $ ses <dbl> -0.13, -0.39, -0.80, -0.72, -0.74, -0.58, -0.83, -0.51, -0.56~
## $ meanses <dbl> -0.4826087, -0.4826087, -0.4826087, -0.4826087, -0.4826087, -~
## $ homework <dbl> 1, 0, 0, 1, 2, 1, 5, 1, 1, 2, 1, 1, 1, 2, 1, 4, 1, 2, 1, 1, 1~
## $ white <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ parented <dbl> 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 1, 2, 3, 3, 1, 3, 3, 3~
## $ public <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ ratio <dbl> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1~
## $ percmin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ math <dbl> 48, 48, 53, 42, 43, 57, 33, 64, 36, 56, 48, 48, 44, 35, 50, 3~
## $ sex <dbl> 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2~
## $ race <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4~
## $ sctype <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ cstr <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ scsize <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3~
## $ urban <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ region <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ schnum <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
Create a dummy variable for female. Estimate two models: one predicting mathematics scores and one predicting the amount of homework done per week.
mlmdata$female <- ifelse(mlmdata$sex == 2, 1, 0)
ols1 <- lm(math ~ ses + female + white, mlmdata)
ols2 <- lm(homework ~ ses + female + white, mlmdata)
summary(ols1)
##
## Call:
## lm(formula = math ~ ses + female + white, data = mlmdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.6697 -6.0448 -0.4286 6.2627 22.7777
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.74425 1.26629 40.863 <2e-16 ***
## ses 7.11861 0.61874 11.505 <2e-16 ***
## female 0.07826 1.08977 0.072 0.943
## white 0.05375 1.34520 0.040 0.968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.778 on 256 degrees of freedom
## Multiple R-squared: 0.3858, Adjusted R-squared: 0.3786
## F-statistic: 53.61 on 3 and 256 DF, p-value: < 2.2e-16
summary(ols2)
##
## Call:
## lm(formula = homework ~ ses + female + white, data = mlmdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8782 -0.9882 -0.3111 0.9573 5.0651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.32032 0.20375 11.388 < 2e-16 ***
## ses 0.73129 0.09956 7.346 2.74e-12 ***
## female 0.17143 0.17535 0.978 0.3292
## white -0.45126 0.21645 -2.085 0.0381 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.412 on 256 degrees of freedom
## Multiple R-squared: 0.1788, Adjusted R-squared: 0.1692
## F-statistic: 18.58 on 3 and 256 DF, p-value: 6.15e-11
Here we can see that socio-economic status is a strong positive predictor of both math scores and homework. Also, non-white students do more homework than white students.
Now we’ll plot out these coefficients.
plot_coefs(ols1, ols2,
coefs = c("SES" = "ses","Female" = "female","White" = "white"),
scale = FALSE, # generates standardized coefficients when TRUE
robust = FALSE, # robust standard errors when TRUE
legend.title = "Academics",
model.names = c("Math scores","Homework"))
As you can see, this is a visually appealing way to present regression coefficients side-by-side. Significance is observed for coefficients that have error bars which fail to cross the dotted line.
In this case, the SES error bars are far to the right of the dotted line, indicating positive and significant relationships. The error bar for the White coefficient in the Homework model is just to the left of the dotted line, indicating a negative and significant relationship. All other coefficient error bars cross the dotted line, indicating insignificant relationships.
Further documentation on this procedure: https://jtools.jacob-long.com/reference/plot_summs.html