This tutorial takes on two super important tasks. First, how can we generate tables that present side-by-side results from regression analyses? Nested tables help us to better understand social processes by examining change in coefficients across models. Second, how can we export those tables to a format that is useful for interpretation and presentation? Conducting analyses means running a plethora of models, so having a quick and easy way to export these results is indispensable. The R package “huxtable” helps us to accomplish both goals.
The first thing to do is to load the necessary packages.
library(tidyverse) # always load the tidy tools
library(haven) # to read in Stata .dta files
library(huxtable) # package for estimating margins
Let’s read in some educational data for our example.
mlmdata <- read_dta("https://stats.idre.ucla.edu/stat/examples/imm/imm10.dta")
glimpse(mlmdata)
## Rows: 260
## Columns: 19
## $ schid <dbl> 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7472, 7~
## $ stuid <dbl> 3, 8, 13, 17, 27, 28, 30, 36, 37, 42, 52, 53, 61, 64, 72, 83,~
## $ ses <dbl> -0.13, -0.39, -0.80, -0.72, -0.74, -0.58, -0.83, -0.51, -0.56~
## $ meanses <dbl> -0.4826087, -0.4826087, -0.4826087, -0.4826087, -0.4826087, -~
## $ homework <dbl> 1, 0, 0, 1, 2, 1, 5, 1, 1, 2, 1, 1, 1, 2, 1, 4, 1, 2, 1, 1, 1~
## $ white <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ parented <dbl> 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 1, 2, 3, 3, 1, 3, 3, 3~
## $ public <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ ratio <dbl> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1~
## $ percmin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ math <dbl> 48, 48, 53, 42, 43, 57, 33, 64, 36, 56, 48, 48, 44, 35, 50, 3~
## $ sex <dbl> 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2~
## $ race <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4~
## $ sctype <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ cstr <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ scsize <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3~
## $ urban <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ region <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ schnum <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
Create a dummy variable for female. Estimate a model predicting mathematics scores.
mlmdata$female <- ifelse(mlmdata$sex == 2, 1, 0)
ols <- lm(math ~ homework + ses + female + white, mlmdata)
summary(ols)
##
## Call:
## lm(formula = math ~ homework + ses + female + white, data = mlmdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.9356 -5.8326 0.5589 5.2318 23.8706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.7198 1.4598 32.005 < 2e-16 ***
## homework 2.1654 0.3648 5.936 9.51e-09 ***
## ses 5.5351 0.6394 8.656 5.61e-16 ***
## female -0.2930 1.0254 -0.286 0.775
## white 1.0309 1.2741 0.809 0.419
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.244 on 255 degrees of freedom
## Multiple R-squared: 0.4604, Adjusted R-squared: 0.4519
## F-statistic: 54.39 on 4 and 255 DF, p-value: < 2.2e-16
Homework and socio-economic status are positively and significantly associated with math scores. Gender and race differences in math scores are not statistically significant.
Let’s use huxreg to generate a table display these findings.
huxreg(ols)
(1) | |
---|---|
(Intercept) | 46.720 *** |
(1.460) | |
homework | 2.165 *** |
(0.365) | |
ses | 5.535 *** |
(0.639) | |
female | -0.293 |
(1.025) | |
white | 1.031 |
(1.274) | |
N | 260 |
R2 | 0.460 |
logLik | -914.862 |
AIC | 1841.724 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
The table presents the information in a format that is common to see in social science journals. Before we get to exporting, let’s take a look at some of the options.
One can adjust various aspects of the table. First, it is possible to decide on the summary statistics. In this case, we are asking R for only the number of observations and the R-squared value, with the original quotations referring to the label for the statistic (e.g., ‘N’) and the name of the object (e.g., “nobs”) in the summary list.
Second, we can control the note at the bottom of the table. In this case, the script asks for a note about the stars for significance and adds text describing the data.
Third, we can adjust the coefficient labels. Again, the first quoted text describes what we want the label to be (e.g., “Homework (hours)”), while the second calls on the variable name (“homework”).
huxreg(ols,
statistics = c('N' = "nobs", 'R-squared' = "r.squared"),
note = "{stars}
Source: UCLA example education data",
coefs = c("Homework (hours)" = "homework",
"Socio-economic status (SES)" = "ses",
"Female" = "female",
"White" = "white"))
(1) | |
---|---|
Homework (hours) | 2.165 *** |
(0.365) | |
Socio-economic status (SES) | 5.535 *** |
(0.639) | |
Female | -0.293 |
(1.025) | |
White | 1.031 |
(1.274) | |
N | 260 |
R-squared | 0.460 |
*** p < 0.001; ** p < 0.01; * p < 0.05 Source: UCLA example education data |
Looks nice, right!
There are several options for exporting these tables, but probably the most useful option is exporting to a Word document.
The script below is identical to the previous script, with one important exception. After generating the table, I added a piping option (%>%) in order to export the table to a Word doc via the quick_docx command. Note that you will likely need to set a working directory in order to save this file to wherever you want it to go.
huxreg(ols,
statistics = c('N' = "nobs", 'R-squared' = "r.squared"),
note = "{stars}
Source: UCLA example education data",
coefs = c("Homework (hours)" = "homework",
"Socio-economic status (SES)" = "ses",
"Female" = "female",
"White" = "white")) %>%
quick_docx(file = "example_huxreg.docx")
The last thing I want to show is how to generate and export a second model to appear alongside the first. Below I save the results from a model that is the same as the initial ols model, except that it includes an interaction between SES and white.
Once these results are saved, we need to generate a list including the name of the two models. Then we can feed that list into the huxreg command to generate the nested table. Below I display what it looks like before adding the export option.
ols2 <- lm(math ~ homework + ses + female + white + ses:white, mlmdata)
models <- list("Model 1" = ols, "Model 2" = ols2)
huxreg(models,
statistics = c('N' = "nobs", 'R-squared' = "r.squared"),
note = "{stars}
Source: UCLA example education data",
coefs = c("Homework (hours)" = "homework",
"Socio-economic status (SES)" = "ses",
"Female" = "female",
"White" = "white",
"SES*white" = "ses:white"))
Model 1 | Model 2 | |
---|---|---|
Homework (hours) | 2.165 *** | 2.102 *** |
(0.365) | (0.359) | |
Socio-economic status (SES) | 5.535 *** | 2.411 * |
(0.639) | (1.171) | |
Female | -0.293 | -0.699 |
(1.025) | (1.016) | |
White | 1.031 | 3.129 * |
(1.274) | (1.417) | |
SES*white | 4.217 ** | |
(1.333) | ||
N | 260 | 260 |
R-squared | 0.460 | 0.481 |
*** p < 0.001; ** p < 0.01; * p < 0.05 Source: UCLA example education data |
And now let’s run the same command with the export option to create a new Word document.
huxreg(models,
statistics = c('N' = "nobs", 'R-squared' = "r.squared"),
note = "{stars}
Source: UCLA example education data",
coefs = c("Homework (hours)" = "homework",
"Socio-economic status (SES)" = "ses",
"Female" = "female",
"White" = "white",
"SES*white" = "ses:white")) %>%
quick_docx(file = "example_huxreg2.docx")
With the output exported to Word, one can make any number formatting changes to the table.
For more on this package, I recommend the following bookdown chapter. Note that you can use huxtable to generate descriptive statistics tables as well: https://bookdown.org/sarahwerth2024/RegressionLabsBook/lab-5-r.html