class: inverse, center, title-slide, middle # Creating publication-ready summary tables with {gtsummary} ### Emily C. Zabor #### RUG at HDSI #### December 15, 2022 <p align="center"><img src="Images/CC_hires_r.png" width=30%></p> --- # About me .left-column[ <img src="Images/UMN.jpg" width=50%> <img src="Images/msk-black-logo.png" width=95%> <img src="Images/CU.jpg" width=50%> <img src="Images/Taussig.jpg" width=75%> ] .right-column[ .medium[ * MS in Biostatistics from the **University of Minnesota** <br> <br> * 9 years as a Research Biostatistician at **Memorial Sloan Kettering Cancer Center** <br> * DrPH in Biostatistics from **Columbia University** <br> * Faculty Biostatistician at **Cleveland Clinic** ] ] --- class: inverse, center, middle # Background and Introduction --- # The reproducibility crisis .pull-left[ .large[ - Quality of medical research is often low - **Low quality code** in medical research part of the problem - Low quality code is more likely to **contain errors** - Reproducibility is often **cumbersome** and **time-consuming** ] ] .pull-right[ <p align="center"><img src="Images/reproducibility-graphic-online1.jpeg" width=90%></p> ] .footnote[Image source: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970; Slide source: http://www.danieldsjoberg.com/rmedicine-gtsummary] --- # Need reproducible, publication-ready tables .pull-left[ .large[ How do we get from **code** in R: ```r summary(df$age) table(df$sex) table(df$race) table(df$ethnicity) ``` ] ] .pull-right[ .large[ To **table** in a publication: <p align="center"><img src="Images/table1_example.png" width=90%> ] ] .small[ Brierley CK, Zabor EC, Komrokji RS, DeZern AE, Roboz GJ, Brunner AM, Stone RM, Sekeres MA, Steensma DP. Low participation rates and disparities in participation in interventional clinical trials for myelodysplastic syndromes. *Cancer.* 2020 Aug 7. doi: 10.1002/cncr.33105. PMID: 32767690. ] --- # {gtsummary} overview .pull-left[ .large[ * Create **tabular summaries** with sensible defaults but highly customizable * Types of summaries: - "Table 1"-types - Cross-tabulation - Regression models - Survival data - Survey data - Custom tables ] ] .pull-right[ .large[ * Report statistics from {gtsummary} tables **inline** in R Markdown * Stack and/or merge any table type * Use **themes** to standardize across tables * Choose from different **print engines** ] <img src="Images/gtsummary_logo.png" width=40% align="middle"> ] --- # Example dataset .pull-left[ .large[ * The `trial` dataset is included with {gtsummary} * Simulated dataset of baseline characteristics for 200 patients who receive Drug A or Drug B * Variables were assigned labels using the `labelled` package ] ] .pull-right[ ```r library(gtsummary) library(tidyverse) head(trial) |> gt::gt() ```
trt
age
marker
stage
grade
response
death
ttdeath
Drug A
23
0.160
T1
II
0
0
24.00
Drug B
9
1.107
T2
I
1
0
24.00
Drug A
31
0.277
T1
II
0
0
24.00
Drug A
NA
2.067
T3
III
1
1
17.64
Drug A
51
2.767
T4
III
1
1
16.43
Drug B
39
0.613
T4
I
0
1
15.64
] --- # Example dataset .large[ This presentation will use a subset of the variables. ] ```r sm_trial <- trial |> select(trt, age, grade, response) ```
Variable
Label
trt
Chemotherapy Treatment
age
Age
grade
Grade
response
Tumor Response
--- class: inverse, center, middle # tbl_summary() --- # Basic tbl_summary() .pull-left[ ```r sm_trial |> select(-trt) |> tbl_summary() ``` .medium[ - Four types of summaries: `continuous`, `continuous2`, `categorical`, and `dichotomous` - Statistics are `median (IQR)` for continuous, `n (%)` for categorical/dichotomous - Variables coded `0/1`, `TRUE/FALSE`, `Yes/No` treated as dichotomous - Lists `NA` values under "Unknown" - Label attributes are printed automatically ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_1.png" width=50%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r sm_trial |> tbl_summary( by = trt ) ``` - `by`: specifies a column variable for cross-tabulation ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2a.png" width=80%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r sm_trial |> tbl_summary( by = trt, type = age ~ "continuous2" ) ``` - `by`: specify a column variable for cross-tabulation - `type`: specify the summary type ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2b.png" width=80%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r sm_trial |> tbl_summary( by = trt, type = age ~ "continuous2", statistic = list( age ~ c("{mean} ({sd})", "{min}, {max}"), response ~ "{n} / {N} ({p}%)" ) ) ``` - `by`: specify a column variable for cross-tabulation - `type`: specify the summary type - `statistic`: customize the reported statistics ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2c.png" width=80%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r sm_trial |> tbl_summary( by = trt, type = age ~ "continuous2", statistic = list( age ~ c("{mean} ({sd})", "{min}, {max}"), response ~ "{n} / {N} ({p}%)" ), label = grade ~ "Pathologic grade" ) ``` - `by`: specify a column variable for cross-tabulation - `type`: specify the summary type - `statistic`: customize the reported statistics - `label`: change or customize variable labels ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2d.png" width=80%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r sm_trial |> tbl_summary( by = trt, type = age ~ "continuous2", statistic = list( age ~ c("{mean} ({sd})", "{min}, {max}"), response ~ "{n} / {N} ({p}%)" ), label = grade ~ "Pathologic grade", digits = age ~ 1 ) ``` - `by`: specify a column variable for cross-tabulation - `type`: specify the summary type - `statistic`: customize the reported statistics - `label`: change or customize variable labels - `digits`: specify the number of decimal places for rounding ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2e.png" width=80%></p> ] --- # Customize tbl_summary() output .large[**provide argument** = **select variables** ~ **give instructions**] ```r sm_trial |> tbl_summary( statistic = all_continuous() ~ "{mean} ({sd})", label = starts_with("grade") ~ "Pathologic grade", digits = age ~ 2 ) ``` <br> .large[Use **lists** to pass 2 or more choices:] ```r label = list(age ~ "Patient age (years)", grade = "Pathologic tumor grade") ``` --- # Add-on functions in {gtsummary} .xlarge[ `tbl_summary()` objects can also be updated using related functions. - `add_*()` add **additional column** of statistics or information, e.g. p-values, q-values, overall statistics, N obs., and more - `modify_*()` **modify** table headers, spanning headers, and footnotes - `bold_*()/italicize_*()` **style** labels, variable levels, significant p-values ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r sm_trial |> tbl_summary( by = trt ) |> add_p() |> add_q(method = "fdr") ``` .medium[ * `add_p()`: adds a column of p-values * `add_q()`: adds a column of p-values adjusted for multiple comparisons through a call to `p.adjust()` ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_3a.png" width=100%></p> ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r sm_trial |> tbl_summary( by = trt, missing = "no" ) |> add_overall() ``` .medium[ * `add_overall()`: adds a column of overall statistics ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_4a.png" width=100%></p> ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r sm_trial |> tbl_summary( by = trt, missing = "no" ) |> add_overall() |> add_n() ``` .medium[ * `add_overall()`: adds a column of overall statistics * `add_n()`: adds a column with the sample size ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_4b.png" width=100%></p> ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r sm_trial |> tbl_summary( by = trt, missing = "no" ) |> add_overall() |> add_n() |> add_stat_label( label = all_categorical() ~ "No. (%)" ) ``` .medium[ * `add_overall()`: adds a column of overall statistics * `add_n()`: adds a column with the sample size * `add_stat_label()`: adds a description of the reported statistic ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_4c.png" width=100%></p> ] --- # Update tbl\_summary() with add\_\*() ```r trial |> select(trt, marker, response) |> tbl_summary( by = trt, statistic = list(marker ~ "{mean} ({sd})", response ~ "{p}%"), missing = "no" ) |> add_difference() ``` <p align="center"><img src="Images/tbl_summary_4d.png" width=80%></p> .large[ - `add_difference()`: adds mean and rate differences between groups. Can optionally be adjusted differences, see argument `adj.vars` ] --- # Update tbl\_summary() with add\_\*() ```r sm_trial |> tbl_summary( by = trt, missing = "no" ) |> add_stat(...) ``` .large[ - Write custom statistic functions, and add to table with `add_stat()` - Added statistics can be placed on the label or the level rows ] --- # Update tbl\_summary() with bold\_\*()/italicize\_\*() .pull-left[ ```r sm_trial |> tbl_summary( by = trt ) |> add_p() |> bold_labels() |> italicize_levels() |> bold_p(t = 0.8) ``` .medium[ * `bold_labels()`: bold the variable labels * `italicize_levels()`: italicize the variable levels * `bold_p()`: bold p-values according a specified threshold ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_3b.png" width=90%></p> ] --- # Update tbl\_summary() with modify\_\*() .pull-left[ .tiny[ ```r sm_trial |> select(age, response, trt) |> tbl_summary( by = trt ) |> modify_header( update = list( stat_1 ~ "**A**", stat_2 ~ "**B**" )) |> modify_spanning_header( all_stat_cols() ~ "**Drug**") |> modify_footnote( all_stat_cols() ~ "median (IQR) for continuous; n (%) for categorical" ) ``` ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_5.png" width=90%></p> ] --- # Column names * Use `show_header_names()` to see the internal header names available for use in `modify_header()` * `all_stat_cols()` selects, for example, columns "stat_1" and "stat_2" .small[ ```r tbl <- sm_trial |> tbl_summary( by = trt) show_header_names(tbl) ``` ``` ## ℹ As a usage guide, the code below re-creates the current column headers. ``` ``` ## modify_header( ## label = "**Characteristic**", ## stat_1 = "**Drug A**, N = 98", ## stat_2 = "**Drug B**, N = 102" ## ) ``` ``` ## ## ## Column Name Column Header ## ------------ -------------------- ## label **Characteristic** ## stat_1 **Drug A**, N = 98 ## stat_2 **Drug B**, N = 102 ``` ] --- # Add-on functions in {gtsummary} .xlarge[ And many more! See the documentation at http://www.danieldsjoberg.com/gtsummary/reference/index.html And a detailed `tbl_summary()` vignette at http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html ] --- # Cross-tabulation with tbl_cross() .large[`tbl_cross()` is a wrapper for `tbl_summary()` for **n x m** tables] .pull-left[ <br> ```r sm_trial |> tbl_cross( row = trt, col = grade, percent = "row", margin = "row" ) |> add_p(source_note = TRUE) ``` ] .pull-right[ <p align="center"><img src="Images/tbl_cross_1.png" width=90%></p> ] --- # Continuous summaries with tbl_continuous() .large[`tbl_continuous()` summarizes a continuous variable by 1, 2, or more categorical variables] .pull-left[ ```r sm_trial |> tbl_continuous( variable = age, by = trt, include = grade ) ``` ] .pull-right[ <p align="center"><img src="Images/tbl_cont_1.png" width=100%></p> ] --- # Survey data with tbl_svysummary() .pull-left[ ```r survey::svydesign( ids = ~1, data = as.data.frame(Titanic), weights = ~Freq ) |> tbl_svysummary( by = Survived, include = c(Class, Sex) ) |> add_p() |> modify_spanning_header( all_stat_cols() ~ "**Survived**" ) ``` ] .pull-right[ <p align="center"><img src="Images/tbl_svy_1.png" width=90%></p> ] --- # Survival outcomes with tbl_survfit() ```r library(survival) fit <- survfit(Surv(ttdeath, death) ~ trt, trial) tbl_survfit( fit, times = c(12, 24), label_header = "**{time} Month**" ) |> add_p() ``` <p align="center"><img src="Images/tbl_surv_1.png" width=70%></p> --- class: inverse, center, middle # tbl_regression() --- # Traditional model summary() .pull-left[ ```r m1 <- glm( response ~ age + stage, data = trial, family = binomial(link = "logit") ) summary(m1) ``` .medium[ Looks **messy** and it's not easy for others to understand. ] ] .pull-right[ <p align="center"><img src="Images/messy-model-output.png" width=100%></p> ] --- # Basic tbl_regression() .pull-left[ ```r tbl_regression(m1) ``` .medium[ - Displays **p-values** for covariates - Shows **reference levels** for categorical variables - **Model type recognized** as logistic regression with log(OR) appearing in header ] ] .pull-right[ <p align="center"><img src="Images/m1_tbl_1.png" width=90%></p> ] --- # Customize tbl_regression() output .pull-left[ ```r tbl_regression( m1, exponentiate = TRUE ) |> add_global_p() |> add_glance_table( include = c(nobs, logLik, AIC, BIC) ) ``` .medium[ - Display **odds ratio** estimates and **confidence intervals** - Add **global p-values** - Add various **model statistics** ] ] .pull-right[ <p align="center"><img src="Images/m1_tbl_2.png" width=60%></p> ] --- # Supported models in tbl_regression() ``` ## [1] "`biglm::bigglm()`" "`biglmm::bigglm()`" "`brms::brm()`" ## [4] "`cmprsk::crr()`" "`fixest::feglm()`" "`fixest::femlm()`" ## [7] "`fixest::feNmlm()`" "`fixest::feols()`" "`gam::gam()`" ## [10] "`geepack::geeglm()`" "`glmmTMB::glmmTMB()`" "`lavaan::lavaan()`" ## [13] "`lfe::felm()`" "`lme4::glmer()`" "`lme4::glmer.nb()`" ## [16] "`lme4::lmer()`" "`logitr::logitr()`" "`MASS::glm.nb()`" ## [19] "`MASS::polr()`" "`mgcv::gam()`" "`mice::mira`" ## [22] "`multgee::nomLORgee()`" "`multgee::ordLORgee()`" "`nnet::multinom()`" ## [25] "`ordinal::clm()`" "`ordinal::clmm()`" "`parsnip::model_fit`" ## [28] "`plm::plm()`" "`rstanarm::stan_glm()`" "`stats::aov()`" ## [31] "`stats::glm()`" "`stats::lm()`" "`stats::nls()`" ## [34] "`survey::svycoxph()`" "`survey::svyglm()`" "`survey::svyolr()`" ## [37] "`survival::clogit()`" "`survival::coxph()`" "`survival::survreg()`" ## [40] "`tidycmprsk::crr()`" "`VGAM::vglm()`" ``` .large[- **Custom tidiers** can be written and passed to `tbl_regression()` using the `tidy_fun` argument.] --- # Univariate models with tbl_uvregression() .pull-left[ ```r sm_trial |> tbl_uvregression( method = glm, y = response, method.args = list(family = binomial), exponentiate = TRUE ) ``` .medium[ - Specify model `method`, `method.args`, and the `response` variable - Arguments and helper functions like `exponentiate`, `bold_*()`, `add_global_p()` can also be used with `tbl_uvregression()` ] ] .pull-right[ <p align="center"><img src="Images/tbl_uvreg.png" width=90%></p> ] --- class: inverse, center, middle # inline_text() --- # {gtsummary} reporting with inline_text() .large[ - Tables are important, but we often need to **report results in-line** in a report. - Any statistic reported in a {gtsummary} table can be extracted and reported in-line in an R Markdown document with the `inline_text()` function. - The pattern of what is reported can be modified with the `pattern = ` argument. - Default is `pattern = "{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})"` for regression summaries. ] --- # {gtsummary} reporting with inline_text() <p align="center"><img src="Images/m1_tbl_3.png" width=30%></p> **In Code:** The odds ratio for age is '` r inline_text(m1_tbl_3, variable = age)`' **In Report:** The odds ratio for age is 1.02 (95% CI 1.00, 1.04; p=0.087) --- class: inverse, center, middle # tbl_merge()/tbl_stack() --- # tbl_merge() for side-by-side tables .pull-left[ A **univariable** table: ```r trial |> select(age, grade, death, ttdeath) |> tbl_uvregression( method = coxph, y = Surv(ttdeath, death), exponentiate = TRUE ) |> add_global_p() ``` <p align="center"><img src="Images/tbl_uvsurv.png" width=50%></p> ] .pull-right[ A **multivariable** table: ```r coxph( Surv(ttdeath, death) ~ age + grade, data = trial ) |> tbl_regression( exponentiate = TRUE ) |> add_global_p() ``` <p align="center"><img src="Images/tbl_mvsurv.png" width=50%></p> ] --- # tbl_merge() for side-by-side tables ```r tbl_merge( list(tbl_uvsurv, tbl_mvsurv), tab_spanner = c("**Univariable**", "**Multivariable**") ) ``` <p align="center"><img src="Images/tbl_surv_merge.png" width=50%></p> --- # tbl_stack() to combine vertically .pull-left[ An **unadjusted** model: ```r coxph(Surv(ttdeath, death) ~ trt, data = trial) |> tbl_regression( show_single_row = trt, label = trt ~ "Drug B vs A", exponentiate = TRUE ) ``` <p align="center"><img src="Images/t3.png" width=60%></p> ] .pull-right[ An **adjusted** model: ```r coxph(Surv(ttdeath, death) ~ trt + grade + stage + marker, data = trial) |> tbl_regression( show_single_row = trt, label = trt ~ "Drug B vs A", exponentiate = TRUE, include = "trt" ) ``` <p align="center"><img src="Images/t4.png" width=60%></p> ] --- # tbl_stack() to combine vertically ```r tbl_stack( list(t3, t4), group_header = c("Unadjusted", "Adjusted") ) ``` <p align="center"><img src="Images/tbl_surv_stack.png" width=40%></p> --- # tbl_strata() for stratified tables ```r sm_trial |> mutate(grade = paste("Grade", grade)) |> tbl_strata( strata = grade, ~tbl_summary(.x, by = trt, missing = "no") |> modify_header(all_stat_cols() ~ "**{level}**") ) ``` <p align="center"><img src="Images/tbl_strata.png" width=60%></p> --- class: inverse, center, middle # {gtsummary} themes --- # {gtsummary} theme basics .large[ - A **theme** is a set of customization preferences that can be easily set and reused. - Themes control **default settings for existing functions** - Themes control more **fine-grained customization** not available via arguments or helper functions - Easily use one of the **available themes**, or **create your own** ] --- # {gtsummary} default theme .pull-left[ ```r reset_gtsummary_theme() trial |> select(age, grade, trt) |> tbl_summary(by = trt) |> add_stat_label() |> add_p() |> modify_caption("Default Theme") ``` ] .pull-right[ <p align="center"><img src="Images/no_theme.png" width=90%></p> ] --- # {gtsummary} theme_gtsummary_journal() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_journal(journal = "jama") trial |> select(age, grade, trt) |> tbl_summary(by = trt) |> add_stat_label() |> add_p() |> modify_caption("Journal Theme (JAMA)") ``` ] .pull-right[ <p align="center"><img src="Images/jama_theme.png" width=90%></p> ] .medium[ Journal options include `jama`, `lancet`, `nejm`, `qjecon`. **Contributions welcome!** ] --- # {gtsummary} theme_gtsummary_language() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_language(language = "hi") trial |> select(age, grade, trt) |> tbl_summary(by = trt) |> add_stat_label() |> add_p() |> modify_caption("Language Theme (Hindi)") ``` ] .pull-right[ <p align="center"><img src="Images/lang_theme.png" width=90%></p> ] .medium[ Language options: "de" (German), "en" (English), "es" (Spanish), "fr" (French), "gu" (Gujarati), "hi" (Hindi), "is" (Icelandic), "ja" (Japanese), "kr" (Korean), "mr" (Marathi), "nl" (Dutch), "no" (Norwegian), "pt" (Portuguese), "se" (Swedish), "zh-cn" (Chinese - Simplified), "zh-tw" (Chinese - Traditional) ] --- # {gtsummary} theme_gtsummary_compact() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_compact() trial |> select(age, grade, trt) |> tbl_summary(by = trt) |> add_stat_label() |> add_p() |> modify_caption("Compact Theme") ``` ] .pull-right[ <p align="center"><img src="Images/compact_theme.png" width=90%></p> ] .medium[ Reduces padding and font size ] --- # {gtsummary} set_gtsummary_theme() .large[ - `set_gtsummary_theme()` to create a custom theme - See the {gtsummary} + themes vignette: http://www.danieldsjoberg.com/gtsummary/articles/themes.html ] --- class: inverse, center, middle # {gtsummary} print engines --- # {gtsummary} print engines <p align="center"><img src="Images/gtsummary_rmarkdown.png" width=50%></p> --- class: inverse, center, middle # In Closing --- # {gtsummary} website .large[http://www.danieldsjoberg.com/gtsummary/] <p align="center"><img src="Images/gtsummary_website.png" width=60%></p> --- # {gtsummary} installation .pull-left[ Install the production version of {gtsummary} from CRAN: ```r install.packages("gtsummary") ``` Install the development version of {gtsummary} from GitHub: ```r remotes::install_github("ddsjoberg/gtsummary") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-84-1.png" width="100%" /> ] --- # Package authors/contributors .large[ .pull-left[ **Daniel D. Sjoberg** (maintainer) Michael Curry Joseph Larmarange Jessica Lavery Karissa Whiting Emily C. Zabor Xing Bai ] .pull-right[ Esther Drill Jessica Flynn Margie Hannum Stephanie Lobaugh Shannon Pileggi Amy Tin Gustavo Zapata Wainberg ] ] --- # Thank you! .large[
: [emilyzabor.com](https://www.emilyzabor.com/)
: [zabore2@ccf.org](mailto:zabore2@ccf.org)
: [zabore](https://github.com/zabore/)
: [zabormetrics](https://twitter.com/zabormetrics) ]