class: inverse, center, title-slide, middle # Creating presentation-ready summary tables with {gtsummary} ### Emily C. Zabor #### Greater Cleveland R Group #### September 30, 2020 <p align="center"><img src="Images/CC_hires_r.png" width=30%></p> --- # About me .left-column[ <img src="Images/UMN.jpg" width=50%> <img src="Images/msk-black-logo.png" width=95%> <img src="Images/CU.jpg" width=50%> <img src="Images/Taussig.jpg" width=75%> ] .right-column[ .medium[ * MS in Biostatistics from the **University of Minnesota** in 2010 <br> <br> * 9 years as a Research Biostatistician at **Memorial Sloan Kettering Cancer Center** <br> * DrPH in Biostatistics from **Columbia University** in 2019 <br> * Faculty Biostatistician at **Cleveland Clinic** starting in 2019 ] ] --- class: inverse, center, middle # Motivation --- # The reproducibility crisis .pull-left[ .large[ - Quality of medical research is often low - **Low quality code** in medical research part of the problem - Low quality code is more likely to **contain errors** - Reproducibility is often **cumbersome** and **time-consuming** ] ] .pull-right[ <p align="center"><img src="Images/reproducibility-graphic-online1.jpeg" width=90%></p> ] .footnote[Image source: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970; Slide source: http://www.danieldsjoberg.com/rmedicine-gtsummary] --- # Need reproducible, presentation-ready tables <p align="center"><img src="Images/table1_example.png" width=50%> .small[ Brierley CK, Zabor EC, Komrokji RS, DeZern AE, Roboz GJ, Brunner AM, Stone RM, Sekeres MA, Steensma DP. Low participation rates and disparities in participation in interventional clinical trials for myelodysplastic syndromes. *Cancer.* 2020 Aug 7. doi: 10.1002/cncr.33105. PMID: 32767690. ] --- # Our solution: internal package {biostatR} <img src="Images/msk-black-logo.png" width=60%> .xxlarge[Department of Epidemiology & Biostatistics] .large[We thought it was *so good* that we converted it into the public package {gtsummary}!] --- # {gtsummary} overview .pull-left[ * Create **tabular summaries** with sensible defaults but highly customizable * Types of summaries: - "Table 1"-types - Cross-tabulation - Regression models - Survival data - Survey data - Custom tables * Report statistics from {gtsummary} tables **inline** in R Markdown * Stack and/or merge any table type * Use **themes** to standardize across tables * Choose from different **print engines** ] .pull-right[ <img src="Images/gtsummary_logo.png" width=80%> ] --- # {gtsummary} example dataset * The `trial` dataset is included with {gtsummary} * Simulated dataset of baseline characteristics for 200 patients who receive Drug A or Drug B * Variables were assigned labels using the `labelled` package ```r library(gtsummary) library(tidyverse) head(trial) ``` ``` ## # A tibble: 6 x 8 ## trt age marker stage grade response death ttdeath ## <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl> ## 1 Drug A 23 0.16 T1 II 0 0 24 ## 2 Drug B 9 1.11 T2 I 1 0 24 ## 3 Drug A 31 0.277 T1 II 0 0 24 ## 4 Drug A NA 2.07 T3 III 1 1 17.6 ## 5 Drug A 51 2.77 T4 III 1 1 16.4 ## 6 Drug B 39 0.613 T4 I 0 1 15.6 ``` --- # {gtsummary} example dataset .pull-left[ This presentation will use a subset of the variables. ```r sm_trial <- trial %>% select(trt, age, grade, response) ``` ] .pull-right[ <p align="center"><img src="Images/sm_trial_info.png" width=90%></p> ] --- class: inverse, center, middle # tbl_summary() --- # Basic tbl_summary() .pull-left[ ```r tbl_summary_1 <- sm_trial %>% select(-trt) %>% tbl_summary() ``` .medium[ - Three types of summaries: `continuous`, `categorical`, and `dichotomous` - Statistics are `median (IQR)` for continuous, `n (%)` for categorical/dichotomous - Variables coded `0/1`, `TRUE/FALSE`, `Yes/No` treated as dichotomous - Lists `NA` values under "Unknown" - Label attributes are printed automatically ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_1.png" width=65%></p> ] --- # Customize tbl_summary() output .pull-left[ ```r tbl_summary_2 <- sm_trial %>% tbl_summary( by = trt, statistic = response ~ "{n} / {N} ({p}%)", label = grade ~ "Pathologic tumor grade", digits = age ~ 2 ) ``` .medium[ - `by`: specifies a column variable for cross-tabulation - `statistic`: customize the reported statistics - `label`: change or customize variable labels - `digits`: specify the number of decimal places for rounding ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_2.png" width=90%></p> ] --- # Customize tbl_summary() output .large[**provide argument** = **select variables** ~ **give instructions**] ```r sm_trial %>% tbl_summary( statistic = all_continuous() ~ "{mean} ({sd})", label = starts_with("grade") ~ "Pathologic grade", digits = age ~ 2 ) ``` <br> .large[Use **lists** to specify more than two choices:] ```r label = list(age ~ "Patient age (years)", grade = "Pathologic tumor grade") ``` --- # Add-on functions in {gtsummary} .xlarge[ `tbl_summary()` objects can also be updated using related functions. - `add_*()` add additional column of statistics or information, e.g. p-values, q-values, overall statistics, N obs., and more - `modify_*()` modify table headers, spanning headers, and footnotes - `bold_*()/italicize_*()` style labels, variable levels, significant p-values ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r tbl_summary_3a <- sm_trial %>% tbl_summary( by = trt ) %>% add_p() %>% add_q( method = "bonferroni" ) ``` .medium[ * `add_p()`: adds a column of p-values * `add_q()`: adds a column of p-values adjusted for multiple comparisons through a call to `p.adjust()` ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_3a.png" width=100%></p> ] --- # Update tbl\_summary() with add\_\*() .pull-left[ ```r tbl_summary_3b <- sm_trial %>% tbl_summary( by = trt, missing = "no" ) %>% add_overall() %>% add_n() %>% add_stat_label( label = all_categorical() ~ "No. (%)" ) ``` .medium[ * `add_overall()`: adds a column of overall statistics * `add_n()`: adds a column with the sample size * `add_stat_label()`: adds a description of the reported statistic ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_3b.png" width=100%></p> ] --- # Update tbl\_summary() with bold\_\*()/italicize\_\*() .pull-left[ ```r tbl_summary_4 <- sm_trial %>% tbl_summary( by = trt ) %>% add_p() %>% bold_labels() %>% italicize_levels() %>% bold_p(t = 0.8) ``` .medium[ * `bold_labels()`: bold the variable labels * `italicize_levels()`: italicize the variable levels * `bold_p()`: bold p-values according a specified threshold ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_4.png" width=100%></p> ] --- # Update tbl\_summary() with modify\_\*() .pull-left[ .tiny[ ```r tbl_summary_5 <- sm_trial %>% select(age, response, trt) %>% tbl_summary( by = trt ) %>% modify_header( update = list( stat_1 ~ "**A**", stat_2 ~ "**B**" )) %>% modify_spanning_header( update = starts_with("stat_") ~ "Drug") %>% modify_footnote( update = starts_with("stat_") ~ "median (IQR) for continuous; n (%) for categorical" ) ``` ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_5.png" width=90%></p> ] * Use `show_header_names()` to see the internal header names available for use in `modify_header()` --- # Add-on functions in {gtsummary} .xlarge[ And many more! See the documentation at http://www.danieldsjoberg.com/gtsummary/reference/index.html And a detailed `tbl_summary()` vignette at http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html ] --- # Cross-tabulation with tbl_cross() .large[`tbl_cross()` is a wrapper for `tbl_summary()` for **n x m** tables] .pull-left[ <br> ```r tbl_cross_1 <- sm_trial %>% tbl_cross( row = trt, col = grade, percent = "row", margin = "row" ) %>% add_p(source_note = TRUE) ``` ] .pull-right[ <p align="center"><img src="Images/tbl_cross_1.png" width=90%></p> ] --- class: inverse, center, middle # tbl_regression() --- # Traditional model summary() .pull-left[ ```r m1 <- glm( response ~ age + stage, data = trial, family = binomial(link = "logit") ) summary(m1) ``` .medium[ Looks **messy** and it's not easy for others to understand. ] ] .pull-right[ <p align="center"><img src="Images/messy-model-output.png" width=100%></p> ] --- # Basic tbl_regression() .pull-left[ ```r m1_tbl_1 <- tbl_regression( m1 ) ``` .medium[ - Displays **p-values** for covariates - Shows **reference levels** for categorical variables ] ] .pull-right[ <p align="center"><img src="Images/m1_tbl_1.png" width=90%></p> ] --- # Customize tbl_regression() output .pull-left[ ```r m1_tbl_2 <- tbl_regression( m1, exponentiate = TRUE ) %>% add_global_p() ``` .medium[ - Display **odds ratio** estimates and **confidence intervals** - Add global p-values ] ] .pull-right[ <p align="center"><img src="Images/m1_tbl_2.png" width=90%></p> ] --- # Supported models in tbl_regression() .xlarge[ - From `stats`: `lm()`, `glm()` - From `survival`: `coxph()`, `clogit()`, `survreg()` - From `lme4`: `glmer()`, `lmer()` - From `geepack`: `geeglm()` ] .large[**Custom tidiers** can be written and passed to `tbl_regression()` using the `tidy_fun` argument.] --- # Univariate models with tbl_uvregression() .pull-left[ ```r tbl_uvreg <- sm_trial %>% tbl_uvregression( method = glm, y = response, method.args = list(family = binomial), exponentiate = TRUE ) ``` .medium[ - Specify model `method`, `method.args`, and the `response` variable - Arguments and helper functions like `exponentiate`, `bold_*()`, `add_global_p()` can also be used with `tbl_uvregression()` ] ] .pull-right[ <p align="center"><img src="Images/tbl_uvreg.png" width=90%></p> ] --- class: inverse, center, middle # inline_text() --- # {gtsummary} reporting with inline_text() .large[ - Tables are important, but we often need to report results in-line in a report. - Any statistic reported in a {gtsummary} table can be extracted and reported in-line in an R Markdown document with the `inline_text()` function. - The pattern of what is reported can be modified with the `pattern = ` argument. - Default is `pattern = "{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})"`. ] --- # {gtsummary} reporting with inline_text() <p align="center"><img src="Images/m1_tbl_2.png" width=30%></p> **In Code:** The odds ratio for age is '` r inline_text(m1_tbl_2, variable = age)`' **In Report:** The odds ratio for age is 1.02 (95% CI 1.00, 1.04; p=0.091) --- class: inverse, center, middle # tbl_merge()/tbl_stack() --- # tbl_merge() for side-by-side tables .pull-left[ A **univariable** table: ```r library(survival) tbl_uvsurv <- trial %>% select(age, grade, death, ttdeath) %>% tbl_uvregression( method = coxph, y = Surv(ttdeath, death), exponentiate = TRUE ) %>% add_global_p() ``` ] .pull-right[ A **multivariable** table: ```r library(survival) tbl_mvsurv <- coxph( Surv(ttdeath, death) ~ age + grade, data = trial ) %>% tbl_regression( exponentiate = TRUE ) %>% add_global_p() ``` ] --- # tbl_merge() for side-by-side tables .pull-left[ A **univariable** table: <p align="center"><img src="Images/tbl_uvsurv.png" width=90%></p> ] .pull-right[ A **multivariable** table: <p align="center"><img src="Images/tbl_mvsurv.png" width=85%></p> ] --- # tbl_merge() for side-by-side tables ```r tbl_surv_merge <- tbl_merge( list(tbl_uvsurv, tbl_mvsurv), tab_spanner = c("**Univariable**", "**Multivariable**") ) ``` <p align="center"><img src="Images/tbl_surv_merge.png" width=50%></p> --- # tbl_stack() to combine vertically .pull-left[ An **unadjusted** model: ```r t3 <- coxph(Surv(ttdeath, death) ~ trt, data = trial) %>% tbl_regression( show_single_row = trt, label = trt ~ "Drug B vs A", exponentiate = TRUE ) ``` ] .pull-right[ An **adjusted** model: ```r t4 <- coxph(Surv(ttdeath, death) ~ trt + grade + stage + marker, data = trial) %>% tbl_regression( show_single_row = trt, label = trt ~ "Drug B vs A", exponentiate = TRUE, include = "trt" ) ``` ] --- # tbl_stack() to combine vertically .pull-left[ An **unadjusted** model: <p align="center"><img src="Images/t3.png" width=90%></p> ] .pull-right[ An **adjusted** model: <p align="center"><img src="Images/t4.png" width=90%></p> ] --- # tbl_stack() to combine vertically ```r tbl_surv_stack <- tbl_stack( list(t3, t4), group_header = c("Unadjusted", "Adjusted") ) ``` <p align="center"><img src="Images/tbl_surv_stack.png" width=40%></p> --- class: inverse, center, middle # {gtsummary} themes --- # {gtsummary} theme basics .large[ - A **theme** is a set of customization preferences that can be easily set and reused. - Themes control **default settings for existing functions** - Themes control more **fine-grained customization** not available via arguments or helper functions - Easily use one of the **available themes**, or **create your own** ] --- # {gtsummary} default theme .pull-left[ ```r reset_gtsummary_theme() no_theme <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt() %>% gt::tab_header("Default Theme") ``` ] .pull-right[ <p align="center"><img src="Images/no_theme.png" width=90%></p> ] --- # {gtsummary} theme_gtsummary_journal() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_journal(journal = "jama") jama_theme <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt() %>% gt::tab_header("Journal Theme (JAMA)") ``` ] .pull-right[ <p align="center"><img src="Images/jama_theme.png" width=90%></p> ] .medium[ Journal options include `jama` and `lancet` ] --- # {gtsummary} theme_gtsummary_language() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_language(language = "hi") lang_theme <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt() %>% gt::tab_header("Language Theme (Hindi)") ``` ] .pull-right[ <p align="center"><img src="Images/lang_theme.png" width=90%></p> ] .medium[ Language options: "de" (German), "en" (English), "es" (Spanish), "fr" (French), "gu" (Gujarati), "hi" (Hindi), "ja" (Japanese), "mr" (Marathi), "pt" (Portuguese), "se" (Swedish), "zh-cn" (Chinese - Simplified), "zh-tw" (Chinese - Traditional) ] --- # {gtsummary} theme_gtsummary_compact() .pull-left[ ```r reset_gtsummary_theme() theme_gtsummary_compact() compact_theme <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt() %>% gt::tab_header("Compact Theme") ``` ] .pull-right[ <p align="center"><img src="Images/compact_theme.png" width=90%></p> ] .medium[ Reduces padding and font size ] --- # {gtsummary} set_gtsummary_theme() ```r my_theme <- list( # Some gt customization "as_gt-lst:addl_cmds" = list( # make the font size small tab_spanner = rlang::expr(gt::tab_options(table.font.size = 'small')), # add a custom title and subtitle to every table user_added1 = rlang::expr(gt::tab_header( title = "Emily Zabor's Table", subtitle = "For Internal Use Only")), # add a custom data source note user_added2 = rlang::expr(gt::tab_source_note( source_note = "Source: very private internal data!")), # stripe the table rows user_added3 = rlang::expr(gt::opt_row_striping()), user_added4 = rlang::expr(gt::opt_table_lines("none")) ) ) ``` --- # {gtsummary} set_gtsummary_theme() ```r returns <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt(return_calls = TRUE) returns ``` <img src="Images/return_calls.png" width=80%> --- # {gtsummary} set_gtsummary_theme() .pull-left[ ```r reset_gtsummary_theme() set_gtsummary_theme(my_theme) my_theme_tbl <- trial %>% select(age, grade, trt) %>% tbl_summary(by = trt) %>% add_stat_label() %>% add_p() %>% as_gt() my_theme_tbl ``` * Made the font size small * Added custom title, subtitle, source note * Striped the rows * Removed all row lines ] .pull-right[ <p align="center"><img src="Images/my_theme_tbl.png" width=90%></p> ] --- # And many more options! .large[ See the {gtsummary} + themes vignette: http://www.danieldsjoberg.com/gtsummary/articles/themes.html ] --- class: inverse, center, middle # {gtsummary} print engines --- # {gtsummary} print engines <p align="center"><img src="Images/gtsummary_rmarkdown.png" width=90%></p> .footnote[.small[Slide source: http://www.danieldsjoberg.com/rmedicine-gtsummary]] --- # Example HTML output .large[ **Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_html.Rmd **Output**: http://www.emilyzabor.com/gtsummary_print_engine_html.html ] --- # Example PDF output .large[ **Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_pdf.Rmd **Output**: http://www.emilyzabor.com/gtsummary_print_engine_pdf.pdf ] --- # Example RTF output .large[ **Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_rtf.Rmd **Output**: http://www.emilyzabor.com/gtsummary_print_engine_rtf.rtf ] --- # Example Word output .large[ **Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_word.Rmd **Output**: http://www.emilyzabor.com/gtsummary_print_engine_word.docx ] --- class: inverse, center, middle # In Closing --- # {gtsummary} website .large[http://www.danieldsjoberg.com/gtsummary/] <p align="center"><img src="Images/gtsummary_website.png" width=80%></p> --- # {gtsummary} installation .pull-left[ Install the production version of {gtsummary} from CRAN: ```r install.packages("gtsummary") ``` Install the development version of {gtsummary} from GitHub: ```r remotes::install_github("ddsjoberg/gtsummary") ``` ] .pull-right[ <!-- --> ] --- # Thank you {gtsummary} authors: Daniel Sjoberg (Maintainer), Margaret Hannum, Karissa Whiting, Emily Zabor <img src="Images/sjoberg.jpg" width=20%> <img src="Images/hannum.jpg" width=20%> <img src="Images/whiting.jpg" width=20%> <img src="Images/zabor.jpg" width=20%> {gtsummary} contributors (not pictured): Michael Curry, Esther Drill, Jessica Flynn, Joseph Larmarange, Stephanie Lobaugh, Gustavo Zapata Wainberg Special thanks to Dan Sjoberg and Margie Hannum for sharing materials from previous {gtsummary} talks! R/Medicine, August 28, 2020: http://www.danieldsjoberg.com/rmedicine-gtsummary RLadies NYC, February 26, 2020: https://margarethannum.com/gtsummary-presentation-rladies <br> .medium[
: @zabormetrics,
: @zabore,
: zabore2@ccf.org ] --- class: inverse, center, middle # New in v1.3.5 (Released 2020-09-29)! --- # New summary type continuous2 .pull-left[ ```r tbl_summary_n1 <- sm_trial %>% select(-trt) %>% tbl_summary( type = age ~ "continuous2", statistic = all_continuous2() ~ "{mean} ({sd})" ) ``` .medium[ - Prints the continuous summary on the line below the variable label - Function `all_continuous2()` for selecting all `continuous2` type variables - `theme_gtsummary_continuous2()` makes `continuous2` the default summary type for all continuous variables ] ] .pull-right[ <p align="center"><img src="Images/tbl_summary_n1.png" width=45%></p> ] --- # New function add_glance_source_note() ```r m1 <- glm( response ~ age + stage, data = trial, family = binomial(link = "logit") ) broom::glance(m1) ``` ``` ## # A tibble: 1 x 8 ## null.deviance df.null logLik AIC BIC deviance df.residual nobs ## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> ## 1 229. 182 -112. 234. 250. 224. 178 183 ``` --- # New function add_glance_source_note() ```r m1_tbl_n1 <- tbl_regression( m1, exponentiate = TRUE ) %>% add_glance_source_note() ``` <p align="center"><img src="Images/m1_tbl_n1.png" width=70%></p> --- # And more... See package News for full details: http://www.danieldsjoberg.com/gtsummary/news/index.html