Cleveland-R-gtsummary

# Creating presentation-ready summary tables with {gtsummary}

### Emily C. Zabor

#### Greater Cleveland R Group

#### September 30, 2020

---
# About me

<img src="Images/Taussig.jpg" width=75%>
]

* MS in Biostatistics from the **University of Minnesota** in 2010

* 9 years as a Research Biostatistician at **Memorial Sloan Kettering Cancer Center**

* DrPH in Biostatistics from **Columbia University** in 2019

* Faculty Biostatistician at **Cleveland Clinic** starting in 2019
]

]

---

---
# The reproducibility crisis

- **Low quality code** in medical research part of the problem

- Low quality code is more likely to **contain errors**

- Reproducibility is often **cumbersome** and **time-consuming**
]
]

.footnote[Image source: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970; Slide source: http://www.danieldsjoberg.com/rmedicine-gtsummary]

---
# Need reproducible, presentation-ready tables

.small[
Brierley CK, Zabor EC, Komrokji RS, DeZern AE, Roboz GJ, Brunner AM, Stone RM, Sekeres MA, Steensma DP. Low participation rates and disparities in participation in interventional clinical trials for myelodysplastic syndromes. *Cancer.* 2020 Aug 7. doi: 10.1002/cncr.33105. PMID: 32767690.
]

---
# Our solution: internal package {biostatR}

---
# {gtsummary} overview

.pull-left[
* Create **tabular summaries** with sensible defaults but highly customizable
* Types of summaries:
  - "Table 1"-types
  - Cross-tabulation
  - Regression models
  - Survival data
  - Survey data
  - Custom tables

* Report statistics from {gtsummary} tables **inline** in R Markdown
* Stack and/or merge any table type
* Use **themes** to standardize across tables
* Choose from different **print engines**

]

---

# {gtsummary} example dataset

* The `trial` dataset is included with {gtsummary}

* Simulated dataset of baseline characteristics for 200 patients who receive Drug A or Drug B

* Variables were assigned labels using the `labelled` package

```r
library(gtsummary)
library(tidyverse)

head(trial)
```

```
## # A tibble: 6 x 8
## trt age marker stage grade response death ttdeath
## <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl>
## 1 Drug A 23 0.16 T1 II 0 0 24 
## 2 Drug B 9 1.11 T2 I 1 0 24 
## 3 Drug A 31 0.277 T1 II 0 0 24 
## 4 Drug A NA 2.07 T3 III 1 1 17.6
## 5 Drug A 51 2.77 T4 III 1 1 16.4
## 6 Drug B 39 0.613 T4 I 0 1 15.6
```

---
# {gtsummary} example dataset

This presentation will use a subset of the variables.

```r
sm_trial <-
 trial %>% 
 select(trt, age, grade, response)
```

]

---

---
# Basic tbl_summary()

```r
tbl_summary_1 <- 
 sm_trial %>%
 select(-trt) %>% 
 tbl_summary()
```

- Statistics are `median (IQR)` for continuous, `n (%)` for categorical/dichotomous

- Variables coded `0/1`, `TRUE/FALSE`, `Yes/No` treated as dichotomous

- Lists `NA` values under "Unknown"

- Label attributes are printed automatically
]

]

---
# Customize tbl_summary() output

```r
tbl_summary_2 <- 
 sm_trial %>%
 tbl_summary(
 by = trt,
 statistic = response ~ "{n} / {N} ({p}%)",
 label = grade ~ "Pathologic tumor grade",
 digits = age ~ 2
 )
```

- `statistic`: customize the reported statistics

- `label`: change or customize variable labels

- `digits`: specify the number of decimal places for rounding

]
]

---
# Customize tbl_summary() output

```r
sm_trial %>%
  tbl_summary(
    statistic    =    all_continuous()        ~    "{mean} ({sd})",
    label        =    starts_with("grade")    ~    "Pathologic grade",
    digits       =    age                     ~    2
    )
```

```r
label = list(age ~ "Patient age (years)", grade = "Pathologic tumor grade")
```

---
# Add-on functions in {gtsummary}

- `add_*()` add additional column of statistics or information, e.g. p-values, q-values, overall statistics, N obs., and more

- `modify_*()` modify table headers, spanning headers, and footnotes

- `bold_*()/italicize_*()` style labels, variable levels, significant p-values

]

---
# Update tbl\_summary() with add\_\*()

```r
tbl_summary_3a <- 
 sm_trial %>%
 tbl_summary(
 by = trt
 ) %>% 
 add_p() %>% 
 add_q(
 method = "bonferroni"
 ) 
```

* `add_q()`: adds a column of p-values adjusted for multiple comparisons through a call to `p.adjust()`
]
]

---
# Update tbl\_summary() with add\_\*()

```r
tbl_summary_3b <- 
 sm_trial %>%
 tbl_summary(
 by = trt,
 missing = "no"
 ) %>% 
 add_overall() %>% 
 add_n() %>% 
 add_stat_label(
 label = all_categorical() ~ "No. (%)"
 ) 
```

.medium[
* `add_overall()`: adds a column of overall statistics
* `add_n()`: adds a column with the sample size 
* `add_stat_label()`: adds a description of the reported statistic
]
]

---
# Update tbl\_summary() with bold\_\*()/italicize\_\*()

```r
tbl_summary_4 <- 
 sm_trial %>%
 tbl_summary(
 by = trt
 ) %>%
 add_p() %>% 
 bold_labels() %>% 
 italicize_levels() %>% 
 bold_p(t = 0.8)
```

.medium[
* `bold_labels()`: bold the variable labels
* `italicize_levels()`: italicize the variable levels
* `bold_p()`: bold p-values according a specified threshold
]
]

---
# Update tbl\_summary() with modify\_\*()

```r
tbl_summary_5 <- 
 sm_trial %>% select(age, response, trt) %>% 
 tbl_summary(
 by = trt
 ) %>%
 modify_header(
 update = list(
 stat_1 ~ "**A**",
 stat_2 ~ "**B**"
 )) %>% 
 modify_spanning_header(
 update = starts_with("stat_") ~ "Drug") %>% 
 modify_footnote(
 update = starts_with("stat_") ~ 
 "median (IQR) for continuous; n (%) for categorical"
 )
```
]

]

* Use `show_header_names()` to see the internal header names available for use in `modify_header()`

---
# Add-on functions in {gtsummary}

See the documentation at http://www.danieldsjoberg.com/gtsummary/reference/index.html

And a detailed `tbl_summary()` vignette at http://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html
]

---
# Cross-tabulation with tbl_cross()

```r
tbl_cross_1 <-
 sm_trial %>%
 tbl_cross(
 row = trt, 
 col = grade,
 percent = "row",
 margin = "row"
 ) %>%
 add_p(source_note = TRUE)
```

]

---

---
# Traditional model summary()

```r
m1 <- glm(
 response ~ age + stage,
 data = trial,
 family = binomial(link = "logit")
 )

summary(m1)
```

---
# Basic tbl_regression()

```r
m1_tbl_1 <-
 tbl_regression(
 m1
 )
```

- Shows **reference levels** for categorical variables
]

]

---
# Customize tbl_regression() output

```r
m1_tbl_2 <-
 tbl_regression(
 m1,
 exponentiate = TRUE
 ) %>% 
 add_global_p()
```

- Add global p-values
]
]

---
# Supported models in tbl_regression()

.xlarge[
- From `stats`: `lm()`, `glm()`
- From `survival`: `coxph()`, `clogit()`, `survreg()`
- From `lme4`: `glmer()`, `lmer()`
- From `geepack`: `geeglm()`
]

.large[**Custom tidiers** can be written and passed to `tbl_regression()` using the `tidy_fun` argument.]

---
# Univariate models with tbl_uvregression()

```r
tbl_uvreg <- 
 sm_trial %>% 
 tbl_uvregression(
 method = glm,
 y = response,
 method.args = list(family = binomial),
 exponentiate = TRUE
 )
```

- Arguments and helper functions like `exponentiate`, `bold_*()`, `add_global_p()` can also be used with `tbl_uvregression()`
]
]

---

---
# {gtsummary} reporting with inline_text()
.large[
- Tables are important, but we often need to report results in-line in a report.

- Any statistic reported in a {gtsummary} table can be extracted and reported in-line in an R Markdown document with the `inline_text()` function.

- The pattern of what is reported can be modified with the `pattern = ` argument.

- Default is `pattern = "{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})"`.
]

---
# {gtsummary} reporting with inline_text()

**In Code:** 
The odds ratio for age is '` r inline_text(m1_tbl_2, variable = age)`'

**In Report:** 
The odds ratio for age is 1.02 (95% CI 1.00, 1.04; p=0.091)

---

---
# tbl_merge() for side-by-side tables

A **univariable** table:

```r
library(survival)

tbl_uvsurv <- 
 trial %>% 
 select(age, grade, death, ttdeath) %>% 
 tbl_uvregression(
 method = coxph,
 y = Surv(ttdeath, death),
 exponentiate = TRUE
 ) %>% 
 add_global_p()
```

]

A **multivariable** table:

```r
library(survival)

tbl_mvsurv <- coxph(
 Surv(ttdeath, death) ~ 
 age + grade, 
 data = trial
 ) %>% 
 tbl_regression(
 exponentiate = TRUE
 ) %>% 
 add_global_p() 
```

]

---
# tbl_merge() for side-by-side tables

A **univariable** table:

<img src="Images/tbl_uvsurv.png" width=90%>
]

A **multivariable** table:

<img src="Images/tbl_mvsurv.png" width=85%>
]

---
# tbl_merge() for side-by-side tables

```r
tbl_surv_merge <- tbl_merge(
 list(tbl_uvsurv, tbl_mvsurv),
 tab_spanner = c("**Univariable**", "**Multivariable**")
 )
```

---
# tbl_stack() to combine vertically

An **unadjusted** model:

```r
t3 <-
 coxph(Surv(ttdeath, death) ~ 
 trt, 
 data = trial) %>%
 tbl_regression(
 show_single_row = trt,
 label = trt ~ "Drug B vs A",
 exponentiate = TRUE
 )
```

]

An **adjusted** model:

```r
t4 <-
 coxph(Surv(ttdeath, death) ~ 
 trt + grade + stage + marker, 
 data = trial) %>%
 tbl_regression(
 show_single_row = trt,
 label = trt ~ "Drug B vs A",
 exponentiate = TRUE, 
 include = "trt"
 )
```

]

---
# tbl_stack() to combine vertically

An **unadjusted** model:

<img src="Images/t3.png" width=90%>
]

An **adjusted** model:

<img src="Images/t4.png" width=90%>
]

---
# tbl_stack() to combine vertically

```r
tbl_surv_stack <- tbl_stack(
 list(t3, t4),
 group_header = c("Unadjusted", "Adjusted")
 )
```

---

---
# {gtsummary} theme basics

- Themes control **default settings for existing functions**

- Themes control more **fine-grained customization** not available via arguments or helper functions

- Easily use one of the **available themes**, or **create your own**
]

---
# {gtsummary} default theme

```r
reset_gtsummary_theme()

no_theme <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt() %>%
 gt::tab_header("Default Theme")
```

]

---
# {gtsummary} theme_gtsummary_journal()

```r
reset_gtsummary_theme()

theme_gtsummary_journal(journal = "jama")

jama_theme <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt() %>%
 gt::tab_header("Journal Theme (JAMA)")
```

]

---
# {gtsummary} theme_gtsummary_language()

```r
reset_gtsummary_theme()

theme_gtsummary_language(language = "hi")

lang_theme <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt() %>%
 gt::tab_header("Language Theme (Hindi)")
```

]

.medium[
Language options: "de" (German), "en" (English), "es" (Spanish), "fr" (French), "gu" (Gujarati), "hi" (Hindi), "ja" (Japanese), "mr" (Marathi), "pt" (Portuguese), "se" (Swedish), "zh-cn" (Chinese - Simplified), "zh-tw" (Chinese - Traditional)
]

---
# {gtsummary} theme_gtsummary_compact()

```r
reset_gtsummary_theme()

theme_gtsummary_compact()

compact_theme <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt() %>%
 gt::tab_header("Compact Theme")
```

]

---
# {gtsummary} set_gtsummary_theme()

```r
my_theme <-
 list(
 # Some gt customization
 "as_gt-lst:addl_cmds" = list(
 # make the font size small
 tab_spanner = rlang::expr(gt::tab_options(table.font.size = 'small')),
 # add a custom title and subtitle to every table
 user_added1 = rlang::expr(gt::tab_header(
 title = "Emily Zabor's Table", subtitle = "For Internal Use Only")),
 # add a custom data source note
 user_added2 = rlang::expr(gt::tab_source_note(
 source_note = "Source: very private internal data!")),
 # stripe the table rows
 user_added3 = rlang::expr(gt::opt_row_striping()),
 user_added4 = rlang::expr(gt::opt_table_lines("none"))
 )
 )
```

---
# {gtsummary} set_gtsummary_theme()

```r
returns <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt(return_calls = TRUE)

returns
```

---
# {gtsummary} set_gtsummary_theme()

```r
reset_gtsummary_theme()

set_gtsummary_theme(my_theme)

my_theme_tbl <- 
 trial %>%
 select(age, grade, trt) %>%
 tbl_summary(by = trt) %>%
 add_stat_label() %>%
 add_p() %>%
 as_gt()

my_theme_tbl
```

* Made the font size small
* Added custom title, subtitle, source note
* Striped the rows
* Removed all row lines
]

---
# And many more options!

.large[
See the {gtsummary} + themes vignette: http://www.danieldsjoberg.com/gtsummary/articles/themes.html
]

---

---
# {gtsummary} print engines

---
# Example HTML output

.large[
**Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_html.Rmd

**Output**: http://www.emilyzabor.com/gtsummary_print_engine_html.html
]

---
# Example PDF output

.large[
**Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_pdf.Rmd

**Output**: http://www.emilyzabor.com/gtsummary_print_engine_pdf.pdf
]

---
# Example RTF output

.large[
**Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_rtf.Rmd

**Output**: http://www.emilyzabor.com/gtsummary_print_engine_rtf.rtf
]

---
# Example Word output

.large[
**Code**: https://github.com/zabore/cleveland-r-gtsummary/blob/master/gtsummary_print_engine_word.Rmd

**Output**: http://www.emilyzabor.com/gtsummary_print_engine_word.docx
]

---

---
# {gtsummary} website

---
# {gtsummary} installation

```r
install.packages("gtsummary")
```

Install the development version of {gtsummary} from GitHub:

```r
remotes::install_github("ddsjoberg/gtsummary")
```
]

![](index_files/figure-html/unnamed-chunk-67-1.png)

]

---
# Thank you

{gtsummary} authors: Daniel Sjoberg (Maintainer), Margaret Hannum, Karissa Whiting, Emily Zabor

{gtsummary} contributors (not pictured): Michael Curry, Esther Drill, Jessica Flynn, Joseph Larmarange, Stephanie Lobaugh, Gustavo Zapata Wainberg

Special thanks to Dan Sjoberg and Margie Hannum for sharing materials from previous {gtsummary} talks!

R/Medicine, August 28, 2020: http://www.danieldsjoberg.com/rmedicine-gtsummary

RLadies NYC, February 26, 2020: https://margarethannum.com/gtsummary-presentation-rladies

---
class: inverse, center, middle
# New in v1.3.5 (Released 2020-09-29)!

---
# New summary type continuous2
.pull-left[

```r
tbl_summary_n1 <- 
 sm_trial %>%
 select(-trt) %>% 
 tbl_summary(
 type = age ~ "continuous2",
 statistic = all_continuous2() ~ "{mean} ({sd})"
 )
```

- Function `all_continuous2()` for selecting all `continuous2` type variables

- `theme_gtsummary_continuous2()` makes `continuous2` the default summary type for all continuous variables
]

]

---
# New function add_glance_source_note()

```r
m1 <- 
 glm(
 response ~ age + stage,
 data = trial,
 family = binomial(link = "logit")
 )

broom::glance(m1)
```

```
## # A tibble: 1 x 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 229. 182 -112. 234. 250. 224. 178 183
```

---
# New function add_glance_source_note()

```r
m1_tbl_n1 <-
 tbl_regression(
 m1,
 exponentiate = TRUE
 ) %>% 
 add_glance_source_note()
```

---
# And more...

See package News for full details: http://www.danieldsjoberg.com/gtsummary/news/index.html