Time-to-event endpoints are common in biomedical research

Common research questions relate to times and probabilities

  • What is the probability of remaining event-free for a certain number of years?
  • What is the average amount of event-free time?

Survival analysis methods needed when follow-up time varies

The {survival} package is the basis of the survival ecosystem in R


  • Began development in 1985
  • Total of 11.9M downloads
  • Active development ongoing
  • Many detailed vignettes covering both the basics and advanced topics
  • Includes the essential methods
install.packages("survival")
library(survival)

Happiness is reproducible publication-ready tables and plots

install.packages(c("gtsummary", "ggsurvfit"))
library(gtsummary)
library(ggsurvfit)

Example dataset

The colon data come from a clinical trial of adjuvant therapy for colon cancer, and are included in the {survival} package.


Variables of interest:

Variable Variable description Levels
rx Treatment assignment Obs(ervation), Lev(amisole), Lev(amisole)+5-FU
sex Sex 1=male, 0=female
years Years to status Continuous years
status Censoring status 0=censored, 1=recurrence


See appendix slide for code to generate altered example data colonr.

Survival tables with {gtsummary}

Use tbl_survfit() to create:

  • Tables of median event-free time
  • Tables of x-time event-free probability

Use tbl_uvregression() to create:

  • Tables of univariate Cox regression results

Use tbl_regression() to create:

  • Tables of multivariable Cox regression results

Tables are highly customizable, see https://www.danieldsjoberg.com/gtsummary/ for details and vignettes to get started

Table of median event-free time

Use tbl_survfit() with the probs = 0.5 argument:

list(
  survfit(
    Surv(years, status) ~ 1, 
    data = colonr),
  survfit(
    Surv(years, status) ~ rx, 
    data = colonr)
  ) |> 
  tbl_survfit(
    probs = 0.5,
    label_header = 
      "Median (95% CI)"
    )
Characteristic Median (95% CI)
Overall 5.5 (4.0, —)
Treatment
    Obs 3.4 (2.2, 5.6)
    Lev 3.2 (2.2, 5.7)
    Lev+5FU — (—, —)
  • Highly customizable
  • Can set labels, titles, subtitles, footnotes, and more

Table of x-year event-free probability

Use tbl_survfit() with the times = 5 argument:

list(
  survfit(
    Surv(years, status) ~ 1, 
    data = colonr),
  survfit(
    Surv(years, status) ~ rx, 
    data = colonr)
  ) |> 
  tbl_survfit(
    times = 5,
    label_header = 
      "{time}-year RFS (95% CI)"
    )
Characteristic 5-year RFS (95% CI)
Overall 51% (48%, 54%)
Treatment
    Obs 45% (40%, 51%)
    Lev 46% (41%, 52%)
    Lev+5FU 62% (56%, 67%)
  • Note the use of {glue} syntax in the label header to dynamically code the column header

Inline results reporting

Save the tbl_survfit() result to an object (here, tab1) so that table statistics can be extracted for use inline with the inline_text() function.


The inline code:

The median (95% CI) recurrence-free time for the Levamisole treatment group is `r inline_text(tab1, variable = rx, level = 'Lev', prob = 0.5)`.


Will show as:

The median (95% CI) recurrence-free time for the Levamisole treatment group is 3.2 (2.2, 5.7).

Univariate Cox regression table

Use tbl_uvregression():

colonr |> 
  tbl_uvregression(
    method = coxph,
    y = Surv(years, status),
    include = c(rx, sex),
    exponentiate = T
    ) |> 
  add_global_p()
Characteristic N HR 95% CI p-value
Treatment 929

<0.001
    Obs

    Lev
0.98 0.80, 1.21
    Lev+5FU
0.60 0.47, 0.76
Sex 929

0.4
    Female

    Male
0.92 0.77, 1.10
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
  • Fits models for all variables listed in include =
  • exponentiate = T to obtain hazard ratios
  • Add global p-values

Multivariable Cox regression table

Pass fitted model results directly to tbl_regression():

coxph(
  Surv(years, status) ~ 
    rx + sex, 
  data = colonr
  ) |> 
  tbl_regression(
    exponentiate = T
  ) |> 
  add_global_p()
Characteristic HR 95% CI p-value
Treatment

<0.001
    Obs
    Lev 0.99 0.80, 1.22
    Lev+5FU 0.60 0.47, 0.75
Sex

0.3
    Female
    Male 0.90 0.75, 1.08
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Time-to-event plots with {ggsurvfit}

Uses {ggplot2} as the basis so known customizations are available with the + operator.

  • Plot Kaplan-Meier curves using ggsurvfit
  • Plot cumulative incidence curves for competing risks using ggcuminc
  • Plot multi-state models using ggsurvfit

Options to:

  • Add confidence intervals
  • Add risk tables
  • Add quantiles
  • And more

See https://www.danieldsjoberg.com/ggsurvfit/ for details and vignettes to get started

Kaplan-Meier curves with ggsurvfit()

survfit2(
  Surv(years, status) ~ 1, 
  data = colonr
  ) |>
  ggsurvfit() + 
  add_confidence_interval() + 
  add_risktable() +
  scale_y_continuous(
    limits = c(0, 1)
    ) + 
  scale_x_continuous(
    limits = c(0, 8), 
    breaks = seq(0, 8, 2)
  ) +
  labs(
    x = "Years from treatment start",
    y = "Recurrence-free probability"
  )

Put it all together

Thank you, and please get in touch


zabore2@ccf.org

https://www.emilyzabor.com/

https://github.com/zabore

https://www.linkedin.com/in/emily-zabor-59b902b7/

Appendix: Create example dataset

  • colon dataset from {survival} altered to:

    • Limit to one row per patient with interest in recurrence rather than death
    • Scale time from days to years
    • Recode sex levels to informative values
    • Label variables of interest
colonr <- 
  colon |> 
  filter(etype == 1) |> 
  mutate(
    years = time / 365.25,
    sex = case_match(
      sex,
      0 ~ "Female",
      1 ~ "Male"
    )
  ) |> 
  labelled::set_variable_labels(
    rx = "Treatment", 
    sex = "Sex"
  )
include-after: |