Estimate conditional survival

library(condsurv)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.1
library(survival)
#> Warning: package 'survival' was built under R version 4.2.1

If \(S(t)\) represents the survival function at time \(t\), then conditional survival is defined as

\[S(y|x) = \frac{S(x + y)}{S(x)}\]

where \(y\) is the number of additional survival years of interest and \(x\) is the number of years a subject has already survived.

Generating conditional survival estimates

The conditional_surv_est function will generate this estimate along with 95% confidence intervals.

The lung dataset from the survival package will be used to illustrate.

# Scale the time variable to be in years rather than days
lung2 <- 
  mutate(
    lung,
    os_yrs = time / 365.25
  )

First generate a single conditional survival estimate. This is the conditional survival of surviving to 1 year conditioned on already having survived 6 months (\(0.5\) year). This returns a list, where cs_est is the conditional survival estimate, cs_lci is the lower bound of the 95% confidence interval and cs_uci is the upper bound of the 95% confidence interval.

myfit <- survfit(Surv(os_yrs, status) ~ 1, data = lung2)

conditional_surv_est(
  basekm = myfit,
  t1 = 0.5, 
  t2 = 1
)
#> $cs_est
#> [1] 0.58
#> 
#> $cs_lci
#> [1] 0.49
#> 
#> $cs_uci
#> [1] 0.66

You can easily use purrr::map_df to get a table of estimates for multiple timepoints. For example we could get the conditional survival estimate of surviving to a variety of different time points given that the subject has already survived for 6 months (0.5 years).

prob_times <- seq(1, 2.5, 0.5)

purrr::map_df(
  prob_times, 
  ~conditional_surv_est(
    basekm = myfit, 
    t1 = 0.5, 
    t2 = .x) 
  ) %>% 
  dplyr::mutate(years = prob_times) %>% 
  dplyr::select(years, everything()) %>% 
  knitr::kable()

years	cs_est	cs_lci	cs_uci
1.0	0.58	0.49	0.66
1.5	0.36	0.27	0.45
2.0	0.16	0.10	0.25
2.5	0.07	0.02	0.15

A note on confidence interval estimation

The confidence intervals are based on a variation of the log-log transformation, also known as the “exponential” Greenwood formula, where the conditional survival estimate is substituted in for the traditional survival estimate in constructing the confidence interval.

If \(\hat{S}(y|x)\) is the estimated conditional survival to \(y\) given having already survived to \(x\), then

\[\hat{S}(y|x)^{exp(\pm1.96\sqrt{\hat{L}(y|x)})}\]

where

\[\hat{L}(y|x)=\frac{1}{\log(\hat{S}(y|x))^2}\sum_{j:x \leq \tau_j \leq y}\frac{d_j}{(r_j-d_j)r_j}\]

and

\(\tau_j\) = distinct death time \(j\)

\(d_j\) = number of failures at death time \(j\)

\(r_j\) = number at risk at death time \(j\)

Emily C. Zabor

Last updated: 2022-10-20

Generating conditional survival estimates

A note on confidence interval estimation