estimate_cs.Rmd
library(condsurv)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.1
library(survival)
#> Warning: package 'survival' was built under R version 4.2.1
If \(S(t)\) represents the survival function at time \(t\), then conditional survival is defined as
\[S(y|x) = \frac{S(x + y)}{S(x)}\]
where \(y\) is the number of additional survival years of interest and \(x\) is the number of years a subject has already survived.
The conditional_surv_est
function will generate this
estimate along with 95% confidence intervals.
The lung
dataset from the survival
package
will be used to illustrate.
# Scale the time variable to be in years rather than days
lung2 <-
mutate(
lung,
os_yrs = time / 365.25
)
First generate a single conditional survival estimate. This is the
conditional survival of surviving to 1 year conditioned on already
having survived 6 months (\(0.5\)
year). This returns a list, where cs_est
is the conditional
survival estimate, cs_lci
is the lower bound of the 95%
confidence interval and cs_uci
is the upper bound of the
95% confidence interval.
myfit <- survfit(Surv(os_yrs, status) ~ 1, data = lung2)
conditional_surv_est(
basekm = myfit,
t1 = 0.5,
t2 = 1
)
#> $cs_est
#> [1] 0.58
#>
#> $cs_lci
#> [1] 0.49
#>
#> $cs_uci
#> [1] 0.66
You can easily use purrr::map_df
to get a table of
estimates for multiple timepoints. For example we could get the
conditional survival estimate of surviving to a variety of different
time points given that the subject has already survived for 6 months
(0.5 years).
prob_times <- seq(1, 2.5, 0.5)
purrr::map_df(
prob_times,
~conditional_surv_est(
basekm = myfit,
t1 = 0.5,
t2 = .x)
) %>%
dplyr::mutate(years = prob_times) %>%
dplyr::select(years, everything()) %>%
knitr::kable()
years | cs_est | cs_lci | cs_uci |
---|---|---|---|
1.0 | 0.58 | 0.49 | 0.66 |
1.5 | 0.36 | 0.27 | 0.45 |
2.0 | 0.16 | 0.10 | 0.25 |
2.5 | 0.07 | 0.02 | 0.15 |
The confidence intervals are based on a variation of the log-log transformation, also known as the “exponential” Greenwood formula, where the conditional survival estimate is substituted in for the traditional survival estimate in constructing the confidence interval.
If \(\hat{S}(y|x)\) is the estimated conditional survival to \(y\) given having already survived to \(x\), then
\[\hat{S}(y|x)^{exp(\pm1.96\sqrt{\hat{L}(y|x)})}\]
where
\[\hat{L}(y|x)=\frac{1}{\log(\hat{S}(y|x))^2}\sum_{j:x \leq \tau_j \leq y}\frac{d_j}{(r_j-d_j)r_j}\]
and
\(\tau_j\) = distinct death time \(j\)
\(d_j\) = number of failures at death time \(j\)
\(r_j\) = number at risk at death time \(j\)