8  Survival Analysis: Time-to-Event Data in Medicine

8.1 Why Ordinary Regression Fails for Time-to-Event Data

In medicine, we frequently ask questions like: How long until a patient relapses after chemotherapy? How long does a hip replacement last before revision surgery is needed? What is the median survival time after a Stage IV lung cancer diagnosis?

These are time-to-event questions. The outcome is not simply “did the event happen?” (which would be logistic regression) nor is it a continuous measurement at a fixed time point (which would be linear regression). The outcome is how long until something happens.

You might think: “Why not just use linear regression with time as the outcome?” There is a fundamental problem — censoring.

8.1.1 The Censoring Problem

Imagine a clinical trial that follows 100 patients for 5 years after a cancer diagnosis. At the end of the study:

  • 40 patients have died (we know their exact survival time).
  • 30 patients are still alive at year 5 (we know they survived at least 5 years, but we do not know when they will die).
  • 20 patients moved away or were lost to follow-up at various time points.
  • 10 patients withdrew from the study.

For the last 60 patients, we have incomplete information. We know they survived at least until the last time we observed them, but not how much longer. This is right censoring — the event of interest (death) may occur to the right of (after) our last observation.

If we simply excluded these 60 patients, we would introduce severe bias — we would systematically underestimate survival times because the long-survivors are disproportionately censored. If we treated their last-observed time as their event time, we would also underestimate survival. Survival analysis methods were developed precisely to handle this problem.

8.1.2 Types of Censoring

  • Right censoring (most common): The event has not yet occurred when observation ends. The true event time is to the right of the censoring time.
  • Left censoring: The event occurred before observation began (e.g., a patient already had the disease at enrollment, but we do not know when it started).
  • Interval censoring: The event is known to have occurred within an interval but the exact time is unknown (e.g., a tumor detected at a scheduled 6-month scan — it appeared sometime between the last scan and this one).

In this chapter, we focus primarily on right censoring, which dominates clinical research.

8.2 The Kaplan-Meier Estimator

The Kaplan-Meier (KM) estimator is the workhorse of survival analysis. It provides a non-parametric estimate of the survival function \(S(t) = P(T > t)\), the probability of surviving beyond time \(t\).

8.2.1 How It Works

The KM estimator calculates survival probability at each time point where an event occurs:

\[\hat{S}(t) = \prod_{t_i \le t} \left(1 - \frac{d_i}{n_i}\right)\]

where:

  • \(t_i\) is a time at which at least one event occurred,
  • \(d_i\) is the number of events at time \(t_i\),
  • \(n_i\) is the number of individuals at risk just before time \(t_i\) (alive and uncensored).

The key insight is that censored individuals contribute to the risk set (\(n_i\)) up until the moment they are censored, and then they drop out. They are not discarded — they contribute all the information they can.

8.2.2 Clinical Example: Primary Biliary Cholangitis (PBC)

We will use the classic PBC dataset from the Mayo Clinic, which is included in R’s survival package. This dataset contains 418 patients with primary biliary cholangitis (PBC), a chronic liver disease. Patients were followed from enrollment until death, transplant, or censoring.

Code
# Load the PBC dataset
data(pbc, package = "survival")

# Prepare data — status: 0 = censored, 1 = transplant, 2 = dead
# For now, treat death as the event and censor everything else
pbc_clean <- pbc %>%
  filter(!is.na(trt)) %>%
  mutate(
    status_binary = ifelse(status == 2, 1, 0),
    time_years = time / 365.25,
    trt_label = factor(trt, labels = c("D-penicillamine", "Placebo"))
  )

cat("N =", nrow(pbc_clean), "\n")
cat("Events (deaths):", sum(pbc_clean$status_binary), "\n")
cat("Censored:", sum(pbc_clean$status_binary == 0), "\n")

# Fit overall KM curve
km_overall <- survfit(Surv(time_years, status_binary) ~ 1, data = pbc_clean)
print(km_overall)
Code
# Plot KM curve with confidence intervals
ggsurvplot(
  km_overall,
  data = pbc_clean,
  conf.int = TRUE,
  risk.table = TRUE,
  xlab = "Time (years)",
  ylab = "Survival probability",
  title = "Overall Survival in PBC Patients",
  ggtheme = theme_minimal()
)
Code
import pandas as pd
import numpy as np
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt

# Load PBC data (from R via CSV or use lifelines datasets)
# We'll replicate the data preparation
from lifelines.datasets import load_rossi  # placeholder — in practice, load PBC
# For demonstration, we use the PBC data exported from R or a similar dataset
# Here we simulate the structure:
import subprocess

# In practice, you would load the PBC data. We use lifelines' built-in loader:
try:
    pbc = pd.read_csv("data/pbc.csv")
except FileNotFoundError:
    # Create from R's dataset structure for demonstration
    from lifelines.datasets import load_regression_dataset
    print("Note: Using lifelines example data for demonstration.")
    print("In practice, load the PBC dataset from the survival R package.")

# Kaplan-Meier estimation
kmf = KaplanMeierFitter()

# Example with generic time-to-event data
np.random.seed(42)
n = 312
time_years = np.random.exponential(scale=8, size=n)
event_observed = np.random.binomial(1, 0.45, size=n)

kmf.fit(time_years, event_observed=event_observed, label="PBC Patients")

print(kmf.summary.head(10))
print(f"\nMedian survival time: {kmf.median_survival_time_:.2f} years")
Code
fig, ax = plt.subplots(figsize=(8, 5))
kmf.plot_survival_function(ax=ax, ci_show=True)
ax.set_xlabel("Time (years)")
ax.set_ylabel("Survival probability")
ax.set_title("Kaplan-Meier Survival Estimate")
plt.tight_layout()
plt.show()

8.2.3 Interpreting the KM Curve

Key features to examine:

  1. Median survival time: the time at which \(\hat{S}(t) = 0.5\) (where the curve crosses the 50% line). If the curve never drops to 50%, the median is undefined.
  2. Survival at specific time points: e.g., “5-year survival is 65%.”
  3. Shape of the curve: A steep early drop suggests high early mortality. A long flat tail suggests most survivors do well long-term.
  4. Confidence bands: Wider bands mean more uncertainty, often because fewer patients remain at risk.
  5. Censoring marks (tick marks on the curve): Heavy censoring near the tail means the late estimates are unreliable.

8.2.4 Confidence Intervals for the KM Estimator

The pointwise confidence interval for \(\hat{S}(t)\) is typically computed via Greenwood’s formula, which estimates the variance of \(\hat{S}(t)\):

\[\widehat{\text{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_i \le t} \frac{d_i}{n_i(n_i - d_i)}\]

A 95% confidence interval is then \(\hat{S}(t) \pm 1.96 \sqrt{\widehat{\text{Var}}[\hat{S}(t)]}\). In practice, log-log transformations are preferred because they guarantee the interval stays within [0, 1].

8.3 The Log-Rank Test

The log-rank test compares survival curves between two or more groups. It is the most widely used test for this purpose.

The null hypothesis is that survival is identical across groups: \(H_0: S_1(t) = S_2(t)\) for all \(t\).

The test statistic compares the observed number of events in each group to the number expected under the null hypothesis, summed over all event times. It is most powerful when the proportional hazards assumption holds — that is, when one group has a consistently higher (or lower) hazard than the other across time.

Code
# Compare survival between treatment groups
km_by_trt <- survfit(Surv(time_years, status_binary) ~ trt_label, data = pbc_clean)

# Log-rank test
logrank <- survdiff(Surv(time_years, status_binary) ~ trt_label, data = pbc_clean)
print(logrank)
Code
ggsurvplot(
  km_by_trt,
  data = pbc_clean,
  pval = TRUE,
  conf.int = TRUE,
  risk.table = TRUE,
  xlab = "Time (years)",
  ylab = "Survival probability",
  legend.title = "Treatment",
  palette = c("#E7B800", "#2E9FDF"),
  ggtheme = theme_minimal()
)
Code
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import numpy as np

np.random.seed(42)

# Simulate two treatment groups
n_per_group = 156
time_trt = np.random.exponential(scale=8.5, size=n_per_group)
event_trt = np.random.binomial(1, 0.42, size=n_per_group)

time_placebo = np.random.exponential(scale=7.5, size=n_per_group)
event_placebo = np.random.binomial(1, 0.48, size=n_per_group)

# Log-rank test
results = logrank_test(time_trt, time_placebo, event_trt, event_placebo)
print(f"Log-rank test statistic: {results.test_statistic:.3f}")
print(f"p-value: {results.p_value:.4f}")

# Plot both curves
fig, ax = plt.subplots(figsize=(8, 5))
kmf_trt = KaplanMeierFitter()
kmf_trt.fit(time_trt, event_trt, label="D-penicillamine")
kmf_trt.plot_survival_function(ax=ax)

kmf_placebo = KaplanMeierFitter()
kmf_placebo.fit(time_placebo, event_placebo, label="Placebo")
kmf_placebo.plot_survival_function(ax=ax)

ax.set_xlabel("Time (years)")
ax.set_ylabel("Survival probability")
ax.set_title("Survival by Treatment Group")
plt.tight_layout()
plt.show()
NoteWhen the Log-Rank Test is Weak

The log-rank test has low power when survival curves cross (e.g., one treatment is better early but worse later). In such cases, consider the weighted log-rank test (e.g., Fleming-Harrington) or the restricted mean survival time (RMST) approach.

8.4 Cox Proportional Hazards Model

The Cox proportional hazards (PH) model is the most widely used regression model for survival data. Introduced by Sir David Cox in 1972, it models the hazard function — the instantaneous rate of the event occurring at time \(t\), given survival to that point:

\[h(t | X) = h_0(t) \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)\]

where:

  • \(h(t | X)\) is the hazard at time \(t\) for an individual with covariates \(X\),
  • \(h_0(t)\) is the baseline hazard (an unspecified function of time),
  • \(\beta_1, \ldots, \beta_p\) are regression coefficients,
  • \(\exp(\beta_j)\) is the hazard ratio for a one-unit increase in \(X_j\).

8.4.1 Key Properties

  1. Semi-parametric: The baseline hazard \(h_0(t)\) is left unspecified — the model makes no assumption about the shape of the hazard over time. Only the relative effect of covariates is estimated.
  2. Proportional hazards assumption: The ratio of hazards for any two individuals is constant over time. If patient A has twice the hazard of patient B at time 1, they also have twice the hazard at time 5.
  3. Hazard ratios: \(\text{HR} = \exp(\beta)\). An HR > 1 means increased hazard (worse prognosis); HR < 1 means decreased hazard (better prognosis); HR = 1 means no effect.

8.4.2 Fitting the Cox Model

Code
# Fit Cox model with multiple predictors
cox_model <- coxph(
  Surv(time_years, status_binary) ~ age + sex + albumin + bili + edema,
  data = pbc_clean
)

summary(cox_model)
Code
# Forest plot of hazard ratios
ggforest(cox_model, data = pbc_clean)
Code
from lifelines import CoxPHFitter
import pandas as pd
import numpy as np

# Create a simulated clinical dataset
np.random.seed(42)
n = 312
df = pd.DataFrame({
    'time_years': np.random.exponential(scale=8, size=n),
    'event': np.random.binomial(1, 0.45, size=n),
    'age': np.random.normal(50, 10, size=n),
    'sex': np.random.binomial(1, 0.5, size=n),
    'albumin': np.random.normal(3.5, 0.5, size=n),
    'bili': np.random.exponential(scale=3, size=n),
    'edema': np.random.choice([0, 0.5, 1], size=n, p=[0.75, 0.15, 0.10])
})

cph = CoxPHFitter()
cph.fit(df, duration_col='time_years', event_col='event',
        formula='age + sex + albumin + bili + edema')

cph.print_summary()
Code
fig, ax = plt.subplots(figsize=(8, 5))
cph.plot(ax=ax)
ax.set_title("Cox PH Model — Hazard Ratios")
plt.tight_layout()
plt.show()

8.4.3 Interpreting Hazard Ratios

Consider a Cox model result where bilirubin has \(\hat{\beta} = 0.15\) (HR = 1.16, 95% CI: 1.10–1.23, p < 0.001).

Interpretation: “For each 1 mg/dL increase in serum bilirubin, the hazard of death increases by 16% (HR = 1.16, 95% CI: 1.10–1.23), adjusting for age, sex, albumin, and edema.”

WarningHazard Ratios Are Not Risk Ratios

A hazard ratio of 2.0 does not mean “twice the risk of death.” It means the instantaneous rate of death is twice as high at any given moment. Over long follow-up, the cumulative risk difference depends on the baseline hazard. Be precise in your language when reporting results.

8.5 Checking the Proportional Hazards Assumption

The proportional hazards assumption is critical. If it is violated, the hazard ratios from the Cox model do not have a stable interpretation — the “effect” of a covariate depends on when you look.

8.5.1 Schoenfeld Residuals

The most common diagnostic tool is the Schoenfeld residual test. Under proportional hazards, the scaled Schoenfeld residuals for a covariate should show no trend over time. A significant trend indicates violation.

Code
# Test proportional hazards assumption
ph_test <- cox.zph(cox_model)
print(ph_test)
Code
# Plot Schoenfeld residuals
par(mfrow = c(2, 3))
plot(ph_test)
Code
# Check proportional hazards assumption in lifelines
cph.check_assumptions(df, p_value_threshold=0.05, show_plots=True)

8.5.2 What To Do When PH Is Violated

If the PH assumption is violated for a covariate:

  1. Stratify on the offending variable: coxph(Surv(time, event) ~ x1 + strata(x2), data = df). This allows the baseline hazard to differ by strata of x2 without estimating an HR for it.
  2. Include a time interaction: Allow the effect to change over time.
  3. Use time-varying covariates (see below).
  4. Use an alternative model: Accelerated failure time (AFT) models or parametric models (Weibull, log-normal).

8.6 Time-Varying Covariates

In many clinical settings, covariates change over the course of follow-up. A patient’s lab values, treatment status, or disease stage may evolve. The standard Cox model assumes covariates are fixed at baseline, but the model can be extended to handle time-varying covariates.

The idea is to split each patient’s follow-up into intervals. Within each interval, the covariate values are fixed, but they can change between intervals. In R, this is accomplished using the tmerge() function from the survival package or by manually creating a “counting process” data format with start and stop times:

# Conceptual example — counting process format
# PatientID  start  stop  event  lab_value
#     1        0      3     0     2.1
#     1        3      7     0     3.4
#     1        7     10     1     5.8    <- event at time 10

This is an advanced topic. The key takeaway is that the Cox model is flexible enough to accommodate covariates that change over time, but the data preparation is more complex and requires careful thought about when measurements were taken.

8.7 Competing Risks

Standard survival analysis assumes that the event of interest is the only possible event, or that censoring is non-informative (i.e., censored patients have the same future risk as those who remain under observation).

Competing risks arise when another event prevents the event of interest from being observed. Classic examples:

  • Studying time to heart transplant rejection, but the patient dies of infection first.
  • Studying time to cancer recurrence, but the patient dies of cardiovascular disease.
  • Studying time to hospital readmission, but the patient dies before readmission.

In these situations, treating the competing event as simple censoring is wrong — it inflates the estimated probability of the event of interest (because death is not “non-informative” censoring; dead patients cannot experience the event).

8.7.1 Two Approaches

  1. Cause-specific hazard models: Fit separate Cox models for each event type, censoring the competing events. This estimates the rate of each event among those still at risk.
  2. Fine-Gray subdistribution hazard model: Directly models the cumulative incidence function, keeping patients who experienced the competing event in the risk set. This is often more appropriate for clinical prediction.
TipRule of Thumb for Competing Risks

If you are interested in etiology (what causes the event?), use cause-specific hazards. If you are interested in prediction (what is the probability of the event by time \(t\)?), use the Fine-Gray model and report cumulative incidence functions rather than 1 minus KM.

8.8 Full Clinical Example: PBC Prognostic Model

Let us build a complete survival model for the PBC dataset, following the steps a clinical researcher would take.

Code
# Step 1: Explore the data
pbc_model <- pbc_clean %>%
  select(time_years, status_binary, age, sex, albumin, bili,
         protime, stage, edema, trt_label) %>%
  drop_na()

cat("Complete cases:", nrow(pbc_model), "\n")
cat("Events:", sum(pbc_model$status_binary), "\n")
cat("Median follow-up:", median(pbc_model$time_years), "years\n")

# Step 2: Fit multivariable Cox model
cox_full <- coxph(
  Surv(time_years, status_binary) ~ age + sex + albumin + log(bili) +
    protime + stage + edema + trt_label,
  data = pbc_model
)
summary(cox_full)
Code
# Step 3: Assess model discrimination
cat("Concordance (C-index):", summary(cox_full)$concordance[1], "\n")
cat("This means the model correctly ranks survival times",
    round(summary(cox_full)$concordance[1] * 100, 1),
    "% of the time.\n")
Code
# Step 4: Predicted survival for new patients
new_patients <- data.frame(
  age = c(40, 65),
  sex = c("f", "f"),
  albumin = c(3.8, 2.5),
  bili = c(1.0, 10.0),
  protime = c(10.5, 13.0),
  stage = c(2, 4),
  edema = c(0, 1),
  trt_label = c("Placebo", "Placebo")
)

pred_surv <- survfit(cox_full, newdata = new_patients)

plot(pred_surv, col = c("blue", "red"), lwd = 2,
     xlab = "Time (years)", ylab = "Survival probability",
     main = "Predicted Survival: Low-risk vs High-risk Patient")
legend("topright", c("Low risk", "High risk"),
       col = c("blue", "red"), lwd = 2)
Code
from lifelines import CoxPHFitter, KaplanMeierFitter
import pandas as pd
import numpy as np

np.random.seed(42)
n = 276

# Simulate a PBC-like dataset
pbc_sim = pd.DataFrame({
    'time_years': np.abs(np.random.exponential(7, n) +
                         np.random.normal(0, 1, n)),
    'event': np.random.binomial(1, 0.45, n),
    'age': np.random.normal(50, 10, n),
    'sex': np.random.binomial(1, 0.12, n),  # mostly female in PBC
    'albumin': np.random.normal(3.5, 0.4, n),
    'log_bili': np.random.normal(0.5, 1.0, n),
    'protime': np.random.normal(11, 1, n),
    'stage': np.random.choice([1, 2, 3, 4], n, p=[0.05, 0.25, 0.40, 0.30]),
    'edema': np.random.choice([0, 0.5, 1], n, p=[0.75, 0.15, 0.10]),
    'treatment': np.random.binomial(1, 0.5, n)
})

# Fit Cox model
cph = CoxPHFitter()
cph.fit(pbc_sim, duration_col='time_years', event_col='event',
        formula='age + sex + albumin + log_bili + protime + stage + edema + treatment')

cph.print_summary()
print(f"\nConcordance index: {cph.concordance_index_:.3f}")
Code
# Predict survival for specific patient profiles
low_risk = pd.DataFrame({
    'age': [40], 'sex': [0], 'albumin': [3.8], 'log_bili': [0.0],
    'protime': [10.5], 'stage': [2], 'edema': [0], 'treatment': [0]
})
high_risk = pd.DataFrame({
    'age': [65], 'sex': [0], 'albumin': [2.5], 'log_bili': [2.3],
    'protime': [13.0], 'stage': [4], 'edema': [1], 'treatment': [0]
})

fig, ax = plt.subplots(figsize=(8, 5))
cph.predict_survival_function(low_risk).plot(ax=ax, label='Low risk', color='blue')
cph.predict_survival_function(high_risk).plot(ax=ax, label='High risk', color='red')
ax.set_xlabel("Time (years)")
ax.set_ylabel("Survival probability")
ax.set_title("Predicted Survival: Low-risk vs High-risk Patient")
ax.legend()
plt.tight_layout()
plt.show()

8.9 Exercises

8.9.1 Exercise 1: Kaplan-Meier Curves by Disease Stage

Using the PBC dataset, create Kaplan-Meier curves stratified by disease stage (1–4). Perform a log-rank test to compare survival across stages. Report the median survival time for each stage.

Code
# Fit KM by stage
km_stage <- survfit(Surv(time_years, status_binary) ~ stage, data = pbc_clean)

# Your code: plot the curves and perform the log-rank test
# Hint: use ggsurvplot() with pval = TRUE
# Hint: use survdiff() for the formal test
Code
# Fit KM for each stage and plot
# Your code here
# Hint: loop over stages, fit KaplanMeierFitter for each
# Use logrank_test() for pairwise comparisons

8.9.2 Exercise 2: Build and Evaluate a Cox Model

Build a Cox proportional hazards model for PBC survival using age, bilirubin (log-transformed), albumin, prothrombin time, and edema as predictors.

  1. Report the hazard ratios and 95% confidence intervals.
  2. Check the proportional hazards assumption. Which covariates, if any, violate it?
  3. Calculate the concordance index. How well does the model discriminate?
  4. Predict the 5-year survival probability for a 55-year-old woman with bilirubin = 3 mg/dL, albumin = 3.2 g/dL, prothrombin time = 11 seconds, and no edema.
Code
# Your code here
# cox_model <- coxph(Surv(time_years, status_binary) ~ ..., data = pbc_clean)
# summary(cox_model)
# cox.zph(cox_model)
Code
# Your code here
# cph = CoxPHFitter()
# cph.fit(...)
# cph.check_assumptions(...)

8.9.3 Exercise 3: Competing Risks Thinking

Consider a study of time to kidney transplant rejection in 500 recipients. During follow-up, some patients die before experiencing rejection (a competing risk).

  1. Explain why treating death as censoring would bias the results. What direction would the bias go?
  2. Which approach would you use if your goal were to estimate the probability of rejection within 2 years — cause-specific hazards or Fine-Gray? Why?
  3. (Bonus) In R, use the cmprsk or tidycmprsk package to fit a Fine-Gray model. In Python, explore lifelines or scikit-survival for competing risk methods.

8.10 Summary

Concept What It Does Key Assumption
Kaplan-Meier Non-parametric survival estimate Non-informative censoring
Log-rank test Compares survival curves Proportional hazards (most powerful)
Cox PH model Regression for survival data Proportional hazards, non-informative censoring
Fine-Gray model Competing risks regression Models subdistribution hazard

Survival analysis is fundamental to clinical research. The methods in this chapter — Kaplan-Meier estimation, log-rank testing, and Cox regression — form the backbone of nearly every clinical trial and prognostic study. The proportional hazards assumption must always be checked, and competing risks should be considered whenever another event can prevent observation of the event of interest.

8.11 References and Further Reading

The foundational paper on proportional hazards regression is Cox (1972), “Regression Models and Life-Tables,” published in the Journal of the Royal Statistical Society, Series B, which introduced the semi-parametric model that now bears his name. This paper is one of the most cited in all of statistics.

For a comprehensive and accessible introduction to survival analysis in the context of this course, see Smits et al. (2026), which covers Kaplan-Meier estimation, the Cox model, and competing risks with modern clinical examples. Their treatment of model diagnostics and the proportional hazards assumption is particularly practical.

Bradburn et al. provide an excellent tutorial series on survival analysis for medical researchers, published in the British Journal of Cancer. Their series covers Kaplan-Meier methods, the Cox model, and advanced topics in a format accessible to clinicians without extensive statistical training.

For hands-on implementation, the survival and survminer packages in R are the standard tools. Therneau and Grambsch’s Modeling Survival Data: Extending the Cox Model (Springer, 2000) is the definitive reference for the R survival package. In Python, the lifelines library by Cameron Davidson-Pilon provides an excellent API for survival analysis, and its online documentation includes worked clinical examples.

For competing risks specifically, Fine and Gray (1999), “A Proportional Hazards Model for the Subdistribution of a Competing Risk,” in the Journal of the American Statistical Association, is the key methodological reference. Austin and Fine (2017) provide practical guidance on when and how to use competing risk methods in clinical research.