Is there a limit to human longevity?

layout: false
background-image: url("figures/logo_hec_montreal_blanc_web.png")
background-size: 400px
background-position: 5% 5%

# Is there a limit to human longevity?

**Department of Mathematics and Statistics, Dalhousie University**

.light[Léo Belzile, HEC Montréal]
<br>
Based on joint work with Anthony Davison, Jutta Gampe, Holger Rootzén, Dmitrii Zholud
]

---

---

# Estimating human lifespan

The study of human longevity is full of pitfalls for the unwarry...

The problem raises several statistical problems revolving around

.box-1.medium.sp-after-half[data quality]
.box-2.medium.sp-after-half[models]
.box-3.medium.sp-after-half[extrapolation]

---

# Glossary

- .color-1[**Supercentenarian**]: person living beyond 110th birthday
- .color-1[**Semi-supercentenarian**]: dies between 105th and 110th birthdays.
- .color-1[**Lifetime**]: life-length of an individual.
- .color-1[**Lifespan**]: upper limit (if any) on distribution of lifetimes.

---

# Why study longevity?

Statistical analysis needed to assess biological theories about

.box-inv-1.sp-after[natural selection]

.box-inv-1.sp-after[mortality plateau]

.box-inv-1.sp-after[existence of<br> finite lifespan]

.box-1.medium[Lots of interest in the news!]
---

# Exponential growth of mortality

> *It is believed that exponential growth of mortality with age (Gompertz law) is followed by a period of deceleration, with slower rates of mortality increase at older ages.*

> *Recent studies found that the exponential increase of the mortality risk with age (the famous Gompertz law) continues even at extreme old ages in humans, rats, and mice, thus challenging traditional views about old-age mortality deceleration, mortality leveling-off, and late-life mortality plateaus.*

]

The Gompertz–Makeham model is extremely popular in demography and fits well the distribution of lifetimes at lower levels, until 102-105.

---

# Theory of senescence

<div class="figure" style="text-align: center">
<img src="figures/Pyrkov2021.png" alt="Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1." width="100%" />
<p class="caption">Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1.</p>
</div>

???

In Japan, in 2016, 8167 male and 57 525 female were centenarian.
- rate of 5.1 per 10K in 2015

In Canada, 6116 female and 835 male
- rate of 1.9 per 10K

Between 1983 and 2009, a total of 
- 12 supercentenarians died
- 321 semisupercentenarians died
in Quebec.

- oldest living today: Soeur André (Lucile Randon, 118 years, 236 days);
– oldest ever: Jeanne Calment (122 years, 164 days).

---

---

---

# Sampling

Information limited due to availability of historical records.

- Validation is key
   - necronyms
   - record falsification
   - mistakes in data registers

- Most databases (e.g., *Gerontology Research Group*) include self-reported records.

.box-inv-1.medium[**Opportunity samples**]

???

Shigechio Izumi was excluded from records because he beared his dead brother name, so died age 105 rather than 120.

In Japan, the first modern family registration system was established in 1872 (Jinshin-KOSEKI), amended in 1886 by the Family Registration Law, Chapter 10 of Exceptional Lifespan.

Italy: example of teenager reportedly borned in 1800s, who died age 13 but was initially recorded to have died 113.

---

# Jeanne Calment, the controversy

???

Nikolai Zak (conspiracy?) theory in Rejuvenation Research that Yvonne Calment took the place of her mother Jeanne to avoid inheritance taxes.
---

# International Database on Longevity

To draw reliable conclusions, we need .color-1[**representative samples**].

- .section-title-1[**validated supercentenarian (110+)**]  from 13 countries

- plus (partly validated) .section-title-1[** semi-supercentenarian (105-109)**] for 9 countries

- Age-ascertainement bias-free

???

- Nearly 50% of the records fail the validation check.
- Random sample validated for semi-supercentenarians in France

---

# Sampling mechanisms

Data are obtained by .section-title-1[**casting a net**] on the population of potential (semi)-supercentenarians.

- for IDL, (only) supercentenarians in a country .color-1[**who died**] between dates `$c_1$` and `$c_2$`.

- records for the candidates are then individually validated.

---

# Lexis diagram for interval truncation

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/lexisdiag-1.png" alt="Lexis diagrams showing the selection mechanism." width="82%" />
<p class="caption">Lexis diagrams showing the selection mechanism.</p>
</div>

---

# More complex truncation schemes!

Semisupercentenarians (105-109) who died in window `$(d_1, d_2)\neq (c_1, c_2)$`.

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/lexisdiag3-1.png" alt="Lexis diagrams for IDL data with semisupercentenarian and supercentenarians" width="85%" />
<p class="caption">Lexis diagrams for IDL data with semisupercentenarian and supercentenarians</p>
</div>

---

# Lexis diagram for Italian data

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/lexisdiag2-1.png" alt="Lexis diagrams for Istat data with semisupercentenarian and supercentenarians" width="45%" />
<p class="caption">Lexis diagrams for Istat data with semisupercentenarian and supercentenarians</p>
</div>

---

# Truncation can be hidden

- .section-title-1[ **Extinct cohort method** ]: Birth cohorts for which no death has been reported for X consecutive years.
- counts cross-tabulated by years of birth, age and gender.

<div class="figure" style="text-align: center">
<img src="figures/Sibuya_tabular.png" alt="Annual Vital Statistics Report of Japan (Hanamaya &amp; Sibuya, 2014)." width="56%" />
<p class="caption">Annual Vital Statistics Report of Japan (Hanamaya & Sibuya, 2014).</p>
</div>

---

# Why does it matter?

Ignoring truncation leads to  .section-title-1[**underestimation**] of the survival probability: population increase and reduction in mortality at lower age translates into larger impact for later birth cohorts.

]

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/artefacts_truncation-1.png" alt="Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right)." width="65%" />
<p class="caption">Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right).</p>
</div>

---

# Incorrect conclusions

<div class="figure" style="text-align: center">
<img src="figures/Dong2016.png" alt="Failing to account for truncation and increase in population." width="85%" />
<p class="caption">Failing to account for truncation and increase in population.</p>
</div>

---

---

---

# Survival analysis

Denote the lifetime `$T$`, a continuous random variable with distribution `$F$`, density `$f$`, lifespan `$t_{F}= \sup\{t: F(t) < 1\}$` and survivor and hazard functions

`$$\begin{align*}
 S(t) &= \Pr(T>t) =1-F(t), 
\\h(t) &= \frac{f(t)}{S(t)}, \quad t>0.
\end{align*}$$`

---

# Poisson process

- Suppose individuals independently reach age `$u_0$` at calendar time `$x$` at rate `$\nu(x)$`, and subsequently die at age `$t+u_0$` with density `$f$`.
- Events in `$\mathcal{C} = [c_1 , c_2] \times [u_0, \infty)$` follow a Poisson process of rate
`$$\begin{align*}
\lambda(c, t) = \nu(c − t)f(t), \qquad
c \in \mathbb{R}, t > 0
\end{align*}$$`
at calendar time `$c$` and excess lifetime `$t$`.

---

# Poisson process

- The lifetime density for dying in `$\mathcal{C}$` is
`$$\begin{align*}
f_{\mathcal{C}}(t) \propto f(t)w_{\mathcal{C}}(t), \quad w_{\mathcal{C}}(t) = \int_{c_1 - t}^{c_2-t} \nu(x) \mathrm{d}x, \quad t>0
\end{align*}$$`
where `$w_{\mathcal{C}}$` is decreasing, so `$f_{\mathcal{C}}$` is .color-2[**stochastically smaller**] than `$f$`.

---

# Likelihood contributions

The likelihood depends on `$\nu$`, hence consider the conditional likelihood
`$$\begin{align*}
\frac{f(t)}{F(b)-F(a)}, \quad a < t< b
\end{align*}$$`
for interval truncated data and, for left-truncated and right-censored data,
`$$\begin{align*}
\frac{h(t)^\delta S(t)}{1-F(a)}, \quad t> a,
\end{align*}$$`
where `$[a, b] = [\max\{0, c_1 − x\}, c_2 − x]$`.

---

# Models

Many models popular in demography, many with infinite endpoint.

- exponential: `$h(t) = \sigma^{-1}$` for `$\sigma>0$`.
- Gompertz–Makeham (1825, 1860): `$$h(t) = \lambda + \sigma^{-1}\exp(\beta t/\sigma), \qquad \beta, \sigma>0, \lambda \geq 0.$$`
- Logistic (Thatcher, 1999): 
`$$h(t) = \lambda + \frac{A\exp(\beta t/ \sigma)}{1+B\exp(\beta t/ \sigma)}, \qquad A>0, B \ge 0.$$`

---

# Generalized Pareto distribution

Most records include only lifetime above `$u_0$` (.color-2[**threshold exceedances**])

If a scaling function `$a_u$` exists such
that `$(X − u)/a_u$` has a non-degenerate distribution conditional on `$X > u$`, then (Pickands, 1975)
`$$\frac{\Pr\{(X-u)/a_u > t\}}{\Pr(X >u)} \to 
\begin{cases}
(1+\xi t/\sigma)_{+}^{-1/\xi}, & \xi \neq 0\\
\exp(-t/\sigma), & \xi = 0.
\end{cases}$$`
where `$c_+ = \max\{c, 0\}$` for a real number `$c$`.

The .color-2[**unique nondegenerate**] limiting distribution for exceedances of a threshold `$u$` is .section-title-2[**generalized Pareto**].

---

# Penultimate approximation

At lower levels, the behaviour of the fitted model depends on the reciprocal hazard, `$r(t) = 1/h(t)$`; under mild regularity conditions,

`$$\xi = \lim_{t \to t_{F}} r'(t)$$`
and a pre-asymptotic shape is `$\xi_u = r'(u)$`.

.small[
For example, the Gompertz model has `$\xi_u \nearrow 0$`: estimates of `$\xi$` tend to be negative.
]

???

The speed of convergence is quite fast, so we would expect the exceedances to be well approximated by an exponential distribution.

---

# Threshold stability

A key property of the generalized Pareto distribution is **threshold stability**.

- can extrapolate behaviour of `$F$` at higher levels
- useful for choosing `$u$` in applications
   - fit model at multiple threshold `$u_1 < \cdots < u_k$`.
   - check whether shape `$\xi$` agrees over range.

---

# Lack of (threshold) stability

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/thstab-1.png" alt="Threshold stability plots for France and Italy (left), and Netherlands (right)." width="60%" />
<p class="caption">Threshold stability plots for France and Italy (left), and Netherlands (right).</p>
</div>

---

# How good is the approximation?

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/qqplot_italian-1.png" alt="Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat." width="75%" />
<p class="caption">Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat.</p>
</div>

???

Bootstrap estimates obtained by conditioning on truncation time and birth dates. New observations simulated from doubly truncated distributions

---

# Accounting for interval truncation

The plotting position for `$x$`-axis of Q-Q plot for observation `$y_i$` is
$$
F_0^{-1}\left[F_0(a_i) +\left\\{ F_0(b_i)-F_0(a_i) \right\\} \frac{F_n(y_i) - F_n(a_i)}{F_n(b_i)-F_n(a_i)}\right]
$$
where
- `$F_0$` is the postulated (i.e., fitted) parametric distribution,
- `$F_0^{-1}$` is the corresponding quantile function,
- `$F_n$` is the NPMLE of the distribution function (Turnbull, 1976).

Censored observations not displayed.

---
# Flexibility is key

We fit a semiparametric hazard function `$$h(t) = \{\sigma + \xi t + g(t)\}^{-1}_{+}$$` with `$g(t) \to 0$` as `$t \to t_{F}$` with `$g(t)$` a cubic regression spline
- generalizes generalized Pareto model
- reduces to parametric model in upper tail
- equispaced knots

???

Left: figure obtained with bshazard for left-truncated right-censored data
Right: discretize data into daily bins, use cumulative hazard H(t) = sum_{z=1}^t h(z)/365 for interpretability, so survival function is exp(-H(x))
---
# Let the tail speak for itself!

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/splines_italian-1.png" alt="Nonparametric hazard (left) and semiparametric generalized Pareto (right)." width="75%" />
<p class="caption">Nonparametric hazard (left) and semiparametric generalized Pareto (right).</p>
</div>

.tiny[Semiparametric estimator suggest a wide range of plausible behaviour, including constant risk.]
---

# Take home messages

.box-inv-2.sp-after[cannot use low thresholds<br> for extrapolation.]

.box-inv-2.sp-after[goodness-of-fit diagnostics suggest<br> generalized Pareto model fits well.]

.box-inv-2.sp-after[hazard doesn't stabilize<br>until about 108 years.]

.box-inv-2.sp-after[shape estimates suggest<br>a **decrease** of the risk above.]

---
# Is there a finite lifespan?

Mathematically speaking, is `$t_F=\sup\{t: F(t)<1\} = \infty$`?

Hard to convey to the average reader:
- `$t_F=\infty$` does not imply immortality.
- `$\Pr(T > u) < \varepsilon$` does not imply finite lifespan `$t_F < u$`.

- Gompertz–Makeham has no right endpoint  and `$t_F =\infty$`.
- exponential model has a constant hazard (.color-2[**plateau of mortality**]).
- the generalized Pareto implies a lifespan of `$u-\sigma/\xi$` if `$\xi < 0$`.

---

layout: false
name: extrapolation
class: center middle section-title section-title-3

.huge.color-0[Extrapolation]

---

---

# Is the lifetime distribution bounded?

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/profile_lifetime-1.png" alt="Profile likelihood for endpoint for various countries and three thresholds." width="55%" />
<p class="caption">Profile likelihood for endpoint for various countries and three thresholds.</p>
</div>

---

# Human lifespan

No discernible differences between

- earlier and later birth cohorts,
- countries,
- men and women, except that after age 108 French men have lower survival.

Not to be confused with gender inbalance due to lower survival of men.
---

# You have no power in here!

The power of a likelihood ratio test for detecting a finite endpoint (obtained by simulating records with a generalized Pareto distribution with lifespan `$t_F$`) is high: based on France/Italy/IDL data (2016 version),

- 125 years: combined power of 97%;
- 130 years: combined power of 83%;
- 135 years: combined power of 66%.

Suggests that the human lifespan lies well beyond any lifetime yet observed.

---

# Huge uncertainty

Japanese (unvalidated) data are interval-censored and right-truncated

<div class="figure" style="text-align: center">
<img src="Dalhousie22_Belzile_files/figure-html/japanese-1.png" alt="Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right)." width="65%" />
<p class="caption">Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right).</p>
</div>

---

# Supercentenarians [don't] live forever...

Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.

Surviving until 130 years conditional on surviving until 110 years
- is equivalent to obtaining 20 heads in a row,
- a less than one-in-a-million chance...

Anticipated increase in number of supercentenarians make it possible to observe 130, but higher record is highly unlikely (Pearce & Raftery 2021).

---

# References

- Léo R. Belzile, Anthony C. Davison, Jutta Gampe, Holger Rootzén and Dmitrii Zholud (2022). Is there a cap on longevity? A statistical review., Annual Reviews of Statistics and its Applications, 9, 21–45, [`doi:10.1146/annurev-statistics-040120-025426`](https://doi.org/10.1146/annurev-statistics-040120-025426).
- Léo R. Belzile and Anthony C. Davison (2020). Improved inference for risk measures of univariate extremes (2022), Annals of Applied Statistics, 16(3): 1524–1549, [`doi: 10.1214/21-AOAS1555`](https://doi.org/10.1214/21-AOAS1555)
- Léo R. Belzile, Anthony C. Davison, Holger Rootzén and Dmitrii Zholud (2021). Human mortality at extreme age., Royal Society Open Science, 8, [`doi:10.1098/rsos.202097`](https://doi.org/10.1098/rsos.202097).