layout: false background-image: url("figures/logo_hec_montreal_blanc_web.png") background-size: 400px background-position: 5% 5% class: center middle main-title section-title-1 # Is there a limit to human longevity? .class-info[ **Biostatistics Seminar, McGill University** .light[Léo Belzile, HEC Montréal] <br> Based on joint work with Anthony Davison, Jutta Gampe, Holger Rootzén, Dmitrii Zholud ] --- layout: true class: title title-1 --- # Lots of interest in the news! <div class="figure" style="text-align: center"> <img src="figures/Soeur_Andre_collage.png" alt="Headlines following the death of Lucile Randon (Soeur André) on January 17th (New York Times, Radio-Canada, Le Temps)" width="100%" /> <p class="caption">Headlines following the death of Lucile Randon (Soeur André) on January 17th (New York Times, Radio-Canada, Le Temps)</p> </div> --- # Why study longevity? Statistical analyses needed to assess biological theories about .box-inv-1.sp-after-half.large[mortality plateau] .box-inv-1.sp-after-half.large[senescence] .box-inv-1.sp-after-half.large[existence of<br> finite lifespan] --- # Exponential growth of mortality .small[ > *It is believed that exponential growth of mortality with age (Gompertz law) is followed by a period of deceleration, with slower rates of mortality increase at older ages.* > *Recent studies found that the exponential increase of the mortality risk with age (the famous Gompertz law) continues even at extreme old ages in humans, rats, and mice, thus challenging traditional views about old-age mortality deceleration, mortality leveling-off, and late-life mortality plateaus.* ] .small[ Gavrilova & Gavrilov (2015), *Journals of Gerontology: Biological Sciences* ] ??? The Gompertz–Makeham model is extremely popular in demography and fits well the distribution of lifetimes at lower levels, until 102-105. --- # Theory of senescence <div class="figure" style="text-align: center"> <img src="figures/Pyrkov2021.png" alt="Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1." width="100%" /> <p class="caption">Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1.</p> </div> ??? In Japan, in 2016, 8167 male and 57 525 female were centenarian. - rate of 5.1 per 10K in 2015 In Canada, Census 2021 indentified 9 545 centenarians (6116 female, 1830 male) - rate of 1.9 per 10K in 2016 Between 1983 and 2009, a total of - 12 supercentenarians died - 321 semisupercentenarians died in Quebec. - oldest living today: Maria Branyas Morera (115 years, 148 days) – oldest ever: Jeanne Calment (122 years, 164 days). --- # Decreasing max reported age at death? <div class="figure" style="text-align: center"> <img src="figures/Dong2016.png" alt="Evidence for a limit to human lifespan by X. Dong, B. Milholland and J. Vijg (2016), Nature, doi:10.1038/nature19793." width="85%" /> <p class="caption">Evidence for a limit to human lifespan by X. Dong, B. Milholland and J. Vijg (2016), Nature, doi:10.1038/nature19793.</p> </div> --- # Estimating human lifespan The study of human longevity is full of pitfalls for the unwary... The problem raises several statistical problems revolving around .pull-left[ .box-5.medium.sp-after-half[data quality] .box-2.medium.sp-after-half[sampling] ] .pull-right[ .box-3.medium.sp-after-half[models] .box-4.medium.sp-after-half[extrapolation] ] --- layout: false name: data class: center middle section-title section-title-5 .huge[ # Data quality ] --- layout: true class: title title-5 --- # Samples Information limited due to availability of historical records. - Validation is key - necronyms - record falsification - mistakes in data registers - Most databases (e.g., *Gerontology Research Group*) include self-reported records. -- .box-5.medium[**Opportunity samples**] ??? Shigechio Izumi was excluded from records because he beared his dead brother name, so died age 105 rather than 120. In Japan, the first modern family registration system was established in 1872 (Jinshin-KOSEKI), amended in 1886 by the Family Registration Law, Chapter 10 of Exceptional Lifespan. Italy: example of teenager reportedly borned in 1800s, who died age 13 but was initially recorded to have died 113. --- # Show me your (meta)data <div class="figure" style="text-align: center"> <img src="figures/idl.png" alt="[International Database on Longevity](https://www.supercentenarians.org/en/) " width="65%" /> <p class="caption">[International Database on Longevity](https://www.supercentenarians.org/en/) </p> </div> To draw reliable conclusions, we need .color-5[**representative samples**]. --- # International Database on Longevity The IDL ([supercentenarians.org](https://supercentenarians.org)) includes - .section-title-5[**validated supercentenarian (110+)**] from 13 countries -- - plus (partly validated) .section-title-5[** semi-supercentenarian (105-109)**] for 9 countries -- - Age-ascertainement bias-free -- .box-inv-5[17 798 records, of which 1161 validated supercentenarians] ??? - Nearly 50% of the records fail the validation check. - Random sample validated for semi-supercentenarians in France --- layout: false name: data class: center middle section-title section-title-2 .huge[ # Sampling ] --- layout: true class: title title-2 --- # Sampling mechanisms Data are obtained by .section-title-2[**casting a net**] on the population of potential (semi)-supercentenarians. -- - for IDL, (only) supercentenarians in a country .color-2[**who died**] between dates `\(c_1\)` and `\(c_2\)`. -- - records for the candidates are then individually validated. --- # Lexis diagram for interval truncation <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/lexisdiag-1.png" alt="Lexis diagrams showing the selection mechanism." width="82%" /> <p class="caption">Lexis diagrams showing the selection mechanism.</p> </div> --- # Lexis diagram for Italian data <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/lexisdiag2-1.png" alt="Lexis diagrams for Istat data with semisupercentenarian and supercentenarians" width="45%" /> <p class="caption">Lexis diagrams for Istat data with semisupercentenarian and supercentenarians</p> </div> --- # More complex truncation schemes! Semisupercentenarians (105-109) who died in window `\((d_1, d_2)\neq (c_1, c_2)\)`. <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/lexisdiag3-1.png" alt="Lexis diagrams for IDL data with semisupercentenarian and supercentenarians" width="85%" /> <p class="caption">Lexis diagrams for IDL data with semisupercentenarian and supercentenarians</p> </div> --- # Truncation can be hidden - .section-title-2[ **Extinct cohort method** ]: Birth cohorts for which no death has been reported for X consecutive years. - counts cross-tabulated by years of birth, age and gender. <div class="figure" style="text-align: center"> <img src="figures/Sibuya_tabular.png" alt="Annual Vital Statistics Report of Japan (Hanamaya & Sibuya, 2014)." width="56%" /> <p class="caption">Annual Vital Statistics Report of Japan (Hanamaya & Sibuya, 2014).</p> </div> --- # Why does it matter? .small[ Ignoring truncation leads to .section-title-2[**underestimation**] of the survival probability: population increase and reduction in mortality at lower age translates into larger impact for later birth cohorts. ] <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/artefacts_truncation-1.png" alt="Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right)." width="65%" /> <p class="caption">Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right).</p> </div> --- # Accounting for interval truncation We also need to modify usual inferential tools. The plotting position for `\(x\)`-axis of Q-Q plot for observation `\(y_i\)` is $$ F_0^{-1}\left[F_0(a_i) +\left\\{ F_0(b_i)-F_0(a_i) \right\\} \frac{F_n(y_i) - F_n(a_i)}{F_n(b_i)-F_n(a_i)}\right] $$ where - `\(F_0\)` is the postulated (i.e., fitted) parametric distribution, - `\(F_0^{-1}\)` is the corresponding quantile function, - `\(F_n\)` is the NPMLE of the distribution function (Turnbull, 1976). --- # Artefacts of data collection <div class="figure" style="text-align: center"> <img src="figures/Dong2016_small.png" alt="Revisiting Dong et al. (2016)" width="65%" /> <p class="caption">Revisiting Dong et al. (2016)</p> </div><div class="figure" style="text-align: center"> <img src="figures/Dong_revisited.png" alt="Revisiting Dong et al. (2016)" width="65%" /> <p class="caption">Revisiting Dong et al. (2016)</p> </div> --- layout: false name: models class: center middle section-title section-title-3 .huge[ # Models ] --- layout: true class: title title-3 --- # Survival analysis Denote the lifetime `\(T\)`, a continuous random variable with distribution `\(F\)`, density `\(f\)`, lifespan `\(t_{F}= \sup\{t: F(t) < 1\}\)` and survivor and hazard functions `$$\begin{align*} S(t) &= \Pr(T>t) =1-F(t), \\h(t) &= \frac{f(t)}{S(t)}, \quad t>0. \end{align*}$$` --- # Poisson process - Suppose individuals independently reach age `\(u_0\)` at calendar time `\(x\)` at rate `\(\nu(x)\)`, and subsequently die at age `\(t+u_0\)` with density `\(f\)`. - Events in `\(\mathcal{C} = [c_1 , c_2] \times [u_0, \infty)\)` follow a Poisson process of rate `$$\begin{align*} \lambda(c, t) = \nu(c − t)f(t), \qquad c \in \mathbb{R}, t > 0 \end{align*}$$` at calendar time `\(c\)` and excess lifetime `\(t\)`. --- # Poisson process - The lifetime density for dying in `\(\mathcal{C}\)` is `$$\begin{align*} f_{\mathcal{C}}(t) \propto f(t)w_{\mathcal{C}}(t), \quad w_{\mathcal{C}}(t) = \int_{c_1 - t}^{c_2-t} \nu(x) \mathrm{d}x, \quad t>0 \end{align*}$$` where `\(w_{\mathcal{C}}\)` is decreasing, so `\(f_{\mathcal{C}}\)` is .color-3[**stochastically smaller**] than `\(f\)`. --- # Likelihood contributions The likelihood depends on `\(\nu\)`, hence consider the conditional likelihood `$$\begin{align*} \frac{f(t)}{F(b)-F(a)}, \quad a < t< b \end{align*}$$` for interval truncated data and, for left-truncated and right-censored data, `$$\begin{align*} \frac{h(t)^\delta S(t)}{1-F(a)}, \quad t> a, \end{align*}$$` where `\([a, b] = [\max\{0, c_1 − x\}, c_2 − x]\)`. --- # Popular parametric models Hazard of models frequently used in demography: - exponential: `\(h(t) = \sigma^{-1}\)` for `\(\sigma>0\)`. - Gompertz–Makeham (1825, 1860): `$$h(t) = \lambda + \sigma^{-1}\exp(\beta t/\sigma), \qquad \beta, \sigma>0, \lambda \geq 0.$$` - Logistic–Makeham (Perks, 1932): `$$h(t) = \lambda + \frac{\alpha\exp(\beta t/ \sigma)}{1+\alpha\gamma\exp(\beta t/ \sigma)}, \qquad \alpha>0, \beta\gamma \ge 0.$$` --- # Comparing models Authors frequently fail to account for non-standard asymptotics for tests. - The Gompertz model includes the exponential when `\(\beta \to 0\)` (boundary parameter) so likelihood ratio test has `\(\frac{1}{2}\chi^2_0 + \frac{1}{2}\chi^2_1\)` null distribution. - The Gompertz-Makeham distribution include the exponential distribution as special case, but the parameters are not numerically identifiable when `\(\beta \to 0\)`. --- # Extreme value theory Most records include only lifetime above `\(u_0\)` (.color-3[**threshold exceedances**]) The .color-3[**unique nondegenerate**] limiting distribution for exceedances of a threshold `\(u\)` is .section-title-3.inv[**generalized Pareto**]. .tiny[*Some conditions apply] --- # Generalized Pareto distribution If a scaling function `\(a_u\)` exists such that `\((X − u)/a_u\)` has a non-degenerate distribution conditional on `\(X > u\)`, then (Pickands, 1975) `$$\lim_{u \to t_F}\frac{\Pr\{(X-u)/a_u > t\}}{\Pr(X >u)} \to \begin{cases} (1+\xi t/\sigma)_{+}^{-1/\xi}, & \xi \neq 0\\ \exp(-t/\sigma), & \xi = 0. \end{cases}$$` where `\(c_+ = \max\{c, 0\}\)`. --- # Penultimate approximation At lower levels, the behaviour of the fitted model depends on the reciprocal hazard, `\(r(t) = 1/h(t)\)`; under mild regularity conditions, `$$\xi = \lim_{t \to t_{F}} r'(t)$$` and a pre-asymptotic shape is `\(\xi_u = r'(u)\)`. -- - the Gompertz model has `\(\xi_u \nearrow 0\)`: estimates of `\(\xi\)` tend to be negative. ??? The speed of convergence is quite fast, so we would expect the exceedances to be well approximated by an exponential distribution. --- # Threshold stability A key property of the generalized Pareto distribution is **threshold stability**. - can extrapolate behaviour of `\(F\)` at higher levels - useful for choosing `\(u\)` in applications - fit model at multiple threshold `\(u_1 < \cdots < u_k\)`. - check whether shape `\(\xi\)` agrees over range. --- # Lack of (threshold) stability <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/thstab-1.png" alt="Threshold stability plots for France and Italy (left), and Netherlands (right)." width="85%" /> <p class="caption">Threshold stability plots for France and Italy (left), and Netherlands (right).</p> </div> --- # How good is the approximation? <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/qqplot_italian-1.png" alt="Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat." width="75%" /> <p class="caption">Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat.</p> </div> ??? Bootstrap estimates obtained by conditioning on truncation time and birth dates. New observations simulated from doubly truncated distributions --- # Flexibility is key We fit a semiparametric hazard function `$$h(t) = \{\sigma + \xi t + g(t)\}^{-1}_{+}$$` with `\(g(t) \to 0\)` as `\(t \to t_{F}\)` with `\(g(t)\)` a cubic regression spline - generalizes generalized Pareto model - reduces to parametric model in upper tail - equispaced knots ??? Left: figure obtained with bshazard for left-truncated right-censored data Right: discretize data into daily bins, use cumulative hazard H(t) = sum_{z=1}^t h(z)/365 for interpretability, so survival function is exp(-H(x)) --- # Let the tail speak for itself! <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/splines_italian-1.png" alt="Nonparametric hazard (left) and semiparametric generalized Pareto (right)." width="75%" /> <p class="caption">Nonparametric hazard (left) and semiparametric generalized Pareto (right).</p> </div> .tiny[Semiparametric estimator suggest a wide range of plausible behaviour, including constant risk.] --- layout: false name: extrapolation class: center middle section-title section-title-4 .huge.color-0[Extrapolation] --- layout: true class: title title-4 --- # Is there a finite lifespan? Mathematically speaking, is `\(t_F=\sup\{t: F(t)<1\} = \infty\)`? Hard to convey to the average reader: - `\(t_F=\infty\)` does not imply immortality. - `\(\Pr(T > u) < \varepsilon\)` does not imply finite lifespan `\(t_F < u\)`. -- .box-inv-4[the answer may be in the model.] - Gompertz–Makeham has no right endpoint and `\(t_F =\infty\)`. - exponential model has a constant hazard (.color-4[**plateau of mortality**]). - the generalized Pareto implies a lifespan of `\(u-\sigma/\xi\)` if `\(\xi < 0\)`. --- # Is the lifetime distribution bounded? <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/profile_lifetime-1.png" alt="Profile likelihood for endpoint for various countries and three thresholds." width="55%" /> <p class="caption">Profile likelihood for endpoint for various countries and three thresholds.</p> </div> --- # Human lifespan No discernible differences between - earlier and later birth cohorts, - countries, - men and women, except that after age 108 French men have lower survival. Not to be confused with gender inbalance due to lower survival of men. --- # You have no power in here! The power of a likelihood ratio test for detecting a finite endpoint (obtained by simulating records with a generalized Pareto distribution with lifespan `\(t_F\)`) is high: based on France/Italy/IDL data (2016 version), - 125 years: combined power of 97%; - 130 years: combined power of 83%; - 135 years: combined power of 66%. Suggests that the human lifespan lies well beyond any lifetime yet observed. --- # Huge uncertainty Japanese (unvalidated) data are interval-censored and right-truncated <div class="figure" style="text-align: center"> <img src="McGill_Biostat_2023-02-Belzile_files/figure-html/japanese-1.png" alt="Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right)." width="65%" /> <p class="caption">Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right).</p> </div> --- # Supercentenarians [don't] live forever... Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss. -- Surviving until 130 years conditional on surviving until 110 years - is equivalent to obtaining 20 heads in a row, - a less than one-in-a-million chance... -- Anticipated increase in number of supercentenarians make it possible to observe 130, but higher record is highly unlikely (Pearce & Raftery 2021). --- layout: true class: title title-1 --- # Take home messages .box-inv-1.sp-after[cannot use low thresholds<br> for extrapolation.] -- .box-inv-1.sp-after[goodness-of-fit diagnostics suggest<br> generalized Pareto model fits well.] -- .box-inv-1.sp-after[hazard doesn't stabilize<br>until about 108 years.] -- .box-inv-1.sp-after[shape estimates suggest<br>a **decrease** of the risk above 108.] --- # References - Léo R. Belzile, Anthony C. Davison, Holger Rootzén and Dmitrii Zholud (2021). Human mortality at extreme age., Royal Society Open Science, 8, [`doi:10.1098/rsos.202097`](https://doi.org/10.1098/rsos.202097). - Léo R. Belzile, Anthony C. Davison, Jutta Gampe, Holger Rootzén and Dmitrii Zholud (2022). Is there a cap on longevity? A statistical review., Annual Reviews of Statistics and its Applications, 9, 21–45, [`doi:10.1146/annurev-statistics-040120-025426`](https://doi.org/10.1146/annurev-statistics-040120-025426). - Léo R. Belzile (2022) Heads or tails: What statistical models tell us about the probability of living beyond 110, [The Conversation](https://theconversation.com/heads-or-tails-what-statistical-models-tell-us-about-the-probability-of-living-beyond-110-189287) - Léo R. Belzile (2023+). `longevity`: An **R** Package for Modelling Excess Lifetimes