Department of Mathematics and Statistics, Dalhousie University
Léo Belzile, HEC Montréal
Based on joint work with Anthony Davison, Jutta Gampe, Holger Rootzén, Dmitrii Zholud
The study of human longevity is full of pitfalls for the unwarry...
The problem raises several statistical problems revolving around
data quality models extrapolation
Statistical analysis needed to assess biological theories about
natural selection
mortality plateau
existence of
finite lifespan
Statistical analysis needed to assess biological theories about
natural selection
mortality plateau
existence of
finite lifespan
Lots of interest in the news!
It is believed that exponential growth of mortality with age (Gompertz law) is followed by a period of deceleration, with slower rates of mortality increase at older ages.
Recent studies found that the exponential increase of the mortality risk with age (the famous Gompertz law) continues even at extreme old ages in humans, rats, and mice, thus challenging traditional views about old-age mortality deceleration, mortality leveling-off, and late-life mortality plateaus.
Gavrilova & Gavrilov (2015), Journals of Gerontology: Biological Sciences
The Gompertz–Makeham model is extremely popular in demography and fits well the distribution of lifetimes at lower levels, until 102-105.
Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1.
In Japan, in 2016, 8167 male and 57 525 female were centenarian.
In Canada, 6116 female and 835 male
Between 1983 and 2009, a total of
321 semisupercentenarians died in Quebec.
oldest living today: Soeur André (Lucile Randon, 118 years, 236 days); – oldest ever: Jeanne Calment (122 years, 164 days).
Data quality
Information limited due to availability of historical records.
Validation is key
Most databases (e.g., Gerontology Research Group) include self-reported records.
Information limited due to availability of historical records.
Validation is key
Most databases (e.g., Gerontology Research Group) include self-reported records.
Opportunity samples
Shigechio Izumi was excluded from records because he beared his dead brother name, so died age 105 rather than 120.
In Japan, the first modern family registration system was established in 1872 (Jinshin-KOSEKI), amended in 1886 by the Family Registration Law, Chapter 10 of Exceptional Lifespan.
Italy: example of teenager reportedly borned in 1800s, who died age 13 but was initially recorded to have died 113.
Nikolai Zak (conspiracy?) theory in Rejuvenation Research that Yvonne Calment took the place of her mother Jeanne to avoid inheritance taxes.
To draw reliable conclusions, we need representative samples.
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
Age-ascertainement bias-free
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
Age-ascertainement bias-free
1081 validated supercentenarians
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
for IDL, (only) supercentenarians in a country who died between dates c1 and c2.
records for the candidates are then individually validated.
Lexis diagrams showing the selection mechanism.
Semisupercentenarians (105-109) who died in window (d1,d2)≠(c1,c2).
Lexis diagrams for IDL data with semisupercentenarian and supercentenarians
Lexis diagrams for Istat data with semisupercentenarian and supercentenarians
Annual Vital Statistics Report of Japan (Hanamaya & Sibuya, 2014).
Ignoring truncation leads to underestimation of the survival probability: population increase and reduction in mortality at lower age translates into larger impact for later birth cohorts.
Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right).
Failing to account for truncation and increase in population.
Models
Denote the lifetime T, a continuous random variable with distribution F, density f, lifespan tF=sup and survivor and hazard functions
\begin{align*} S(t) &= \Pr(T>t) =1-F(t), \\h(t) &= \frac{f(t)}{S(t)}, \quad t>0. \end{align*}
The likelihood depends on \nu, hence consider the conditional likelihood \begin{align*} \frac{f(t)}{F(b)-F(a)}, \quad a < t< b \end{align*} for interval truncated data and, for left-truncated and right-censored data, \begin{align*} \frac{h(t)^\delta S(t)}{1-F(a)}, \quad t> a, \end{align*} where [a, b] = [\max\{0, c_1 − x\}, c_2 − x].
Many models popular in demography, many with infinite endpoint.
Most records include only lifetime above u_0 (threshold exceedances)
If a scaling function a_u exists such that (X − u)/a_u has a non-degenerate distribution conditional on X > u, then (Pickands, 1975) \frac{\Pr\{(X-u)/a_u > t\}}{\Pr(X >u)} \to \begin{cases} (1+\xi t/\sigma)_{+}^{-1/\xi}, & \xi \neq 0\\ \exp(-t/\sigma), & \xi = 0. \end{cases} where c_+ = \max\{c, 0\} for a real number c.
The unique nondegenerate limiting distribution for exceedances of a threshold u is generalized Pareto.
At lower levels, the behaviour of the fitted model depends on the reciprocal hazard, r(t) = 1/h(t); under mild regularity conditions,
\xi = \lim_{t \to t_{F}} r'(t) and a pre-asymptotic shape is \xi_u = r'(u).
For example, the Gompertz model has \xi_u \nearrow 0: estimates of \xi tend to be negative.
The speed of convergence is quite fast, so we would expect the exceedances to be well approximated by an exponential distribution.
A key property of the generalized Pareto distribution is threshold stability.
Threshold stability plots for France and Italy (left), and Netherlands (right).
Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat.
Bootstrap estimates obtained by conditioning on truncation time and birth dates. New observations simulated from doubly truncated distributions
The plotting position for x-axis of Q-Q plot for observation y_i is F_0^{-1}\left[F_0(a_i) +\left\{ F_0(b_i)-F_0(a_i) \right\} \frac{F_n(y_i) - F_n(a_i)}{F_n(b_i)-F_n(a_i)}\right] where
Censored observations not displayed.
We fit a semiparametric hazard function h(t) = \{\sigma + \xi t + g(t)\}^{-1}_{+} with g(t) \to 0 as t \to t_{F} with g(t) a cubic regression spline
Left: figure obtained with bshazard for left-truncated right-censored data Right: discretize data into daily bins, use cumulative hazard H(t) = sum_{z=1}^t h(z)/365 for interpretability, so survival function is exp(-H(x))
Nonparametric hazard (left) and semiparametric generalized Pareto (right).
Semiparametric estimator suggest a wide range of plausible behaviour, including constant risk.
cannot use low thresholds
for extrapolation.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
hazard doesn't stabilize
until about 108 years.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
hazard doesn't stabilize
until about 108 years.
shape estimates suggest
a decrease of the risk above.
Mathematically speaking, is t_F=\sup\{t: F(t)<1\} = \infty?
Hard to convey to the average reader:
Mathematically speaking, is t_F=\sup\{t: F(t)<1\} = \infty?
Hard to convey to the average reader:
the answer may be in the model.
Extrapolation
Profile likelihood for endpoint for various countries and three thresholds.
No discernible differences between
Not to be confused with gender inbalance due to lower survival of men.
The power of a likelihood ratio test for detecting a finite endpoint (obtained by simulating records with a generalized Pareto distribution with lifespan t_F) is high: based on France/Italy/IDL data (2016 version),
Suggests that the human lifespan lies well beyond any lifetime yet observed.
Japanese (unvalidated) data are interval-censored and right-truncated
Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right).
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Surviving until 130 years conditional on surviving until 110 years
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Surviving until 130 years conditional on surviving until 110 years
Anticipated increase in number of supercentenarians make it possible to observe 130, but higher record is highly unlikely (Pearce & Raftery 2021).
doi:10.1146/annurev-statistics-040120-025426
.doi: 10.1214/21-AOAS1555
doi:10.1098/rsos.202097
.The study of human longevity is full of pitfalls for the unwarry...
The problem raises several statistical problems revolving around
data quality models extrapolation
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Department of Mathematics and Statistics, Dalhousie University
Léo Belzile, HEC Montréal
Based on joint work with Anthony Davison, Jutta Gampe, Holger Rootzén, Dmitrii Zholud
The study of human longevity is full of pitfalls for the unwarry...
The problem raises several statistical problems revolving around
data quality models extrapolation
Statistical analysis needed to assess biological theories about
natural selection
mortality plateau
existence of
finite lifespan
Statistical analysis needed to assess biological theories about
natural selection
mortality plateau
existence of
finite lifespan
Lots of interest in the news!
It is believed that exponential growth of mortality with age (Gompertz law) is followed by a period of deceleration, with slower rates of mortality increase at older ages.
Recent studies found that the exponential increase of the mortality risk with age (the famous Gompertz law) continues even at extreme old ages in humans, rats, and mice, thus challenging traditional views about old-age mortality deceleration, mortality leveling-off, and late-life mortality plateaus.
Gavrilova & Gavrilov (2015), Journals of Gerontology: Biological Sciences
The Gompertz–Makeham model is extremely popular in demography and fits well the distribution of lifetimes at lower levels, until 102-105.
Figure 3 of Pyrkov et al. (2021), Nature Communications, doi:10.1038/s41467-021-23014-1.
In Japan, in 2016, 8167 male and 57 525 female were centenarian.
In Canada, 6116 female and 835 male
Between 1983 and 2009, a total of
321 semisupercentenarians died in Quebec.
oldest living today: Soeur André (Lucile Randon, 118 years, 236 days); – oldest ever: Jeanne Calment (122 years, 164 days).
Data quality
Information limited due to availability of historical records.
Validation is key
Most databases (e.g., Gerontology Research Group) include self-reported records.
Information limited due to availability of historical records.
Validation is key
Most databases (e.g., Gerontology Research Group) include self-reported records.
Opportunity samples
Shigechio Izumi was excluded from records because he beared his dead brother name, so died age 105 rather than 120.
In Japan, the first modern family registration system was established in 1872 (Jinshin-KOSEKI), amended in 1886 by the Family Registration Law, Chapter 10 of Exceptional Lifespan.
Italy: example of teenager reportedly borned in 1800s, who died age 13 but was initially recorded to have died 113.
Nikolai Zak (conspiracy?) theory in Rejuvenation Research that Yvonne Calment took the place of her mother Jeanne to avoid inheritance taxes.
To draw reliable conclusions, we need representative samples.
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
Age-ascertainement bias-free
To draw reliable conclusions, we need representative samples.
validated supercentenarian (110+) from 13 countries
plus (partly validated) semi-supercentenarian (105-109) for 9 countries
Age-ascertainement bias-free
1081 validated supercentenarians
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
Data are obtained by casting a net on the population of potential (semi)-supercentenarians.
for IDL, (only) supercentenarians in a country who died between dates c_1 and c_2.
records for the candidates are then individually validated.
Lexis diagrams showing the selection mechanism.
Semisupercentenarians (105-109) who died in window (d_1, d_2)\neq (c_1, c_2).
Lexis diagrams for IDL data with semisupercentenarian and supercentenarians
Lexis diagrams for Istat data with semisupercentenarian and supercentenarians
Annual Vital Statistics Report of Japan (Hanamaya & Sibuya, 2014).
Ignoring truncation leads to underestimation of the survival probability: population increase and reduction in mortality at lower age translates into larger impact for later birth cohorts.
Impact of truncation on quantile-quantile plots (left) and maximum age by birth year (right).
Failing to account for truncation and increase in population.
Models
Denote the lifetime T, a continuous random variable with distribution F, density f, lifespan t_{F}= \sup\{t: F(t) < 1\} and survivor and hazard functions
\begin{align*} S(t) &= \Pr(T>t) =1-F(t), \\h(t) &= \frac{f(t)}{S(t)}, \quad t>0. \end{align*}
The likelihood depends on \nu, hence consider the conditional likelihood \begin{align*} \frac{f(t)}{F(b)-F(a)}, \quad a < t< b \end{align*} for interval truncated data and, for left-truncated and right-censored data, \begin{align*} \frac{h(t)^\delta S(t)}{1-F(a)}, \quad t> a, \end{align*} where [a, b] = [\max\{0, c_1 − x\}, c_2 − x].
Many models popular in demography, many with infinite endpoint.
Most records include only lifetime above u_0 (threshold exceedances)
If a scaling function a_u exists such that (X − u)/a_u has a non-degenerate distribution conditional on X > u, then (Pickands, 1975) \frac{\Pr\{(X-u)/a_u > t\}}{\Pr(X >u)} \to \begin{cases} (1+\xi t/\sigma)_{+}^{-1/\xi}, & \xi \neq 0\\ \exp(-t/\sigma), & \xi = 0. \end{cases} where c_+ = \max\{c, 0\} for a real number c.
The unique nondegenerate limiting distribution for exceedances of a threshold u is generalized Pareto.
At lower levels, the behaviour of the fitted model depends on the reciprocal hazard, r(t) = 1/h(t); under mild regularity conditions,
\xi = \lim_{t \to t_{F}} r'(t) and a pre-asymptotic shape is \xi_u = r'(u).
For example, the Gompertz model has \xi_u \nearrow 0: estimates of \xi tend to be negative.
The speed of convergence is quite fast, so we would expect the exceedances to be well approximated by an exponential distribution.
A key property of the generalized Pareto distribution is threshold stability.
Threshold stability plots for France and Italy (left), and Netherlands (right).
Quantile-quantile plots with 95% pointwise and simultaneous bands (left) and conditional cumulative hazard (right) for Istat.
Bootstrap estimates obtained by conditioning on truncation time and birth dates. New observations simulated from doubly truncated distributions
The plotting position for x-axis of Q-Q plot for observation y_i is F_0^{-1}\left[F_0(a_i) +\left\{ F_0(b_i)-F_0(a_i) \right\} \frac{F_n(y_i) - F_n(a_i)}{F_n(b_i)-F_n(a_i)}\right] where
Censored observations not displayed.
We fit a semiparametric hazard function h(t) = \{\sigma + \xi t + g(t)\}^{-1}_{+} with g(t) \to 0 as t \to t_{F} with g(t) a cubic regression spline
Left: figure obtained with bshazard for left-truncated right-censored data Right: discretize data into daily bins, use cumulative hazard H(t) = sum_{z=1}^t h(z)/365 for interpretability, so survival function is exp(-H(x))
Nonparametric hazard (left) and semiparametric generalized Pareto (right).
Semiparametric estimator suggest a wide range of plausible behaviour, including constant risk.
cannot use low thresholds
for extrapolation.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
hazard doesn't stabilize
until about 108 years.
cannot use low thresholds
for extrapolation.
goodness-of-fit diagnostics suggest
generalized Pareto model fits well.
hazard doesn't stabilize
until about 108 years.
shape estimates suggest
a decrease of the risk above.
Mathematically speaking, is t_F=\sup\{t: F(t)<1\} = \infty?
Hard to convey to the average reader:
Mathematically speaking, is t_F=\sup\{t: F(t)<1\} = \infty?
Hard to convey to the average reader:
the answer may be in the model.
Extrapolation
Profile likelihood for endpoint for various countries and three thresholds.
No discernible differences between
Not to be confused with gender inbalance due to lower survival of men.
The power of a likelihood ratio test for detecting a finite endpoint (obtained by simulating records with a generalized Pareto distribution with lifespan t_F) is high: based on France/Italy/IDL data (2016 version),
Suggests that the human lifespan lies well beyond any lifetime yet observed.
Japanese (unvalidated) data are interval-censored and right-truncated
Posterior credible intervals by threshold (left) and sampling distribution with(out) rounding (right).
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Surviving until 130 years conditional on surviving until 110 years
Estimated exponential distribution above 110 years for IDL has mean 0.5 (0.46, 0.53): a coin toss.
Surviving until 130 years conditional on surviving until 110 years
Anticipated increase in number of supercentenarians make it possible to observe 130, but higher record is highly unlikely (Pearce & Raftery 2021).
doi:10.1146/annurev-statistics-040120-025426
.doi: 10.1214/21-AOAS1555
doi:10.1098/rsos.202097
.