CHAPTER 7: Age standardization

Freddie Bray and Jacques Ferlay

Introduction
For every type of cancer in every population worldwide, age is a key determinant of the risk of developing the disease (Armitage and Doll, 1954). Further, the age structures of the registry catchment populations included in this volume vary considerably. For these two reasons, any global comparisons of incidence rates must be made independent of the effects of age to be meaningful. This chapter briefly discusses the summary measures used for this purpose in the preparation of Cancer Incidence in Five Continents (CI5), allowing for comparisons of incidence across populations (within Volume XI) and within the same population over time (across consecutive volumes). In each of the previous ten volumes of CI5, we have emphasized that an age-specific examination of cancer incidence rates is an essential exploratory step in any comparative analysis, and Volume XI is no exception. Graphical displays of age-specific rates may reveal possible artefacts in the data or provide insight into the biology and etiology of cancer in the population or populations under study. The online resources available at http://ci5.iarc.fr (and the graphical facilities within) allow for a visual inspection that complements the tabulated summaries in the published volume. To compare incidence across multiple populations or time points, we require a summary measure that absorbs the schedule of age-specific rates in different registries and across time. The crude rate does not take into account the varying age structures in the underlying populations. Therefore, as in previous volumes, two standardized measures – the age-standardized rate and the cumulative rate – are presented; this enables comparisons of cancer risk between registries, independent of the effects of age. A brief description of the properties of these summary measures is given below, together with the principal calculations involved. Readers who wish to explore the topic in more depth should consult the chapters by Smith (1992) and Day (1992) in CI5 Volume VI. The books by Jensen et al. (1991), Dos Santos Silva (1999), and Kirkwood and Sterne (2003) include chapters that discuss in practical terms the age standardization of cancer data, while Estève et al. (1994) give a more theoretical account of the underlying methodology.

Calculating age-standardized rates
The age-standardized incidence rate is the summary rate that would have been observed, given the schedule of age-specific rates, in a population with the age composition of some reference population, often called the standard population. Calculation of the standardized rate is an example of direct standardization, whereby the observed age-specific rates are applied (directly) to a theoretical standard population. The calculation is illustrated in Table 7.1, for stomach cancer incidence among females in Yanting County, China (2008–2012), using the world standard population first introduced by Segi (1960), drawn from a pooled population of 46 countries and modified for the first volume of this series by Doll et al. (1966). The Segi–Doll world standard is used in this and all previous volumes of CI5 as a means to rapidly examine geographical and temporal variations in cancer risk. The main criticism levelled at age-standardized rates concerns the need to select an arbitrary standard population; age-standardized rates can only be meaningfully compared if they refer to the same standard. Although it is clear that the age composition of the world standard is representative of neither the present nor the future age-specific global population, switching to a more up-to-date standard would only be worthwhile if it could provide benefits that outweighed the drawbacks of rendering the Segi–Doll standardized rates obsolete. However, it has been shown that although very different absolute values of age-standardized rates are obtained using different standards, the estimates of age-adjusted relative risk (the main purpose of such routine comparisons) are quite similar (Bray et al., 2002). Therefore, the main effect of changing the standard to a more up-to-date representation of the global population would be the inconsequential result that the value of the age-standardized rate would be closer to that of the crude rate. Accordingly, for both theoretical and practical reasons, the age-standardized rates in this volume are calculated as previously, using the age composition of the Segi–Doll world standard, shown in Table 7.1. Age groups are indexed by the subscript i, di is the number of cases in age group i, yi is the number of person-years at risk in age group i, and wi is the number of individuals in (or the weight of) age group i in the world standard population. The crude rate per 100 000 per year is therefore calculated as follows (using the data presented in Table 7.1 as an example):


And the age-standardized rate is calculated as follows:
In this example, the age-standardized rate is 20% less than the crude rate in the same 5-year period. This is because the world standard population has proportionally fewer individuals in the older age groups than does the actual registry population, and the risk of disease (according to the age-specific rates) is higher at older ages.

Table 7.1. Calculation of age-standardized incidence rates:
Stomach cancer (ICD-10 C16), China, Yanting County, females (2008–2012)
Age group (years)No. of casesPerson-years at riskAge-specific incidence (per 105 years)Standard world populationExpected no. of cases in standard world population
idiyi105(di/yi)widiwi/yi
0–40844680.0012 0000.00
5–90816700.0010 0000.00
10–140720680.009 0000.00
15–1901060930.009 0000.00
20–2401469680.008 0000.00
25–2901491080.008 0000.00
30–3461337804.486 0000.27
35–393512440828.136 0001.69
40–445810222956.746 0003.40
4549909264397.156 0005.83
50–5411783769139.675 0006.98
55–5913175003174.664 0006.99
60–6417566514263.104 00010.52
65–6917151164334.223 00010.03
70–7416237469432.362 0008.65
75–7914823505629.651 0006.30
80–849513259716.495003.58
85+8275281089.275005.45
Total1270145164687.49100 00069.68

Cumulative rate and cumulative risk
The cumulative rate is the sum of the age-specific rates over each year of age from birth to a defined upper age limit. As age-specific incidence rates are usually computed for 5 year age intervals, the cumulative rate is then five times the sum of the age-specific rates calculated over the 5 year age groups, assuming the age-specific rates are the same for all ages within a 5 year age group. In the example of stomach cancer incidence in females in the China, Yanting County registry population, all the age groups have a width of five years, so the cumulative rate from ages 0 to 74 is calculated as follows:


The cumulative rate is in fact not a rate at all but a dimensionless quantity most conveniently expressed as a percentage. In the example above, the cumulative rate up to the age of 74 years would be given as 7.65%. The age span over which a rate is accumulated must be specified. In this volume, the age ranges 0–64 years and 0–74 years are used to provide two representations of the lifetime risk of developing the disease. Other age ranges may be more appropriate for more specific purposes, such as investigating childhood cancers. The cumulative rate’s companion – the cumulative risk – is defined as the probability that an individual will develop the disease in question during a certain age span, in the absence of other causes of death. If the cumulative risk in an age range is less than 10%, as is the case with most tumours, it can be approximated very well by the cumulative rate. Table 7.2 shows the correction needed to convert the cumulative rate into the cumulative risk. For values under 10%, the difference between the two is small. The precise mathematical relationship between the cumulative rate and the cumulative risk is as follows:

cumulative risk = 1 – exp(–cumulative rate)

Continuing with our example, the cumulative risk is 7.37%, and thus the estimated risk for a female in Yanting County of developing stomach cancer between the ages of 0 and 74 years is estimated as 7.4%. In other words, approximately 1 in 14 females would develop stomach cancer in the defined region within the specified lifetime, assuming no other causes of death. The cumulative rate has two principal advantages over the age-standardized rate. First, because it is a form of direct standardization, the problem of choosing an arbitrary reference population is eliminated. Second, as an approximation of the cumulative risk, it has a greater intuitive appeal, and is more directly interpretable as a measurement of lifetime risk, assuming no other causes of death.
Table 7.2. Conversion of cumulative rates, 100x, into the corresponding cumulative risks, 100(1–e–x)
100x0.10.51.05.07.010.015.020.030.040.050.0
100(1–e–x)0.10.4990.9954.886.769.5213.9318.1325.9232.9739.35

Calculating the standard error
Both the age-standardized rate and the cumulative rate are weighted sums of the age-specific rates, so the standard error can be derived in both cases from the same formula. If the age-specific rate in age group i is estimated from di cases and yi person-years, then the age-standardized rate (with wi representing the standardization weights) is

with an estimated variance (based on the Poisson distribution) of

and an estimated standard error of

For the age-standardized rate, the weights are given by the number of individuals in each age group per 100 000 in the standard population. For the cumulative rate, the weights are equal to the widths of the age groups. When all age groups are 5 years wide, the expression for the standard error (SE) of the cumulative rate (expressed as a percentage) reduces to:

For the example of stomach cancer in females in the China, Yanting County registry population, the estimates (with standard errors) are 69.7 (1.98) for the rate standardized to the world population and 7.65 (0.264) for the cumulative rate (ages 0–74).

Cases of unknown age
The age-standardized rate and the cumulative rate are necessarily corrected for cases of unknown age that occur within the CI5 registry datasets. The correction procedure involves multiplying the summary measure (based on cases of known age) by T/K, where T is the total number of cases of cancer of the same type in individuals of the same sex, and K is the number of cancers occurring in individuals of known age. The standard errors are also multiplied by the same correction factor (T/K). This correction relies on the assumption that the cases of unknown age are randomly distributed, and therefore have the same age distribution as the cases of known age. In other words, the correction assumes that the probability that the age of a case is unknown does not depend on the age of the case. Although this assumption probably does not hold – it is more likely that age is not recorded among older patients – it is nevertheless important that all registered cases are accounted for, so that the summary statistics are not underestimated.

Availability of data for upper age groups
The precision of rates calculated using direct standardization varies according to the number and width of the age groups used. Age-standardized rates, for example, yield slightly different values depending on the extent to which population data are available for the oldest age groups (aged 75 years or more). For most registries, census data are available for the same 18 age groups shown in Table 7.1, enabling the calculation of rates for the upper age ranges of 75–79, 80–84, and 85+. However, some registries can only obtain population data for 16 or 17 age groups, and thus the oldest age groups instead correspond to individuals aged 75+ or 80+, respectively. As an example of the effect of this difference, the age-standardized rate of 69.7 for stomach cancer incidence in females in the China, Yanting County registry population changes to 69.0 when only 16 age groups are used in the calculation instead of all 18 age groups.

References

  1. Armitage P, Doll R (1954). The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer. 8(1):1–12. http://dx.doi.org/10.1038/bjc.1954.1 PMID:13172380
  2. Bray F, Guilloux A, Sankila R, Parkin DM (2002). Practical implications of imposing a new world standard population. Cancer Causes Control. 13(2):175–82. http://dx.doi.org/10.1023/A:1014344519276 PMID:11936824
  3. Day NE (1992). Cumulative rate and cumulative risk. In: Parkin DM, Muir CS, Whelan SL, Gao Y-T, Ferlay J, Powell J, editors. Cancer Incidence in Five Continents, Vol. VI. IARC Scientific Publication No. 120. Lyon: International Agency for Research on Cancer.
  4. Doll R, Payne P, Waterhouse J, editors (1966). Cancer Incidence in Five Continents: A Technical Report. Berlin: Springer-Verlag (for UICC).
  5. Dos Santos Silva I, editor (1999). Cancer Epidemiology: Principles and Methods. IARC Non-Serial Publication. Lyon: International Agency for Research on Cancer.
  6. Estève J, Benhamou E, Raymond L (1994). Statistical Methods in Cancer Research, Vol. IV. Descriptive Epidemiology. IARC Scientific Publication No. 128. Lyon: International Agency for Research on Cancer.
  7. Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RG, editors (1991). Cancer Registration: Principles and Methods. IARC Scientific Publication No. 95. Lyon: IARCPress.
  8. Kirkwood BR, Sterne JAC (2003). Essential Medical Statistics. 2nd ed. Oxford: Blackwell Science.
  9. Segi M (1960). Cancer mortality for selected sites in 24 countries (1950–57). Sendai, Japan: Department of Public Health, Tohoku University School of Medicine.
  10. Smith PG (1992). Comparison between registries: age-standardized rates. In: Parkin DM, Muir CS, Whelan SL, Gao Y-T, Ferlay J, Powell J, editors. Cancer Incidence in Five Continents, Vol. VI. IARC Scientific Publication No. 120. Lyon: International Agency for Research on Cancer.