CHAPTER 5: Data comparability and quality

Freddie Bray and Jacques Ferlay

Background
The utility of the cancer registry is contingent on the underlying quality of its data and the quality control procedures it has in place; the primary function of CI5 is to enable the comparison of cancer incidence rates across a multitude of populations worldwide. In the evaluation of registered cases, three dimensions of quality have been assessed to ensure that the registry submissions to this volume meet a sufficiently high standard for inclusion.
Comparability is the extent to which a registry’s coding and classification procedures and definitions adhere to established international standards and guidelines. The definition of an incident case is especially important in evaluating comparability.

Completeness is the degree to which all diagnosed neoplasms within a registry’s catchment population are included in the registry database. Several methods can be used to evaluate the level of completeness of the enumeration of cases within a catchment population.

Validity (or accuracy) is the proportion of cases recorded as having a given characteristic that truly do have that attribute. Several indicators of validity relate to the precision of a registry’s source documents and the level of expertise in abstracting, coding, and recoding cases.

The preparation and evaluation of the indices of data quality for CI5 requires careful attention from the volume editors, to ensure that all accepted datasets are of sufficiently high quality to merit their inclusion in the volume. The editorial procedures used to conduct a transparent and impartial evaluation of each submitted dataset are outlined in this chapter.

Elements of the evaluation
The practical aspects and techniques of evaluating cancer registry data quality have been examined in a two-part review (Bray and Parkin, 2009; Parkin and Bray, 2009), and were briefly described – with an emphasis on low- and middle-income settings – in the recent IARC Technical Publication No. 43: Planning and Developing Population-based Cancer Registration in Low- and Middle-income Settings (Bray et al., 2014). The editorial board of CI5 Volume XI sought to comprehensively assess data quality on the basis of the indicators of comparability, completeness, and validity reported in these publications, in keeping with the approach taken in the previous volume (Forman et al., 2013), readers can examine and form their own opinions on the quality of individual datasets by reviewing the accompanying comparative tabulations and graphics in the printed book and online. These tables and figures serve as a guide to evaluating registries’ adherence to the standard definitions and recommendations, and the completeness and validity of their data.
As in previous volumes, the editors carried out an extensive process of verifying coding, identifying duplicate registrations, querying unlikely or impossible combinations of codes, and converting the data to a standard format before formal editorial consideration. At the editorial board’s meetings, the editors consulted a series of pre-assembled registry-specific tables and other documentation:

  1. a set of editorial tables (see four examples at the end of this chapter, generated using a hypothetical 2008-2012 dataset for a fictional registration area entitled Erewhon);
  2. tables of site-specific case numbers, age-specific rates, and summary rates (crude, cumulative, and age-standardized), as presented in this volume;
  3. the populations at risk by sex and age, including the source or method of estimation used (where applicable), and a comparison with the previous 5-year population data (where available), as presented in this volume; and
  4. the completed questionnaires, including responses related to the definitions used by each registry.
This review process was routinely applied to the evaluation of most of the 483 datasets submitted, but the increasing number of registries submitting data (including 102 registries from China, 73 of which were submitting for the first time) also warranted additional comparative overviews of key quality indicators across registries by region and country. As in previous volumes, asterisks are used to denote datasets for which particular consideration is required in interpreting the numerical results for some or all of the reviewed cancer sites (see the Notes on the datasets section later in this chapter).

Comparability
Determining the extent of the comparability of a cancer dataset requires consideration of the registry’s procedures, including the standards and definitions used in registration. In the preparation of this volume, the editorial team particularly focused on the following procedural aspects:

  1. the system used for classifying and coding neoplasms
  2. the definition of incidence – what constitutes a cancer case, the definition of date of incidence, and the rules for dealing with multiple primaries (i.e. for distinguishing new cases of primary cancer from extensions, recurrences, or metastases of existing cases); and
  3. the registration of cancers detected in asymptomatic individuals.

International standards for the classification and coding of neoplasms
The registries were asked to submit their data coded according to the third edition of the International Classification of Diseases for Oncology (ICD-O-3) (Fritz et al., 2000), as well as to verify (and if necessary, to correct) their data before submission. Coding from other systems can be converted to ICD-O-3 using software such as IARCcrgTools (Ferlay et al., 2005).
ICD-O-3 provides a standardized system for coding the following aspects of disease classification:

  1. topography: the anatomical location (body site) of the tumour;
  2. morphology: the microscopic appearance and cellular origin of the tumour;
  3. behaviour: the classification of the tumour as malignant, benign, in situ, or of uncertain behaviour;
  4. grade: the extent of differentiation of the tumour; and
  5. basis of diagnosis: the method of diagnosis used.

Definition of incidence
The CI5 Volume XI call for data specifically requested the submission of data on all primary tumours, including data (if collected) on basal and squamous cell skin cancers and non-malignant tumours of the central nervous system and urinary bladder. The rules for determining incidence date and multiple primaries are briefly described below.

Incidence date: Because the period from the occurrence of the first mutation to the clinical diagnosis of cancer often spans decades, a standardized definition of cancer is needed for determining whether to register a case and establishing the precise date when the disease became incident. Incidence dates are commonly defined using a hierarchical set of rules from one of three available algorithms, published by IARC (Jensen et al., 1991), the European Network of Cancer Registries (ENCR) (Pheby et al., 1997), and the SEER Program (Johnson et al., 2007). The registries were asked to state whether one of these algorithms or other, in-house rules were applied.

Multiple primaries: Because an individual may develop more than one cancer, there must be a clear distinction between new cases (to be counted as incident cancers) and cases that are actually extensions, recurrences, or metastases of an existing cancer. There are two sets of rules commonly used by cancer registries for this purpose. The SEER rules (Johnson et al., 2007) are used mainly by cancer registries in North America, whereas the jointly developed IARC/IACR rules (IARC, 2004) tend to be used throughout the rest of the world, at least for the purpose of reporting incidence rates.
The SEER rules result in somewhat higher incidence rates because they allow for the occurrence of multiple incident cancers at the same body site, providing the new case occurs 2 months to 5 years (depending on the site) after an earlier diagnosis, whereas the IARC/IACR rules allow for only one cancer per body site during a patient’s lifetime, unless there are multiple cancers of different histological types. The SEER rules also recognize new cases at different subsites of the same organ (e.g. the colon and the skin) or on opposite sides of the body (for paired organs).
Links to these rules were provided to all contributors to this volume. The registries were asked to include all multiple primary tumours in their submitted datasets for the time period covered (2008–2012) and to state which set of rules was used to define new primary cancers, and whether these could be distinguished from subsequent primaries that occur in the same person. The sites at which varying definitions of multiple primaries are likely to have the largest effect on incidence rates are listed in Table 5.1, along with the percentage differences in incidence at these sites (among a subset of SEER registries in the USA) using the SEER definition of a second primary (Johnson et al., 2007) versus the IARC/IACR rules.

Table 5.1. The percentage difference in crude and age-standardized incidence rates (at selected body sites) within the SEER (9 registries) 2003¬2008¬–2007 2012 dataset when determined using the SEER rules for multiple primary tumours versus the IARC/IACR rules (2004)
Difference in incidence rates using the SEER rules vs the IARC/IACR rules (%)
Body siteMalesFemales
CrudeASR (W)CrudeASR (W)
Colon3.63.53.63.1
Lung2.02.02.42.5
Skin (melanoma)8.78.05.54.9
Breast--5.95.6
Testis1.81.8--
Kidney3.93.91.81.7
All sites except skin2.12.03.23.0

Registration of cancers in asymptomatic individuals
Incidental diagnosis is the incidental detection of cancer in an asymptomatic individual (e.g. upon microscopic examination of tissue that has been removed for a reason unrelated to cancer). The incidental diagnosis of cancer occurs with particular frequency as a result of screening examinations and at autopsy.

Screen-detected cancers
When a screening programme is introduced within a population, cancer incidence rates increase, because the programme identifies prevalent cancers that are detectable by the screening test but have not yet progressed to the stage where they begin to cause symptoms. After the initial rounds of screening, these prevalent cases have all been detected, so the incidence rate decreases, but usually not all the way to the pre-screening level, due to some degree of overdiagnosis. This phenomenon occurs when cases are detected that would otherwise not have been diagnosed during a person’s lifetime, either because the cancer was so slow-growing that the individual would have died of another cause before the cancer was detected, or because the cancer was non-progressive and would never actually have become invasive. These cancers are sometimes called pseudodisease. Overdiagnosis can occur as a consequence of breast cancer screening, and it is even more common in prostate cancer screening. In both cases, screening (by mammography for breast cancer or by prostate-specific antigen [PSA] testing for prostate cancer) identifies small, slow-growing, latent tumours. Although many of these tumours would never have progressed to clinically significant cancer during the patient’s lifetime, it is currently impossible to predict which of them will. Incidence rates are therefore elevated in screened populations. Mammography screening programmes typically target women within the age range of 50–74 years. Men may undergo PSA testing at any age, but it is more common among men aged more than 50 years. Almost all cancer registries include malignant tumours that are detected during screening programmes or diagnosed on the basis of histological specimens taken from asymptomatic individuals in whom there was no clinical suspicion of cancer. The inclusion of these cases is likely to increase incidence rates, since at least some of the malignant cells identified in these ways would never have resulted in a clinical cancer diagnosis had they otherwise remained undetected.

Autopsy-detected cancers
Most cancer registries include cases identified during necropsy examinations of subjects in whom cancer had not been diagnosed (or perhaps even suspected) during life. The extent of the resulting inflation of incidence rates depends on the prevalence of necropsy examinations within the population. The impact is greatest in countries and regions with legislation that permits autopsies to be conducted for medical, scientific, or educational purposes without consent. However, such practices have generally been declining in most countries over recent decades.

Application of international standards in low- and middle-income settings
Cancer registries operating in low- and middle-income settings may face particular challenges to following international registration standards (Bray et al., 2014). For example, a lack of coverage by pathology laboratories or difficulty accessing diagnosis records reduces the percentage of microscopically verified cases, and results in postponement of the incidence date as determined according to the ENCR recommendations, which define the incidence date as the date of first histological or cytological confirmation of malignancy.

Completeness
Completeness – the extent to which all of the incident cancers occurring in the population are included in the registry database – is a very important aspect of data quality. The incidence rates calculated from registry data most closely approximate their true values within the population when maximum completeness is achieved (through the use of comprehensive case-finding procedures). The methods used in the editorial process for this volume to evaluate overall completeness are semiquantitative, in that they provide an indication of the degree of completeness of a given registry’s database relative to those of other registries or over time. The indices of completeness evaluated during the editorial process can be grouped into four categories:

  1. historical data:
    o the stability of incidence rates (the number of new cases) over time
    o a comparison of incidence rates in different populations
    o age-specific incidence curves
    o childhood cancer incidence rates;
  2. the proportion of cases microscopically verified (MV%);
  3. the mortality-to-incidence (M:I) ratio; and
  4. death certificate methods.
Duplicate registrations of the same case should be avoided through careful attention to record linkage during the registration process. As the datasets submitted for this volume contained individual anonymous patient identification numbers, it was possible for the editors to check for duplicates (and multiple primaries) according to the IARC/IACR rules (IARC, 2004). However, it was not possible to check for duplicates within a dataset using other data items, which could only be assessed by the individual registries before submission.

Regional comparisons
For several of the quantitative indices described below, a comparison with standard values was performed. In most cases, the standard used for comparison was the values from cancer registries in the same region (or by country when the number of high-quality registries was sufficient), using the data published in the previous two volumes of CI5. Diagnostic practices (especially with respect to histology and cytology) and the accuracy of recording the underlying cause of death on death certificates vary between populations and regions, but it is reasonable to assume that the incidence rates for specific cancers will tend to be relatively similar in datasets from the same region. In total, 26 regions (or countries) were defined for the purpose of calculating the standard values used to support editorial decisions (see Table 5.2). For each regional group of registries, the data from CI5 Volumes IX and X were used to calculate the mean and variance of the site-specific age-standardized incidence rates, the MV%, and the M:I ratios. Tables listing some of the standard values of MV%, M:I ratio, and age-standardized rate per 100 000 (ASR) that were used in the CI5 editorial process (i.e. those calculated for low- or middle-income countries and regions) are provided in Annex 2 of IARC Technical Publication No. 43 (Bray et al., 2014).
Ad hoc tables were used to identify unusually high or low incidence rates in specific regions – both for all sites combined and for certain major cancers. This helped the editors to assess completeness, by allowing them to identify outliers or unusual patterns.

Table 5.2. The regions defined for calculating standard values of age-standardized incidence rates, proportions of cases microscopically verified (MV%), and mortality-to-incidence (M:I) ratios
RegionsNumber of registries in the region
Africa
Sub-Saharan Africa11
North Africa and West Asia13
Central and South America
Central America and the Caribbean6
Brazil5
South America (excluding Brazil)14
North America
Canada12
USA13
Asia
China14
India12
Japan8
Republic of Korea8
Thailand6
Turkey4
Other countries in East Asia4
Europe
Eastern Europe9
United Kingdom and Ireland12
Northern Europe8
Italy33
Spain13
South-eastern Europe7
France11
Germany9
Switzerland9
Other countries in western Europe4
Oceania
Australia and New Zealand9
Oceania (excluding Australia and New Zealand)4

Historical data

Stability of the incidence rates (the number of new cases) over time
Changes in the completeness of registration may lead to the appearance of unexpected or implausible incidence trends within a registry’s dataset. Therefore, one of the key CI5 editorial tables (Editorial table 1) lists the number of new cases registered by major diagnosis groups per calendar year (and the corresponding percentage of the total number of cases), broken down by sex, with an accompanying bar chart that provides a visual check of the amount of variation in the total numbers of cases per year (at all sites and in both sexes) over the time period covered. At the bottom of each bar, a percentage value indicates how many cases were registered that year relative to the highest number of cases registered in any single year of the covered period. In some cases, this visual check may suggest potential problems with the registration process (or the source population data) during the registration period.
A further editorial table (Editorial table 2) presents for males and females, average annual incidence rates (per 100 000 person-years) by site and age group, as well as summary rates. This table also includes a column (with the heading CHV10) that lists the estimated annual percentage change in the incidence rates since Volume X of CI5, and the changes that are statistically significant are shown in bold. This rate comparison of Volumes X and XI, and other comparisons that were performed as part of the editorial process, are described in the Statistical tests section later in this chapter. Changes in incidence rates over time that are larger than expected (and that cannot be attributed to discrepancies in the estimation of person-years at risk) suggest the possibility of changes in the completeness of case ascertainment.

Comparison of incidence rates in different populations
The possibility of incomplete registration is also investigated by comparing observed incidence rates with expected values calculated using data from registries in the same region, and an editorial table is generated for this purpose (see Editorial table 3). This table presents the age-standardized incidence rates (and their standard errors) for 23 sites (and the total for all sites) in males and females, along with the ratio of the observed value to the expected value (O/E). If the observed age-standardized rate is significantly different from the expected value for the corresponding country or region, the O/E is shown in bold and flagged with a greater-than symbol (>) if the value is higher than expected or a less-than symbol (<) if the value is lower than expected. This comparison and others that were performed as part of the editorial process are described in the Statistical tests section later in this chapter. In addition to consulting this editorial table, the editors also frequently compared ad hoc tabulations from registries covering geographically or ethnically similar populations. In some cases, deviation from regional standards may be the result of specific local variations in the prevalence and distribution of risk factors, or in the presence or intensity of screening for certain cancers, but systematic discrepancies (i.e. those seen for several different sites) suggest the possibility of underregistration (or overregistration – e.g. due to the inclusion of duplicate records).

Age-specific incidence curves
As part of the editorial process, age-specific incidence (per 100 000 person-years) curves for 12 sites by sex (see Editorial figure 1) were generated and examined to detect any abnormal fluctuations in the anticipated patterns, such as an unexpected drop in the rate of increase in incidence in older age groups, which may be indicative of underascertainment within these groups (although there can also be other explanations). These curves can also reveal problems with the source files used to determine the size of the populations at risk in the various age groups (see also the Population section later in this chapter).

Childhood cancer incidence rates
The incidence rates of cancer (all types combined) within children (i.e. within the age groups 0–4 years, 5–9 years, and 10–14 years) tend to exhibit much less variability than do the incidence rates of cancer in adults, although there are some well-documented geographical and ethnic differences for certain childhood cancers. The possibility of underenumeration (and duplicate registration) in this age range within the Volume XI data was investigated by comparing incidence rates within the childhood age groups with the corresponding values from Volume X. The lowest and highest deciles of incidence rates of childhood cancer in the Volume X data are shown in Table 5.3.

Table 5.3. The lowest and highest deciles of incidence rates (per 100,000) of childhood cancer in Volume X
Age group (years)BoysGirls
LowestHighestlowestHighest
0-4< 12.6> 26.4< 12.1> 23.7
5-9< 8.9> 17.9< 7.0> 13.0
10-14< 9.0> 17.2< 8.2> 16.0

Proportion of cases microscopically verified (MV%)

The MV% is the percentage of cases that were diagnosed on the basis of microscopic verification of a tissue specimen (sometimes also called morphological verification – the two terms are synonymous). The definition of microscopically verified cases includes histologically confirmed cases, cases diagnosed on the basis of exfoliative cytology specimens, and cases of leukaemia diagnosed on the basis of haematological examination (without examination of bone marrow). The main use of MV% as an indicator of data quality is as a measure of validity (see the Validity section later in this chapter), but a very high proportion of cases diagnosed by histology, cytology, or haematology – higher than might reasonably be expected – may also suggest that a registry is over-reliant on pathology laboratories as a source of information and is failing to find cases diagnosed by other means. One of the editorial tables discussed earlier in this chapter (see Editorial table 3) also includes a column showing observed MV% values for 23 sites (and the total of all sites) in males and females. In this MV% column, any observed values that are significantly greater than or less than the expected value (an average for the corresponding country or region) are shown in bold and flagged with a greater-than symbol (>) or a less-than symbol (<), respectively. This comparison is also described in the Statistical tests section later in this chapter.

Mortality-to-incidence (M:I) ratio

The M:I ratio is an important indicator of completeness, and its use for this purpose is an example of the independent case ascertainment method of evaluating registry completeness. The M:I ratio compares the number of deaths due to a specific type of cancer over a specific period of time (obtained from a source that is independent of the registry – usually the vital statistics system) with the number of new cases of that type of cancer registered during the same period. When the quality of the mortality data is good (especially in terms of the accuracy of cause of death) and incidence and survival are in steady state, the M:I ratio is approximated by 1 minus the 5-year survival probability. Because both survival and the quality of mortality statistics are somewhat related to the level of socioeconomic development, it is important to consider a registry’s geographical location when evaluating this statistic. As part of the CI5 editorial process, the observed M:I ratios for registries’ datasets were compared to standard values from the same region, testing for significant differences (see the Statistical tests section later in this chapter). Editorial table 3 includes a column showing observed M:I ratios for 23 sites (and the total of all sites) in males and females. Within this column, any observed values that are significantly greater than or less than expected (based on the average regional values from Volume X) are shown in bold and flagged with a greater-than symbol (>) or a less-than symbol (<), respectively.
M:I ratios that are higher than expected raise suspicion of incompleteness (i.e. incident cancers missed by the registry), especially if the values are high for several different sites. However, under- or overreporting of tumours on the death certificates distorts this relationship, as does a lack of constancy in incidence and case fatality (the rate of death among incident cases) over time. For example, if incidence increases while case fatality (or survival) remains relatively constant, the M:I ratio tends to be less than (1 minus survival); conversely, if incidence decreases relative to case fatality, the M:I ratio is greater than (1 minus survival), and may even exceed a value of 1 for more lethal cancers.
The use of this method requires mortality data that are of good quality, especially in terms of the accuracy of cause of death. This method cannot be used where there is no comprehensive death registration, or when cause of death is missing or inaccurate on death certificates, which is the situation in almost all countries in Africa and many countries in Asia.

Death certificate methods of evaluating completeness

Access to death certificates is important to cancer registries as a means of finding cases not captured by other registration procedures. The completeness of registration may be evaluated on the basis of the proportion of incident cancers that come to the registry’s attention via death certificates. Fig. 5.1 illustrates the process of registering new cases using death certificates that mention cancer.
A cancer registry’s record linkage procedures should enable the registry to accurately determine whether a death certificate case is already in the database (i.e. previously notified by and registered from another source). A death-certificate-notified (DCN) case is any case first notified by a death certificate. For some DCN cases, the registry later receives a separate notification (without needing to initiate trace-back procedures), and the case is consequently registered from this non-death-certificate source. Because it is possible (depending on registry procedures) for a death certificate mentioning cancer to be received before other relevant notifications (e.g. a pathology report), it has been suggested that registries should establish a suitable interval between receiving a first notification by death certificate and initiating a registration on the basis of this information. If no other notifications are received after this interval, trace-back procedures are initiated. Some cases that are successfully traced back may be found not to be cancers, and are therefore not registered. The remaining cases are classified as death-certificate-initiated (DCI) cases, of which there are two types: (1) successfully traced cases found to be cancers, which are consequently registered from the appropriate source, and (2) cases for which no information source other than a death certificate mentioning cancer can be found, which are consequently registered as death-certificate-only (DCO) cases.


Fig. 5.1. The process of using death certificates (DCs) to identify new cancer cases, which may subsequently be classified as death-certificate-notified (DCN), death-certificate-initiated (DCI), or death-certificate-only (DCO) cases (see the chapter text for more details). Adapted from European Journal of Cancer, 45(5), Bray F and Parkin DM, Evaluation of data quality in the cancer registry: principles and methods. Part I: comparability, validity and timeliness, pages 747–55, Copyright 2009, with permission from Elsevier.

DCO cases represent the residuum of the trace-back process; they are the remaining DCN cases for which no other information source could be obtained through any of the registry’s trace-back procedures. By itself, therefore, the proportion of DCO cases (DCO%) is not an indicator of completeness of registration; a low DCO% may indicate efficient case finding or may result from the efficient trace-back of DCN cases. However, the proportion of DCI cases (DCI%) is always greater than or equal to the DCO% (see Editorial table 3), so an elevated DCO% is suggestive of incompleteness.
Like other indicators, the DCO% must be interpreted in the context of local circumstances. In some developing countries, the quality of death certificates may be very poor, with many deaths erroneously attributed to cancer, and registries may have difficulty tracing these notifications back to a hospital capable of confirming (or contradicting) the death certificate statement. Because death certificate methods rely on the availability of relatively high-quality (complete and accurate) certification of cause of death within the registration area, they are not readily applicable in many low- and middle-income settings, or even in many high-income settings.

Validity

Validity (or accuracy) is defined as the proportion of cases in a dataset recorded as having a given characteristic (e.g. site or age) that truly do have that attribute. Several linked methods are used in this volume to provide numerical indices of validity, which enable (at least on an interval scale) comparisons between registries, within a single registry over time, and within a single registry with respect to specified subsets of cases (cases at certain sites, from different sources, etc.). These methods include internal consistency methods, diagnostic criteria methods (histological verification and DCO cases), and missing information analyses (e.g. primary site unspecified, age unknown).

Internal consistency
The use of the IARCcrgTools software to perform consistency checks on the submitted datasets is described in Chapter 6 of this volume. Registries were asked to verify and correct their data using this or other software tools before submission, and to ensure that the ICD-O-3 coding system was used for all relevant variables.

Microscopic verification
Typically, the accuracy of a stated diagnosis is likely to be better if the diagnosis is based on histological examination by a pathologist. However, surveys have shown that many cancer registries code diagnoses based on exfoliative cytology or haematological examination of peripheral blood in the same category as those based on histological examination (i.e. the cases are all coded as microscopically verified), making it is impossible to distinguish between them in the data. Partly for this reason, the index of validity used in the editorial tables (see Editorial table 3) and the tables showing indices of data quality in this volume include the MV% rather than the proportion of cases histologically verified (HV%).
As noted in the Completeness section above, any observed MV% values in Editorial table 3 that are significantly greater than or less than expected (compared to the regional standard) are shown in bold and flagged with a greater-than symbol (>) or a less-than symbol (<), respectively.

Death-certificate-only (DCO) cases
The proportion of cancers for which no information other than a death certificate mentioning cancer can be found – the DCO% – is another measure of validity, since the information on death certificates is generally less accurate and/or precise than information obtained from clinical or pathology records. A considerable effort has been made in the editorial process to ensure that cases reported as DCO truly are DCO cases. As stated earlier in this chapter, DCO cases represent the residuum of the trace-back process; they are the remaining DCN cases for which no information other than a death certificate mentioning cancer could be obtained through any of the registry’s trace-back procedures (see Fig. 5.1). Establishing acceptable and objective criteria for the DCO% has been a contentious issue in international comparative studies. As stated earlier, a low DCO% may simply reflect efficient trace-back of cases initially missed by the normal case-finding procedures. The DCO% is also influenced by local circumstances (including the availability and accuracy of death certificates) and the registry’s ability to successfully link records.

Other and unspecified/age unknown
The proportion of registered cases with unknown values for various data items can be an indicator of data quality. Unknown values can result from problems with: • the data collection system (or access to necessary source documents)
• the item and code values that are defined
• the application of coding rules.
The definitions used influence the proportion of unknown codes, for example, when evaluating cases with the primary site coded as “Other and unspecified” (O&U). This classification is defined in detail in Chapter 3. Other variables for which the proportion of cases with missing values is commonly evaluated include age, ethnicity, and disease stage. A high proportion of cases with missing values generally implies poor diagnostic precision (as evidenced by the low MV% observed among O&U cases) or a failure to specify the site of the primary cancer in cases diagnosed on the basis of tissue obtained from a metastasis. The proportion of unknown values usually varies by primary site and tends to be higher among elderly patients. The proportions also vary somewhat between registries. The percentages of O&U cases and cases with unknown age, by registry, are shown in Table 5.4 at the end of this chapter.

Population

It is important to remember that a 10% error in the estimation of the population at risk produces the same level of inaccuracy in the calculated incidence rate as a 10% error in enumeration of cases. However, cancer registries are generally not responsible for population estimates, and must rely on official censuses, or intercensal/postcensal estimates provided by vital statistics departments or their equivalents. The editors asked all contributing registries to provide official population estimates and state the source of the population data. This information has been summarized for each entry, along with the average annual population at risk during the period covered by the registrations.
Although the population data provided by the registries could rarely be directly verified by the editors, the shapes of the population pyramids, as well as any irregularities in the age-specific incidence curves, were used to identify potential errors in the population estimates (Editorial table 2), and if necessary, the appropriateness of the source of the information provided was queried. Potential problems with estimating the population at risk are stated in the Notes on the data section of each entry. In some cases, a high likelihood of inaccuracy in estimates of the population at risk contributed to the editorial decision to mark a registry’s contribution with an asterisk.

Notes on the datasets: the asterisks (*)

The presence of an asterisk indicates that additional care is required in interpreting the numerical results for some or all cancer sites; readers should refer to the Notes on the data section of the corresponding registry description for the specific reasons. The principal use of the asterisks is to denote datasets that are considered by the editors to have characteristics suggesting questionable quality or completeness of information on cases or the population at risk, as well as datasets for which the editors could not evaluate the relevant indices due to deficiencies in the registration process. The criteria used in this judgement were not rigidly defined; the decision was based on an examination of all the indices described in this chapter and knowledge of the circumstances within which the registry operates. The intrinsic value of a given dataset in providing information on little-known geographical and ethnic patterns, as well as continuity with earlier data from the same registry, were also taken into consideration. For the purpose of comparability between registries, all datasets for which no official mortality data could be provided or that included no DCO registrations (due to lack of access to death certificates) were also flagged with an asterisk.

Statistical tests

Four comparisons (for which statistical tests were applied) were made as part of the editorial process for CI5 Volume XI:

  1. a comparison of each age-standardized incidence rate with the corresponding value from Volume X;
  2. a comparison of each registry’s age-standardized incidence rates for major sites with the corresponding Volume X values for registries in the same country or region;
  3. a comparison of each registry’s MV% values for major sites with the corresponding Volume X values for registries in the same country or region; and
  4. a comparison of each registry’s M:I ratios for major sites with the corresponding Volume X values for registries in the same country or region.
The results of these tests were not published but have been used to flag certain registry datasets as unusual or possibly inconsistent with previously published data, and therefore requiring further investigation. This battery of tests was first implemented in Volume VIII, and the methodological details and formulae are provided in Chapter 5 of that volume (Parkin and Plummer, 2002).

References

  1. Bray F, Parkin DM (2009). Evaluation of data quality in the cancer registry: principles and methods. Part I: comparability, validity and timeliness. Eur J Cancer. 45(5):747–55. http://dx.doi.org/10.1016/j.ejca.2008.11.032 PMID:19117750
  2. Bray F, Znaor A, Cueva P, Korir A, Swaminathan R, Ullrich A, et al. (2014). Planning and Developing Population-based Cancer Registration in Low- and Middle-income Settings. IARC Technical Publication No. 43. Lyon: International Agency for Research on Cancer.
  3. Curado MP, Edwards B, Shin HR, Storm H, Ferlay J, Heanue M, et al., editors (2007). Cancer Incidence in Five Continents, Vol. IX. IARC Scientific Publication No. 160. Lyon: International Agency for Research on Cancer.
  4. Ferlay J, Burkhard C, Whelan S, Parkin DM (2005). Check and Conversion Programs for Cancer Registries (IARC/IACR Tools for Cancer Registries). IARC Technical Report No. 42. Lyon: International Agency for Research on Cancer.
  5. Forman D, Bray F, Brewster DH, Gombe Mbalawa C, Kohler B, Piñeros M, Steliarova-Foucher E, Swaminathan R, Ferlay J, editors (2014). Cancer Incidence in Five Continents, Vol. X. IARC. Scientific Publication No. 164. Lyon: International Agency for Research on Cancer.
  6. Fritz A, Percy CL, Jack A, Shanmugaratnam K, Sobin L, Parkin DM, et al., editors (2000). International Classification of Diseases for Oncology. 3rd ed. (ICD-O-3). Geneva: World Health Organization.
  7. IARC (2004). International Rules for Multiple Primary Cancers ICD-O Third Edition. Internal Report No. 2004/02. Lyon: International Agency for Research on Cancer.
  8. Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RG, editors (1991). Cancer Registration: Principles and Methods. IARC Scientific Publication No. 95. Lyon: IARCPress.
  9. Johnson CH, Peace S, Adamo P, Fritz A, Percy-Laurry A, Edwards BK (2007). The 2007 Multiple Primary and Histology Coding Rules. Bethesda (MD): National Cancer Institute, Surveillance, Epidemiology and End Results Program. Available from: http://seer.cancer.gov/tools/mphrules/, accessed 20 November 2014.
  10. Parkin DM, Bray F (2009). Evaluation of data quality in the cancer registry: principles and methods Part II. Completeness. Eur J Cancer. 45(5):756–64. http://dx.doi.org/10.1016/j.ejca.2008.11.033 PMID:19128954
  11. Parkin DM, Plummer M (2002). Comparability and quality of data. In: Parkin DM, Whelan SL, Ferlay J, Teppo L, Thomas DB, editors. Cancer Incidence in Five Continents, Vol. VIII. IARC Scientific Publication No. 155. Lyon: International Agency for Research on Cancer; pp. 57–73.
  12. Parkin DM, Whelan SL, Ferlay J, Teppo L, Thomas DB, editors (2002). Cancer Incidence in Five Continents, Vol. VIII. IARC Scientific Publication No. 155. Lyon: International Agency for Research on Cancer.
  13. Pheby D, Martínez C, Roumagnac M, Schouten LJ (1997). Recommendations for Coding Incidence Date. Ispra, Italy: European Network of Cancer Registries. Available from: http://www.encr.eu/index.php/activities/recommendations, accessed 20 November 2014.