INNOVATION IN STATISTICAL METHODS

At the time when IARC was established, in 1965, there was an obvious need for competence in the design and statistical analysis of laboratory experiments and epidemiological and clinical studies of cancer, and of diseases generally. This need was hard to meet because of a shortage of qualified people, exacerbated by the demands of developing methods well suited to studies of noncommunicable diseases and of mastering the potential of electronic computers, recently introduced and still unfamiliar.

To meet this need, work on several main topics in statistical methodology was soon initiated within IARC’s areas of research. The conditions at IARC, like at other institutions, reflected the status of information technology in the late 1960s and early 1970s. Jacques Estève, who was the head of information technology at IARC at that time, recalls: “I had to introduce the first data management system. In the first years of IARC, things had been pretty disordered. The epidemiologists were unhappy as they had great difficulty in retrieving their data; once they were entered into the computer, how to access them was a kind of practical mystery. The computer installation to support the new data management system occupied a large room, and provided much less computing power than today’s smallest laptop. Yet over a few years the data management performance was transformed for the better.” This was only the first of a series of transformations that kept IARC’s computing system on a par with the constantly evolving technology.

Students at work with mechanical calculating machines, the tools usually available in the late 1960s for statistical analyses of epidemiological data sets. Beyond the four arithmetic operations, these machines could calculate the sum of a sequence and the sum of the products of two sequences of numbers.

Images

In 1972 WHO had installed an IBM 360, which at the time was a very powerful computer. We wrote programs in FORTRAN and prepared a set of Hollerith cards to instruct the computer how to analyse the data, followed by 600 or 1000 data cards. I would rise at 5:00 am, walk up to the Gare des Brotteaux in Lyon, get on the overnight train going from Barcelona to Copenhagen, with a stop in Geneva, where I would have breakfast in Gare Cornavin, take a bus up to the WHO headquarters, work all day feeding my cards into the machine, and then in the evening I would reboard the train to Lyon. – Norman Breslow, former IARC scientist

A UNIFIED FRAMEWORK FOR EPIDEMIOLOGICAL STUDIES OF CANCER ETIOLOGY

Epidemiological studies aimed at investigating causes of cancer were – and are – at the core of IARC’s research. The development at IARC of statistical methodology for etiological studies produced notable results, some of which have proven to be of lasting value as they still serve as key references. This applies especially to Statistical Methods in Cancer Research, by Norman Breslow and Nick Day, published in two volumes: The Analysis of Case–Control Studies in 1980 and The Design and Analysis of Cohort Studies in 1987. The book is still available on the IARC website and is, quite reasonably, characterized as a classic text in the field (see “Frontline statistical research: Norman Breslow and Nick Day”).

Images

Several factors combined to make the book a success. First, it was timely. The title refers to cancer research in general, but in fact the book deals essentially with statistical methods for cancer epidemiology (although some of the methods, such as survival analysis, can also be applied to animal experiments). In cancer epidemiology, methodological innovations had been flourishing since the 1950s, aimed at solving specific problems of data analysis. However, the connections between the different new methods were not obvious, and their relative merits and limits of applicability were not well defined. In Breslow and Day’s book, these methods, which had been scattered among articles in statistical and epidemiological journals, were critically reviewed and related to one another in a logically coherent framework. Second, in doing so, the authors frequently used original results from their own methodological research. Third, the presentation was at a level that respected theoretical and formal rigour while being mostly accessible to readers with a limited mathematical background. Fourth, and most importantly, the statistical analyses were illustrated step-by-step by applying them to real data from epidemiological studies. This was uncommon in the methodology books then available, and even in those published later. A strong connection was maintained between the methods and the substance of epidemiological investigations, such as studies of the relationship between alcohol consumption and oesophageal cancer, hormones and endometrial cancer, or ionizing radiation and lung cancer.

Mortality rates for oesophageal cancer in the Brittany region (deaths per 100 000 population per year during 1958–1966), by canton. Rates in Brittany were markedly higher than the average rates in France. Within the region, major variations occurred among cantons. A relationship with different levels of alcohol consumption was suspected, and epidemiological studies were initiated to test this hypothesis.

Images

A 2014 survey of books on epidemiological and statistical methods in the biomedical literature showed that Breslow and Day’s book currently receives 100–200 citations per year in research contexts, as a reference for now well-established methods or for teaching purposes (see “Case–control studies”). The Breslow–Day test, first introduced in the book, is often found in research papers as a statistical test of whether risk (e.g. of lung cancer in smokers compared with non-smokers) is the same in different subgroups (e.g. men and women).

In Breslow and Day’s book, cancers were considered as occurring in a cohort of people specially assembled to study the causes of cancer. One can also consider cancers, or deaths from cancers, occurring in populations within defined geographical areas, or cancer-related deaths or recurrences of cancer occurring in groups of patients. Statistical methods to deal with these two situations were presented in 1994 in an IARC book co-authored by Jacques Estève, Statistical Methods in Cancer Research: Descriptive Epidemiology. It details methods for analysing data as typically gathered by cancer registries, including examining how cancer occurrence evolves over time, as well as geographical variations in cancer frequency and their correlations with geographical variations in factors such as income, air pollution, alcohol consumption, and diet. The book also covers a key topic for evaluating the effectiveness of cancer treatments: the analysis of cancer patients’ survival.

Survival analysis is also at the core of animal experiments to test whether a substance is carcinogenic. Supplement 2 of the IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, published in 1980, has a 100-page annex on statistical methods, produced collaboratively by IARC and external statisticians. This annex highlights with great clarity how findings from carcinogenicity experiments in animals should be correctly analysed (see “Analysing animal carcinogenicity experiments”). A substantial expansion, with a focus on formal statistical models of analysis, followed in 1987 with the IARC book Statistical Methods in Cancer Research: The Design and Analysis of Long-Term Animal Experiments, of which Jürgen Wahrendorf was the senior co-author. The issues developed in these publications are a further example of the value of IARC’s contributions to the analysis of cancer data and are relevant not only for cancer research but also for long-term toxicological experiments in general.

A NOVEL EPIDEMIOLOGICAL STUDY DESIGN

Reliable data analysis does not depend only on statistical methods; these are highly dependent on the way in which the data have been collected, and hence the design of a study matters as much as the analytical approach. The Gambia Hepatitis Intervention Study ranks among IARC’s key projects. Its substantive relevance for establishing the etiology of liver cancer and testing the preventive effectiveness of the vaccine against hepatitis B is outlined in the chapter “Viruses and vaccines”. Equally important from a methodological viewpoint was its novel study design.

The Gambia Hepatitis Intervention Study originated in the mid-1980s under particular circumstances. A vaccine was available that was known to be effective against hepatitis B. The research question was whether preventing hepatitis B infection (i.e. preventing newborns from becoming carriers of the hepatitis B virus) would prevent the later occurrence of primary liver cancer. Initially, it seemed that the only ethically admissible way to answer this question would be to start administering the vaccine to all newborns in a given year and then compare (several decades later) the liver cancer occurrence in vaccinated people with that in the unvaccinated people who were born before the vaccination programme started. This is known as a “pre–post” comparison. Such an approach is fraught with potential biases because many other factors, which vary over time and have nothing to do with the vaccine, could induce a change in cancer occurrence and detection.

The study design was considered ethically uncontroversial, as confirmed by the IARC Ethics Committee, which had recently been established (see “The IARC Ethics Committee”). But the design was scientifically weak, a serious handicap considering the considerable investment of resources that would be demanded by the project over a projected 40-year period. However, one major practical constraint to vaccine delivery soon emerged that was turned into a scientifically strong study design. In fact, it would have been logistically impossible to start administering the vaccine to all newborns in The Gambia in a given year – more than 60 000 newborns, scattered across rural areas. The only feasible procedure was to introduce the vaccine gradually over several years.

The crucial methodological innovation was to choose at random – rather than by convenience or in a systematic way – the newborns to be vaccinated each year. Actually, clusters of newborns (i.e. local vaccination teams), rather than individual newborns, were chosen at random. During the first year of the programme (1986), about 25% of all newborns, coming from areas covered by four vaccination teams chosen at random from a total of 17 countrywide, were vaccinated (to be compared with the 75% who were unvaccinated). During the second year, 50% were vaccinated. During the third year, 75% were vaccinated, and finally during the fourth year, all newborns were vaccinated. This design made possible an unbiased comparison between the randomly chosen vaccinated and unvaccinated subjects within each of the first three years of the programme. The random choice of newborns to be vaccinated was ethically unobjectionable since it was non-discriminatory and impartial.

This design, first implemented in the Gambia Hepatitis Intervention Study, is both scientifically and ethically sound and has entered standard methodology as the “stepped-wedge” trial design. The principle is that an intervention is assigned sequentially to the trial participants, either as individuals or as clusters of individuals, over several time periods. Which individuals or clusters receive the intervention in each time slot is determined at random, and by the end of the random allocation all individuals or groups will receive the intervention. This type of design has been used, and continues to be used, in a variety of studies within and outside the field of cancer research, particularly in the evaluation of the effects of vaccinations, screening, and health education programmes.

What impressed me enormously was the Gambia hepatitis B programme and the absolute commitment of the Director at the time to ensuring that IARC could continue vaccination after the numbers needed in the vaccinated and unvaccinated groups were complete. Most researchers would say, “We will now continue with our research programme and we’re very sorry but you’ll have to go and look elsewhere for the money to continue the vaccinations”, but that was not the approach taken by IARC. I thought that was really an amazingly good thing to do. – Bruce Armstrong, former IARC Deputy Director

ANALYSING MULTICENTRE EPIDEMIOLOGICAL STUDIES – A KEY IARC ACTIVITY

Breslow and Day’s synthesis consolidated a methodological basis that could be used as a standard starting point for a great variety of specific developments. At IARC, research in statistical methods has become more specialized over the ensuing decades and is now embedded within the different types of epidemiological studies. However, some areas of work maintain a more general perspective. One example is a recent paper from IARC, “Penalized loss functions for Bayesian model comparison”. Although the title sounds highly esoteric, in fact this research addresses the very general and fundamental issue of how to choose the best model in the analysis of any data set (e.g. how best to formalize the mathematical relationship between intake of various foods and the occurrence of colon cancer).

A second topic of broad relevance is the analysis of data from multicentre epidemiological studies. Conducting investigations in multiple populations was inherent to the scientific rationale for the establishment of IARC, particularly because in the mid-1960s this type of study was not common in cancer research. For example, multiple populations may be those in different geographical areas, chosen because they may have widely different lifestyle habits. Or multiple populations of workers exposed to the same potential cancer hazard (e.g. a chemical) at various factories may be chosen to achieve a total population large enough to attain a high sensitivity for detecting an increase in risk if it exists. Another advantage of multicentre studies is the possibility of verifying whether the results obtained within the different populations are consistent with each other. For example, finding the same inverse relationship between the intake of vegetable fibre and the frequency of colon cancer in different populations would be strong evidence in favour of a causal preventive role of vegetable fibre intake. In science, replicability, or at least consistency, of results – as is feasible in multicentre studies – is the most stringent criterion for judging causality. The methods for assessing consistency, although simple in principle, are fraught with complexities in practice (see “Combining epidemiological results from multiple populations”). Optimizing these methods is a continuous area of research in biostatistics at IARC.