History and Methodological Aspects of Analytic Studies of AD: Case–Control Studies
Case–control studies, by virtue of their generally lower costs and relatively fast completion, were conducted as the first step in the systematic investigation of the etiology of AD. This study design was most common in the 1970s and 1980s. There are surprisingly few reports in the literature summarizing the results and methodological shortcomings of these studies. The case–control study gave epidemiologists a jumping-off place, beginning with what is fondly and at the same time disparagingly termed “fishing expeditions.” Fishing expeditions are done when little is known about a field. Investigators examine a large number of exposures in one case–control study. These may include risk and protective factors that already have some support from earlier studies (e.g., age and family history of memory problems), seem biologically plausible (e.g., head trauma, smoking), or that give rise to other diseases (e.g., pesticides in Parkinson’s disease, diabetes for vascular disease). This in turn can produce spurious findings (Type I errors) if the number of comparisons is large and no correction for multiple comparisons is used. There was the danger in the case–control study era that a statistically significant finding may have been due to chance. One of many examples of this is the report from a case–control study that nose-picking was associated with later-onset AD (ORmatched = 7.0 [7/1 discordant pairs, p = 0.08]), and physical underactivity was associated with earlier-onset AD (ORmatched = not calculable due to a zero in the denominator [14/0discordant pairs, p = 0.0005]) (Henderson et al., 1992). At a meeting where these findings were presented, one of our colleagues joked to one of us, “To stop Alzheimer’s, we just have to stop sitting around and picking our noses.” Although this comment was obviously meant to be facetious, we must be cautious about interpreting results based on small numbers.
While the case–control studies likely produced some false-positive results, they may also have produced false-negative ones. In order to focus on the purest form of AD, most of these studies chose to exclude possible AD by NINCDS-ADRDA criteria, eliminating cases where the relatively common vascular lesions could be contributing to disease expression. By focusing on probable AD cases without evidence for stroke or cerebrovascular disease, individuals with vascular risk factors for these outcomes were excluded from the cases, but not from the controls. Consequently, risk factors such as hypertension, diabetes, and hypercholesterolemia were not identified as risk factors for AD in the case–control studies. When prospective cohort studies began, participants unselected for vascular risk factors at baseline were followed for development of incident AD, permitting the important role of vascular risk factors in the clinical expression of AD to be observed. In fact, prospective cohort studies sometimes obtained the opposite findings from case–control studies. For example, case–control studies suggested that cigarette smoking, a risk factor for stroke, might be inversely related to AD. Cohort studies gave us the opposite answer, that cigarette smoking was a risk factor (Chapter 13). Cohort studies have also permitted the examination of risk factors for other dementia subtypes, such as Lewy body dementia and frontal temporal dementia.
Most of the cases in case–control studies were identified from dementia clinics or through hospital records. A selection bias is likely present in cases who were brought to medical attention by their family members. If family members were in denial, the patients may not come to medical attention until a moderate-to-severe dementia is present. Other AD patients may not come to medical attention at all, because they lived alone and did not have close family members to observe their condition and bring them to medical attention. In addition, if a demented individual identified in a clinic had no identifiable proxy informant, they would usually have been excluded from case–control studies.
The use of proxy informants is necessary for dementia patients who cannot provide valid information about their own exposures. However, their use invites information bias. Case proxies may either overreport or underreport past exposures, depending on their own recall, level of background reading, and desire to pin the cause on an identifiable event, such as a head injury. Furthermore, recall bias can be aggravated by control proxies underreporting exposures. Proxies are frequently selected from the next-of-kin who brings the patient in. This will most often be a spouse or an adult child. For some types of remote information, such as information about early life and adolescence, siblings may be better informants. The proxy type was usually not matched in the cases and controls, which could result in odds ratios biased either toward or away from the null value of 1.0.
When proxy informants must be used for cases, they should also be used for controls (Nelson, Longstreth, Koepsell, & van Belle, 1990). In an attempt to evaluate the quality of the information obtained from proxy informants, some case–control studies of AD conducted a validation study of control and control–proxy informant pairs to measure the degree of agreement between the two sources of information. The assumption of such validation studies was that the agreement between case–case proxy informant pairs would be similar to control–control proxy informant pairs. This is not necessarily true. For example, in a case–control study matching on informant type (we only accepted spouse informants), we rated exposure to chemicals in the cases’ and controls’ occupations two ways by asking the spouse and by assessment by an industrial hygienist who was blinded to case–control status. Figure 9.1 shows the results for one chemical exposure (Graves, A. Borenstein, unpublished). Epidemiologists usually assume that case proxies over-recall exposures due to the natural instinct to identify a cause for the disease. In this study, the sensitivity of reporting positive exposure was higher (83%, compared to an objective and blinded assessment made by an industrial hygienist) among case spouses, than among control spouses (61.5%). The specificity of reporting exposures by spouses was the same for both cases and controls, implying that if the industrial hygienist rated the occupation as not exposed to a certain chemical, both case and control spouse proxies were equally likely to rate it as nonexposed. These findings suggest that control informants may be more likely to under-recall exposures than case informants to over-recall exposures.
Most cases used in the case–control studies were prevalent cases. This implies that some of the identified risk factors may be associated with survival with AD, but not with incident AD. This problem is compounded by how long a case included in a case–control study is allowed to have had the disease. There usually were no limits placed on this in the inclusion criteria of most studies. Therefore, longer-duration cases were more likely to be included (this is called “length bias”). For example, if the case-fatality due to AD is modified by smoking status, smokers will be less likely to be identified as prevalent cases in case–control studies. This could result in an inverted odds ratio, suggesting that smoking is protective for AD (Graves and Mortimer, 1994).
The study base principle that cases and controls should be members of the same defined population (Wacholder, McLaughlin, Silverman, & Mandel, 1992) was often not met in case–control studies. Sometimes cases came from a specific hospital or clinic and controls were also selected from the same hospital. Other times, controls might have been identified from people living in the same neighborhoods as the cases. This is not because the investigators were not aware of the study base principle, but rather because before population-based studies of AD were established, ascertainment of all cases in a primary base was not possible. In these instances, the desire for cases and controls to resemble each other by characteristics other than the disease took precedence over representation of the study base. In some studies, for example, controls were selected from patients in the same hospitals in which the case was diagnosed. Although this may seem appropriate, it would have the effect of increasing the frequency of cardiovascular and other diseases in the control sample, making it less likely that these diseases would be found to be risk factors and possibly suggesting that such diseases in the controls were protective for AD.
After the first set of case–control studies was published, a reanalysis of raw data (a pooled reanalysis) was conducted (van Duijn, Stijnen, Hofman, 1991). This type of reanalysis is useful in increasing the power to observe significant associations that are not evident in individual studies, particularly for uncommon exposures. However, it also can amplify the effects of uncontrolled bias or confounding. This is not a problem in the traditional use of meta-analysis of randomized clinical trials (RCTs). If subtle Relative Risks (say for the sake of argument, a RR =1.2) are not statistically significant because the individual trials are not sufficiently large, then pooling of trials can boost statistical power. Since RCTs are randomized to treatment status, confounding biases are minimized or eliminated, and if the RR =1.2 becomes statistically significant, the only question remaining is whether or not the treatment effect is clinically significant (Clayton, 1991). The pooled reanalyses of case–control studies conducted in 1990 helped to advance knowledge about risk factors for AD by pointing to variables that were important to investigate in cohort studies, including family history of dementia, smoking, depression, early and advanced maternal age, alcohol consumption, and occupational exposures to chemicals such as lead and solvents.
Design of Experiments > Case-Control Study
What is a Case-Control Study?
A case-control study is a retrospective study that looks back in time to find the relative risk between a specific exposure (e.g. second hand tobacco smoke) and an outcome (e.g. cancer). A control group of people who do not have the disease or who did not experience the event is used for comparison. The goal is figure out the relationship between risk factors and disease or outcome and estimate the odds of an individual getting a disease or experiencing an event.
Case-control studies have four main steps:
- The study begins by enrolling people who already have a certain disease or outcome.
- A second control group of similar size is sampled, preferably from a population identical in every way except that they don’t have the disease or condition being studied. They should not be selected because of an exposure status.
- People are asked about their exposure to risk factors.
- Finally, an odds ratio is calculated.
The Odds Ratio is also used to figure out if a particular exposure (like eating processed meat) is a risk factor for a particular outcome (like colon cancer). Image: Michigan.gov
The two types of case-control studies are:
- Non-matched case-control study: this is the simplest form. Find a person with the disease and enroll them in the study. Then enroll a control and determine their exposure status.
- Matched case-control: Find a person with the disease and enroll them in the study. Match the person for some characteristic (e.g. sex, age, weight) with a control. This can eliminate or minimize confounding variables. However, it generally results in a longer study; the more characteristics being “matched”, the longer the study takes.
Advantages and Disadvantages
A case-control study is often the best choice for rare conditions or diseases. Let’s say 10 people in Duval county in Florida had a particularly rare disease. Random sampling for a cohort study would involve large numbers of people and may not pick up any of the diseased people at all. With a case-control study, all 10 people who have the disease can be identified (assuming they are in a medical database) and enrolled in the study. Random sampling could then be used on the non-diseased population to form the control group.
- Short term study that doesn’t require waiting for events to happen, as they have already occurred.
- Multiple risk factors can be studied at the same time.
- Quickly establishes associations between risk factors and disease. This can be especially useful with disease outbreaks, as causes can be identified with small sample sizes.
- Stronger than cross-sectional studies for establishing causation.
- Control groups can be difficult to find.
- Results can easily be tainted by recall bias, where people with the disease or condition are more likely to remember past details compared to people who don’t have the disease or condition.
- Is weaker than a cohort study for establishing causation.
- Usually not generalizable.
Examples from Real Life
- This study for non-Hodgkin lymphoma found a connection between the disease and inflammatory disorders like Sjögrens, Celiac and rheumatoid arthritis.
- This studyinvestigated how increased consumption of fruits and vegetables protects against Cervical Intraepithelial Neoplasia.
- This INTERHEART study looked at second hand tobacco smoke and increased risk of myocardial infarction.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!