Methodological notes

← vista completa

General concepts in biostatistics and clinical epidemiology: Observational studies with cross-sectional and ecological designs

Conceptos generales en bioestadística y epidemiología clínica: estudios observacionales con diseños transversal y ecológico

Abstract

Observational studies evaluate variables of interest in a sample or a population, without intervening in them. They can be descriptive if they focus on the description of variables, or analytical when comparison between groups is made to establish associations through statistical inference. Cross-sectional studies and ecological—also called correlational—studies are two observational methodological designs. Cross-sectional studies collect the data of the exposure variable and the outcome at the same time, to describe characteristics of the sample or to study associations. Ecological studies describe and analyze correlations among different variables, and the unit of analysis is aggregated data from multiple individuals. In both types of studies, associations of interest for biomedical research can be established, but no causal relationships should be inferred. This is the second of a methodological series of articles on general concepts in biostatistics and clinical epidemiology developed by the Chair of Scientific Research Methodology at the School of Medicine, University of Valparaíso, Chile. In this review, we address general theoretical concepts about cross-sectional and ecological studies, including applications, measures of association, advantages, disadvantages, and reporting guidelines. Finally, we discuss some concepts about observational designs relevant to undergraduate and graduate students of health sciences.

Main messages

  • Cross-sectional designs collect study variables simultaneously, and the unit of analysis is the individual. They are useful in determining the prevalence and facilitate rapidly establishing associations among variables
  • Ecological studies analyze correlations among variables whose unit of analysis is grouped data. They are usually easy to conduct and allow the study of large populations.
  • These observational studies cannot establish causal inferences but do permit establishing statistical relationships of great importance for biomedical research and public health.

Introduction

An essential classification in clinical epidemiology is based on the criterion of observation versus experimentation, that is, if researchers focus on the observation of measured variables or if they apply an intervention among study participants. In the first case, we refer to observational studies, where data of interest are collected and then analyzed descriptively and/or analytically, which includes the application of interviews, measuring instruments, laboratory tests, among others, but without intervening the exposure variable. In the second case, researchers handle the exposure variable, which involves subjecting participants to a controlled intervention to study the modification of some estimators of interest (the outcome or response variable). It is in a sense a clinical experiment, which in clinical epidemiology is called a clinical trial. Today, observational studies play an essential role in various aspects of health science research and even provide answers when clinical trials are ethically questionable or difficult to perform.

This review is the second release of a methodological series of six narrative reviews about general topics in biostatistics and clinical epidemiology. Each article will cover one of six topics based on content from publications available in the main databases of scientific literature and specialized reference texts. The series is oriented toward undergraduate and graduate students and is developed by the Chair of Scientific Research Methodology at the School of Medicine, University of Valparaíso, Chile. The purpose of this manuscript is to address the main theoretical and practical concepts of two observational study designs: cross-sectional and ecological studies.

Descriptive studies versus analytical studies

Another classification in the taxonomy of methodological designs is the definition of studies as descriptive and/or analytical. Studies have a descriptive purpose if their objective is merely to describe the frequency distribution of the variables without the pretense of obtaining conclusions about associations[1], or analytical if they incorporate some level of inferential statistical analysis with the purpose of establishing associations from the data. Descriptive studies constitute a large part of published research and have contributed to the understanding of the semiology and natural history of diseases, the frequency of certain phenomena in the population, the study of infrequent conditions and the establishment of interventions, giving rise to the origin of new hypotheses. Among the descriptive studies, we find case reports and case series, where infrequent conditions are presented at the level of diagnosis, treatment and/or prognosis[2]. These used to be the first source of evidence regarding emerging conditions, such as the clinical observation of blindness in newborns that led to the association with high concentrations of oxygen in incubators, or hepatocellular adenoma in young women, concluding the relationship with exposure to high doses of contraceptive drugs[1]. In case reports or case series, a descriptive analysis of the reported data is presented[3]. Various authors place cross-sectional studies (studies in individuals) and ecological studies (studies in population) within the category of descriptive studies. However, both designs can have an analytical orientation, where hypothesis tests are applied using at least two groups of participants (comparison groups) to obtain statistical inference; therefore, they can also be classified as analytical studies[3],[4],[5].

Cross-sectional studies

The central element of cross-sectional studies is that both the variable considered an exposure (variable X, independent, explanatory, predictive or factor) and the outcome variable (variable Y, dependent, explained, predicted or response) are measured simultaneously, that is, temporality is cross-sectional or in a single moment. This does not permit ensuring that the exposure has preceded the outcome because there is no follow-up over time. In cross-sectional studies, a representative sample of a larger population can be studied, or an entire population can be analyzed, such as with a census. In both situations, frequency or prevalence of a condition of interest can be determined, a reason why these are also known as “prevalence studies.” This could include a pathology, a characteristic, a factor conceptualized in the literature as prognostic, such as a protective factor or a risk factor, among others. However, the association between two variables of interest can also be studied, thus exhibiting an analytical orientation[3],[5]. A cross-sectional study is exemplified in the following example[6].

A study sought to determine the prevalence of asthma in children and analyze its association with being a passive smoker, being exposed to vehicular traffic (both risk factors) and the intake of dehydrated fruit (a possible protective factor). The researchers found that the prevalence of asthma increased with the number of smokers with whom they lived, but it was not associated with living near the main avenue or the consumption of dehydrated fruits. Thus, in this cross-sectional study, there is both a descriptive (an estimate of prevalence) and an analytical component (study of the associations between the variables).

Measures of association
Although in the previous example it was possible to establish the associations using advanced statistical methods, it would not be possible to directly determine the risk as this is reserved for studies that have a longitudinal temporal approach[7]; it is thus a matter of methodological design and not statistical analysis. Therefore, the appropriate association measures in the case of cross-sectional studies are the odds ratio (OR) and the prevalence ratio (PR). The odds ratio can be defined as the excess or reduction in the advantage that exposed individuals have in presenting the condition compared to not presenting it, concerning the advantage (or reduction) in non-exposed individuals presenting the condition compared to not presenting it. For its part, the interpretation of the prevalence ratio is simpler, more direct and to some degree intuitive, since it indicates how many times individuals exposed to a phenomenon are more likely to present the condition with respect to those not exposed[8],[9],[10]. Although they correspond to different concepts, interpreting the odds ratio as a prevalence ratio is a conceptual error frequently observed in published research.

A particular type of cross-sectional study is a diagnostic test study, where the ability of a test to discriminate between the presence and absence of disease (index test) is evaluated for the purpose of diagnosing a disease[11]. It is usually performed by comparing the test results with a reference standard (also known as the gold standard or truth criterion) in healthy and those with the condition, to later apply in people suspected to have the disease[12]. These studies evaluate the operational characteristics of the index test, such as its specificity, sensitivity, predictive values and likelihood ratios[13]. Example 2 presents a diagnostic test study, whose design corresponds to a cross-sectional study[14].

A cross-sectional study analyzed the diagnostic utility of a rapid antigen test (index test) for the diagnosis of acute tonsillitis in children between 2 and 14 years. This test was compared with pharyngeal culture, considered as the standard diagnostic reference. A sensitivity of 86.5% and a specificity of 91.5% were found, demonstrating that the test is useful for the diagnosis of the pathology in this context.

Advantages and disadvantages
Cross-sectional studies are usually quick to execute. Because they do not involve temporal follow-up, loss of follow-up is not a problem, and associated economic costs are lower, allowing associations to be established quickly[1]. The main disadvantage is the issue of temporality since it is not clear that the exposure variable (cause) precedes the result variable (effect) and it is not possible to establish a causal relationship[1],[15]; thus results must be interpreted prudently and in context. Likewise, this design is not very useful in infrequent pathologies or those where prevalence changes rapidly, as in the case of infectious diseases[5].

Ecological or correlational studies

Ecological or correlational studies share the central characteristic of cross-sectional studies, since, regarding temporality, both explanatory and explained variables are collected simultaneously. They are known as "ecological" as investigations of this type use geographical areas to define the units of analysis. Indeed, their particularity lies in the unit of analysis: grouped data are analyzed (ecological units), corresponding to estimators determined from summaries of individual data; thus they are studies based on populations[16]. The frequency of a condition in a population is studied, and its correlation (hence the name "correlational" studies) with one or more exposure variables that are also measured in aggregate[5]. For example, an ecological study[17] analyzed the inequality in the distribution of otolaryngologists in Latin American countries, concluding that in all countries specialists were more frequently found in socio-geographically advantageous areas and capital cities, demonstrating high inequality in distribution; the authors emphasize the importance of implementing policies that improve access to this medical discipline.

Some of its advantages include the mapping of diseases and their risk factors, the realization of large-scale comparisons, and the study of public health strategies[16],[18]. Likewise, ecological studies have contributed significantly to the analysis of occupational exposures to harmful agents, as in the case of the association between exposure to asbestos and occurrence of mesothelioma[18],[19].

Although the main type of ecological study is the geographical one, where a condition of interest is compared between geographic regions, it is also possible to monitor a population over time to evaluate its changes, as in the case of longitudinal ecological studies. These are particularly sensitive to biases, such as those associated with the method of disease determination, as examinations and diagnostic criteria tend to improve over time. Other types of ecological study are studies of migrant populations, which are used to discriminate genetic factors from environmental factors based on geographical and cultural variation. Nonetheless, it should be taken into account that the migrant population may not be representative of the population of origin and that health may be affected by the migration process itself. Example 3 shows an ecological study in migrant populations[20],[21].

In a study by Ødegaard published in 1932, titled "Emigration and insanity," it was observed that the rate of hospitalization (ecological unit) for schizophrenia was higher in cases that had emigrated to the United States compared to its compatriots residing in Norway, which opened the debate about the role that environmental factors play in the psychopathology of psychosis. However, the results should be interpreted with caution for the reasons discussed.

Measures of association
The measure of association in these studies is a correlation coefficient (hence the name "correlational studies") that indicates the degree of a linear association between two variables that are conceptualized as exposure and outcome1. The study of variables associated with the dependent variable, analysis of confounding variables and the construction of predictive models for the response variable could be considered using multivariate statistical regression methods[22].

Advantages and disadvantages
In general, ecological studies are easy to conduct, since data is usually already collected in statistics from public institutions, or open-access registries such as national surveys[23]. This would also solve the bioethical complexity linked to direct study in humans and its economic cost[1]. Also, they facilitate the study of large populations.

The primary disadvantage associated with inference from ecological studies is related to the reduction of information that may occur in the process of aggregating data, which does not permit identifying associations at an individual level[16]. As data is analyzed in aggregate form, the relationship between exposure and outcome cannot be empirically determined at the individual level, so to infer about causal mechanisms at an individual level from aggregate statistics of the group in which an individual belongs (for example, the hospitalization rate of a country) is an error known as ecological fallacy, ecological bias or fallacy of division[1],[18]. For example, one study[24] demonstrated a very significant linear correlation between the consumption of chocolate per capita and the number of Nobel prizes for every 10 million people in 23 countries studied (r = 0.791, p <0.0001); however, this does not ensure that award-winners consumed large amounts of chocolate. Another disadvantage, typical of studies in which the variables of interest are measured at the same time, is temporal ambiguity since it is not possible to define which phenomenon occurred first. Finally, statistical analysis of these designs could be hindered by multicollinearity, a phenomenon where there is a correlation between predictive (independent) variables of a multivariate model, which could reduce the relevance of variables of greater interest[25].

Reporting guidelines

In 2007, an international collaboration of epidemiologists, methodologists, statisticians, researchers, and journal editors published the Strengthening the Reporting of Observational Studies in Epidemiology, or STROBE reporting guideline (http://www.strobe-statement.org)[26], based on the experience with the CONSORT guideline which guides reporting for randomized controlled trials[27]. Its purpose is to promote the clear and transparent reporting of research and is therefore not a quality assessment tool. STROBE focuses on the three most widespread observational methodological designs: cross-sectional studies, case-control studies, and cohort studies. It includes twenty-two items grouped into six domains: title and summary, introduction, methods, results, discussion and additional information[27],[28]. Although the use of reporting guidelines has been emphasized internationally, the use of STROBE is not homogeneous in the published literature[29],[30]. There is currently no similar initiative for ecological studies.

Preventing and controlling confounding

A fundamental challenge for observational studies is the prevention and control of potential biases that may threaten their internal validity, especially confounding. Confounding can occur, for example, when the groups compared differ in baseline characteristics (such as biodemographic characteristics), such that there are intergroup differences in addition to the variable of interest[31]. Many observational studies use data that were originally collected for purposes other than research objectives, for example, national surveys, hospital statistics, among others; this represents another source of confounding. To respond to these concerns, at the level of design for a cross-sectional study, strategies such as the application of rigorous eligibility criteria or the restriction can be used (for example, strict selection of subjects who present the characteristic to be “neutralized,” or selecting those in whom it is absent)[32]. At the level of statistical analysis, a stratified analysis can be employed, which is the analysis according to strata of individuals grouped according to a confounding variable, such as age and sex. As mentioned, multivariate statistical regression models can be used, whose purpose is the identification of the variables that, when adjusting the model, act as confounding variables[33]. Ways of controlling confounding at the level of data analysis will be elaborated further in the next article in this series.

Final considerations

Although they are usually known as prevalence studies that primarily suggest a descriptive purpose, cross-sectional studies often lead to the study of associations when a comparison group is available. If the primary objective is to determine the prevalence of a condition, the appropriate design is a cross-sectional study. However, sampling must be random; non-probabilistic sampling only permits the study of frequency. In the study cited in Example 1, random sampling was carried out in different schools in the United Kingdom to determine the prevalence of asthma in children[6]. The study of prevalence should not be confused with that of incidence. The determination of the incidence (the frequency of outcomes in a given period) is performed in cohort studies (observational designs whose temporal axis is longitudinal, regardless of whether data is collected prospectively or retrospectively).

Some authors have pointed out that due to phenomena that have a great influence on the results, such as the ecological fallacy, ecological studies should only be undertaken when it is not possible to perform an analysis of the individual data[31]. However, due to the advantages and opportunities mentioned, they are often the first step, especially for public health objectives, such as an analysis of the geographic distribution of specialists in otolaryngology[17] or environmental factors in psychosis[20].

Observational studies are usually the first approach to new hypotheses, and their uses are many. They may help to identify statistical hypotheses that can later be studied through hypothesis testing, giving rise to associations. Cross-sectional and ecological studies, due to their temporality, do not allow causal hypotheses to be established. They must be conducted rigorously, considering that they are vulnerable to multiple biases, especially confounding, which can be prevented at the level of design, and controlled during the statistical analysis. As a whole, observational studies offer the possibility for new ways of looking at things (Figure 1).

Summary diagram of cross-sectional studies and ecological studies.
Source: designed by the authors.
Full size