5.4 Directed acyclic graphs (DAGs) and causal diagrams
7 min read•august 14, 2024
Directed acyclic graphs (DAGs) are powerful tools in epidemiology for visualizing causal relationships between variables. They help researchers identify potential sources of bias and guide statistical analysis, making complex causal structures easier to understand and communicate.
In the context of causation and , DAGs provide a framework for representing assumptions about how variables interact. By mapping out these relationships, researchers can better plan their studies, choose appropriate methods, and interpret results with greater clarity and confidence.
Directed Acyclic Graphs for Causal Inference
Definition and Purpose
Top images from around the web for Definition and Purpose
Directed acyclic graphs (DAGs) are graphical representations of causal relationships between variables, where represent variables and directed represent causal relationships
DAGs are used in causal inference to visually represent and communicate assumptions about the causal structure of a system, helping to identify potential sources of bias and guide the selection of appropriate statistical methods
DAGs are acyclic, meaning that there are no feedback loops or cycles in the graph, and the edges are directed, indicating the direction of the causal relationship between variables (e.g., smoking → lung cancer)
DAGs can be used to represent both observed and unobserved variables, as well as the relationships between them, allowing for a more comprehensive understanding of the causal structure (e.g., including latent variables such as genetic predisposition)
By explicitly representing the causal assumptions underlying a research question, DAGs can help researchers to identify potential variables, mediators, and colliders, and to develop appropriate strategies for addressing them in statistical analyses (e.g., adjusting for confounders, conducting mediation analysis)
Role in Causal Inference
DAGs provide a framework for formally representing and communicating causal assumptions, promoting transparency and facilitating discussion among researchers
By visually depicting the causal relationships between variables, DAGs can help researchers identify potential sources of bias, such as confounding, mediation, or bias, and develop strategies to address them
DAGs can guide the selection of appropriate statistical methods for estimating causal effects, such as adjustment for confounders, mediation analysis, or the use of instrumental variables (e.g., propensity score matching, structural equation modeling)
The use of DAGs encourages researchers to explicitly consider and justify their causal assumptions, promoting a more rigorous and transparent approach to causal inference
DAGs can be used to assess the of causal effects and to determine the minimal sufficient for estimating causal effects from observational data (e.g., using the )
Components of a DAG
Nodes and Edges
Nodes in a DAG represent variables, which can be either observed (measured) or unobserved (latent) (e.g., age, sex, blood pressure, socioeconomic status)
Directed edges in a DAG represent causal relationships between variables, with the arrow pointing from the cause to the effect (e.g., age → blood pressure)
The absence of a directed edge between two nodes indicates that there is no direct causal relationship between the corresponding variables
A path in a DAG is a sequence of edges connecting two nodes, which can be either directed (following the direction of the arrows) or undirected (ignoring the direction of the arrows)
Familial Relationships
A node is a parent of another node if there is a directed edge from the former to the latter, and a child if there is a directed edge from the latter to the former (e.g., age is a parent of blood pressure, blood pressure is a child of age)
A node is an ancestor of another node if there is a directed path from the former to the latter, and a descendant if there is a directed path from the latter to the former (e.g., age is an ancestor of cardiovascular disease, cardiovascular disease is a descendant of age)
These familial relationships between nodes in a DAG are essential for understanding the causal structure and identifying potential sources of bias
Causal Paths and Associations
A causal path in a DAG is a directed path from one node to another, representing a direct or indirect causal relationship between the corresponding variables (e.g., age → blood pressure → cardiovascular disease)
An association between two variables in a DAG can arise from a causal path, confounding, or collider bias
Understanding the types of paths and associations in a DAG is crucial for identifying potential sources of bias and selecting appropriate statistical methods for causal inference
Confounding, Mediation, and Collider Bias in DAGs
Confounding
Confounding occurs when a variable influences both the exposure and the outcome, creating a spurious association between them. In a DAG, confounding is represented by a common cause of both the exposure and the outcome (e.g., age influencing both smoking and lung cancer)
Confounding can lead to biased estimates of causal effects if not properly addressed in the analysis
DAGs can help identify potential confounders by examining the paths between the exposure and outcome variables and looking for common causes
Mediation
Mediation occurs when the effect of an exposure on an outcome is partially or fully transmitted through an intermediate variable (the ). In a DAG, mediation is represented by a directed path from the exposure to the outcome via the mediator (e.g., smoking → lung inflammation → lung cancer)
Mediation analysis can be used to decompose the total effect of an exposure on an outcome into direct and indirect effects
DAGs can help identify potential mediators and guide the selection of appropriate methods for mediation analysis
Collider Bias
Collider bias occurs when conditioning on a common effect of two variables (the collider) induces a spurious association between them. In a DAG, collider bias is represented by two arrows pointing into the same node (e.g., smoking ← lung cancer → asbestos exposure)
Conditioning on a collider can create a non-causal association between its causes, leading to biased estimates of causal effects
DAGs can help identify potential colliders and guide decisions about conditioning or stratification in the analysis
Identifying and Addressing Bias
DAGs can be used to identify potential confounders, mediators, and colliders by examining the paths between variables and the direction of the arrows
By explicitly representing these causal relationships, DAGs can guide the selection of appropriate statistical methods for estimating causal effects, such as adjustment for confounders, mediation analysis, or the use of instrumental variables
Researchers can use DAGs to assess the potential impact of unmeasured confounding or measurement error on their estimates of causal effects
DAGs can also help researchers identify situations where causal effects are not identifiable from observational data, prompting the need for alternative study designs or additional assumptions
DAGs for Epidemiological Research
Constructing DAGs
Begin by identifying the key variables relevant to the research question, including the exposure, outcome, and potential confounders, mediators, and effect modifiers (e.g., in a study of the effect of physical activity on cardiovascular disease, relevant variables might include age, sex, diet, and smoking status)
Represent each variable as a node in the DAG, and draw directed edges between nodes to represent the hypothesized causal relationships based on prior knowledge and subject matter expertise
Consider the temporal ordering of variables when drawing edges, as causes must precede their effects
Include both measured and unmeasured variables in the DAG, as omitting important variables can lead to biased estimates of causal effects
Refining and Analyzing DAGs
Examine the DAG for potential sources of bias, such as confounding, mediation, or collider bias, by tracing the paths between variables and considering the direction of the arrows
If necessary, refine the DAG by adding or removing variables or edges to better represent the causal structure and address potential sources of bias (e.g., adding a previously omitted confounder or removing an edge based on new evidence)
Use the final DAG to guide the selection of appropriate statistical methods for estimating causal effects, such as adjustment for confounders, mediation analysis, or the use of instrumental variables (e.g., using the backdoor criterion to identify minimal sufficient adjustment sets)
Interpretation and Limitations
Interpret the results of the statistical analysis in light of the causal assumptions represented in the DAG, and consider the limitations and potential alternative explanations for the findings
Acknowledge that DAGs are based on the researcher's assumptions and subject matter knowledge, and that alternative DAGs may be plausible
Consider the potential impact of unmeasured confounding, measurement error, or violations of causal assumptions on the validity of the estimates
Use sensitivity analyses to assess the robustness of the findings to alternative causal assumptions or potential sources of bias (e.g., simulating the impact of unmeasured confounding)
Examples in Epidemiological Research
DAGs have been used in epidemiological studies to investigate causal relationships between various exposures and health outcomes, such as the effect of air pollution on respiratory health, the impact of social determinants on health inequalities, and the causal linking diet and physical activity to obesity and chronic diseases
In a study of the effect of maternal smoking on birth outcomes, a DAG could be used to represent the causal relationships between maternal smoking, birth weight, gestational age, and potential confounders such as maternal age, socioeconomic status, and prenatal care
A DAG could be used to guide the analysis of the causal effect of a public health , such as a smoking cessation program, on health outcomes, by representing the causal pathways through which the intervention may influence behavior and health, as well as potential effect modifiers and sources of bias