After the last adventure in statistics another exploration. Today we will be looking at DAGs.
DAGs are a systematic way of finding the link between correlation and causation. Suppose we have a phenomenon X and a phenomenon Y. We want to see how X influences Y. So if the relation is such that X directly influences Y to as denoted by X → Y, then everything is simple but normally there are also other variables in play. In the last post we saw an example of this. These other variables can mess around with the link between correlation and causation.
To aid us in finding causation in the correlation we can use DAGs or Directed Acyclic Graphs. Graph here means vertices connected by lines, as these are Directed graphs these lines have direction, so they are arrows, and finally as the graph is acyclic when we follow these arrows we never go round in a circle. This was maybe all a bit technical so let's look at an example.
We want to see how X influences Y (in a statistics book X is typically referred to as predictor variable and Y outcome variable). There is a direct link between X and Y but there are also another path between X and Y as a result of A:
X ← A → Y
This other paths is problematic because it influences both X and Y so directly measuring the connection between X and Y we don't know how much they are influenced by A.
So how can we remove this influence? The trick here will be to see how the arrows move through the paths and conditioning in a proper way on a variable in that path. Conditioning means that in a sense I am going to fix it. For example if a variable is gender. Then conditioning on that variable means that I will look at male and female separately. Here in the example we can condition on A and this gives a the ability to obtain a causal relation between X and Y.
In real statistical problems there many issues which can add more difficulty to discovering causality. For example certain variables might not be observable in the sense that we know that are certain variable is present and we know how it relate to others but we cannot measure it or the data is so bad that is does not constitute a proper measurement. But still in these case you might be in luck in the sense that you can condition on another variable to close the path as in the example.
References: This post is based on Chapter 6 from McElreath's Statistical rethinking which is a must read. There is also a lovely series of lectures by him on the book
Cat tax
Good luck with that in this political climate :P
I wonder if there's ever an 'X directly causes Y' relation. Seems like some other conditions must be there for anything to happen.
It is true there could always be hidden variables that because you didn't account for it they mess up your DAG and consequenlty the causality D:
thanks a lot for bringing this
!1UP
Thank you for the support :D
Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!
Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).
You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.
You have received a 1UP from @gwajnberg!
@stem-curator, @vyb-curator, @pob-curator, @neoxag-curator
And they will bring !PIZZA 🍕. The @oneup-cartel will soon upvote you with:
Learn more about our delegation service to earn daily rewards. Join the Cartel on Discord.
I gifted $PIZZA slices here:
(18/20) @curation-cartel tipped @mathowl (x1)
Send $PIZZA tips in Discord via tip.cc!
Congratulations @mathowl! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 22000 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Check out our last posts:
Support the HiveBuzz project. Vote for our proposal!