So today I was told that assuming correlation and causation are liked is a massive logical fallacy unbecoming of scientists. But is it really?
How does research actually work when we're looking into complex matters such as homosexuality and a potential genetic basis?
The first step is (whether you like it or not) to find a correlation between variable factors, so we start by gathering a nice big group of people. The bigger the group is, the better, since larger numbers mean that any correlation has less chance of just being random.
So we start with the hypothesis ('theory' in layman's terms, whereas 'theory' in scientific terms comes later) that homosexuality is hereditary, then we see if the evidence can support this idea.
Then we start looking into these people, how do the chances of being gay compare to the subject's brothers and sisters? There we find that homosexual men have higher chances of having homosexual brothers – thus we see a correlation.
Linking correlation to causation is where a bit of thinking comes in. The brother can be gay due to the shared genetics, but they were also raised in the same house – so is it nature or nurture?
Well let's look at people who share genes with this homosexual individual but weren't raised in the same house.
So we find that their uncles or cousins also have a higher incidence of homosexuality, even though they were raised in different households. This pulls the possibility of causation away from the environment one is raised in and toward a genetic basis.
But finding that – regardless of nurture – there is an increased incidence of homosexuality is not enough to merit saying that it is indeed genetic. This is when we start to look at the actual genes, and their epigenetics (that is the way in which our body regulates these genes), for any possible answers.
If we find genes (or epigenetic factors such as promoters) present among the test group in frequencies which are higher (or lower, it works both ways) than could be expected by random chance as compared to those genes in the general population, then we say that the correlation between these genes and homosexuality is 'statistically significant'. Now while that doesn't guarantee causation it does mean that there is something to the whole 'gay gene' idea, since it's consistently non-random.
Only after all this has been done can we start to theorise that homosexuality has a genetic component, and then the challenges start! The paper is submitted to a journal in the relevant field (genetics in this example) and then fact checked by a panel of people holding degrees in the field the paper covers. If they are satisfied with the paper, it gets published.
When this research is published in a journal it gets looked at by many biochemists and statisticians (the readers of said journal) who can then deliver an educated critique on the assumptions, methods, conclusions and such of the research paper. This allows for public (within the scientific community) discussion of the work and for any errors or misunderstandings to be ironed out.
That's how we decide what's real and what isn't, through slow tedious weeks and months spent looking at every possible factor. The process is a bit more complex than what I've presented here, but I think it covers all the major points of how a study of this nature would work.
Hard math and statistical methods are used to connect correlation and causation, not assumptions.
So don't tell scientists how to do their job when you don't even understand how the method works, they've got this under control.