Saturday, December 26, 2020

Genes and microbes

The body’s assortment of microorganisms depends on what we eat, drugs we take, the stress we are subjected to and the environment we interact with (eg, the infamous SARS-CoV-2 virus). Yet genes also have their say.

In a study of 977 twins in UK, the most heritable taxonomic group of bacteria was found to be Christensenellaceae (Goodrich et al, 2014). These bacteria is present in higher amounts in genetically-lean individuals. It encourages growth of other microbes connected to body weight and energy conservation such as methanogenic Archaea.

A study of over 1500 healthy individuals in Canada (Turpin et al, 2016), associated another abundant bacteria Faecalibacterium with immune system gene CNTN6 (rs1394174), and linked several other genes and bacteria of minor clinical importance (rs59846192 of DMRTB1 - Lachnospira, rs28473221 of SALL3 - Eubacterium, ), rs62171178 nearest UBR3 - Rikenellaceae).

A new paper posted this month on BioRxiv, reports results of a larger genome-wide association study performed for 7,738 individuals from the northern Netherlands.

The authors investigated 5.5 million common genetic variants using linear mixed models on hundreds of bacterial groups and pathways. Potential confounders such as medication usage, anthropometric data and stool characteristics were carefully considered along with dietary information. 

The strongest associations were identified in intronic regions of genes. In particular, between rs182549 in intronic region of the MCM6 gene and Bifidobacteria. This SNP was found to be responsible for lactose intolerance in European population. And so was Bifidobacteria - the most researched and most effective probiotic against lactose intolerance.

Another interesting microbe Collinsella and its family Coriobacteriaceae, associated with rheumatoid arthritis, cholesterol metabolism and leaky gut, was linked to several SNPs regulating genes responsible for the blood group antigens. Blood types does matter. 

Genetic factors might influence our preferences of vegetables, fruit, starchy foods, meat, fish, dairy and snacks. The Dutch study confirmed an earlier finding that rs642387, a genetic variation near genes influencing brain function, is linked to microbial family Rikenellaceae. The paper found that these bacteria, when present in large numbers in the gut, led to decreased consumption of salt. They also showed that an increase in the bacterial pathway of histidine degradation led to increased intake of processed meat.

The paper provides a wealth of information and comes with a lot of supplementary material.


Goodrich JK, Waters JL, Poole AC, Sutter JL, Koren O, Blekhman R, Beaumont M, Van Treuren W, Knight R, Bell JT, Spector TD. Human genetics shape the gut microbiome. Cell. 2014 Nov 6;159(4):789-99. 

Turpin W, Espin-Garcia O, Xu W, Silverberg MS, Kevans D, Smith MI, Guttman DS, Griffiths A, Panaccione R, Otley A, Xu L. Association of host genome with intestinal microbial composition in a large healthy cohort. Nature genetics. 2016 Nov;48(11):1413.

Lopera-Maya EA, Kurilshikov A, van der Graaf A, Hu S, Andreu-Sánchez S, Chen L, Vila AV, Gacesa R, Sinha T, Collij V, Klaassen MA. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. bioRxiv. 2020 Jan 1.

Monday, March 2, 2020

Sorry I did not quite get that, try again

The holy grail of AI is to fully understand human language in all its nuances. To do that, it should be able to assess, extract and evaluate information from textual data. Were are we now in 2020? 

Wednesday, October 30, 2019

Correlations and Interpretations

reproduced from Aurametrix blog

We all know that “correlation does not imply causation.” Correlation means X and Y change together, while causation means X makes Y happen. But - as there's a grain of truth in every joke -
seemingly unrelated factors could always be related on some level.

In the case of ice cream sales correlated with drowning death or homicides, the connection is the weather. While soup sales increase during the cold winter months, ice cream sales go up when the temperatures rise. On a nice sunny day more people go for a swim or enjoy the outdoors where there is a wider selection of victims for predators. The most important factor contributing to summer fires is also heat.

The heat wave is an example of a hidden or unseen variable, also known as a confounding variable.

A clinical study might lack control variables - such as the use of placebo. Besides, all possible error sources can't be controlled. But we could reduce the amount of error if we identify all of the possible confounding variables, by a clear understanding of their implications.

How to identify a confounder?  By checking if potential confounding factors are associated with both outcome and exposure variables then comparing associations before and after adjusting for it.

Here is an example. Let's assume we studied 100 MEBO patients that experience symptoms after stress and other triggers and 100 patients in remission tolerant to MEBO triggers. 48 participants experienced stress (30 active state, 18 remission) and 152 participants were exposed to low-to-moderate amount of alcohol (82 out of them were in remission). The unadjusted odds ratio (OR) of experiencing symptoms after stress was 1.95 which means that likelihood for flareups was almost twice higher after stress compared to consumption of alcohol)

Table 1

Number of flareups after triggers
TriggerMEBO symptoms flareup, no. (%)
FlareupNo flareupTotal
Stress30 (63)18 (37)48
Alcohol70 (46)82 (54)152

Unadjusted odds ratio=30×8270×18=1.95

Now we need to find out whether a specific factor, say Bacteria-X, is related to response to stress and/or alcohol.  By looking at Table 2, we see that 50% of the patients with flareups and 20% of the patients in remission have detectable levels of this bacteria in their gut.

Table 2

Distribution of wound infection cases and controls by Bacteria-X status
Have Bacteria XExperiencing flareups
It seems that, with a ratio of 2.5, Bacteria-X is related to flaareups in response to stress or alcohol. Now we need to understand if this bacteria is acting up in stress vs alcohol conditions. Table 3 shows the relation between Bacteria-X and type of trigger for 200 patients. Of 70 patients with Bacteria-X, 35 (50%) were exposed to high stress and of 130 patients that don't have bacteria-X, 13 (10%) experienced high stress. Thus, with a ratio of 5.0, we clearly observe that patients with bacteria-X were more likely than patients without bacteria-X to experience stress. At this point, it seems that Bacteria-X is related to both stress and alcohol triggers.

Table 3

Relation between type of MEBO trigger and presence of Bacteria-X
Bacteria-X presentTotalAppendectomy, no. (%)
No13013 (10)117 (90)
Yes7035 (50)35 (50)
Second, we need to calculate the adjusted OR and compare it with the unadjusted OR. We first stratify study population to patients with and without bacteria X. Within each stratum, a contingency (2 × 2) table is created and the OR is calculated (Table 4). When we calculate the OR separately for patients with and without bacteria-X, we find that the OR is 1 in each stratum, indicating the lack of association between flareup and type of stress. We could conclude that the unadjusted OR of 1.95 in Table 1 was owing to the unbalanced distribution of those with bacteria-X in their gut microbiome between cases and controls. Thus, in this example, bacteria-X was a confounder, and the association between stress and MEBO symptoms flareup was spurious.

Table 4

Calculation of odds ratio after stratifying by presence of bacteria X
Bacteria-X; stressMEBO flareup, no. (%)TotalAdjusted odds ratio
 High5 (38)8 (62)135×7245×8=360360=1.0
 Low45 (38)72 (62)117
 High25 (71)10 (29)3525×1025×10=250250=1.0
 Low25 (71)10 (29)35

Confounding can be dealt with at the stage of study design (before collecting the data) or at the stage of data analysis (after collecting the data). The commonly used methods to control for confounding factors and improve internal validity are randomization, restriction, matching, stratification, multi-variable regression analysis and propensity score analysis

Notes about odds ratio

Odds ratio is a relative risk, a measure of association between an exposure and an outcome. The odds ratio is calculated using the number of case-patients who did or did not have exposure to a (confounding) factor and the number of controls who did or did not have the exposure. The odds ratio tells us how
much higher the odds of exposure are among case-patients than among controls.

Suppose 200 persons attended a buffet dinner and 55 attendees became ill after it. We asked everyone to describe what they ate  (this is called a case-control study since we compared those with outcome of interest with those who did not have the outcome). 53 of 54 case-patients and 33 of 40 controls mentioned lettuce in their report. The odds of getting sick from lettuce were O1 = 53/33 and the odds of getting sick without eating lettuce was O2 = 1/7, hence the odds ratio for lettuce was about 11.2 (O1/O2)

a = number of persons exposed and with disease  53
b = number of persons exposed but without disease 33
c = number of persons unexposed but with disease 1
d = number of persons unexposed and without disease 7 
a+c = total number of persons with disease (case-patients) 54
b+d = total number of persons without disease (controls) 40

a/b divided by c/d = a*d/b*c  53*7/33 = 11.2

Here are real life scenarios from our microbiome study:
Age - a truly confounding variable?


Jager KJ, Zoccali C, Macleod A, Dekker FW. Confounding: what it is and how to deal with it. Kidney international. 2008 Feb 1;73(3):256-60.

Rasmussen SH, Ludeke S, Hjelmborg JV. A major limitation of the direction of causation model: Non-shared environmental confounding. Twin Research and Human Genetics. 2019 Feb;22(1):14-26.

Jing S. A Study on Causal Discovery Considering Confounders.

blockquote { margin:1em 20px; background: #dfdfdf; padding: 8px 8px 8px 8px; font-style: italic; }