The TreeWAS method of Cortes et al.

In my first year as a PhD student, I spent some time going through this paper by Adrian Cortes et al. which introduces the TreeWAS method. This is a Bayesian method for performing genetic association testing of binary phenotypes which are organised in a tree structure (such as diseases in biobank datasets, which are often encoded according to the classification hierarchy of the International Classification of Diseases (ICD)). One of the advantages of this method is that it explicitly models (through a graphical model) the hierarchical structure between diseases and so takes advantage of possible pleiotropy to increase association power, especially with very granular phenotypes. The results of applying this method to the UK Biobank have been made available by the authors on a dedicated website and a follow-up article has now also been published in Nature Genetics.

It took me a while to fully (?) understand the statistical model and the inference algorithms for fitting it and so I wrote down some notes for future reference. I’m sharing these notes here in the hope that they may be useful for someone. (Note that they haven’t been reviewed by any of the authors and so may contain errors – do let me know if you find one!)

You can find the notes in PDF here and (if you’re curious) take a look at the LaTeX source code on my GitHub.