Resemblance Between Microbiomes: How Distance Measures Shape Variation in the Gill Microbiome of Atlantic Salmon

Date

Authors

Saber, Elle

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis presents a dual investigation into the analysis of microbiome data. The first component involves the collection and analysis of a novel dataset from the Salmon Enterprises of Tasmania Pty Ltd (SALTAS) breeding program. The second line of investigation focuses on a comprehensive characterisation of distance measures used to quantify similarities between microbiomes. Current approaches in microbiome analysis treat the data as "compositional" because it conveys relative abundance information and thus advocate for a log-ratio transformation. However, the prevalence of zeros in 16S rRNA sequencing data violates core assumptions of compositional data analysis. For this reason, throughout the thesis, microbiome data is referred to as "seemingly compositional". Understanding the variation and heritability of the salmon gill microbiome is an open question in aquaculture. Exploratory analysis of the salmon gill microbiome data reveals that there may be a small heritability; however, it is dwarfed by the effect of the environment. It becomes clear during the exploratory chapters how the choice of transformation or distance impacts both the variance we observe and our ability to partition the variation. Inspired by the question of heritability - premised on the idea that closely related individuals should exhibit more similar traits than unrelated individuals - this research investigates what it means for two microbiomes to be similar. The second half of the thesis is devoted to systematically examining how the choice of distance impacts biological interpretation when applied to seemingly compositional data with zeros. Specifically, I compare Aitchison, Euclidean, Jensen-Shannon, and Hellinger distances. Using a range of methods for comparison, including a novel visualisation framework, simulation, and analytical results for the eigenvalues of expected distance matrix, I show that the widely advocated and compositionally valid Aitchison distance can invert the signal and noise in the data, leading to biologically counterintuitive conclusions. In contrast, distances not traditionally considered "compositionally valid", such as Jensen-Shannon and Hellinger distances, demonstrate a better balance between statistical rigour and biological relevance. Notably, the Jensen-Shannon distance emerges as a compromise between the extremes of Aitchison and Euclidean distances, exhibiting a surprising similarity to Hellinger distance. This research introduces alternative ways of understanding and measuring dissimilarities in multivariate phenotypes. It addresses both the practical analysis of a novel microbiome dataset and the theoretical characterisation of distance measures on "seemingly compositional" data. While rooted in quantitative genetics and the study of the salmon gill microbiome, the methodologies and insights extend to broader applications in multi-omics and ecological studies. This contribution enhances the interpretation of complex biological data and supports efforts to improve health and disease resistance in breeding programs through informed strategies.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description