Introduction
The human microbiome is the totality of the microbes (microbiota), their genetic elements (metagenome) and the interactions they have with surrounding environments throughout the human body.1 In contrast to the human genome, the human microbiome is highly variable, displays substantial intra-individual variation at different body sites (gut, skin, lung, vagina, oral cavity, etc), inter-individual variation at the same body sites and intra-individual variation at different times in longitudinal studies.2
The human microbiome plays a key role in human disease and health. A preponderance of human microbiome studies have implicated the human microbiome in the pathogenesis of many human diseases, such as obesity, diabetes, alcoholic liver disease, vaginosis and even cancers.1 3 The genotypic effect on the microbiome may explain the missing link between genetics and disease since a disease-susceptibility genotype may affect the disease outcome through the alteration of the microbiome composition.4 5 Therefore, identifying potential factors that influence the microbiome composition and discovering their relationship with biological or clinical outcomes help demystify the inherent disease mechanism and enable the possibility of modulating the microbiome composition for therapeutic purposes.
Fuelled by the technological advancement of next-generation sequencing, the human microbiome can be interrogated using high-throughput sequencing. One strategy amplifies and sequences the bacterial 16S ribosomal RNA from the samples. We then cluster the similar sequences into operational taxonomic units (OTUs). By comparing OTUs with reference databases, we identify existing species in the samples and also obtain the OTU abundance profiles. The OTU abundance profiles refer to a matrix with the (i, j)-th element referring to the number of sequence reads that represent the j-th OTU (or species, roughly speaking) in the i-th subject. This count matrix forms the foundation for statistical analyses.6 The notable features of OTU abundances are high-dimensional (p>>n) and skewed counts with a preponderance of zeros. One line of research aims to advance statistical tools to directly tackle such data features to find individual OTU culprits for certain diseases of interest.7 8 Another emerging paradigm, however, shifts gears to study the impact of the overall microbiome composition represented as diversity metrics, such as alpha-diversity and beta-diversity,6 which we introduce and focus on in this paper.