Vitenskapelig artikkel

Alignment-independent bilinear multivariate modelling (AIBIMM) for global analyses of 16S rRNA gene phylogeny

Rudi, Knut; Zimonja, Monika; Næs, Tormod


Tidsskrift: International Journal of Systematic and Evolutionary Microbiology, vol. 56, p. 10–1, 2006

Internasjonale standardnumre:
Trykt: 1466-5026
Elektronisk: 1466-5034

Open Access: none

Alignment-independent phylogenetic methods have interesting properties for global phylogenetic reconstructions, particularly with respect to speed and accuracy. Here, we present a novel multimer-based alignment-independent bilinear mathematical modelling (AIBIMM) approach for global 16S rRNA gene phylogenetic analyses. In AIBIMM, jackknife cross-validated principal component analyses (PCA) are used to explain the variance in nucleotide n-mer frequency data. We compared AIBIMM with alignment-based distance, maximum-parsimony and maximum-likelihood phylogenetic methods, analysing taxa belonging to the Proteobacteria (n=82), Actinobacteria (n=30) and Archaea (n=7). These analyses indicated an attraction between the Actinobacteria and Archaea for the traditional methods, with the two taxa Acidimicrobium and Rubrobacter at the root of the tree. AIBIMM, on the other hand, showed that the Actinobacteria was tightly clustered, with Acidimicrobium and Rubrobacter within a distinct subgroup of the Actinobacteria. The application of AIBIMM was further evaluated, analysing full-length 16S rRNA gene sequences for 2818 taxa representing the prokaryotic domains. We obtained a highly structured description of the prokaryote diversity. Sample-to-model (Si) distances were also determined for taxa included in our work. We determined Si distances for models of the six major subgroups of taxa detected in the global analyses, in addition to nested subgroups within the Alphaproteobacteria. The Si-distance evaluation showed a very good separation of the taxa within the models from those outside. We conclude that AIBIMM represents a novel phylogenetic framework suitable for accommodating the current exponential growth of 16S rRNA gene sequences in the public domain.