Phylogenetic Model Selection via Machine Learning

Date

2024

Authors

Dong, Yanghe

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Phylogenetic inference, which reconstructs evolutionary trees from DNA or amino acid sequences, is crucial for understanding the evolutionary histories of species on Earth. Model selection is a fundamental step in this process, determining the best-fit model for the data. However, classic maximum likelihood-based methods for model selection are computationally intensive. This study introduces a machine learning-based framework for amino acid model selection, consisting of three components: protFinder for selecting the best-fit substitution model, RHASFinder for identifying the appropriate rate heterogeneity model, and protFFinder for determining the use of empirical pre-estimated frequencies. Our framework is an order of magnitude faster than the widely used ModelFinder, while maintaining comparable accuracy.

Description

Deposited by the author 27.10.24

Keywords

phylogenetics, amino acid, model selection, rate heterogeneity, neural network

Citation

Source

Type

Thesis (Masters)

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads