http://opendata.unex.es/recurso/ciencia-tecnologia/investigacion/tesis/Tesis/2016-48

One of the most outstanding research topics in the field of bioinformatics is the reconstruction of evolutionary relationships among species. By studying the molecular features of living organisms, phylogenetic inference procedures seek to provide hypotheses about the evolutionary events which led to the current biodiversity in nature. The result of a phylogenetic analysis is given by a tree-shaped data structure known as phylogenetic tree, whose topology describes the evolutionary history of the input species by defining ancestor-descendant relationships. The biological quality of the inferred phylogenies is usually evaluated by means of optimality criteria, which allow the modelling of phylogenetic inference as an optimization problem.In this context, several key problems must be addressed. Firstly, the reconstruction of optimal solutions implies the processing of a search space whose size grows exponentially with the number of species under study. Additional difficulties are given by the fact that evaluation procedures require complex computations whose number grows linearly with the length of the input molecular sequences. A more controversial problem lies on the choice of the preferred optimality criterion, as it represents one of the most troublesome sources of conflict in phylogenetics. Situations where different optimality criteria give support to conflicting evolutionary histories for a given dataset can be solved by proposing a compromise view of phylogenetics based on multiobjective optimization. This PhD Thesis aims to apply bioinspired computing to perform real phylogenetic analyses according to the maximum parsimony and maximum likelihood criteria. Such goal requires a multiobjective formulation of the problem which considers the optimization of both objective functions simultaneously, in such a way that the output of the procedure will be given not by a single solution, but a set of multiobjective phylogenetic trees which represent good compromises between divergent criteria. The implications of performing multiobjective phylogenetic analyses have not been fully evaluated in the literature, so this research seeks to carry out comprehensive comparative studies among different multiobjective algorithms, in order to decide which algorithmic designs maximize the quality of the inferred trees from a multiobjective and biological point of view.Due to the significant complexity of problem, this Thesis is especially focused on the exploitation of modern parallel architectures to conduct real phylogenetic analyses. More specifically, this research undertakes the introduction of parallel computing techniques to the applied multiobjective designs, with the aim of taking advantage of the computing capabilities offered by different hardware setups: shared-memory architectures, multicore clusters, and heterogeneous CPU+GPU systems. In summary, this PhD Thesis is focused on the application of parallel and bioinspired computing to tackle the phylogenetic inference problem. The key goals of this research include the definition of a multiobjective formulation of phylogenetics to address conflicts in real biological analyses, the study and assessment of a variety of bioinspired multiobjective designs, and their efficient parallelization on different hardware architectures. Through the comparison with other state-of-the-art phylogenetic tools, we give account of the relevant parallel, multiobjective, and biological performance attained by the proposed multiobjective designs.*Bibliography (samples)[1] P. Lemey, M. Salemi, A.-M. Vandamme, The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing, Cambridge Univ. Press, Cambridge, 2009.[2] W. Fitch, Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology, Systematic Zoology 20 (4) (1972) 406-416.[3] J. Felsenstein, Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach, Journal of Molecular Evolution 17 (6) (1981) 368-376.[4] D. A. Bader, A. Stamatakis, C. W. Tseng, Computational Grand Challenges in Assembling the Tree of Life: Problems and Solutions, in: Advances in Computers, Vol. 68, Elsevier, 2006, pp. 127-176.[5] G. B. Fogel, Evolutionary Computation for the Inference of Natural Evolutionary Histories, IEEE Connections 3 (1) (2005) 11-14.[6] J. R. Macey, Plethodontid salamander mitochondrial genomics: A parsimony evaluation of character conflict and implications for historical biogeography, Cladistics 21 (2) (2005) 194-202.[7] J. J. Wiens, M. R. Servedio, Phylogenetic analysis and intraspecific variation: performance of parsimony, likelihood, and distance methods. Systematic Biology 47 (2) 228-253.[8] C. Coello, C. Dhaenens, L. Jourdan, Advances in Multi-Objective Nature Inspired Computing, Springer Verlag, Berlin / Heidelberg, 2010.[9] W. Gropp, W. Lusk, A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, 2nd ed., The MIT Press, Cambridge, MA, USA, 1999.[10] B. Chapman, G. Jost, R. van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, The MIT Press, Cambridge, MA, USA, 2007.[11] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6 (2) (2002) 182-197.[12] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, in: Proceedings of Evolutionary Methods for Design, Optimization, and Control with Applications to Industrial Problems (EUROGEN02), 2002, pp. 95-100.[13] E. Zitzler, S. Künzli, Indicator-Based Selection in Multiobjective Search, in: Parallel Problem Solving From Nature VIII, Vol. 3242 of LNCS, Springer Verlag, 2004, pp. 832-842.[14] D. Karaboga, B. Basturk, A Powerful and Efficient Algorithm for Numerical Function Optimization: Artificial Bee Colony (ABC) Algorithm, Journal of Global Optimization 39 (3) (2007) 459-171.[15] X. S. Yang, Firefly algorithm, stochastic test functions and design optimization, International Journal of Bio-Inspired Computation 2(2) (2010) 78-84.[16] X. S. Yang, X. He, Bat algorithm: literature review and applications, International Journal of Bio-Inspired Computation 5(3) (2013) 141-149.[17] H. Matsuda, Construction of Phylogenetic Trees from Amino acid Sequences using a Genetic Algorithm, in: Genome Informatics Workshop, Universal Academy Press, 1995, pp. 19-28.[18] P. O. Lewis, A Genetic Algorithm for Maximum-Likelihood Phylogeny Inference Using Nucleotide Sequence Data, Molecular Biology and Evolution 15 (3) (1998) 277-283.[19] A. Skourikhine, Phylogenetic tree reconstruction using self-adaptive genetic algorithm, in: Proceedings of the 1st IEEE International Symposium on Bioinformatics and Biomedical Engineering, 2000, pp. 129-134.[20] T. H. Reijmers, R. Wehrens, F. D. Daeyaert, P. J. Lewi, L. M. C Buydens, Using genetic algorithms for the construction of phylogenetic trees: application to G-protein coupled receptor sequences, Biosystems 49 (1) (1999) 31-43.[21] A. Moilanen, Searching for most parsimonious trees with simulated evolutionary optimization, Cladistics 15 (1) (1999) 39-50.[22] C. B. Congdon, K. J. Septor, Phylogenetic Trees using Evolutionary Search: Initial Progress in Extending Gaphyl to Work with Genetic Data, in: CEC 2003, IEEE Computer Society, 2003, pp. 320-326.[23] J. E. Gallardo, C. Cotta, A. J. Fernández, Reconstructing phylogenies with memetic algorithms and branch-and-bound, in: Analysis of Biological Data: A Soft Computing Approach, World Scientific, 2007, pp. 59-84.[24] D. Zwickl, Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion, Ph.D Thesis, Univ. Texas at Austin, 2006.[25] R. Helaers, M. C. Milinkovitch, MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics, BMC Bioinformatics 11 (1) (2010) 379-389.[26] P. A. Goloboff, J. S. Farris, K. C. Nixon, TNT, a free program for phylogenetic analysis, Cladistics 24 (5) (2008) 774-786.[27] A. Stamatakis, RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies, Bioinformatics 30 (9) (2014) 1312-1313.[28] C. Ceron, J. Dopazo, E. L. Zapata, J. M. Carazo, O. Trelles, Parallel implementation for DNAml program on message-passing architectures, Parallel Computing 24 (5-6) (1998) 701-716.[29] H. Schmidt, K. Strimmer, M. Vingron, A. Haeseler, Tree-puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing, Bioinformatics 18 (3) (2002) 502-504.[30] Q. Snell, M. Whiting, M. Clement, D. McLaughlin, Parallel phylogenetic inference, in: Proceedings of the 2000 ACM/IEEE conference on Supercomputing, IEEE Computer Society, 2001, Article 35.[31] K. Katoh, K. Kuma, T. Miyata, Genetic algorithm-based maximum likelihood analysis for molecular phylogeny, Journal of Molecular Evolution 53 (4-5) (2001) 477-484.[32] M. J. Brauer, M. T. Holder, L. A. Dries, D. J. Zwickl, P. O. Lewis, D. M. Hillis, Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference, Molecular Biology and Evolution 19 (10) (2002) 1717-1726.[33] B. Q. Minh, L. S. Vinh, A. von Haeseler, H. A. Schmidt, pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies, Bioinformatics 21 (19) (2005) 3794-3796.[34] F. Ronquist, et al., MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space, Systematic Biology 61 (3) (2012) 539-542.[35] F. Pratas, P. Trancoso, L. Sousa, A. Stamatakis, G. Shi, V. Kindratenko, Fine-grain parallelism using multi-core, Cell/BE, and GPU systems, Parallel Computing 38 (8) (2012) 365-390.[36] L. Poladian, L. Jermiin, Multi-Objective Evolutionary Algorithms and Phylogenetic Inference with Multiple Data Sets, Soft Computing 10 (4) (2006) 359-368.[37] G. P. Coelho, A. E. A. Silva, F. J. V. Zuben, An Immune-Inspired Multi-Objective Approach to the Reconstruction of Phylogenetic Trees, Neural Computing and Applications 19 (8) (2010) 1103-1132.[38] W. Cancino, A. C. B. Delbem, A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction, in: New Achievements in Evol. Comp., InTech, 2010, pp. 135-156.[39] W. Cancino, L. Jourdan, E. G. Talbi, A. C. B. Delbem, Parallel Multi-Objective Approaches for Inferring Phylogenies, in: Proc. of EVOBIO 2010, Vol. 6023 of LNCS, Springer Verlag, 2010, pp. 26-37.[40] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, V. G. D. Fonseca, Performance Assessment of Multiobjective Optimizers: An Analysis and Review, IEEE Transactions on Evolutionary Computation 7 (2) (2003), 117-132.[41] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures. 5th edition, Chapman & Hall/CRC, New York, NY, USA, 2011.[42] H. Shimodaira, M. Hasegawa, CONSEL: for assessing the confidence of phylogenetic tree selection, Bioinformatics 17 (12) (2001) 1246-1247.

Literals

  • ou:tribunal
    • Pires Seabara De Sousa, Leonel Augusto (Vocal)
    • Rojas Ruiz, Ignacio (Vocal)
    • Granado Criado, José María (Secretario)
    • Ortega Lopera, Julio (Presidente)
    • Senar Rosell, Miquel Àngel (Vocal)
  • dcterms:director
    • Vega Rodríguez, Miguel Ángel (Director)
  • dcterms:description
    • One of the most outstanding research topics in the field of bioinformatics is the reconstruction of evolutionary relationships among species. By studying the molecular features of living organisms, phylogenetic inference procedures seek to provide hypotheses about the evolutionary events which led to the current biodiversity in nature. The result of a phylogenetic analysis is given by a tree-shaped data structure known as phylogenetic tree, whose topology describes the evolutionary history of the input species by defining ancestor-descendant relationships. The biological quality of the inferred phylogenies is usually evaluated by means of optimality criteria, which allow the modelling of phylogenetic inference as an optimization problem.In this context, several key problems must be addressed. Firstly, the reconstruction of optimal solutions implies the processing of a search space whose size grows exponentially with the number of species under study. Additional difficulties are given by the fact that evaluation procedures require complex computations whose number grows linearly with the length of the input molecular sequences. A more controversial problem lies on the choice of the preferred optimality criterion, as it represents one of the most troublesome sources of conflict in phylogenetics. Situations where different optimality criteria give support to conflicting evolutionary histories for a given dataset can be solved by proposing a compromise view of phylogenetics based on multiobjective optimization. This PhD Thesis aims to apply bioinspired computing to perform real phylogenetic analyses according to the maximum parsimony and maximum likelihood criteria. Such goal requires a multiobjective formulation of the problem which considers the optimization of both objective functions simultaneously, in such a way that the output of the procedure will be given not by a single solution, but a set of multiobjective phylogenetic trees which represent good compromises between divergent criteria. The implications of performing multiobjective phylogenetic analyses have not been fully evaluated in the literature, so this research seeks to carry out comprehensive comparative studies among different multiobjective algorithms, in order to decide which algorithmic designs maximize the quality of the inferred trees from a multiobjective and biological point of view.Due to the significant complexity of problem, this Thesis is especially focused on the exploitation of modern parallel architectures to conduct real phylogenetic analyses. More specifically, this research undertakes the introduction of parallel computing techniques to the applied multiobjective designs, with the aim of taking advantage of the computing capabilities offered by different hardware setups: shared-memory architectures, multicore clusters, and heterogeneous CPU+GPU systems. In summary, this PhD Thesis is focused on the application of parallel and bioinspired computing to tackle the phylogenetic inference problem. The key goals of this research include the definition of a multiobjective formulation of phylogenetics to address conflicts in real biological analyses, the study and assessment of a variety of bioinspired multiobjective designs, and their efficient parallelization on different hardware architectures. Through the comparison with other state-of-the-art phylogenetic tools, we give account of the relevant parallel, multiobjective, and biological performance attained by the proposed multiobjective designs.*Bibliography (samples)[1] P. Lemey, M. Salemi, A.-M. Vandamme, The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing, Cambridge Univ. Press, Cambridge, 2009.[2] W. Fitch, Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology, Systematic Zoology 20 (4) (1972) 406-416.[3] J. Felsenstein, Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach, Journal of Molecular Evolution 17 (6) (1981) 368-376.[4] D. A. Bader, A. Stamatakis, C. W. Tseng, Computational Grand Challenges in Assembling the Tree of Life: Problems and Solutions, in: Advances in Computers, Vol. 68, Elsevier, 2006, pp. 127-176.[5] G. B. Fogel, Evolutionary Computation for the Inference of Natural Evolutionary Histories, IEEE Connections 3 (1) (2005) 11-14.[6] J. R. Macey, Plethodontid salamander mitochondrial genomics: A parsimony evaluation of character conflict and implications for historical biogeography, Cladistics 21 (2) (2005) 194-202.[7] J. J. Wiens, M. R. Servedio, Phylogenetic analysis and intraspecific variation: performance of parsimony, likelihood, and distance methods. Systematic Biology 47 (2) 228-253.[8] C. Coello, C. Dhaenens, L. Jourdan, Advances in Multi-Objective Nature Inspired Computing, Springer Verlag, Berlin / Heidelberg, 2010.[9] W. Gropp, W. Lusk, A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, 2nd ed., The MIT Press, Cambridge, MA, USA, 1999.[10] B. Chapman, G. Jost, R. van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, The MIT Press, Cambridge, MA, USA, 2007.[11] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation 6 (2) (2002) 182-197.[12] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, in: Proceedings of Evolutionary Methods for Design, Optimization, and Control with Applications to Industrial Problems (EUROGEN02), 2002, pp. 95-100.[13] E. Zitzler, S. Künzli, Indicator-Based Selection in Multiobjective Search, in: Parallel Problem Solving From Nature VIII, Vol. 3242 of LNCS, Springer Verlag, 2004, pp. 832-842.[14] D. Karaboga, B. Basturk, A Powerful and Efficient Algorithm for Numerical Function Optimization: Artificial Bee Colony (ABC) Algorithm, Journal of Global Optimization 39 (3) (2007) 459-171.[15] X. S. Yang, Firefly algorithm, stochastic test functions and design optimization, International Journal of Bio-Inspired Computation 2(2) (2010) 78-84.[16] X. S. Yang, X. He, Bat algorithm: literature review and applications, International Journal of Bio-Inspired Computation 5(3) (2013) 141-149.[17] H. Matsuda, Construction of Phylogenetic Trees from Amino acid Sequences using a Genetic Algorithm, in: Genome Informatics Workshop, Universal Academy Press, 1995, pp. 19-28.[18] P. O. Lewis, A Genetic Algorithm for Maximum-Likelihood Phylogeny Inference Using Nucleotide Sequence Data, Molecular Biology and Evolution 15 (3) (1998) 277-283.[19] A. Skourikhine, Phylogenetic tree reconstruction using self-adaptive genetic algorithm, in: Proceedings of the 1st IEEE International Symposium on Bioinformatics and Biomedical Engineering, 2000, pp. 129-134.[20] T. H. Reijmers, R. Wehrens, F. D. Daeyaert, P. J. Lewi, L. M. C Buydens, Using genetic algorithms for the construction of phylogenetic trees: application to G-protein coupled receptor sequences, Biosystems 49 (1) (1999) 31-43.[21] A. Moilanen, Searching for most parsimonious trees with simulated evolutionary optimization, Cladistics 15 (1) (1999) 39-50.[22] C. B. Congdon, K. J. Septor, Phylogenetic Trees using Evolutionary Search: Initial Progress in Extending Gaphyl to Work with Genetic Data, in: CEC 2003, IEEE Computer Society, 2003, pp. 320-326.[23] J. E. Gallardo, C. Cotta, A. J. Fernández, Reconstructing phylogenies with memetic algorithms and branch-and-bound, in: Analysis of Biological Data: A Soft Computing Approach, World Scientific, 2007, pp. 59-84.[24] D. Zwickl, Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion, Ph.D Thesis, Univ. Texas at Austin, 2006.[25] R. Helaers, M. C. Milinkovitch, MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics, BMC Bioinformatics 11 (1) (2010) 379-389.[26] P. A. Goloboff, J. S. Farris, K. C. Nixon, TNT, a free program for phylogenetic analysis, Cladistics 24 (5) (2008) 774-786.[27] A. Stamatakis, RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies, Bioinformatics 30 (9) (2014) 1312-1313.[28] C. Ceron, J. Dopazo, E. L. Zapata, J. M. Carazo, O. Trelles, Parallel implementation for DNAml program on message-passing architectures, Parallel Computing 24 (5-6) (1998) 701-716.[29] H. Schmidt, K. Strimmer, M. Vingron, A. Haeseler, Tree-puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing, Bioinformatics 18 (3) (2002) 502-504.[30] Q. Snell, M. Whiting, M. Clement, D. McLaughlin, Parallel phylogenetic inference, in: Proceedings of the 2000 ACM/IEEE conference on Supercomputing, IEEE Computer Society, 2001, Article 35.[31] K. Katoh, K. Kuma, T. Miyata, Genetic algorithm-based maximum likelihood analysis for molecular phylogeny, Journal of Molecular Evolution 53 (4-5) (2001) 477-484.[32] M. J. Brauer, M. T. Holder, L. A. Dries, D. J. Zwickl, P. O. Lewis, D. M. Hillis, Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference, Molecular Biology and Evolution 19 (10) (2002) 1717-1726.[33] B. Q. Minh, L. S. Vinh, A. von Haeseler, H. A. Schmidt, pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies, Bioinformatics 21 (19) (2005) 3794-3796.[34] F. Ronquist, et al., MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space, Systematic Biology 61 (3) (2012) 539-542.[35] F. Pratas, P. Trancoso, L. Sousa, A. Stamatakis, G. Shi, V. Kindratenko, Fine-grain parallelism using multi-core, Cell/BE, and GPU systems, Parallel Computing 38 (8) (2012) 365-390.[36] L. Poladian, L. Jermiin, Multi-Objective Evolutionary Algorithms and Phylogenetic Inference with Multiple Data Sets, Soft Computing 10 (4) (2006) 359-368.[37] G. P. Coelho, A. E. A. Silva, F. J. V. Zuben, An Immune-Inspired Multi-Objective Approach to the Reconstruction of Phylogenetic Trees, Neural Computing and Applications 19 (8) (2010) 1103-1132.[38] W. Cancino, A. C. B. Delbem, A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction, in: New Achievements in Evol. Comp., InTech, 2010, pp. 135-156.[39] W. Cancino, L. Jourdan, E. G. Talbi, A. C. B. Delbem, Parallel Multi-Objective Approaches for Inferring Phylogenies, in: Proc. of EVOBIO 2010, Vol. 6023 of LNCS, Springer Verlag, 2010, pp. 26-37.[40] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, V. G. D. Fonseca, Performance Assessment of Multiobjective Optimizers: An Analysis and Review, IEEE Transactions on Evolutionary Computation 7 (2) (2003), 117-132.[41] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures. 5th edition, Chapman & Hall/CRC, New York, NY, USA, 2011.[42] H. Shimodaira, M. Hasegawa, CONSEL: for assessing the confidence of phylogenetic tree selection, Bioinformatics 17 (12) (2001) 1246-1247.
  • dcterms:creator
    • Santander Jiménez, Sergio
  • dcterms:identifier
    • 2016-48
  • ou:programaDoctorado
    • Tecnologías Informáticas Y Comunicaciones (Tinc)
  • dcterms:subject
    • Informatica
    • Genetica De Poblaciones
    • Inteligencia Artificial
    • Heuristica
  • ou:tesisDehesa
  • dcterms:title
    • Análisis E Inferencia Multiobjetivo De Hipótesis Filogenéticas Mediante Computación Paralela Y Bioinspirada - Multiobjective Analysis And Inference Of Phylogenetic Hypotheses By Means Of Parallel And Bioinspired Computing
  • vcard:url

Typed Literals

Recognized prefixes