A Multi-Objective approach to the optimization of Multiple Sequence Alignment (MSA)
Abstract
Multiple Sequence Alignment (MSA) is one of the main topics in the bioinformatics domain, consists in finding an optimal alignment for three or more biological sequences with the number maximum of conserved zones or totally aligned columns. Different scores to assess the quality of the alignments have been proposed, so the problem can be formulated and resolved as a Multi-Objective Optimization Problem (MOP). For this reason we have carried out a perfomanced study resolving the MSA problem under a multi-objective approach, considering two popular metrics as objectives to be optimized: The weighted Sum-Of-Pairs with affine gap penalties (wSOP) and the Totally Aligned Columns (TC), with three algorithms from the state-of- the-art of Multi-Objective Optimization: NSGAII, SPEA2 and MOCell. Our experiments reveals that the classic metaheuristic NSGA-II provides the best overall performance resolving some problems provided by the benchmark BAliBASE (v3.0), under a multi-objective and biological approach.
Downloads
References
J. Pei, “Multiple protein sequence alignment, ”Current Opinion in Structural Biology, vol. 18, no. 3, pp. 382 – 386, 2008,nucleic acids / Sequences and topology. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0959440X08000407
S. Needleman and C. Wunsch, “A general method applicable to the search for similarities in the aminoacid sequence of two proteins, ”Journal of Molecular Biology, vol.48, no.3, pp.443–453, 1970. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0022283670900574
I. Elias, “Settling the intractability of multiple alignment,”Journal of Computational Biology, vol. 13, no. 7, pp. 1323 – 1339, 2016.
C. Kemena, J. Taly, J. Kleinjung, and C. Notredame, “Strike: evaluation of protein msas using a single 3d structure,”Bioinformatics, vol. 27,no. 24, pp. 3385–3391, 2011.
W. Soto and D. Becerra, “A multi-objective evolutionary algorithm for improving multiple sequence alignments,” in Advances in Bioinformatics and Computational Biology, ser. Lecture Notes in Computer Science,S. Campos, Ed. Springer International Publishing, 2014, vol. 8826, pp.73–82.
B. Blackburneand S. Whelan, “Measuring the distance between multiple sequence alignments, ”Bioinformatics,vol.28, no.4, pp.495–502,2012.[Online]. Available: http://bioinformatics.oxfordjournals.org/content/28/4/495.abstract
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,”IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strengthpareto evolutionary algorithm,” inEUROGEN 2001. Evolutionary Met-hods for Design, Optimization and Control with Applications to Indus-trial Problems, K. Giannakoglou, D. Tsahalis, J. Periaux, P. Papailou,and T. Fogarty, Eds., Athens, Greece, 2002, pp. 95–100.
A. Nebro, J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “Designissues in a multiobjective cellular genetic algorithm,” inEvolutionaryMulti-Criterion Optimization. 4th International Conference, EMO 2007,ser. Lecture Notes in Computer Science, S. Obayashi, K. Deb, C. Poloni,T. Hiroyasu, and T. Murata, Eds., vol. 4403. Springer, 2007, pp. 126–140.
F. Ortuño, O. Valenzuela, F. Rojas, H. Pomares, J. Florido, J. Urquiza,and I. Rojas, “Optimizing multiple sequence alignments using a geneticalgorithm based on three objectives: structural information, non-gapspercentage and totally conserved columns.”Bioinformatics (Oxford,England), vol. 29, no. 17, pp. 2112–21, Sep. 2013. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/23793754
F. J. M. da Silva, J. M. S. Pérez, J. A. G. Pulido, andM. a. V. Rodríguez, “Parallel Niche Pareto AlineaGA–an evolutionary multiobjective approach on multiple sequence alignment.”Journal of integrative bioinformatics, vol. 8, no. 3, p. 174, 2011. [Online].Available: http://www.ncbi.nlm.nih.gov/pubmed/21926437
M. Kaya, A. Sarhan, and R. Abdullah, “Multiple sequencealignmentwithaffinegapbyusingmulti-objectivegeneticalgorithm.”Computer methods and programs in biomedicine, vol. 114, no. 1, pp. 38–49, Apr. 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/24534604
F. P. M. Abbasi, L. Paquete, “Local search for multiobjective multiplesequence alignment,” inBioinformatics and Biomedical Engineering, ser. Lecture Notes in Computer Science, F. Ortuño and I. Rojas, Eds.Springer International Publishing, 2015, vol. 9044, pp. 175–182.
M. Dayhoff, R. Schwartz, and B. B.C. Orcutt, “A model of evolutionarychange in proteins,”In Atlas of Protein Sequences and Structure, vol. 5,pp. 345–352, 1978.
S. Henikoff and J. Henikoff, “Aminoacid substitution matrices fromprotein blocks,”Proceedings of the National Academy of Sciences,vol. 89, no. 22, pp. 10 915–10 919, 1992.
A. Nebro, J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “Mocell: Acellular genetic algorithm for multiobjective optimization,”International Journal of Intelligent Systems, vol. 24, no. 7, pp. 723 – 725, 2009.
L. Bradstreet, The hypervolume indicator for multi-objective optimisation: calculation and use. University of Western Australia, 2011.
E. Zitzler, L. Thiele, M. L. anb C.M. Fonseca, and V. da Fonseca,“Performance assessment ofmultiobjective optimizers: An analysis andreview,”IEEE Trans. on Evolutionary Computation, vol. 7, no. 2, pp.117 – 132, 2003.
G. Raghava, S. M. Searle, P. C. Audley, J. D. Barber, and G. J. Barton,“Oxbench: A benchmark for evaluation of protein multiple sequencealignment accuracy,”BMC Bioinformatics, vol. 4, no. 1, pp. 1–23,2003. [Online]. Available: http://dx.doi.org/10.1186/1471-2105-4-47
P. I. W. de Bakker, A. Bateman, D. F. Burke, R. N. Miguel ,K. Mizuguchi, J. Shi, H. Shirai, and T. L. Blundell, “Homstrad: adding sequence information to structure-based alignments of homologous protein families,” Bioinformatics, vol.17, no.8, pp.748–749, 2001. [Online]. Available: http://bioinformatics.oxfordjournals.org/content/17/8/748.abstract
R. Edgar, “Muscle: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Research, vol.32, no.5, pp.1792–1797, 2004.[Online]. Available: http://nar.oxfordjournals.org/content/32/5/1792.abstract
J. Thompson, P. Koehl, and O. Poch, “Balibase 3.0: latest developmentsof the multiple sequence alignment benchmark,”Proteins, vol. 61, pp.127–136, 2005.
H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig,I. Shindyalov, and P. Bourne, “The protein data bank,”Nucleic AcidsResearch, vol. 28, no. 1, pp. 235–242, 2000. [Online]. Available:http://nar.oxfordjournals.org/content/28/1/235
A. Nebro, J. J. Durillo, and M. Vergne, “Redesigning the jmetalmulti-objective optimization framework,” inProceedings of theCompanion Publication of the 2015 Annual Conference on Geneticand Evolutionary Computation, ser. GECCO Companion ’15.NewYork, NY, USA: ACM, 2015, pp. 1093–1100. [Online]. Available:http://doi.acm.org/10.1145/2739482.2768462
This article is published by LAJC under a Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International License. This means that non-exclusive copyright is transferred to the National Polytechnic School. The Author (s) give their consent to the Editorial Committee to publish the article in the issue that best suits the interests of this Journal. Find out more in our Copyright Notice.
Disclaimer
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.