A Multi-Objective approach to the optimization of Multiple Sequence Alignment (MSA)

  • Cristian Zambrano-Vega Universidad Técnica Estatal de Quevedo
  • Miriam Cárdenas-Zea Universidad Técnica Estatal de Quevedo
  • Ricardo Aguirre-Pérez Universidad Técnica Estatal de Quevedo

Abstract

Multiple Sequence Alignment (MSA) is one of the main topics in the bioinformatics domain, consists in finding an optimal alignment for three or more biological sequences with the number maximum of conserved zones or totally aligned columns. Different scores to assess the quality of the alignments have been proposed, so the problem can be formulated and resolved as a Multi-Objective Optimization Problem (MOP). For this reason we have carried out a perfomanced study resolving the MSA problem under a multi-objective approach, considering two popular metrics as objectives to be optimized: The weighted Sum-Of-Pairs with affine gap penalties (wSOP) and the Totally Aligned Columns (TC), with three algorithms from the state-of- the-art of Multi-Objective Optimization: NSGAII, SPEA2 and MOCell. Our experiments reveals that the classic metaheuristic NSGA-II provides the best overall performance resolving some problems provided by the benchmark BAliBASE (v3.0), under a multi-objective and biological approach.

DOI

Downloads

Download data is not yet available.

Author Biography

Cristian Zambrano-Vega, Universidad Técnica Estatal de Quevedo



References

J. Pei, “Multiple protein sequence alignment, ”Current Opinion in Structural Biology, vol. 18, no. 3, pp. 382 – 386, 2008,nucleic acids / Sequences and topology. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0959440X08000407

S. Needleman and C. Wunsch, “A general method applicable to the search for similarities in the aminoacid sequence of two proteins, ”Journal of Molecular Biology, vol.48, no.3, pp.443–453, 1970. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0022283670900574

I. Elias, “Settling the intractability of multiple alignment,”Journal of Computational Biology, vol. 13, no. 7, pp. 1323 – 1339, 2016.

C. Kemena, J. Taly, J. Kleinjung, and C. Notredame, “Strike: evaluation of protein msas using a single 3d structure,”Bioinformatics, vol. 27,no. 24, pp. 3385–3391, 2011.

W. Soto and D. Becerra, “A multi-objective evolutionary algorithm for improving multiple sequence alignments,” in Advances in Bioinformatics and Computational Biology, ser. Lecture Notes in Computer Science,S. Campos, Ed. Springer International Publishing, 2014, vol. 8826, pp.73–82.

B. Blackburneand S. Whelan, “Measuring the distance between multiple sequence alignments, ”Bioinformatics,vol.28, no.4, pp.495–502,2012.[Online]. Available: http://bioinformatics.oxfordjournals.org/content/28/4/495.abstract

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,”IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.

E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving the strengthpareto evolutionary algorithm,” inEUROGEN 2001. Evolutionary Met-hods for Design, Optimization and Control with Applications to Indus-trial Problems, K. Giannakoglou, D. Tsahalis, J. Periaux, P. Papailou,and T. Fogarty, Eds., Athens, Greece, 2002, pp. 95–100.

A. Nebro, J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “Designissues in a multiobjective cellular genetic algorithm,” inEvolutionaryMulti-Criterion Optimization. 4th International Conference, EMO 2007,ser. Lecture Notes in Computer Science, S. Obayashi, K. Deb, C. Poloni,T. Hiroyasu, and T. Murata, Eds., vol. 4403. Springer, 2007, pp. 126–140.

F. Ortuño, O. Valenzuela, F. Rojas, H. Pomares, J. Florido, J. Urquiza,and I. Rojas, “Optimizing multiple sequence alignments using a geneticalgorithm based on three objectives: structural information, non-gapspercentage and totally conserved columns.”Bioinformatics (Oxford,England), vol. 29, no. 17, pp. 2112–21, Sep. 2013. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/23793754

F. J. M. da Silva, J. M. S. Pérez, J. A. G. Pulido, andM. a. V. Rodríguez, “Parallel Niche Pareto AlineaGA–an evolutionary multiobjective approach on multiple sequence alignment.”Journal of integrative bioinformatics, vol. 8, no. 3, p. 174, 2011. [Online].Available: http://www.ncbi.nlm.nih.gov/pubmed/21926437

M. Kaya, A. Sarhan, and R. Abdullah, “Multiple sequencealignmentwithaffinegapbyusingmulti-objectivegeneticalgorithm.”Computer methods and programs in biomedicine, vol. 114, no. 1, pp. 38–49, Apr. 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/24534604

F. P. M. Abbasi, L. Paquete, “Local search for multiobjective multiplesequence alignment,” inBioinformatics and Biomedical Engineering, ser. Lecture Notes in Computer Science, F. Ortuño and I. Rojas, Eds.Springer International Publishing, 2015, vol. 9044, pp. 175–182.

M. Dayhoff, R. Schwartz, and B. B.C. Orcutt, “A model of evolutionarychange in proteins,”In Atlas of Protein Sequences and Structure, vol. 5,pp. 345–352, 1978.

S. Henikoff and J. Henikoff, “Aminoacid substitution matrices fromprotein blocks,”Proceedings of the National Academy of Sciences,vol. 89, no. 22, pp. 10 915–10 919, 1992.

A. Nebro, J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “Mocell: Acellular genetic algorithm for multiobjective optimization,”International Journal of Intelligent Systems, vol. 24, no. 7, pp. 723 – 725, 2009.

L. Bradstreet, The hypervolume indicator for multi-objective optimisation: calculation and use. University of Western Australia, 2011.

E. Zitzler, L. Thiele, M. L. anb C.M. Fonseca, and V. da Fonseca,“Performance assessment ofmultiobjective optimizers: An analysis andreview,”IEEE Trans. on Evolutionary Computation, vol. 7, no. 2, pp.117 – 132, 2003.

G. Raghava, S. M. Searle, P. C. Audley, J. D. Barber, and G. J. Barton,“Oxbench: A benchmark for evaluation of protein multiple sequencealignment accuracy,”BMC Bioinformatics, vol. 4, no. 1, pp. 1–23,2003. [Online]. Available: http://dx.doi.org/10.1186/1471-2105-4-47

P. I. W. de Bakker, A. Bateman, D. F. Burke, R. N. Miguel ,K. Mizuguchi, J. Shi, H. Shirai, and T. L. Blundell, “Homstrad: adding sequence information to structure-based alignments of homologous protein families,” Bioinformatics, vol.17, no.8, pp.748–749, 2001. [Online]. Available: http://bioinformatics.oxfordjournals.org/content/17/8/748.abstract

R. Edgar, “Muscle: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Research, vol.32, no.5, pp.1792–1797, 2004.[Online]. Available: http://nar.oxfordjournals.org/content/32/5/1792.abstract

J. Thompson, P. Koehl, and O. Poch, “Balibase 3.0: latest developmentsof the multiple sequence alignment benchmark,”Proteins, vol. 61, pp.127–136, 2005.

H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig,I. Shindyalov, and P. Bourne, “The protein data bank,”Nucleic AcidsResearch, vol. 28, no. 1, pp. 235–242, 2000. [Online]. Available:http://nar.oxfordjournals.org/content/28/1/235

A. Nebro, J. J. Durillo, and M. Vergne, “Redesigning the jmetalmulti-objective optimization framework,” inProceedings of theCompanion Publication of the 2015 Annual Conference on Geneticand Evolutionary Computation, ser. GECCO Companion ’15.NewYork, NY, USA: ACM, 2015, pp. 1093–1100. [Online]. Available:http://doi.acm.org/10.1145/2739482.2768462

Published
2016-05-20
How to Cite
[1]
C. Zambrano-Vega, M. Cárdenas-Zea, and R. Aguirre-Pérez, “A Multi-Objective approach to the optimization of Multiple Sequence Alignment (MSA)”, LAJC, vol. 3, no. 1, pp. 43 - 51, May 2016.
Section
Research Articles for the Regular Issue

Most read articles by the same author(s)