Modeling the Performance of MapReduce Applications for the Cloud
Abstract
In the last years, Cloud Computing has become a key technology that made possible to run applications without needing to deploy a physical infrastructure. The challenge with deploying distributed applications in Cloud Computing environments is that the virtual machine infrastructure should be planned in a time and cost-effective way.
This work is a summary of a previous work presented by the authors as a Master’s thesis, with the goal of showing that the execution time of a distributed MapReduce application, running in a Cloud computing environment, can be predicted using a mathematical model based on theoretical specifications. This prediction is made to help the users of the Cloud Computing environment to plan their deployments, i.e., quantify the number of virtual machines and its characteristics. After measuring the application execution time and varying parameters stated in the mathematical model, and after that, using a linear regression technique, the goal is achieved finding a model of the execution time which was then applied to predict the execution time of MapReduce applications. Experiments were conducted in several configurations and showed a clear relation with the theoretical model, revealing that the model is in fact able to predict the execution time of MapReduce applications. The developed model is generic, meaning that it uses theoretical abstractions for the computing capacity of the environment and the computing cost of the MapReduce application.
Downloads
References
P. Mell and T. Grance, “The nist definition of cloud computing (draft),”NIST special publication, vol. 800, p. 145, 2011.
K. Yelick, S. Coghlan, B. Draney, R. S. Canonet al., “The magellanreport on cloud computing for science,”US Department of Energy Officeof Science, Office of Advanced Scientific Computing Research (ASCR)December, 2011.
J. Dean and S. Ghemawat, “Mapreduce: simplified data processing onlarge clusters,”Communications of the ACM, vol. 51, no. 1, pp. 107–113,2008.
S. Babu, “Towards automatic optimization of mapreduce programs,” inProceedings of the 1st ACM symposium on Cloud computing.ACM,2010, pp. 137–142.
H. Herodotou, F. Dong, and S. Babu, “No one (cluster) size fits all:automatic cluster sizing for data-intensive analytics,” inProceedings ofthe 2nd ACM Symposium on Cloud Computing. ACM, 2011, p. 18.
R. Boutaba, L. Cheng, and Q. Zhang, “On cloud computational modelsand the heterogeneity challenge,”Journal of Internet Services andApplications, vol. 3, no. 1, pp. 77–86, 2012.
I. Carrera Izurieta and C. Geyer, “Performance modeling ofmapreduce applications for the cloud,” Master’s thesis, UniversidadeFederaldoRioGrandedoSul,2014.[Online].Available:”http://hdl.handle.net/10183/99055”
I. Carrera and C. Geyer, “Impressionism in cloud computing. a positionpaper on capacity planning in cloud computing environments,” inPro-ceedings of the 15th International Conference on Enterprise InformationSystems (ICEIS). INSTICC, 2013, pp. 333–338.
H. Herodotou, “Hadoop performance models. technical reportcs-2011-05,”Duke Computer Science, 2011. [Online]. Available:”http://www.cs.duke.edu/starfish/files/hadoop-models.pdf”
F. Tian and K. Chen, “Towards optimal resource provisioning for runningmapreduce programs in public clouds,” inCloud Computing (CLOUD),2011 IEEE International Conference on. IEEE, 2011, pp. 155–162.
H. Karloff, S. Suri, and S. Vassilvitskii, “A model of computation formapreduce,” inProceedings of the Twenty-First Annual ACM-SIAMSymposium on Discrete Algorithms. Society for Industrial and AppliedMathematics, 2010, pp. 938–948.
D. Jiang, B. C. Ooi, L. Shi, and S. Wu, “The performance of mapreduce:An in-depth study,”Proceedings of the VLDB Endowment, vol. 3, no.1-2, pp. 472–483, 2010.
Hadoop, 2013, apache Hadoop https://www.grid5000.fr/ accessed on12/28/2013.
T. White,Hadoop: the definitive guide. O’Reilly, 2012.
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoopdistributed file system,” inMass Storage Systems and Technologies(MSST), 2010 IEEE 26th Symposium on. IEEE, 2010, pp. 1–10.
EMR, 2013, amazon Web Services - EMR Elastic MapReducehttp://aws.amazon.com/elasticmapreduce accessed on 07/23/2013.
EC2, 2013, amazon Web Services - EC2 Elastic Compute Cloudhttp://aws.amazon.com/ec2 accessed on 07/23/2013.
A. Iosup, S. Ostermann, M. N. Yigitbasi, R. Prodan, T. Fahringer,and D. H. Epema, “Performance analysis of cloud computing servicesfor many-tasks scientific computing,”Parallel and Distributed Systems,IEEE Transactions on, vol. 22, no. 6, pp. 931–945, 2011.
HDInsight, 2013, windowsAzureHDInsighthttp://azure.microsoft.com/en-us/documentation/services/hdinsight/accessed on 12/02/2014.
A. Sangroya, D. Serrano, and S. Bouchenak, “Benchmarking depend-ability of mapreduce systems,” inReliable Distributed Systems (SRDS),2012 IEEE 31st Symposium on. IEEE, 2012, pp. 21–30.
O. OMalley, “Terabyte sort on apache hadoop,”Yahoo, available onlineat: http://sortbenchmark. org/Yahoo-Hadoop. pdf, pp. 1–3, 2008.
I. Carrera, F. Scariot, P. Turin, and C. Geyer, “An example for perfor-mance prediction for map reduce applications in cloud environments,”inEscola Regional de Redes de Computadores ERRC - RS Rio Grandedo Sul, 2013.
R. Jain,The art of computer systems performance analysis. John Wiley& Sons Chichester, 1991, vol. 182.
R,R: A Language and Environment for Statistical Computing, RFoundation for Statistical Computing, Vienna, Austria, 2011, ISBN 3-900051-07-0 http://www.R-project.org/.
This article is published by LAJC under a Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International License. This means that non-exclusive copyright is transferred to the National Polytechnic School. The Author (s) give their consent to the Editorial Committee to publish the article in the issue that best suits the interests of this Journal. Find out more in our Copyright Notice.
Disclaimer
LAJC in no event shall be liable for any direct, indirect, incidental, punitive, or consequential copyright infringement claims related to articles that have been submitted for evaluation, or published in any issue of this journal. Find out more in our Disclaimer Notice.