HaaS in Environmental Computing: Hadoop-as-a-Service for Big Data Mining in Environmental Computing Applications

Varde, Aparna S

doi:10.1145/3704991.3704995

This article addresses the importance of HaaS (Hadoop-as-a-Service) in cloud technologies, with specific reference to its usefulness in big data mining for environmental computing applications. The term environmental computing refers to computational analysis within environmental science and management, encompassing a myriad of techniques, especially in data mining and machine learning. As is well-known, the classical MapReduce has been adapted within many applications for big data storage and information retrieval. Hadoop based tools such as Hive and Mahout are broadly accessible over the cloud and can be helpful in data warehousing and data mining over big data in various domains. In this article, we explore HaaS technologies, mainly based on Apache's Hive and Mahout for applications in environmental computing, considering publicly available data on the Web. We dwell upon interesting applications such as automated text classification for energy management, recommender systems for ecofriendly products, and decision support in urban planning. We briefly explain the classical paradigms of MapReduce, Hadoop and Hive, further delve into data mining and machine learning over the MapReduce framework, and explore techniques such as Naïve Bayes and Random Forests using Apache Mahout with respect to the targeted applications. Hence, the paradigm of Hadoop-as-a-Service, popularly referred to as HaaS, is emphasized here as per its benefits in a domain-specific context. The studies in environmental computing, as presented in this article, can be useful in other domains as well, considering similar applications. This article can thus be interesting to professionals in web technologies, cloud computing, environmental management, as well as AI and data science in general.

More Like this