skip to main content


Title: An Attribute-Based Access Control Model for Secure Big Data Processing in Hadoop Ecosystem
Apache Hadoop is a predominant software framework for distributed compute and storage with capability to handle huge amounts of data, usually referred to as Big Data. This data collected from different enterprises and government agencies often includes private and sensitive information, which needs to be secured from unauthorized access. This paper proposes extensions to the current authorization capabilities offered by Hadoop core and other ecosystem projects, specifically Apache Ranger and Apache Sentry. We present a fine-grained attribute-based access control model, referred as HeABAC, catering to the security and privacy needs of multi-tenant Hadoop ecosystem. The paper reviews the current multi-layered access control model used primarily in Hadoop core (2.x), Apache Ranger (version 0.6) and Sentry (version 1.7.0), as well as a previously proposed RBAC extension (OT-RBAC). It then presents a formal attribute-based access control model for Hadoop ecosystem, including the novel concept of cross Hadoop services trust. It further highlights different trust scenarios, presents an implementation approach for HeABAC using Apache Ranger and, discusses the administration requirements of HeABAC operational model. Some comprehensive, real-world use cases are also discussed to reflect the application and enforcement of the proposed HeABAC model in Hadoop ecosystem.  more » « less
Award ID(s):
1736209 1538418 1423481 1111925
NSF-PAR ID:
10072092
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ABAC’18: 3rd ACM Workshop on Attribute-Based Access Control, March 19–21, 2018, Tempe, AZ,
Page Range / eLocation ID:
13 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control (ABAC) provides a dynamic and flexible solution that is effective for mediating access. We analyzed and implemented a prototype application of ABAC to large dataset processing in Amazon Web Services, using open-source versions of Apache Hadoop, Ranger, and Atlas. The Hadoop ecosystem is one of the most popular frameworks for large dataset processing and storage and is adopted by major cloud service providers. We conducted a rigorous analysis of cybersecurity in implementing ABAC policies in Hadoop, including developing a synthetic dataset of information at multiple sensitivity levels that realistically represents healthcare and connected social media data. We then developed Apache Spark programs that extract, connect, and transform data in a manner representative of a realistic use case. Our result is a framework for securing big data. Applying this framework ensures that serious cybersecurity concerns are addressed. We provide details of our analysis and experimentation code in a GitHub repository for further research by the community.

     
    more » « less
  2. In today's mobile-first, cloud-enabled world, where simulation-enabled training is designed for use anywhere and from multiple different types of devices, new paradigms are needed to control access to sensitive data. Large, distributed data sets sourced from a wide-variety of sensors require advanced approaches to authorizations and access control (AC). Motivated by large-scale, publicized data breaches and data privacy laws, data protection policies and fine-grained AC mechanisms are an imperative in data intensive simulation systems. Although the public may suffer security incident fatigue, there are significant impacts to corporations and government organizations in the form of settlement fees and senior executive dismissal. This paper presents an analysis of the challenges to controlling access to big data sets. Implementation guidelines are provided based upon new attribute-based access control (ABAC) standards. Best practices start with AC for the security of large data sets processed by models and simulations (M&S). Currently widely supported eXtensible Access Control Markup Language (XACML) is the predominant framework for big data ABAC. The more recently developed Next Generation Access Control (NGAC) standard addresses additional areas in securing distributed, multi-owner big data sets. We present a comparison and evaluation of standards and technologies for different simulation data protection requirements. A concrete example is included to illustrate the differences. The example scenario is based upon synthetically generated very sensitive health care data combined with less sensitive data. This model data set is accessed by representative groups with a range of trust from highly-trusted roles to general users. The AC security challenges and approaches to mitigate risk are discussed. 
    more » « less
  3. Smart homes are interconnected homes in which a wide variety of digital devices with limited resources communicate with multiple users and among themselves using multiple protocols. The deployment of resource-limited devices and the use of a wide range of technologies expand the attack surface and position the smart home as a target for many potential security threats. Access control is among the top security challenges in smart home IoT. Several access control models have been developed or adapted for IoT in general, with a few specifically designed for the smart home IoT domain. Most of these models are built on the role-based access control (RBAC) model or the attribute-based access control (ABAC) model. However, recently some researchers demonstrated that the need arises for a hybrid model combining ABAC and RBAC, thereby incorporating the benefits of both models to better meet IoT access control challenges in general and smart homes requirements in particular. In this paper, we used two approaches to develop two different hybrid models for smart home IoT. We followed a role-centric approach and an attribute-centric approach to develop HyBAC RC and HyBAC AC , respectively. We formally define these models and illustrate their features through a use case scenario demonstration. We further provide a proof-of-concept implementation for each model in Amazon Web Services (AWS) IoT platform. Finally, we conduct a theoretical comparison between the two models proposed in this paper in addition to the EGRBAC model (RBAC model for smart home IoT) and HABAC model (ABAC model for smart home IoT), which were previously developed to meet smart homes’ challenges. 
    more » « less
  4. Smart homes are interconnected homes in which a wide variety of digital devices with limited resources communicate with multiple users and among themselves using multiple protocols. The deployment of resource-limited devices and the use of a wide range of technologies expand the attack surface and position the smart home as a target for many potential security threats. Access control is among the top security challenges in smart home IoT. Several access control models have been developed or adapted for IoT in general, with a few specifically designed for the smart home IoT domain. Most of these models are built on the role-based access control (RBAC) model or the attribute-based access control (ABAC) model. However, recently some researchers demonstrated that the need arises for a hybrid model combining ABAC and RBAC, thereby incorporating the benefits of both models to better meet IoT access control challenges in general and smart homes requirements in particular. In this paper, we used two approaches to develop two different hybrid models for smart home IoT. We followed a role-centric approach and an attribute-centric approach to develop HyBAC RC and HyBAC AC , respectively. We formally define these models and illustrate their features through a use case scenario demonstration. We further provide a proof-of-concept implementation for each model in Amazon Web Services (AWS) IoT platform. Finally, we conduct a theoretical comparison between the two models proposed in this paper in addition to the EGRBAC model (RBAC model for smart home IoT) and HABAC model (ABAC model for smart home IoT), which were previously developed to meet smart homes’ challenges. 
    more » « less
  5. Lankes, R.David (Ed.)
    Resilience is often treated as a single-dimension system attribute, or various dimensions of resilience are studied separately without considering multi-dimensionality. The increasing frequency of catastrophic natural or man-made disasters affecting rural areas demands holistic assessments of community vulnerability and assessment. Disproportionate effects of disasters on minorities, low-income, hard-to-reach, and vulnerable populations demand a community-oriented planning approach to address the “resilience divide.” Rural areas have many advantages, but low population density, coupled with dispersed infrastructures and community support networks, make these areas more affected by natural disasters. This paper will catalyze three key learnings from our current work in public librarians’ roles in disaster resiliency: 1) rural communities are composed of diverse sub-communities, each which experiences and responds to traumatic events differently, depending on micro-geographic and demographic drivers; 2) public libraries are central to rural life, providing a range of informational, educational, social, and personal services, especially in remote areas that lack reliable access to community resources during disasters; and 3) rural citizens tend to be very self-reliant and are committed to strengthening and sustaining community resiliency with local human capital and resources. Public libraries and their librarian leaders are often a “crown jewel” of rural areas’ community infrastructure and this paper will present a community-based design and assessment process for resiliency hubs located in and operated through rural public libraries. The core technical and social science research questions explored in the proposed paper are: 1) Who were the key beneficiaries and what did they need? 2) What was the process of designing a resiliency hub? 3) What did library resiliency hubs provide and how can they be sustained? This resiliency hub study will detail co-production of solutions and involves an inclusive collaboration among researchers, librarians, and community members to address the effects of cascading impacts of natural disasters. The novel co-design process detailed in the paper reflects 1) an in-depth understanding of the complex interactions among libraries, residents, governments, and other agencies by collecting sociotechnical hurricane-related data for Calhoun County, Florida, USA, a region devastated by Hurricane Michael (2018) and hard-hit by Covid-19; 2) analyzed data from newly-developed fusing algorithms and incorporating multiple communities; and 3) co-designed resiliency hubs sited in public libraries. This research leverages a unique opportunity for the co-development of integrated library-centered policies and technologies to establish a new paradigm for developing disaster resiliency in rural settings. Public libraries serve a diverse population who will directly benefit from practical support tailored to their needs. The project will inform efficient plans to ensure that high-need groups are not isolated in disasters. The knowledge and insight gained from disseminating the study’s results will not only improve our understanding of emergency response operations, but also will contribute to the development of new disaster-related policies and plans for public libraries, with a broader application to rural communities in many settings. 
    more » « less