skip to main content


Title: Cybersecurity in Big Data Era: From Securing Big Data to Data-Driven Security
"Knowledge is power" is an old adage that has been found to be true in today's information age. Knowledge is derived from having access to information. The ability to gather information from large volumes of data has become an issue of relative importance. Big Data Analytics (BDA) is the term coined by researchers to describe the art of processing, storing and gathering large amounts of data for future examination. Data is being produced at an alarming rate. The rapid growth of the Internet, Internet of Things (IoT) and other technological advances are the main culprits behind this sustained growth. The data generated is a reflection of the environment it is produced out of, thus we can use the data we get out of systems to figure out the inner workings of that system. This has become an important feature in cybersecurity where the goal is to protect assets. Furthermore, the growing value of data has made big data a high value target. In this paper, we explore recent research works in cybersecurity in relation to big data. We highlight how big data is protected and how big data can also be used as a tool for cybersecurity. We summarize recent works in the form of tables and have presented trends, open research challenges and problems. With this paper, readers can have a more thorough understanding of cybersecurity in the big data era, as well as research trends and open challenges in this active research area.  more » « less
Award ID(s):
1828811
NSF-PAR ID:
10094358
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Services Computing
ISSN:
2372-0204
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Data breaches have become a formidable challenge for business operations in the twenty-first century. The emergence of big data in the ever-growing digital economy has created the necessity to secure critical organizational information. The lack of cybersecurity awareness exposes organizations to potential cyber threats. Thus, this research aims to identify the various dimensions of cybersecurity awareness capabilities. Drawing on the dynamic capabilities framework, the findings of the study show personnel (knowledge, attitude and learning), management (training, culture and strategic orientation) and infrastructure capabilities (technology and data governance) as thematic dimensions to tackle cybersecurity awareness challenges. 
    more » « less
  2. null (Ed.)
    The implementation of Internet of Things (IoT) devices in medical environments, has introduced a growing list of security vulnerabilities and threats. The lack of an extensible big data resource that captures medical device vulnerabilities limits the use of Artificial Intelligence (AI) based cyber defense systems in capturing, detecting, and preventing known and future attacks. We describe a system that generates a repository of Cyber Threat Intelligence (CTI) about various medical devices and their known vulnerabilities from sources such as manufacturer and ICS-CERT vulnerability alerts. We augment the intelligence repository with data sources such as Wikidata and public medical databases. The combined resources are integrated with threat intelligence in our Cybersecurity Knowledge Graph (CKG) from previous research. The augmented graph embeddings are useful in querying relevant information and can help in various AI assisted cybersecurity tasks. Given the integration of multiple resources, we found the augmented CKG produced higher quality graph representations. The augmented CKG produced a 31% increase in the Mean Average Precision (MAP) value, computed over an information retrieval task. 
    more » « less
  3. This work introduces Wearable deep learning (WearableDL) that is a unifying conceptual architecture inspired by the human nervous system, offering the convergence of deep learning (DL), Internet-of-things (IoT), and wearable technologies (WT) as follows: (1) the brain, the core of the central nervous system, represents deep learning for cloud computing and big data processing. (2) The spinal cord (a part of CNS connected to the brain) represents Internet-of-things for fog computing and big data flow/transfer. (3) Peripheral sensory and motor nerves (components of the peripheral nervous system (PNS)) represent wearable technologies as edge devices for big data collection. In recent times, wearable IoT devices have enabled the streaming of big data from smart wearables (e.g., smartphones, smartwatches, smart clothings, and personalized gadgets) to the cloud servers. Now, the ultimate challenges are (1) how to analyze the collected wearable big data without any background information and also without any labels representing the underlying activity; and (2) how to recognize the spatial/temporal patterns in this unstructured big data for helping end-users in decision making process, e.g., medical diagnosis, rehabilitation efficiency, and/or sports performance. Deep learning (DL) has recently gained popularity due to its ability to (1) scale to the big data size (scalability); (2) learn the feature engineering by itself (no manual feature extraction or hand-crafted features) in an end-to-end fashion; and (3) offer accuracy or precision in learning raw unlabeled/labeled (unsupervised/supervised) data. In order to understand the current state-of-the-art, we systematically reviewed over 100 similar and recently published scientific works on the development of DL approaches for wearable and person-centered technologies. The review supports and strengthens the proposed bioinspired architecture of WearableDL. This article eventually develops an outlook and provides insightful suggestions for WearableDL and its application in the field of big data analytics. 
    more » « less
  4. A cursory look at the Internet protocol stack shows error checking capability almost at every layer, and yet, a slowly growing set of results show that a surprising fraction of big data transfers over TCP/IP are failing. As we have dug into this problem, we have come to realize that nobody is paying much attention to the causes of transmission errors in the Internet. Rather, they have typically resorted to file-level retransmissions. Given the exponential growth in data sizes, this approach is not sustainable. Furthermore, while there has been considerable progress in understanding error codes and how to choose or create error codes that offer sturdy error protection, the Internet has not made use of this new science. We propose a set of new ideas that look at paths forward to reduce error rates and better protect big data. We also propose a new file transfer protocol that efficiently handles errors and minimizes retransmissions. 
    more » « less
  5. A cursory look at the Internet protocol stack shows error checking capability almost at every layer, and yet, a slowly growing set of results show that a surprising fraction of big data transfers over TCP/IP are failing. As we have dug into this problem, we have come to realize that nobody is paying much attention to the causes of transmission errors in the Internet. Rather, they have typically resorted to file-level retransmissions. Given the exponential growth in data sizes, this approach is not sustainable. Furthermore, while there has been considerable progress in understanding error codes and how to choose or create error codes that offer sturdy error protection, the Internet has not made use of this new science. We propose a set of new ideas that look at paths forward to reduce error rates and better protect big data. We also propose a new file transfer protocol that efficiently handles errors and minimizes retransmissions. 
    more » « less