skip to main content


Title: Generating Connected Synthetic Electronic Health Records and Social Media Data for Modeling and Simulation
Research and experimentation using big data sets, specifically large sets of electronic health records (EHR) and social media data, is demonstrating the potential to understand the spread of diseases and a variety of other issues. Applications of advanced algorithms, machine learning, and artificial intelligence indicate a potential for rapidly advancing improvements in public health. For example, several reports indicate that social media data can be used to predict disease outbreak and spread (Brown, 2015). Since real-world EHR data has complicated security and privacy issues preventing it from being widely used by researchers, there is a real need to synthetically generate EHR data that is realistic and representative. Current EHR generators, such as Syntheaä (Walonoski et al., 2018) only simulate and generate pure medical-related data. However, adding patients’ social media data with their simulated EHR data would make combined data more comprehensive and realistic for healthcare research. This paper presents a patients’ social media data generator that extends an EHR data generator. By adding coherent social media data to EHR data, a variety of issues can be examined for emerging interests, such as where a contagious patient may have been and others with whom they may have been in contact. Social media data, specifically Twitter data, is generated with phrases indicating the onset of symptoms corresponding to the synthetically generated EHR reports of simulated patients. This enables creation of an open data set that is scalable up to a big-data size, and is not subject to the security, privacy concerns, and restrictions of real healthcare data sets. This capability is important to the modeling and simulation community, such as scientists and epidemiologists who are developing algorithms to analyze the spread of diseases. It enables testing a variety of analytics without revealing real-world private patient information.  more » « less
Award ID(s):
1915780
NSF-PAR ID:
10291773
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Interservice/Industry Training, Simulation and Education Conference (I/ITSEC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control (ABAC) provides a dynamic and flexible solution that is effective for mediating access. We analyzed and implemented a prototype application of ABAC to large dataset processing in Amazon Web Services, using open-source versions of Apache Hadoop, Ranger, and Atlas. The Hadoop ecosystem is one of the most popular frameworks for large dataset processing and storage and is adopted by major cloud service providers. We conducted a rigorous analysis of cybersecurity in implementing ABAC policies in Hadoop, including developing a synthetic dataset of information at multiple sensitivity levels that realistically represents healthcare and connected social media data. We then developed Apache Spark programs that extract, connect, and transform data in a manner representative of a realistic use case. Our result is a framework for securing big data. Applying this framework ensures that serious cybersecurity concerns are addressed. We provide details of our analysis and experimentation code in a GitHub repository for further research by the community.

     
    more » « less
  2. Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so. 
    more » « less
  3. Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so. 
    more » « less
  4. This article presents a novel hardware-assisted distributed ledger-based solution for simultaneous device and data security in smart healthcare. This article presents a novel architecture that integrates PUF, blockchain, and Tangle for Security-by-Design (SbD) of healthcare cyber–physical systems (H-CPSs). Healthcare systems around the world have undergone massive technological transformation and have seen growing adoption with the advancement of Internet-of-Medical Things (IoMT). The technological transformation of healthcare systems to telemedicine, e-health, connected health, and remote health is being made possible with the sophisticated integration of IoMT with machine learning, big data, artificial intelligence (AI), and other technologies. As healthcare systems are becoming more accessible and advanced, security and privacy have become pivotal for the smooth integration and functioning of various systems in H-CPSs. In this work, we present a novel approach that integrates PUF with IOTA Tangle and blockchain and works by storing the PUF keys of a patient’s Body Area Network (BAN) inside blockchain to access, store, and share globally. Each patient has a network of smart wearables and a gateway to obtain the physiological sensor data securely. To facilitate communication among various stakeholders in healthcare systems, IOTA Tangle’s Masked Authentication Messaging (MAM) communication protocol has been used, which securely enables patients to communicate, share, and store data on Tangle. The MAM channel works in the restricted mode in the proposed architecture, which can be accessed using the patient’s gateway PUF key. Furthermore, the successful verification of PUF enables patients to securely send and share physiological sensor data from various wearable and implantable medical devices embedded with PUF. Finally, healthcare system entities like physicians, hospital admin networks, and remote monitoring systems can securely establish communication with patients using MAM and retrieve the patient’s BAN PUF keys from the blockchain securely. Our experimental analysis shows that the proposed approach successfully integrates three security primitives, PUF, blockchain, and Tangle, providing decentralized access control and security in H-CPS with minimal energy requirements, data storage, and response time. 
    more » « less
  5. Patient health records(PHRs) are crucial and sensitive as they contain essential information and are frequently shared among healthcare entities. This information must remain correct, up to date, private and accessible only to the authorized entities. Moreover, access must also be assured during health emergency crises such as the recent outbreak, which represents the greatest test of the flexibility and the efficiency of PHR sharing among healthcare providers, which ended up an immense interruption to the healthcare industry. Moreover, the right to privacy is the most fundamental right for a patient. Hence, the patient health records in the healthcare sector have faced issues with privacy breaches, insider outside attacks, and unauthorized access to crucial patients’ records. As a result, it pushes more patients to demand more control, security, and a smoother experience when they want to access their health records. Furthermore, the lack of interoperability among the healthcare system and providers and the added weight of cyber-attacks on an already overwhelmed system have called for an immediate solution. In this work, we developed a secured blockchain framework that safeguards patients’ full control over their health data which can be stored in their private IPFS and later shared with an authorized provider. Furthermore, the system ensures privacy and security while handling patient data, which can only be shared with the patients. The proposed Security and privacy analysis show promising results in providing time savings, enhanced confidentiality, and less disruption in patient-provider interactions. 
    more » « less