skip to main content


Title: All Eyes On Me: Inside Third Party Trackers' Exfiltration of PHI from Healthcare Providers' Online Systems
In the United States, sensitive health information is protected under the Health Insurance Portability and Accountability Act (HIPAA). This act limits the disclosure of Protected Health Information (PHI) without the patient’s consent or knowledge. However, as medical care becomes web-integrated, many providers have chosen to use third-party web trackers for measurement and marketing purposes. This presents a security concern: third-party JavaScript requested by an online healthcare system can read the website’s contents, and ensuring PHI is not unintentionally or maliciously leaked becomes difficult. In this paper, we investigate health information breaches in online medical records, focusing on 459 online patient portals and 4 telehealth websites. We find 14% of patient portals include Google Analytics, which reveals (at a minimum) the fact that the user visited the health provider website, while 5 portals and 4 telehealth websites con- tained JavaScript-based services disclosing PHI, including medications and lab results, to third parties. The most significant PHI breaches were on behalf of Google and Facebook trackers. In the latter case, an estimated 4.5 million site visitors per month were potentially exposed to leaks of personal information (names, phone numbers) and medical information (test results, medications). We notified healthcare providers of the PHI breaches and found only 15.7% took action to correct leaks. Healthcare operators lacked the technical expertise to identify PHI breaches caused by third-party trackers. After notifying Epic, a healthcare portal vendor, of the PHI leaks, we received a prompt response and observed extensive mitigation across providers, suggesting vendor notification is an effective intervention against PHI disclosures.  more » « less
Award ID(s):
1903612
NSF-PAR ID:
10380003
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 21st Workshop on Privacy in the Electronic Society
Page Range / eLocation ID:
197 to 211
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so. 
    more » « less
  2. Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so. 
    more » « less
  3. null (Ed.)
    Hospitalization of patients with chronic diseases poses a significant burden on the healthcare system. Frequent hospitalization can be partially attributed to the failure of healthcare providers to engage effectively with their patients. Recently, patient portals have become popular as information technology (IT) platforms that provide patients with online access to their medical records and help them engage effectively with healthcare providers. Despite the popularity of these portals, there is a paucity of research on the impact of patient–provider engagement on patients’ health outcomes. Drawing on the theory of effective use, we examine the association between portal use and the incidence of subsequent patient hospitalizations, based on a unique, longitudinal dataset of patients’ portal use, across a 12-year period at a large academic medical center in North Texas. Our results indicate that portal use is associated with improvements in patient health outcomes along multiple dimensions, including the frequency of hospital and emergency visits, readmission risk, and length of stay. This is one of the first studies to conduct a large-scale, longitudinal analysis of a health IT system and its effect on individual level health outcomes. Our results highlight the need for technologies that can improve patient–provider engagement and improve overall health outcomes for chronic disease management. 
    more » « less
  4. null (Ed.)
    Monetizing websites and web apps through online advertising is widespread in the web ecosystem, creating a billion-dollar market. This has led to the emergence of a vast network of tertiary ad providers and ad syndication to facilitate this growing market. Nowadays, the online advertising ecosystem forces publishers to integrate ads from these third-party domains. On the one hand, this raises several privacy and security concerns that are actively being studied in recent years. On the other hand, the ability of today's browsers to load dynamic web pages with complex animations and Javascript has also transformed online advertising. This can have a significant impact on webpage performance. The latter is a critical metric for optimization since it ultimately impacts user satisfaction. Unfortunately, there are limited literature studies on understanding the performance impacts of online advertising which we argue is as important as privacy and security. In this paper, we apply an in-depth and first-of-a-kind performance evaluation of web ads. Unlike prior efforts that rely primarily on adblockers, we perform a fine-grained analysis on the web browser's page loading process to demystify the performance cost of web ads. We aim to characterize the cost by every component of an ad, so the publisher, ad syndicate, and advertiser can improve the ad's performance with detailed guidance. For this purpose, we develop a tool, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities. Our evaluations show that online advertising entails more than 15% of browser page loading workload and approximately 88% of that is spent on JavaScript. On smartphones, this additional cost of ads is 7% lower since mobile pages include fewer and well-optimized ads. We also track the sources and delivery chain of web ads and analyze performance considering the origin of the ad contents. We observe that 2 of the well-known third-party ad domains contribute to 35% of the ads performance cost and surprisingly, top news websites implicitly include unknown third-party ads which in some cases build up to more than 37% of the ads performance cost. 
    more » « less
  5. The migration to electronic health records (EHR) in the healthcare industry has raised issues with respect to security and privacy. One issue that has become a concern for healthcare providers, insurance companies, and pharmacies is patient health information (PHI) leaks because PHI leaks can lead to violation of privacy laws, which protect the privacy of individuals’ identifiable health information, potentially resulting in a healthcare crisis. This study explores the issue of PHI leaks from an access control viewpoint. We utilize access control policies and PHI leak scenarios derived from semi structured interviews with four healthcare practitioners and use the lens of activity theory to articulate the design of an access control model for detecting and mitigating PHI leaks. Subsequently, we follow up with a prototype as a proof of concept. 
    more » « less