Identifying privacy-sensitive data leaks by mobile applications has been a topic of great research interest for the past decade. Technically, such data flows are not “leaks” if they are disclosed in a privacy policy. To address this limitation in automated analysis, recent work has combined program analysis of applications with analysis of privacy policies to determine the flow-to-policy consistency, and hence violations thereof. However, this prior work has a fundamental weakness: it does not differentiate the entity (e.g., first-party vs. third-party) receiving the privacy-sensitive data. In this paper, we propose POLICHECK, which formalizes and implements an entity-sensitive flow-to-policy consistency model. We use POLICHECK to study 13,796 applications and their privacy policies and find that up to 42.4% of applications either incorrectly disclose or omit disclosing their privacy-sensitive data flows. Our results also demonstrate the significance of considering entities: without considering entity, prior approaches would falsely classify up to 38.4% of applications as having privacy-sensitive data flows consistent with their privacy policies. These false classifications include data flows to third-parties that are omitted (e.g., the policy states only the first-party collects the data type), incorrect (e.g., the policy states the third-party does not collect the data type), and ambiguous (e.g., the policy has conflicting statements about the data type collection). By defining a novel automated, entity-sensitive flow-to-policy consistency analysis, POLICHECK provides the highest-precision method to date to determine if applications properly disclose their privacy-sensitive behaviors.
more »
« less
This content will become publicly available on December 1, 2026
A large language model-based tool for identifying relationships to industry in research on the carcinogenicity of benzene, cobalt, and aspartame
Industry-funded research poses a threat to the validity of scientific inference on carcinogenic hazards. Scientists require tools to better identify and characterize industry sponsored research across bodies of evidence to reduce the possible influence of industry bias in evidence synthesis reviews. We applied a novel large language model (LLM)-based tool named InfluenceMapper to demonstrate and evaluate its performance in identifying relationships to industry in research on the carcinogenicity of benzene, cobalt, and aspartame. MethodsAll epidemiological, animal cancer, and mechanistic studies included in systematic reviews on the carcinogenicity of the three agents by theIARC Monographsprogramme. Selected agents were recently evaluated by theMonographsand are of commercial interest by major industries. InfluenceMapper extracted disclosed entities in study publications and classified up to 40 possible disclosed relationship types between each entity and the study and between each entity and author. A human classified entities as ‘industry or industry-funded’ and assessed relationships with industry for potential conflicts of interest. Positive predictive values described the extent of true positive relationships identified by InfluenceMapper compared to human assessment. ResultsAnalyses included 2,046 studies for all three agents. We identified 320 disclosed industry or industry-funded entities from InfluenceMapper output that were involved in 770 distinct study-entity and author-entity relationships. For each agent, between 4 and 8% of studies disclosed funding by industry and 1–4% of studies had at least one author who disclosed receiving industry funding directly. Industry trade associations for all three agents funded 22 studies published in 16 journals over a 37-year span. Aside from funding, the most prevalent disclosed relationships with industry were receiving data, holding employment, paid consulting, and providing expert testimony. Positive predictive values were excellent (≥ 98%) for study-entity relationships but declined for relationships with individual authors. ConclusionsLLM-based tools can significantly expedite and bolster the detection of disclosed conflicts of interest from industry sponsored research in cancer prevention. Possible use cases include facilitating the assessment of bias from industry studies in evidence synthesis reviews and alerting scientists to the influence of industry on scientific inference. Persistent challenges in ascertaining conflicts of interest underscore the urgent need for standardized, transparent, and enforceable disclosures in biomedical journals.
more »
« less
- Award ID(s):
- 2147334
- PAR ID:
- 10649622
- Publisher / Repository:
- Springer
- Date Published:
- Journal Name:
- Environmental Health
- Volume:
- 24
- Issue:
- 1
- ISSN:
- 1476-069X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
BACKGROUND Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production challenges the conventional scienti c feedback mechanisms. High-quality peer reviews are increasingly dif cult to obtain. METHODS We created an automated pipeline using Generative Pretrained Transformer 4 (GPT-4) to provide comments on scienti c papers. We evaluated the quality of GPT-4’s feedback through two large-scale studies. We rst quantitatively compared GPT-4’s gen- erated feedback with human peer reviewers’ feedback in general scienti c papers from 15 Nature family journals (3096 papers in total) and the International Conference on Learning Representations (ICLR) machine learning conference (1709 papers). To speci - cally assess GPT-4’s performance on biomedical papers, we also analyzed a subset of 425 health sciences papers from the Nature portfolio and a random sample of 666 sub- missions to eLife. Additionally, we conducted a prospective user study with 308 research- ers from 110 institutions in the elds of arti cial intelligence and computational biology to understand how researchers perceive feedback generated by our system on their own papers. RESULTS The overlap in the points raised by GPT-4 and by human reviewers (average overlap of 30.85% for Nature journals and 39.23% for ICLR) is comparable with the over- lap between two human reviewers (average overlap of 28.58% for Nature journals and 35.25% for ICLR). Results on eLife and a subset of health sciences papers as categorized by the Nature portfolio show similar patterns. In our prospective user study, more than half (57.4%) of the users found GPT-4–generated feedback helpful/very helpful, and 82.4% found it more bene cial than feedback from at least some human reviewers. We also identify several limitations of large language model (LLM)–generated feedback. CONCLUSIONS Through both retrospective and prospec- tive evaluation, we nd substantial overlap between LLM and human feedback as well as positive user perceptions regarding the usefulness of LLM feedback. Although human expert review should continue to be the foundation of the scienti c process, LLM feedback could bene t researchers, especially when timely expert feedback is not available and in earlier stages of manuscript preparation. (Funded by the Chan–Zuckerberg Initiative and the Stanford Interdisciplin- ary Graduate Fellowship.)more » « less
-
Abstract Ecologists—especially those new to the field—are tasked with finding relevant literature matching their research interests and deciding upon a suitable venue for the publication of their work. To provide a roadmap for early career researchers to identify journals aligned with their interests, we analyzed major research themes found across the top 30 ecology journals and three high‐impact multi‐disciplinary journals (Nature, PNAS,andScience), utilizing an automated content analysis (ACA) of 84,841 article abstracts, titles, and author keywords published over the last four decades. Journals clustered into 10 distinct groups based on 46 research themes identified byACA. We examined the frequency of ecological themes in each of these journal groups to identify the journals most associated with each theme. We found three themes (anthropogenic impacts, disease,andtraits) that occurred at a high frequency in the high‐impact multi‐disciplinary journal group containingNature, PNAS,andScience. Themes that increased in frequency over the last four decades, such asclimate change, traits, anthropogenic, andcellular biology, were found more often in journals with higher impact factors, indicating that emerging research themes in ecology will likely become of interest to a broader readership over time. Our study provides a thematic review as a potential roadmap for junior ecologists to browse and publish journal articles.more » « less
-
Abstract Healthcare industry players make payments to medical providers for non-research expenses. While these payments may pose conflicts of interest, their relationship with overall healthcare costs remains largely unknown. In this study, we linked Open Payments data on providers’ industry payments with Medicare data on healthcare costs. We investigated 374,766 providers’ industry payments and healthcare costs. We demonstrate that providers receiving higher amounts of industry payments tend to bill higher drug and medical costs. Specifically, we find that a 10% increase in industry payments is associated with 1.3% higher medical and 1.8% higher drug costs. For a typical provider, for example, a 10% or $25 increase in annual industry payments would be associated with approximately $1,100 higher medical costs and $100 higher drug costs. Furthermore, the association between payments and healthcare costs varies markedly across states and correlates with political leaning, being stronger in more conservative states.more » « less
-
AbstractEfforts to reach net zero targets by the second half of the century will have profound materials supply implications. The anticipated scale and speed of the energy transition in both transportation and energy storage raises the question of whether we risk running out of the essential critical materials needed to enable this transition. Early projections suggest that disruptions are likely to occur in the short term for select critical materials, but at the same time these shortages provide a powerful incentive for the market to respond in a variety of ways before supply-level stress becomes dire. In April 2023, the MRSFocus on Sustainability subcommitteesponsored a panel discussion on the role of innovation in materials science and engineering in supporting supply chains for clean energy technologies. Drawing on examples from the panel discussion, this perspective examines the myth of materials scarcity, explains the compelling need for innovation in materials in helping supply chains dynamically adapt over time, and illustrates how the Materials Research Society is facilitating engagement with industry to support materials innovation, now and in the future. Graphical Abstract HighlightsIn this commentary, we examine the myth of materials scarcity, explain the compelling need for innovation in materials in helping supply chains dynamically adapt over time, and show how the materials research community can effectively engage with industry, policymakers, and funding agencies to drive the needed innovation in critical areas. DiscussionDemand for certain materials used in clean energy technologies is forecasted to increase by multiples of current production over the next decades. This has drawn attention to supply chain risks and has created a myth that we will “run out” out of certain materials during the energy transition. The reality is that markets have multiple mechanisms to adapt over the long-term, and near-term shortages or expectations of shortages provide a powerful incentive for action. In this commentary, we highlight different ways materials innovation can help solve these issues in the near term and long term, and how the materials research community can effectively engage with industry and policymakers.more » « less
An official website of the United States government
