skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Opening a conversation on responsible environmental data science in the age of large language models
Abstract The general public and scientific community alike are abuzz over the release of ChatGPT and GPT-4. Among many concerns being raised about the emergence and widespread use of tools based on large language models (LLMs) is the potential for them to propagate biases and inequities. We hope to open a conversation within the environmental data science community to encourage the circumspect and responsible use of LLMs. Here, we pose a series of questions aimed at fostering discussion and initiating a larger dialogue. To improve literacy on these tools, we provide background information on the LLMs that underpin tools like ChatGPT. We identify key areas in research and teaching in environmental data science where these tools may be applied, and discuss limitations to their use and points of concern. We also discuss ethical considerations surrounding the use of LLMs to ensure that as environmental data scientists, researchers, and instructors, we can make well-considered and informed choices about engagement with these tools. Our goal is to spark forward-looking discussion and research on how as a community we can responsibly integrate generative AI technologies into our work.  more » « less
Award ID(s):
2125921
PAR ID:
10548869
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Cambridge University Press (CUP)
Date Published:
Journal Name:
Environmental Data Science
Volume:
3
ISSN:
2634-4602
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Newcomers onboarding to Open Source Software (OSS) projects face many challenges. Large Language Models (LLMs), like ChatGPT, have emerged as potential resources for answering questions and providing guidance, with many developers now turning to ChatGPT over traditional Q&A sites like Stack Overflow. Nonetheless, LLMs may carry biases in presenting information, which can be especially impactful for newcomers whose problem-solving styles may not be broadly represented. This raises important questions about the accessibility of AI-driven support for newcomers to OSS projects. This vision paper outlines the potential of adapting AI responses to various problem-solving styles to avoid privileging a particular subgroup. We discuss the potential of AI persona-based prompt engineering as a strategy for interacting with AI. This study invites further research to refine AI-based tools to better support contributions to OSS projects. 
    more » « less
  2. Abstract Synthesis research in ecology and environmental science improves understanding, advances theory, identifies research priorities, and supports management strategies by linking data, ideas, and tools. Accelerating environmental challenges increases the need to focus synthesis science on the most pressing questions. To leverage input from the broader research community, we convened a virtual workshop with participants from many countries and disciplines to examine how and where synthesis can address key questions and themes in ecology and environmental science in the coming decade. Seven priority research topics emerged: (1) diversity, equity, inclusion, and justice (DEIJ), (2) human and natural systems, (3) actionable and use‐inspired science, (4) scale, (5) generality, (6) complexity and resilience, and (7) predictability. Additionally, two issues regarding the general practice of synthesis emerged: the need for increased participant diversity and inclusive research practices; and increased and improved data flow, access, and skill‐building. These topics and practices provide a strategic vision for future synthesis in ecology and environmental science. 
    more » « less
  3. Abstract It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operation and maintenance. In this overview, the history of and foundational thinking around NEON are discussed. A framework of open science is described with a discussion of how NEON can be situated as part of a larger data constellation—across existing networks and different suites of ecological measurements and sensors. Next, a synthesis of early NEON science, based on >100 existing publications, funded proposal efforts, and emergent science at the very first NEON Science Summit (hosted by Earth Lab at the University of Colorado Boulder in October 2019) is provided. Key questions that the ecology community will address with NEON data in the next 10 yr are outlined, from understanding drivers of biodiversity across spatial and temporal scales to defining complex feedback mechanisms in human–environmental systems. Last, the essential elements needed to engage and support a diverse and inclusive NEON user community are highlighted: training resources and tools that are openly available, funding for broad community engagement initiatives, and a mechanism to share and advertise those opportunities. NEON users require both the skills to work with NEON data and the ecological or environmental science domain knowledge to understand and interpret them. This paper synthesizes early directions in the community’s use of NEON data, and opportunities for the next 10 yr of NEON operations in emergent science themes, open science best practices, education and training, and community building. 
    more » « less
  4. The advanced capabilities of Large Language Models (LLMs) have made them invaluable across various applications, from conversational agents and content creation to data analysis, research, and innovation. However, their effectiveness and accessibility also render them susceptible to abuse for generating malicious content, including phishing attacks. This study explores the potential of using four popular commercially available LLMs, i.e., ChatGPT (GPT 3.5 Turbo), GPT 4, Claude, and Bard, to generate functional phishing attacks using a series of malicious prompts. We discover that these LLMs can generate both phishing websites and emails that can convincingly imitate well-known brands and also deploy a range of evasive tactics that are used to elude detection mechanisms employed by anti-phishing systems. These attacks can be generated using unmodified or "vanilla" versions of these LLMs without requiring any prior adversarial exploits such as jailbreaking. We evaluate the performance of the LLMs towards generating these attacks and find that they can also be utilized to create malicious prompts that, in turn, can be fed back to the model to generate phishing scams - thus massively reducing the prompt-engineering effort required by attackers to scale these threats. As a countermeasure, we build a BERT-based automated detection tool that can be used for the early detection of malicious prompts to prevent LLMs from generating phishing content. Our model is transferable across all four commercial LLMs, attaining an average accuracy of 96% for phishing website prompts and 94% for phishing email prompts. We also disclose the vulnerabilities to the concerned LLMs, with Google acknowledging it as a severe issue. Our detection model is available for use at Hugging Face, as well as a ChatGPT Actions plugin. 
    more » « less
  5. Olney, AM; Chounta, IA; Liu, Z; Santos, OC; Bittencourt, II (Ed.)
    An advantage of Large Language Models (LLMs) is their contextualization capability – providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT, which offers students contextualized reflection triggers during a collaborative query optimization task in database design. We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context. We discuss in depth the exploration of the design space of the triggers and their correspondence with the learning objectives as well as the impact on student learning in a pilot study with 34 students. 
    more » « less