Downstream of Cape Hatteras, the vigorously meandering Gulf Stream forms anticyclonic warm core rings (WCRs) that carry warm Gulf Stream and Sargasso Sea waters into the cooler, fresher Slope Sea, and forms cyclonic cold core rings (CCRs) that carry Slope Sea waters into the Sargasso Sea. The Northwest Atlantic shelf and open ocean off the U.S. East Coast have experienced dramatic changes in ocean circulation and water properties in recent years, with significant consequences for marine ecosystems and coastal communities. Some of these changes may be related to a reported regime shift in the number of WCRs formed annually, with a doubling of WCRs shed after 2000. Since the regime shift was detected using a regional eddy‐tracking product, primarily based on sea surface temperatures and relies on analyst skill, we examine three global eddy‐tracking products as an automated and potentially more objective way to detect changes in Gulf Stream rings. Currently, global products rely on altimeter‐measured sea surface height (SSH), with WCRs registering as sea surface highs and CCRs as lows. To identify eddies, these products use either SSH contours or a Lagrangian approach, with particles seeded in satellite‐based surface geostrophic velocity fields. This study confirms the three global products are not well suited for statistical analysis of Gulf Stream rings and suggests that automated WCR identification and tracking comes at the price of accurate identification and tracking. Furthermore, a shift to a higher energy state is detected in the Northwest Atlantic, which coincides with the regime shift in WCRs.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Free, publicly-accessible full text available October 1, 2025 -
Racist structures in STEM education must be interrogated and disrupted to foster equity and social change (McGee, 2020; Rankin et al, 2021). To that end, we use a qualitative case study method to explore the institutional logics of equity, inclusivity, and excellence enacted by chairs, faculty, and staff within a network of computer science departments at Hispanic-Serving Institutions. Drawing on surveys, interviews, and participation observation of 24 computer science departments, we examine ways that institutional agents disrupted the dominant narratives of exclusivity and meritocracy within the discipline by enacting and sustaining inclusive culture and values.more » « lessFree, publicly-accessible full text available April 12, 2025
-
Abstract Previous research has shown that female and Hispanic students who are underrepresented in science, technology, engineering and mathematics (STEM) face more educational barriers than their non-Hispanic, male peers. However, little research has been conducted on the effects of intersectional identities in the STEM space. In an effort to bridge the gap in underrepresented students' experience, the PSEG Institute for Sustainability Studies organizes a paid, interdisciplinary, team-based, experiential learning and internship program called the Green Teams that occurs during 10 weeks of the summer. The Green Teams Program strives to provide undergraduate students from all backgrounds–academically, economically, and demographically–an opportunity to develop their abilities in STEM fields and prepare them to enter the professional world. Based upon a survey given post-internship, self-reported learning gains for all students were analyzed to determine if the program had a significantly greater impact on students who are from groups traditionally underrepresented in STEM in their STEM-related learning gains and their confidence in STEM disciplines. Through t-tests, a Principal Component Analysis (PCA), and a 2-way factorial Analysis of Variance (ANOVA), Hispanic and female participants were found to report significantly higher learning gains than their counterparts in multiple STEM areas from increased tolerance for obstacles to gains in self confidence. The results of the study suggest Hispanic and female students benefit from paid work experiences in STEM with diverse peers and intentional, supportive mentoring. This research on the Green Teams Program provides insight into how this approach positively impacts STEM education of individuals from traditionally underrepresented groups in STEM. The findings may help to further guide the development of the Green Teams Program and the adoption of paid, interdisciplinary, team-based, experiential learning and internship experiences in additional academic STEM settings.
Free, publicly-accessible full text available July 15, 2025 -
The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. We further show that ILF can be seen as a form of minimizing the KL divergence to the target distribution and demonstrate proof-of-concepts on text summarization and program synthesis tasks. For code generation, ILF improves a Codegen-Mono 6.1B model’s pass@1 rate from 22% to 36% on the MBPP benchmark, outperforming both fine-tuning on MBPP and on human- written repaired programs. For summarization, we show that ILF can be combined with learning from human preferences to improve a GPT-3 model’s summarization performance to be comparable to human quality, outperforming fine-tuning on human-written summaries. Overall, our results suggest that ILF is both more effective and sample-efficient than training exclusively on demonstrations for improving an LLM’s performance on a variety of tasks.more » « lessFree, publicly-accessible full text available February 28, 2025
-
Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.more » « less
-
Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.more » « lessFree, publicly-accessible full text available February 28, 2025
-
Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). It is tempting to interpret these CoT explanations as the LLM’s process for solving a task. This level of transparency into LLMs’ predictions would yield significant safety benefits. However, we find that CoT explanations can systematically misrepresent the true reason for a model’s prediction. We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs—e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always “(A)”—which models systematically fail to mention in their explanations. When we bias models toward incorrect answers, they frequently generate CoT explanations rationalizing those answers. This causes accuracy to drop by as much as 36% on a suite of 13 tasks from BIG-Bench Hard, when testing with GPT-3.5 from OpenAI and Claude 1.0 from Anthropic. On a social-bias task, model explanations justify giving answers in line with stereotypes without mentioning the influence of these social biases. Our findings indicate that CoT explanations can be plausible yet misleading, which risks increasing our trust in LLMs without guaranteeing their safety. Building more transparent and explainable systems will require either improving CoT faithfulness through targeted efforts or abandoning CoT in favor of alternative methods.more » « less
-
Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.more » « less
-
Abstract The Northwest Atlantic, which has exhibited evidence of accelerated warming compared to the global ocean, also experienced several notable marine heatwaves (MHWs) over the last decade. We analyze spatiotemporal patterns of surface and subsurface temperature structure across the Northwest Atlantic continental shelf and slope to assess the influences of atmospheric and oceanic processes on ocean temperatures. Here we focus on MHWs from 2015/16 and examine their physical drivers using observational and reanalysis products. We find that a combination of jet stream latitudinal position and ocean advection, mainly due to warm core rings shed by the Gulf Stream, plays a role in MHW development. While both atmospheric and oceanic drivers can lead to MHWs they have different temperature signatures with each affecting the vertical structure differently and horizontal spatial patterns of a MHW. Northwest Atlantic MHWs have significant socio-economic impacts and affect commercially important species such as squid and lobster.