Abstract Understanding propagation of scintillation light is critical for maximizing the discovery potential of next-generation liquid xenon detectors that use dual-phase time projection chamber technology. This work describes a detailed optical simulation of the DARWIN detector implemented using Chroma, a GPU-based photon tracking framework. To evaluate the framework and to explore ways of maximizing efficiency and minimizing the time of light collection, we simulate several variations of the conventional detector design. Results of these selected studies are presented. More generally, we conclude that the approach used in this work allows one to investigate alternative designs faster and in more detail than using conventional Geant4 optical simulations, making it an attractive tool to guide the development of the ultimate liquid xenon observatory. 
                        more » 
                        « less   
                    
                            
                            CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
                        
                    
    
            Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the personas that they aim to simulate, failing to capture the multidimensionality of people and perpetuating stereotypes. To bridge these gaps, we present CoMPosT, a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. We use this framework to measure open-ended LLM simulations’ susceptibility to caricature, defined via two criteria: individuation and exaggeration. We evaluate the level of caricature in scenarios from existing work on LLM simulations. We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2247357
- PAR ID:
- 10506660
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Journal Name:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Page Range / eLocation ID:
- 10853 to 10875
- Format(s):
- Medium: X
- Location:
- Singapore
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Practitioners frequently take multiple samples from large language models (LLMs) to explore the distribution of completions induced by a given prompt. While individual samples can give high-quality results for given tasks, collectively there are no guarantees of the distribution over these samples induced by the generating LLM. In this paper, we empirically evaluate LLMs’ capabilities as distribution samplers. We identify core concepts and metrics underlying LLM-based sampling, including different sampling methodologies and prompting strategies. Using a set of controlled domains we evaluate the error and variance of the distributions induced by the LLM. We find that LLMs struggle to induce reasonable distributions over generated elements, suggesting that practitioners should more carefully consider the semantics and methodologies of sampling from LLMs.more » « less
- 
            Automated assessment of open responses in K–12 science education poses significant challenges due to the multimodal nature of student work, which often integrates textual explanations, drawings, and handwritten elements. Traditional evaluation methods that focus solely on textual analysis fail to capture the full breadth of student reasoning and are susceptible to biases such as handwriting neatness or answer length. In this paper, we propose a novel LLM-augmented multimodal evaluation framework that addresses these limitations through a comprehensive, bias-corrected grading system. Our approach leverages LLMs to generate causal knowledge graphs that encapsulate the essential conceptual relationships in student responses, comparing these graphs with those derived automatically from the rubrics and submissions. Experimental results demonstrate that our framework improves grading accuracy and consistency over deep supervised learning and few-shot LLM baselines.more » « less
- 
            Coastal landscape change represents aggregated sediment transport gradients from spatially and temporally variable marine and aeolian forces. Numerous tools exist that independently simulate subaqueous and subaerial coastal profile change in response to these physical forces on a range of time scales. In this capacity, coastal foredunes have been treated primarily as wind-driven features. However, there are several marine controls on coastal foredune growth, such as sediment supply and moisture effects on aeolian processes. To improve understanding of interactions across the land-sea interface, here the development of the new Windsurf-coupled numerical modeling framework is presented. Windsurf couples standalone subaqueous and subaerial coastal change models to simulate the co-evolution of the coastal zone in response to both marine and aeolian processes. Windsurf is applied to a progradational, dissipative coastal system in Washington, USA, demonstrating the ability of the model framework to simulate sediment exchanges between the nearshore, beach, and dune for a one-year period. Windsurf simulations generally reproduce observed cycles of seasonal beach progradation and retreat, as well as dune growth, with reasonable skill. Exploratory model simulations are used to further explore the implications of environmental forcing variability on annual-scale coastal profile evolution. The findings of this work support the hypothesis that there are both direct and indirect oceanographic and meteorological controls on coastal foredune progradation, with this new modeling tool providing a new means of exploring complex morphodynamic feedback mechanisms.more » « less
- 
            Recent innovation in large language models (LLMs), and their myriad use cases have rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and other enterprises plan to substantially grow their datacenter capacity to support these new workloads. A key bottleneck resource in datacenters is power, which LLMs are quickly saturating due to their rapidly increasing model sizes.We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the training and inference power consumption patterns. Based on our analysis, we claim that the average and peak power utilization in LLM inference clusters should not be very high. Our deductions align with data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment make it challenging to build a reliable and robust power management framework.We leverage the insights from our characterization to identify opportunities for better power management. As a detailed use case, we propose a new framework called POLCA, which enables power oversubscription in LLM inference clouds. POLCA is robust, reliable, and readily deployable. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in existing clusters with minimal performance loss.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
