Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique and factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities.more » « lessFree, publicly-accessible full text available July 16, 2026
- 
            Developing prompt-based methods with Large Language Models (LLMs) requires making numerous decisions, which give rise to a combinatorial search problem over hyper-parameters. This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an adaptive approach to explore this space. We are exploiting the fact that often only few samples are needed to identify clearly superior or inferior settings, and that many evaluation tests are highly correlated. We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate and utilize low-rank matrix factorization to fill in missing evaluations. We carefully assess the efficacy of our approach on several competitive benchmark problems and show that it can identify the top-performing method using only 5-15% of the typical resources—resulting in 85-95% LLM cost savings. Our code is available at https://github.com/kilian-group/banditeval.more » « lessFree, publicly-accessible full text available June 11, 2026
- 
            Simulations of nuclear magnetic resonance (NMR) experiments can be an important tool for extracting information about molecular structure and optimizing experimental protocols but are often intractable on classical computers for large molecules such as proteins and for protocols such as zero-field NMR. We demonstrate the first quantum simulation of an NMR spectrum, computing the zero-field spectrum of the methyl group of acetonitrile using four qubits of a trapped-ion quantum computer. We reduce the sampling cost of the quantum simulation by an order of magnitude using compressed sensing techniques. We show how the intrinsic decoherence of NMR systems may enable the zero-field simulation of classically hard molecules on relatively near-term quantum hardware and discuss how the experimentally demonstrated quantum algorithm can be used to efficiently simulate scientifically and technologically relevant solid-state NMR experiments on more mature devices. Our work opens a practical application for quantum computation.more » « less
- 
            The information content of crystalline materials becomes astronomical when collective electronic behavior and their fluctuations are taken into account. In the past decade, improvements in source brightness and detector technology at modern X-ray facilities have allowed a dramatically increased fraction of this information to be captured. Now, the primary challenge is to understand and discover scientific principles from big datasets when a comprehensive analysis is beyond human reach. We report the development of an unsupervised machine learning approach, X-ray diffraction (XRD) temperature clustering (X-TEC), that can automatically extract charge density wave order parameters and detect intraunit cell ordering and its fluctuations from a series of high-volume X-ray diffraction measurements taken at multiple temperatures. We benchmark X-TEC with diffraction data on a quasi-skutterudite family of materials, (Ca x Sr 1 − x ) 3 Rh 4 Sn 13 , where a quantum critical point is observed as a function of Ca concentration. We apply X-TEC to XRD data on the pyrochlore metal, Cd 2 Re 2 O 7 , to investigate its two much-debated structural phase transitions and uncover the Goldstone mode accompanying them. We demonstrate how unprecedented atomic-scale knowledge can be gained when human researchers connect the X-TEC results to physical principles. Specifically, we extract from the X-TEC–revealed selection rules that the Cd and Re displacements are approximately equal in amplitude but out of phase. This discovery reveals a previously unknown involvement of 5 d 2 Re, supporting the idea of an electronic origin to the structural order. Our approach can radically transform XRD experiments by allowing in operando data analysis and enabling researchers to refine experiments by discovering interesting regions of phase space on the fly.more » « less
- 
            Variational approaches are among the most powerful techniques toapproximately solve quantum many-body problems. These encompass bothvariational states based on tensor or neural networks, and parameterizedquantum circuits in variational quantum eigensolvers. However,self-consistent evaluation of the quality of variational wavefunctionsis a notoriously hard task. Using a recently developed Hamiltonianreconstruction method, we propose a multi-faceted approach to evaluatingthe quality of neural-network based wavefunctions. Specifically, weconsider convolutional neural network (CNN) and restricted Boltzmannmachine (RBM) states trained on a square latticespin-1/2 J_1\!-\!J_2 Heisenberg model. We find that the reconstructed Hamiltonians aretypically less frustrated, and have easy-axis anisotropy near the highfrustration point. In addition, the reconstructed Hamiltonians suppressquantum fluctuations in the largeJ_2 limit. Our results highlight the critical importance of thewavefunction’s symmetry. Moreover, the multi-faceted insight from theHamiltonian reconstruction reveals that a variational wave function canfail to capture the true ground state through suppression of quantumfluctuations.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available