NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Selective Inference with Distributed Data

Liu, S; Panigrahi, S (January 2025, Journal of Machine Learning Research)
Loh, Po-Ling (Ed.)
When data are distributed across multiple sites or machines rather than centralized in one location, researchers face the challenge of extracting meaningful information without directly sharing individual data points. While there are many distributed methods for point estimation using sparse regression, few options are available for estimating uncertainties or conducting hypothesis tests based on the estimated sparsity. In this paper, we introduce a procedure for performing selective inference with distributed data. We consider a scenario where each local machine solves a lasso problem and communicates the selected predictors to a central machine. The central machine then aggregates these selected predictors to form a generalized linear model (GLM). Our goal is to provide valid inference for the selected GLM while reusing data that have been used in the model selection process. Our proposed procedure only requires low-dimensional summary statistics from local machines, thus keeping communication costs low and preserving the privacy of individual data sets. Furthermore, this procedure can be applied in scenarios where model selection is repeatedly conducted on randomly subsampled data sets, addressing the p-value lottery problem linked with model selection. We demonstrate the effectiveness of our approach through simulations and an analysis of a medical data set on ICU admissions.
more » « less
Free, publicly-accessible full text available January 1, 2026
ALMAGAL: IV. Morphological comparison of molecular and thermal dust emission using the histogram of oriented gradients method

https://doi.org/10.1051/0004-6361/202452700

Mininni, C; Molinari, S; Soler, J D; Sánchez-Monge, Á; Coletta, A; Benedettini, M; Traficante, A; Schisano, E; Elia, D; Pezzuto, S; et al (July 2025, Astronomy & Astrophysics)

Context. The study of molecular line emission is crucial to unveil the kinematics and the physical conditions of gas in star-forming regions. We use data from the ALMAGAL survey, which provides an unprecedentedly large statistical sample of high-mass star-forming clumps that helps us to remove bias and reduce noise (e.g., due to source peculiarities, selection, or environmental effects) to determine how well individual molecular species trace continuum emission. Aims. Our aim is to quantify whether individual molecular transitions can be used reliably to derive the physical properties of the bulk of the H₂gas, by considering morphological correlations in their overall integrated molecular line emission with the cold dust. We selected transitions of H₂CO, CH₃OH, DCN, HC₃N, CH₃CN, CH₃OCHO, SO, and SiO and compared them with the 1.38 mm dust continuum emission at different spatial scales in the ALMAGAL sample. We included two transitions of H₂CO to understand whether the validity of the results depends on the excitation condition of the selected transition of a molecular species. The ALMAGAL project observed more than 1000 candidate high-mass star-forming clumps in ALMA band 6 at a spatial resolution down to 1000 au. We analyzed a total of 1013 targets that cover all evolutionary stages of the high-mass star formation process and different conditions of clump fragmentation. Methods. For the first time, we used the method called histogram of oriented gradients (HOG) as implemented in the toolastroHOGon a large statistical sample to compare the morphology of integrated line emission with maps of the 1.38 mm dust continuum emission. For each clump, we defined two masks: the first mask covered the extended more diffuse continuum emission, and the second smaller mask that only contained the compact sources. We selected these two masks to study whether and how the correlation among the selected molecules changes with the spatial scale of the emission, from extended more diffuse gas in the clumps to denser gas in compact fragments (cores). Moreover, we calculated the Spearman correlation coefficient and compared it with our astroHOG results. Results. Only H₂CO, CH₃OH, and SO of the molecular species we analyzed show emission on spatial scales that are comparable with the diffuse 1.38 mm dust continuum emission. However, according the HOG method, the median correlation of the emission of each of these species with the continuum is only ~24–29%. In comparison with the dusty dense fragments, these molecular species still have low correlation values that are below 45% on average. The weak morphological correlation suggests that these molecular lines likely trace the clump medium or outer layers around dense fragments on average (in some cases, this might be due to optical depth effects) or also trace the inner parts of outflows at this scale. On the other hand DCN, HC₃N, CH₃CN₃and CH₃OCHO are well correlated with the dense dust fragments at above 60%. The lowest correlation is seen with SiO for the extended continuum emission and for compact sources. Moreover, unlike other outflow tracers, in a large fraction of the sources, SiO does not cover the area of the extended continuum emission well. This and the results of the astroHOG analysis reveal that SiO and SO do not trace the same gas, in contrast to what was previously thought. From the comparison of the results of the HOG method and the Spearman correlation coefficient, the HOG method gives much more reliable results than the intensity-based coefficient when the level of similarity of the emission morphology is estimated.
more » « less
Free, publicly-accessible full text available July 1, 2026
Dataflow Accelerator Architecture for Autonomous Machine Computing

Liu, S; Zhu, Y; Yu, B; Gaudiot, J_L; Gao, G (October 2024, ACM)

Commercial autonomous machines is a thriving sector, one that is likely the next ubiquitous computing platform, after Personal Computers (PC), cloud computing, and mobile computing. Nevertheless, a suitable computing substrate for autonomous machines is missing, and many companies are forced to develop ad hoc computing solutions that are neither principled nor extensible. By analyzing the demands of autonomous machine computing, this article proposes Dataflow Accelerator Architecture (DAA), a modern instantiation of the classic dataflow principle, that matches the characteristics of autonomous machine software.
more » « less
Full Text Available
Vision-language model-driven scene understanding and robotic object manipulation

Liu, S; Zhang, J; Gao, RX; Wang, V; Wang, L (August 2024, IEEE Xplore)

Humans often use natural language instructions to control and interact with robots for task execution. This poses a big challenge to robots that need to not only parse and understand human instructions but also realise semantic understanding of an unknown environment and its constituent elements. To address this challenge, this study presents a vision-language model (VLM)-driven approach to scene understanding of an unknown environment to enable robotic object manipulation. Given language instructions, a pretrained vision-language model built on open-sourced Llama2-chat (7B) as the language model backbone is adopted for image description and scene understanding, which translates visual information into text descriptions of the scene. Next, a zero-shot-based approach to fine-grained visual grounding and object detection is developed to extract and localise objects of interest from the scene task. Upon 3D reconstruction and pose estimate establishment of the object, a code-writing large language model (LLM) is adopted to generate high-level control codes and link language instructions with robot actions for downstream tasks. The performance of the developed approach is experimentally validated through table-top object manipulation by a robot.
more » « less
Full Text Available
ALMAGAL: III. Compact source catalog: Fragmentation statistics and physical evolution of the core population

https://doi.org/10.1051/0004-6361/202452706

Coletta, A; Molinari, S; Schisano, E; Traficante, A; Elia, D; Benedettini, M; Mininni, C; Soler, J D; Sánchez-Monge, Á; Schilke, P; et al (April 2025, Astronomy & Astrophysics)

The physical mechanisms behind the fragmentation of high-mass dense clumps into compact star-forming cores and the properties of these cores are fundamental topics that are heavily investigated in current astrophysical research. The ALMAGAL survey provides the opportunity to study this process at an unprecedented level of detail and statistical significance, featuring high-angular resolution 1.38 mm ALMA observations of 1013 massive dense clumps at various Galactic locations. These clumps cover a wide range of distances (~2–8 kpc), masses (~10²–10⁴M_⊙), surface densities (0.1–10 g cm⁻²), and evolutionary stages (luminosity over mass ratio indicator of ~0.05 <L/M <450L_⊙/M_⊙). Here, we present the catalog of compact sources obtained with theCuTExalgorithm from continuum images of the full ALMAGAL clump sample combining ACA-7 m and 12 m ALMA arrays, reaching a uniform high median spatial resolution of ~1400 au (down to ~800 au). We characterize and discuss the revealed fragmentation properties and the photometric and estimated physical parameters of the core population. The ALMAGAL compact source catalog includes 6348 cores detected in 844 clumps (83% of the total), with a number of cores per clump between 1 and 49 (median of 5). The estimated core diameters are mostly within ~800–3000 au (median of 1700 au). We assigned core temperatures based on theL/Mof the hosting clump, and obtained core masses from 0.002 to 345M_⊙(complete above 0.23 M_⊙), exhibiting a good correlation with the core radii (M ∝ R^2.6). We evaluated the variation in the core mass function (CMF) with evolution as traced by the clumpL/M, finding a clear, robust shift and change in slope among CMFs within subsamples at different stages. This finding suggests that the CMF shape is not constant throughout the star formation process, but rather it builds (and flattens) with evolution, with higher core masses reached at later stages. We found that all cores within a clump grow in mass on average with evolution, while a population of possibly newly formed lower-mass cores is present throughout. The number of cores increases with the core masses, at least until the most massive core reaches ~10M_⊙. More generally, our results favor a clump-fed scenario for high-mass star formation, in which cores form as low-mass seeds, and then gain mass while further fragmentation occurs in the clump.
more » « less
Free, publicly-accessible full text available April 1, 2026
Cross sections for the ⁵⁴ Fe(n, n′) ⁵⁴ Fe and ⁵⁴ Fe(n, p′) ⁵⁴ Mn reactions deduced from the detection of de-excitation γ rays

https://doi.org/10.1051/epjconf/202532905001

Hicks, S F; Pecha, R L; Howard, T J; French, A J; Santonil, Z C; Vanhoy, J R; Ramirez, A_P D; Peters, E E; Liu, S H; Prados-Estevez, F M; et al (January 2025, EPJ Web of Conferences)
Jentschel, M (Ed.)
γ-ray production cross sections have been deduced for reactions with incident neutrons having energies from 1.5 - 4.7 MeV. Similar measurements were made on a natural Ti sample to establish an absolute normalization. The resulting γ-ray production cross sections are compared to TENDL and TALYS calculations, as well as data from previous measurements. The models are found to describe the production cross sections for mostγrays observed from⁵⁴Mn and⁵⁴Fe rather well.
more » « less
Free, publicly-accessible full text available January 1, 2026
Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Ji, J; Li, G; Yin, L; Qin, M; Yuan, G; Guo, L; Liu, S; Ma, X (July 2024, Proceedings of Machine Learning Research)

Full Text Available
Integrating Large Language Model for Natural Language-Based Instruction toward Robust Human-Robot Collaboration

Gao, F; Xia, L; Zhang, J; Liu, S; Wang, L; Gao, R (May 2024, Elsevier)

Human-Robot Collaboration (HRC) aims to create environments where robots can understand workspace dynamics and actively assist humans in operations, with the human intention recognition being fundamental to efficient and safe task fulfillment. Language-based control and communication is a natural and convenient way to convey human intentions. However, traditional language models require instructions to be articulated following a rigid, predefined syntax, which can be unnatural, inefficient, and prone to errors. This paper investigates the reasoning abilities that emerged from the recent advancement of Large Language Models (LLMs) to overcome these limitations, allowing for human instructions to be used to enhance human-robot communication. For this purpose, a generic GPT 3.5 model has been fine-tuned to interpret and translate varied human instructions into essential attributes, such as task relevancy and tools and/or parts required for the task. These attributes are then fused with perceived on-going robot action to generate a sequence of relevant actions. The developed technique is evaluated in a case study where robots initially misinterpreted human actions and picked up wrong tools and parts for assembly. It is shown that the fine-tuned LLM can effectively identify corrective actions across a diverse range of instructional human inputs, thereby enhancing the robustness of human-robot collaborative assembly for smart manufacturing.
more » « less
Full Text Available
AVA: Towards Autonomous Visualization Agents through Visual Perception‐Driven Decision‐Making

https://doi.org/10.1111/cgf.15093

Liu, S; Miao, H; Li, Z; Olson, M; Pascucci, V; Bremer, P‐T (June 2024, Computer Graphics Forum)

Abstract With recent advances in multi‐modal foundation models, the previously text‐only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Compared to existing work on LLM‐based visualization works that generate and control visualization with textual input and output only, the proposed approach explores the utilization of the visual processing ability of multi‐modal LLMs to develop Autonomous Visualization Agents (AVAs) that can evaluate the generated visualization and iterate on the result to accomplish user‐defined objectives defined through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. Our preliminary exploration and proof‐of‐concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high‐level visualization goals, which pave the way for developing expert‐level visualization agents in the future.
more » « less
Full Text Available
Neural Abstractive Summarization for Long Text and Multiple Tables

Liu, S; Cao, J; Deng, Z; Zhao, W; Yang, R; Wen, Z; Yu, PS (June 2024, IEEE transactions on knowledge and data engineering)

Full Text Available

« Prev Next »

Search for: All records