skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Xie"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires. 
    more » « less
    Free, publicly-accessible full text available December 31, 2026
  2. Free, publicly-accessible full text available January 1, 2027
  3. Free, publicly-accessible full text available December 1, 2026
  4. Free, publicly-accessible full text available October 19, 2026
  5. Free, publicly-accessible full text available September 26, 2026
  6. Abstract Engineering design has been widely implemented in K-12 curricula to cultivate future workforce. In this study, seventh-grade students (N = 38) participated in theSolarizing Your Schoolcurriculum, an action-oriented program where they engaged in engineering design processes to tackle a real-world problem related to renewable energy adoption. The study sought to explore how students balanced constraints and criteria in engineering design. Over a five-day period, seventh-grade students developed plans for adopting solar energy on their school campus and simulated the plan on a technology-enhanced epistemic tool, Aladdin (https://intofuture.org/aladdin.html). Data was collected through design artifacts, log data from design processes, and surveys about their learning experience. Three distinct patterns of balancing design criteria and constraints emerged, including designing for practice, for performance, and for irrelevant goals. The group who designed for practice gave priority to criteria and constraints recorded a higher level of design performance. The study underscores the benefits of integrating action-oriented learning opportunities via engineering design processes in science education. 
    more » « less
    Free, publicly-accessible full text available September 12, 2026
  7. Free, publicly-accessible full text available October 1, 2026
  8. D-optimal experimental design is a classical statistical problem in which one chooses a collection of data vectors, from some available large pool, in order to maximize a measure of predictive quality. In the classical formulation, the only constraint is on the cardinality of the collection, that is, the number of vectors chosen. We study a more general budget-constrained variant in which vectors have heterogeneous costs, and develop four new algorithms (two deterministic and two randomized) with approximation guarantees. Our methods handle heterogeneous costs using a novel exchange rule that interchanges packs of data vectors whose total costs are similar (up to some controlled amount of rounding error). The algorithms outperform the only existing method for this problem from both theoretical and empirical standpoints. Funding: The first and third authors gratefully acknowledge support from the National Science Foundation (NSF) Division of Civil, Mechanical and Manufacturing Innovation [Grant CMMI-2112828]. The second author gratefully acknowledges support from the NSF Division of Computing and Communication Foundations [Grant CCF-2246417] and Office of Naval Research [Grant N00014-24-1-2066]. 
    more » « less
    Free, publicly-accessible full text available October 7, 2026
  9. Crop production is among the most extensive human activities on the planet – with critical importance for global food security, land use, environmental burden, and climate. Yet despite the key role that croplands play in global land use and Earth systems, there remains little understanding of how spatial patterns of global crop cultivation have recently evolved and which crops have contributed most to these changes. Here we construct a new data library of subnational crop-specific irrigated and rainfed harvested area statistics and combine it with global gridded land cover products to develop a global gridded (5-arcminute) irrigated and rainfed cropped area (MIRCA-OS) dataset for the years 2000 to 2015 for 23 crop classes. These global data products support critical insights into the spatially detailed patterns of irrigated and rainfed cropland change since the start of the century and provide an improved foundation for a wide array of global assessments spanning agriculture, water resource management, land use change, climate impact, and sustainable development. 
    more » « less
    Free, publicly-accessible full text available December 1, 2026
  10. Let (kn)n∈N be a sequence of positive integers growing to infinity at a sublinear rate, kn → ∞ and kn/n → 0 as n → ∞. Given a sequence of n-dimensional random vectors {Y (n)}n∈N belonging to a certain class, which includes uniform distributions on suitably scaled ℓnp -balls or ℓnp -spheres, p ≥ 2, and product distributions with sub-Gaussian marginals, we study the large deviations behavior of the corresponding sequence of kn-dimensional orthogonal projections. For almost every sequence of projection matrices, we establish a large deviation principle (LDP) for the corresponding sequence of projections, with a fairly explicit rate function that does not depend on the sequence of projection matrices. As corollaries, we also obtain quenched LDPs for sequences of ℓ2-norms and ℓ∞-norms of the coordinates of the projections. Past work on LDPs for projections with growing dimension has mainly focused on the annealed setting, where one also averages over the random projection matrix, chosen from the Haar measure, in which case the coordinates of the projection are exchangeable. The quenched setting lacks such symmetry properties, and gives rise to significant new challenges in the setting of growing projection dimension. Along the way, we establish new Gaussian approximation results on the Stiefel manifold that may be of independent interest. Such LDPs are of relevance in asymptotic convex geometry, statistical physics and high-dimensional statistics. 
    more » « less
    Free, publicly-accessible full text available September 1, 2026