skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Zhao, Wenlong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. When training node embedding models to represent large directed graphs (digraphs), it is impossible to observe all entries of the adjacency matrix during training. As a consequence most methods employ sampling. For very large digraphs, however, this means many (most) entries may be unobserved during training. In general, observing every entry would be necessary to uniquely identify a graph, however if we know the graph has a certain property some entries can be omitted - for example, only half the entries would be required for a symmetric graph. In this work, we develop a novel framework to identify a subset of entries required to uniquely distinguish a graph among all transitively-closed DAGs. We give an explicit algorithm to compute the provably minimal set of entries, and demonstrate empirically that one can train node embedding models with greater efficiency and performance, provided the energy function has an appropriate inductive bias. We achieve robust performance on synthetic hierarchies and a larger real-world taxonomy, observing improved convergence rates in a resource-constrained setting while reducing the set of training examples by as much as 99%. 
    more » « less
    Free, publicly-accessible full text available September 25, 2025
  2. Vlachos, Andreas; Augenstein, Isabelle (Ed.)
    Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines. 
    more » « less
  3. Abstract The exodus of flying animals from their roosting locations is often visible as expanding ring‐shaped patterns in weather radar data. The NEXRAD network, for example, archives more than 25 years of data across 143 contiguous US radar stations, providing opportunities to study roosting locations and times and the ecosystems of birds and bats. However, access to this information is limited by the cost of manually annotating millions of radar scans. We develop and deploy an AI‐assisted system to annotate roosts in radar data. We build datasets with roost annotations to support the training and evaluation of automated detection models. Roosts are detected, tracked, and incorporated into our developed web‐based interface for human screening to produce research‐grade annotations. We deploy the system to collect swallow and martin roost information from 12 radar stations around the Great Lakes spanning 21 years. After verifying the practical value of the system, we propose to improve the detector by incorporating both spatial and temporal channels from volumetric radar scans. The deployment on Great Lakes radar scans allows accelerated annotation of 15 628 roost signatures in 612 786 radar scans with 183.6 human screening hours, or 1.08 s per radar scan. We estimate that the deployed system reduces human annotation time by ~7×. The temporal detector model improves the average precision at intersection‐over‐union threshold 0.5 (APIoU = .50) by 8% over the previous model (48%→56%), further reducing human screening time by 2.3× in its pilot deployment. These data contain critical information about phenology and population trends of swallows and martins, aerial insectivore species experiencing acute declines, and have enabled novel research. We present error analyses, lay the groundwork for continent‐scale historical investigation about these species, and provide a starting point for automating the detection of other family‐specific phenomena in radar data, such as bat roosts and mayfly hatches. 
    more » « less
  4. Abstract In this study, we combined a machine learning pipeline and human supervision to identify and label swallow and martin roost locations on data captured from 2000 to 2020 by 12 Weather Surveillance Radars in the Great Lakes region of the US. We employed radar theory to extract the number of birds in each roost detected by our technique. With these data, we set out to investigate whether roosts formed consistently in the same geographic area over two decades and whether consistency was also predictive of roost size. We used a clustering algorithm to group individual roost locations into 104 high‐density regions and extracted the number of years when each of these regions was used by birds to roost. In addition, we calculated the overall population size and analyzed the daily roost size distributions. Our results support the hypothesis that more persistent roosts are also gathering more birds, but we found that on average, most individuals congregate in roosts of smaller size. Given the concentrations and consistency of roosting of swallows and martins in specific areas throughout the Great Lakes, future changes in these patterns should be monitored because they may have important ecosystem and conservation implications. 
    more » « less