NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Representations for Hierarchies with Minimal Support

Rozonoyer, Benjamin; Boratko; Michael; Patel; Dhruvesh; Zhao, Wenlong; Dasgupta, Shib_Sankar; Le, Hung; McCallum, Andrew (September 2024, NeurIPS)

When training node embedding models to represent large directed graphs (digraphs), it is impossible to observe all entries of the adjacency matrix during training. As a consequence most methods employ sampling. For very large digraphs, however, this means many (most) entries may be unobserved during training. In general, observing every entry would be necessary to uniquely identify a graph, however if we know the graph has a certain property some entries can be omitted - for example, only half the entries would be required for a symmetric graph. In this work, we develop a novel framework to identify a subset of entries required to uniquely distinguish a graph among all transitively-closed DAGs. We give an explicit algorithm to compute the provably minimal set of entries, and demonstrate empirically that one can train node embedding models with greater efficiency and performance, provided the energy function has an appropriate inductive bias. We achieve robust performance on synthetic hierarchies and a larger real-world taxonomy, observing improved convergence rates in a resource-constrained setting while reducing the set of training examples by as much as 99%.
more » « less
Full Text Available
ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Gupta, Ankita; Karpinska, Marzena; Zhao, Wenlong; Krishna, Kalpesh; Merullo, Jack; Yeh, Luke; Iyyer, Mohit; O'Connor, Brendan (May 2023, Findings of the Association for Computational Linguistics: EACL 2023)
Vlachos, Andreas; Augenstein, Isabelle (Ed.)
Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.
more » « less
Full Text Available
Quantifying long‐term phenological patterns of aerial insectivores roosting in the Great Lakes region using weather surveillance radar

https://doi.org/10.1111/gcb.16509

Deng, Yuting; Belotti, Maria Carolina; Zhao, Wenlong; Cheng, Zezhou; Perez, Gustavo; Tielens, Elske; Simons, Victoria F.; Sheldon, Daniel R.; Maji, Subhransu; Kelly, Jeffrey F.; et al (November 2022, Global Change Biology)

Full Text Available
Using spatiotemporal information in weather radar data to detect and track communal roosts

https://doi.org/10.1002/rse2.388

Perez, Gustavo; Zhao, Wenlong; Cheng, Zezhou; Belotti, Maria_Carolina_T_D; Deng, Yuting; Simons, Victoria_F; Tielens, Elske; Kelly, Jeffrey_F; Horton, Kyle_G; Maji, Subhransu; et al (April 2024, Remote Sensing in Ecology and Conservation)

Abstract The exodus of flying animals from their roosting locations is often visible as expanding ring‐shaped patterns in weather radar data. The NEXRAD network, for example, archives more than 25 years of data across 143 contiguous US radar stations, providing opportunities to study roosting locations and times and the ecosystems of birds and bats. However, access to this information is limited by the cost of manually annotating millions of radar scans. We develop and deploy an AI‐assisted system to annotate roosts in radar data. We build datasets with roost annotations to support the training and evaluation of automated detection models. Roosts are detected, tracked, and incorporated into our developed web‐based interface for human screening to produce research‐grade annotations. We deploy the system to collect swallow and martin roost information from 12 radar stations around the Great Lakes spanning 21 years. After verifying the practical value of the system, we propose to improve the detector by incorporating both spatial and temporal channels from volumetric radar scans. The deployment on Great Lakes radar scans allows accelerated annotation of 15 628 roost signatures in 612 786 radar scans with 183.6 human screening hours, or 1.08 s per radar scan. We estimate that the deployed system reduces human annotation time by ~7×. The temporal detector model improves the average precision at intersection‐over‐union threshold 0.5 (AP^{IoU = .50}) by 8% over the previous model (48%→56%), further reducing human screening time by 2.3× in its pilot deployment. These data contain critical information about phenology and population trends of swallows and martins, aerial insectivore species experiencing acute declines, and have enabled novel research. We present error analyses, lay the groundwork for continent‐scale historical investigation about these species, and provide a starting point for automating the detection of other family‐specific phenomena in radar data, such as bat roosts and mayfly hatches.
more » « less
Long‐term analysis of persistence and size of swallow and martin roosts in the US Great Lakes

https://doi.org/10.1002/rse2.323

Belotti, Maria Carolina T. D.; Deng, Yuting; Zhao, Wenlong; Simons, Victoria F.; Cheng, Zezhou; Perez, Gustavo; Tielens, Elske; Maji, Subhransu; Sheldon, Daniel; Kelly, Jeffrey F.; et al (January 2023, Remote Sensing in Ecology and Conservation)

Abstract In this study, we combined a machine learning pipeline and human supervision to identify and label swallow and martin roost locations on data captured from 2000 to 2020 by 12 Weather Surveillance Radars in the Great Lakes region of the US. We employed radar theory to extract the number of birds in each roost detected by our technique. With these data, we set out to investigate whether roosts formed consistently in the same geographic area over two decades and whether consistency was also predictive of roost size. We used a clustering algorithm to group individual roost locations into 104 high‐density regions and extracted the number of years when each of these regions was used by birds to roost. In addition, we calculated the overall population size and analyzed the daily roost size distributions. Our results support the hypothesis that more persistent roosts are also gathering more birds, but we found that on average, most individuals congregate in roosts of smaller size. Given the concentrations and consistency of roosting of swallows and martins in specific areas throughout the Great Lakes, future changes in these patterns should be monitored because they may have important ecosystem and conservation implications.
more » « less

Search for: All records