Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation of the edit distance that models evolution and the hierarchical relationship that characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed sequences in geometric vector spaces, and illustrate the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 38% reduction in embedding RMSE against the best competing geometry. The capacity of the framework and the significance of these improvements are then demonstrated devising supervised and unsupervised NeuroSEED approaches to multiple core tasks in bioinformatics. Benchmarked with common baselines, the proposed approaches display significant accuracy and/or runtime improvements on real-world datasets. As an example for hierarchical clustering, the proposed pretrained and from-scratch methods match the quality of competing baselines with 30x and 15x runtime reduction, respectively.Free, publicly-accessible full text available December 6, 2022
Using data from nesting beach monitoring and satellite telemetry to improve estimates of marine turtle clutch frequency and population abundancePopulation abundance data are often used to define species’ conservation status. Abundance of marine turtles is typically estimated using nesting beach monitoring data such as nest counts and clutch frequency (CF, i.e., the number of nests female turtles lay within a nesting season). However, studies have shown that CF determined solely from nesting beach monitoring data can be underestimated, leading to inaccurate abundance estimates. To obtain reliable estimates of CF for hawksbill turtles in northeastern Brazil (6.273356° S, 35.036271° W), the region with the highest nesting density in the South Atlantic, data from beach monitoring and satellite telemetry were combined from 2014 to 2019. Beach monitoring data indicated the date of first nesting event, while state-space modeling of satellite telemetry data indicated the departure date of turtles, allowing calculations of residence length at breeding site and CF estimates based on internesting intervals. Females were estimated to nest up to six times within the nesting season with CF estimates between 4.5 and 4.8 clutches per female. CF estimates were used to determine the number of nesting females at the study site based in two approaches: considering and not considering transient turtles. Our approach and findings highlight that transients heavily influence CFmore »