skip to main content


Search for: All records

Creators/Authors contains: "Kasprowicz, Adrienne"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Motivation

    Timetrees depict evolutionary relationships between species and the geological times of their divergence. Hundreds of research articles containing timetrees are published in scientific journals every year. The TimeTree (TT) project has been manually locating, curating and synthesizing timetrees from these articles for almost two decades into a TimeTree of Life, delivered through a unique, user-friendly web interface (timetree.org). The manual process of finding articles containing timetrees is becoming increasingly expensive and time-consuming. So, we have explored the effectiveness of text-mining approaches and developed optimizations to find research articles containing timetrees automatically.

    Results

    We have developed an optimized machine learning system to determine if a research article contains an evolutionary timetree appropriate for inclusion in the TT resource. We found that BERT classification fine-tuned on whole-text articles achieved an F1 score of 0.67, which we increased to 0.88 by text-mining article excerpts surrounding the mentioning of figures. The new method is implemented in the TimeTreeFinder (TTF) tool, which automatically processes millions of articles to discover timetree-containing articles. We estimate that the TTF tool would produce twice as many timetree-containing articles as those discovered manually, whose inclusion in the TT database would potentially double the knowledge accessible to a wider community. Manual inspection showed that the precision on out-of-distribution recently published articles is 87%. This automation will speed up the collection and curation of timetrees with much lower human and time costs.

    Availability and implementation

    https://github.com/marija-stanojevic/time-tree-classification.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract We present the fifth edition of the TimeTree of Life resource (TToL5), a product of the timetree of life project that aims to synthesize published molecular timetrees and make evolutionary knowledge easily accessible to all. Using the TToL5 web portal, users can retrieve published studies and divergence times between species, the timeline of a species’ evolution beginning with the origin of life, and the timetree for a given evolutionary group at the desired taxonomic rank. TToL5 contains divergence time information on 137,306 species, 41% more than the previous edition. The TToL5 web interface is now Americans with Disabilities Act-compliant and mobile-friendly, a result of comprehensive source code refactoring. TToL5 also offers programmatic access to species divergence times and timelines through an application programming interface, which is accessible at timetree.temple.edu/api. TToL5 is publicly available at timetree.org. 
    more » « less