skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Plant science corpus
The plant science corpus consists of the titles and abstracts of plant science articles in PubMed published prior to 2021 with a small number of 2021 records due to modification of records. The columns are: Index: integer index serving as identifier PMID: PubMed identifier Date: Publication date Journal: journal where the article was published Title: Title of the article Abstract: Abstract of the article Corpus: Title and abstract combined Text classification score: plant science record prediction model score Preprocessed corpus: Corpus after lower-casing, stop word removal, removal of non-alphanumeric and non-white space characters, lemmitisation Topic: index of topics after topic modeling  more » « less
Award ID(s):
2107215
PAR ID:
10475805
Author(s) / Creator(s):
Publisher / Repository:
Zenodo
Date Published:
Format(s):
Medium: X
Location:
Michigan State University
Sponsoring Org:
National Science Foundation
More Like this
  1. The objective of this article was to review existing research to assess the evidence for predictive processing (PP) in sign language, the conditions under which it occurs, and the effects of language mastery (sign language as a first language, sign language as a second language, bimodal bilingualism) on the neural bases of PP. This review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. We searched peer-reviewed electronic databases (SCOPUS, Web of Science, PubMed, ScienceDirect, and EBSCO host) and gray literature (dissertations in ProQuest). We also searched the reference lists of records selected for the review and forward citations to identify all relevant publications. We searched for records based on five criteria (original work, peer-reviewed, published in English, research topic related to PP or neural entrainment, and human sign language processing). To reduce the risk of bias, the remaining two authors with expertise in sign language processing and a variety of research methods reviewed the results. Disagreements were resolved through extensive discussion. In the final review, 7 records were included, of which 5 were published articles and 2 were dissertations. The reviewed records provide evidence for PP in signing populations, although the underlying mechanism in the visual modality is not clear. The reviewed studies addressed the motor simulation proposals, neural basis of PP, as well as the development of PP. All studies used dynamic sign stimuli. Most of the studies focused on semantic prediction. The question of the mechanism for the interaction between one’s sign language competence (L1 vs. L2 vs. bimodal bilingual) and PP in the manual-visual modality remains unclear, primarily due to the scarcity of participants with varying degrees of language dominance. There is a paucity of evidence for PP in sign languages, especially for frequency-based, phonetic (articulatory), and syntactic prediction. However, studies published to date indicate that Deaf native/native-like L1 signers predict linguistic information during sign language processing, suggesting that PP is an amodal property of language processing. Systematic Review Registration [ https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021238911 ], identifier [CRD42021238911]. 
    more » « less
  2. null (Ed.)
    Topic modeling, a method for extracting the underlying themes from a collection of documents, is an increasingly important component of the design of intelligent systems enabling the sense-making of highly dynamic and diverse streams of text data related but not limited to scientific discovery. Traditional methods such as Dynamic Topic Modeling (DTM) do not lend themselves well to direct parallelization because of dependencies from one time step to another. In this paper, we introduce and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents. Our approach is based on data decomposition in which the data is partitioned into segments, followed by topic modeling on the individual segments. The resulting local models are then combined into a global solution using clustering. The decomposition and resulting parallelization leads to very fast runtime even on very large datasets. Our approach furthermore provides insight into how the composition of topics changes over time and can also be applied using other data partitioning strategies over any discrete features of the data, such as geographic features or classes of users. In this paper CLDA is applied successfully to seventeen years of NIPS conference papers (2,484 documents and 3,280,697 words), seventeen years of computer science journal abstracts (533,588 documents and 46,446,184 words), and to forty years of the PubMed corpus (4,025,976 documents and 386,847,695 words). On the PubMed corpus, we demonstrate the versatility of CLDA by segmenting the data by both time and by journal. Our runtime on this corpus demonstrates an ability to function on very large scale datasets. 
    more » « less
  3. Parkinson’s disease (PD) is a neurological disorder with complicated and disabling motor and non-motor symptoms. The complexity of PD pathology is amplified due to its dependency on patient diaries and the neurologist’s subjective assessment of clinical scales. A significant amount of recent research has explored new cost-effective and subjective assessment methods pertaining to PD symptoms to address this challenge. This article analyzes the application areas and use of mobile and wearable technology in PD research using the PRISMA methodology. Based on the published papers, we identify four significant fields of research: diagnosis, prognosis and monitoring, predicting response to treatment, and rehabilitation. Between January 2008 and December 2021, 31,718 articles were published in four databases: PubMed Central, Science Direct, IEEE Xplore, and MDPI. After removing unrelated articles, duplicate entries, non-English publications, and other articles that did not fulfill the selection criteria, we manually investigated 1559 articles in this review. Most of the articles (45%) were published during a recent four-year stretch (2018–2021), and 19% of the articles were published in 2021 alone. This trend reflects the research community’s growing interest in assessing PD with wearable devices, particularly in the last four years of the period under study. We conclude that there is a substantial and steady growth in the use of mobile technology in the PD contexts. We share our automated script and the detailed results with the public, making the review reproducible for future publications. 
    more » « less
  4. null (Ed.)
    This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI). 
    more » « less
  5. Information about individual publications associated with grants funded by NSF to support SES research from 2000-2015 (see "SES grants, 2000-2015"). For grants with ten or fewer publications, we included information about all available publications in this dataset. For grants with more than ten publications, we randomly selected ten to include in this dataset. CSV file with 13 columns and names in header row: "Grant ID" is the ID from the Dimensions platform (string); "Grant Number" is the NSF Award number (integer); "Publication Title" is the title of the paper (text); "Publication Year" is the year in which the paper was published (year); "Authors" is a list or abbreviated list of the authors of the paper (text); "Journal" is the name of the scientific journal or outlet in which the paper is published (text); "Interdis Rubric 1" is a metric representing the dataset authors' assessment for the level of interdisciplinarity represented by the paper (integer: “1” indicated social and natural science interdisciplinarity where both social and environmental conditions are measured or explored and/or author affiliations included departments across these disciplines; “2” indicated general interdisciplinarity between two or more different fields (that may both be within natural or social science); and “3” indicated single-disciplinarity) "Citations" is the count of citations the paper had received as of the date listed in "date for cite count", as reported in Google Scholar (integer); "date for cite count" is the date on which citation count for the paper was obtained (ddBBByy); "Abstract" is the text of the abstract of the paper, where available (text); "Notes" are any notes added by the authors of the dataset (text). 
    more » « less