skip to main content

Search for: All records

Creators/Authors contains: "Ding, Ying"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe Mentorship, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists primarily in biosciences that avoids these shortcomings. Our dataset enriches the Academic Family Tree project by adding publication data from the Microsoft Academic Graph and “semantic” representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile–publication matching, semantic content, and demographic inferences, which mostly cover neuroscience and biomedical sciences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists’ career outcomes.

  2. Free, publicly-accessible full text available November 28, 2023
  3. Abstract

    Age-related macular degeneration (AMD) is the principal cause of blindness in developed countries, and its prevalence will increase to 288 million people in 2040. Therefore, automated grading and prediction methods can be highly beneficial for recognizing susceptible subjects to late-AMD and enabling clinicians to start preventive actions for them. Clinically, AMD severity is quantified by Color Fundus Photographs (CFP) of the retina, and many machine-learning-based methods are proposed for grading AMD severity. However, few models were developed to predict the longitudinal progression status, i.e. predicting future late-AMD risk based on the current CFP, which is more clinically interesting. In this paper, we propose a new deep-learning-based classification model (LONGL-Net) that can simultaneously grade the current CFP and predict the longitudinal outcome, i.e. whether the subject will be in late-AMD in the future time-point. We design a new temporal-correlation-structure-guided Generative Adversarial Network model that learns the interrelations of temporal changes in CFPs in consecutive time-points and provides interpretability for the classifier's decisions by forecasting AMD symptoms in the future CFPs. We used about 30,000 CFP images from 4,628 participants in the Age-Related Eye Disease Study. Our classifier showed average 0.905 (95% CI: 0.886–0.922) AUC and 0.762 (95% CI: 0.733–0.792) accuracymore »on the 3-class classification problem of simultaneously grading current time-point's AMD condition and predicting late AMD progression of subjects in the future time-point. We further validated our model on the UK Biobank dataset, where our model showed average 0.905 accuracy and 0.797 sensitivity in grading 300 CFP images.

    « less
  4. This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.
  5. The association between two event times is of scientific importance in various fields. Due to population heterogeneity, it is desirable to examine the degree to which local association depends on different characteristics of the population. Here we adopt a novel quantile-based local association measure and propose a conditional quantile association regression model to allow covariate effects on local association of two survival times. Estimating equations for the quantile association coefficients are constructed based on the relationship between this quantile association measure and the conditional copula. Asymptotic properties for the resulting estimators are rigorously derived, and induced smoothing is used to obtain the covariance matrix. Through simulations we demonstrate the good practical performance of the proposed inference procedures. An application to age-related macular degeneration (AMD) data reals interesting varying effects of the baseline AMD severity score on the local association between two AMD progression times.