In 2020, the White House released the “Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset,” wherein artificial intelligence experts are asked to collect data and develop text mining techniques that can help the science community answer high-priority scientific questions related to COVID-19. The Allen Institute for AI and collaborators announced the availability of a rapidly growing open dataset of publications, the COVID-19 Open Research Dataset (CORD-19). As the pace of research accelerates, biomedical scientists struggle to stay current. To expedite their investigations, scientists leverage hypothesis generation systems, which can automatically inspect published papers to discover novel implicit connections. We present automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research. The systems are based on the graph mining and transformer models. The systems are massively validated using retrospective information rediscovery and proactive analysis involving human-in-the-loop expert analysis. Both systems achieve high-quality predictions across domains in fast computational time and are released to the broad scientific community to accelerate biomedical research. In addition, by performing the domain expert curated study, we show that the systems are able to discover ongoing research findings such as the relationship between COVID-19 and oxytocin hormone.All code, details, and pre-trained models are available at https://github.com/IlyaTyagin/AGATHA-C-GP.
more »
« less
AGATHA: Automatic Graph Mining And Transformer based Hypothesis Generation Approach
Medical research is risky and expensive. Drug discovery requires researchers to efficiently winnow thousands of potential targets to a small candidate set. However, scientists spend significant time and money long before seeing the intermediate results that ultimately determine this smaller set. Hypothesis generation systems address this challenge by mining the wealth of publicly available scientific information to predict plausible research directions.We present AGATHA, a deep-learning hypothesis generation system that learns a data-driven ranking criteria to recommend new biomedical connections. We massively validate our system with a temporal holdout wherein we predict connections first introduced after 2015 using data published beforehand. We additionally explore biomedical sub-domains, and demonstrate AGATHA’s predictive capacity across the twenty most popular relationship types. Furthermore, we perform an ablation study to examine the aspects of our semantic network that most contribute to recommendation quality. Overall, AGATHA achieves best-in-class recommendation quality when compared to other hypothesis generation systems built to predict across all available biomedical literature. Reproducibility: All code, experimental data, and pre-trained models are available online: sybrandt.com/2020/agatha.
more »
« less
- Award ID(s):
- 1725573
- PAR ID:
- 10205661
- Date Published:
- Journal Name:
- Association for Computing Machinery
- Page Range / eLocation ID:
- 2757 to 2764
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Social recommendation task aims to predict users' preferences over items with the incorporation of social connections among users, so as to alleviate the sparse issue of collaborative filtering. While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users’ social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques. To tackle the above challenges, this work proposes a Knowledge-aware Coupled Graph Neural Network (KCGN) that jointly injects the inter-dependent knowledge across items and users into the recommendation framework. KCGN enables the high-order user- and item-wise relation encoding by exploiting the mutual information for global graph structure awareness. Additionally, we further augment KCGN with the capability of capturing dynamic multi-typed user-item interactive patterns. Experimental studies on real-world datasets show the effectiveness of our method against many strong baselines in a variety of settings. Source codes are available at: https://github.com/xhcdream/KCGN.more » « less
-
Despite the benefits of personalizing items and information tailored to users’ needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, at both the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.more » « less
-
Recommender systems are often biased toward popular items. In other words, few items are frequently recommended while the majority of items do not get proportionate attention. That leads to low coverage of items in recommendation lists across users (i.e. low aggregate diversity) and unfair distribution of recommended items. In this paper, we introduce FairMatch, a general graph-based algorithm that works as a post-processing approach after recommendation generation for improving aggregate diversity. The algorithm iteratively finds items that are rarely recommended yet are high-quality and add them to the users' final recommendation lists. This is done by solving the maximum flow problem on the recommendation bipartite graph. While we focus on aggregate diversity and fair distribution of recommended items, the algorithm can be adapted to other recommendation scenarios using different underlying definitions of fairness. A comprehensive set of experiments on two datasets and comparison with state-of-the-art baselines show that FairMatch, while significantly improving aggregate diversity, provides comparable recommendation accuracy.more » « less
-
As one of the most pervasive applications of machine learning, recommender systems are playing an important role on assisting human decision making. The satisfaction of users and the interests of platforms are closely related to the quality of the generated recommendation results. However, as a highly data-driven system, recommender system could be affected by data or algorithmic bias and thus generate unfair results, which could weaken the reliance of the systems. As a result, it is crucial to address the potential unfairness problems in recommendation settings. Recently, there has been growing attention on fairness considerations in recommender systems with more and more literature on approaches to promote fairness in recommendation. However, the studies are rather fragmented and lack a systematic organization, thus making it difficult to penetrate for new researchers to the domain. This motivates us to provide a systematic survey of existing works on fairness in recommendation. This survey focuses on the foundations for fairness in recommendation literature. It first presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking in order to provide a general overview of fairness research, as well as introduce the more complex situations and challenges that need to be considered when studying fairness in recommender systems. After that, the survey will introduce fairness in recommendation with a focus on the taxonomies of current fairness definitions, the typical techniques for improving fairness, as well as the datasets for fairness studies in recommendation. The survey also talks about the challenges and opportunities in fairness research with the hope of promoting the fair recommendation research area and beyond.more » « less
An official website of the United States government

