NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark

https://doi.org/10.1145/3511808.3557675

Sheng, Jiasheng; Gero, Zelalem; Ho, Joyce C. (October 2022, CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management)

With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.
more » « less
Full Text Available
SR-CoMbEr: Heterogeneous Network Embedding Using Community Multi-view Enhanced Graph Convolutional Network for Automating Systematic Reviews

https://doi.org/10.1007/978-3-031-28244-7_35

Lee, Eric W.; Ho, Joyce C. (April 2023, Advances in Information Retrieval: 45th European Conference on Information Retrieval)

Systematic reviews (SRs) are a crucial component of evidence-based clinical practice. Unfortunately, SRs are labor-intensive and unscalable with the exponential growth in literature. Automating evidence synthesis using machine learning models has been proposed but solely focuses on the text and ignores additional features like citation information. Recent work demonstrated that citation embeddings can outperform the text itself, suggesting that better network representation may expedite SRs. Yet, how to utilize the rich information in heterogeneous information networks (HIN) for network embeddings is understudied. Existing HIN models fail to produce a high-quality embedding compared to simply running state-of-the-art homogeneous network models. To address existing HIN model limitations, we propose SR-CoMbEr, a community-based multi-view graph convolutional network for learning better embeddings for evidence synthesis. Our model automatically discovers article communities to learn robust embeddings that simultaneously encapsulate the rich semantics in HINs. We demonstrate the effectiveness of our model to automate 15 SRs.
more » « less
Full Text Available
Echo of Neighbors: Privacy Amplification for Personalized Private Federated Learning with Shuffle Model

Liu, Yixuan; Zhao, Suyun; Xiong, Li; Liu, Yuhan; Chen, Hong (April 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Full Text Available
Neighborhood-Regularized Self-Training for Learning with Few Labels

https://doi.org/10.1609/aaai.v37i9.26260

Xu, Ran; Yu, Yue; Cui, Hejie; Kan, Xuan; Zhu, Yanqiao; Ho, Joyce; Zhang, Chao; Yang, Carl (January 2023, Thirty-Seventh AAAI Conference on Artificial Intelligence)

Full Text Available
PubMed-OA-Extraction-dataset

https://doi.org/10.5281/zenodo.6330817

Sheng, Jiasheng (January 2022, Zenodo)

This is the train-test-validation dataset for pubmed open-access articles keyphrase extraction task. The small_* file contains the all articles that have 5to 25 extractive keyphrases (keyphrase in the article that is inside the abstract of the article).</p>
more » « less
GDA-AM: On the Effectiveness of Solving Min-imax Optimization Via Anderson Mixing

He, Huan; Zhao, Shifan; Xi, Yuanzhe; Ho, Joyce (January 2022, International Conference on Learning Representations)

Full Text Available
Counterfactual and Factual Reasoning over Hypergraphs for Interpretable Clinical Predictions on EHR

Xu, Ran; Yu, Yue; Zhang, Chao; Ali, Mohammed K; Ho, Joyce C; Yang, Carl (January 2022, Proceedings of the 2nd Machine Learning for Health symposium)

Electronic Health Record modeling is crucial for digital medicine. However, existing models ignore higher-order interactions among medical codes and their causal relations towards downstream clinical predictions. To address such limitations, we propose a novel framework CACHE, to provide effective and insightful clinical predictions based on hypergraph representation learning and counterfactual and factual reasoning techniques. Experiments on two real EHR datasets show the superior performance of CACHE. Case studies with a domain expert illustrate a preferred capability of CACHE in generating clinically meaningful interpretations towards the correct predictions.
more » « less
Full Text Available
Communication Efficient Tensor Factorization for Decentralized Healthcare Networks

https://doi.org/10.1109/ICDM51629.2021.00147

Ma, Jing; Zhang, Qiuchen; Lou, Jian; Xiong, Li; Bhavani, Sivasubramanium; Ho, Joyce C. (December 2021, 2021 IEEE International Conference on Data Mining (ICDM))

Full Text Available
Projected federated averaging with heterogeneous differential privacy

https://doi.org/10.14778/3503585.3503592

Liu, Junxu; Lou, Jian; Xiong, Li; Liu, Jinfei; Meng, Xiaofeng (December 2021, Proceedings of the VLDB Endowment)

Federated Learning (FL) is a promising framework for multiple clients to learn a joint model without directly sharing the data. In addition to high utility of the joint model, rigorous privacy protection of the data and communication efficiency are important design goals. Many existing efforts achieve rigorous privacy by ensuring differential privacy for intermediate model parameters, however, they assume a uniform privacy parameter for all the clients. In practice, different clients may have different privacy requirements due to varying policies or preferences. In this paper, we focus on explicitly modeling and leveraging the heterogeneous privacy requirements of different clients and study how to optimize utility for the joint model while minimizing communication cost. As differentially private perturbations affect the model utility, a natural idea is to make better use of information submitted by the clients with higher privacy budgets (referred to as "public" clients, and the opposite as "private" clients). The challenge is how to use such information without biasing the joint model. We propose P rojected F ederated A veraging (PFA), which extracts the top singular subspace of the model updates submitted by "public" clients and utilizes them to project the model updates of "private" clients before aggregating them. We then propose communication-efficient PFA+, which allows "private" clients to upload projected model updates instead of original ones. Our experiments verify the utility boost of both algorithms compared to the baseline methods, whereby PFA+ achieves over 99% uplink communication reduction for "private" clients.
more » « less
Full Text Available
Temporal Network Embedding via Tensor Factorization

https://doi.org/10.1145/3459637.3482200

Ma, Jing; Zhang, Qiuchen; Lou, Jian; Xiong, Li; Ho, Joyce C. (October 2021, Proceedings of the 30th ACM International Conference on Information & Knowledge Management)

Full Text Available

« Prev Next »

Search for: All records