Search for: All records

Award ID contains: 1633370

« Prev Next »

Total Resources

31

Resource Type
Conference Paper

22

Conference Proceeding

0

Dataset

0

Journal Article

9

Workshop Report

0

Availability
Full Text / Resource Available

31

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification

https://doi.org/10.1186/s12911-022-01829-2

Li, Xuedong ; Yuan, Walter ; Peng, Dezhong ; Mei, Qiaozhu ; Wang, Yue ( April 2022 , BMC Medical Informatics and Decision Making)

Abstract Background
Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.
Method
We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged$$F_{1}$$ $F_{1}$ score.
Results
On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features.
Conclusion
As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.

more » « less
Systematic Analysis of Fine-Grained Mobility Prediction With On-Device Contextual Data

https://doi.org/10.1109/TMC.2020.3015921

Li, Huoran ; Lin, Fuqi ; Lu, Xuan ; Xu, Chenren ; Huang, Gang ; Zhang, Jun ; Mei, Qiaozhu ; Liu, Xuanzhe ( March 2022 , IEEE Transactions on Mobile Computing)

Full Text Available
Emojis predict dropouts of remote workers: An empirical study of emoji usage on GitHub

https://doi.org/10.1371/journal.pone.0261262

Lu, Xuan ; Ai, Wei ; Chen, Zhenpeng ; Cao, Yanbin ; Mei, Qiaozhu ( January 2022 , PLOS ONE)
Danforth, Christopher M. (Ed.)
Emotions at work have long been identified as critical signals of work motivations, status, and attitudes, and as predictors of various work-related outcomes. When more and more employees work remotely, these emotional signals of workers become harder to observe through daily, face-to-face communications. The use of online platforms to communicate and collaborate at work provides an alternative channel to monitor the emotions of workers. This paper studies how emojis, as non-verbal cues in online communications, can be used for such purposes and how the emotional signals in emoji usage can be used to predict future behavior of workers. In particular, we present how the developers on GitHub use emojis in their work-related activities. We show that developers have diverse patterns of emoji usage, which can be related to their working status including activity levels, types of work, types of communications, time management, and other behavioral patterns. Developers who use emojis in their posts are significantly less likely to dropout from the online work platform. Surprisingly, solely using emoji usage as features, standard machine learning models can predict future dropouts of developers at a satisfactory accuracy. Features related to the general use and the emotions of emojis appear to be important factors, while they do not rule out paths through other purposes of emoji use.
more » « less
Full Text Available
Fast Learning of MNL Model from General Partial Rankings with Application to Network Formation Modeling

https://doi.org/10.1145/3488560.3498506

Ma, Jiaqi ; Zhang, Xingjian ; Mei, Qiaozhu ( January 2022 , Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining)

Full Text Available
Adversarial Attack on Graph Neural Networks as An Influence Maximization Problem

https://doi.org/10.1145/3488560.3498497

Ma, Jiaqi ; Deng, Junwei ; Mei, Qiaozhu ( January 2022 , Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining)

Full Text Available
Adapting Pre-trained Language Models to Low-Resource Text Simplification: The Path Matters

Garbacea, Cristina ; Mei, Qiaozhu ( January 2022 , The 1st Conference on Lifelong Learning Agents)

Full Text Available
How Much of the Chemical Space Has Been Explored? Selecting the Right Exploration Measure for Drug Discovery

Xie, Yutong ; Xu, Ziqiao ; Ma, Jiaqi ; Mei, Qiaozhu ( January 2022 , ICML 2022 2nd AI for Science Workshop)

Full Text Available
Graph Learning Indexer: A Contributor-Friendly and Metadata-Rich Platform for Graph Learning Benchmarks

Ma, Jiaqi ; Zhang, Xingjian ; Fan, Hezheng ; Huang, Jin ; Li, Tianyue ; Li, Ting-Wei ; Tu, Yiwen ; Zhu, Chenshu ; Mei, Qiaozhu ( January 2022 , Proceedings of the First Learning on Graphs Conference (LoG 2022))

Establishing open and general benchmarks has been a critical driving force behind the success of modern machine learning techniques. As machine learning is being applied to broader domains and tasks, there is a need to establish richer and more diverse benchmarks to better reflect the reality of the application scenarios. Graph learning is an emerging field of machine learning that urgently needs more and better benchmarks. To accommodate the need, we introduce Graph Learning Indexer (GLI), a benchmark curation platform for graph learning. In comparison to existing graph learning benchmark libraries, GLI highlights two novel design objectives. First, GLI is designed to incentivize dataset contributors. In particular, we incorporate various measures to minimize the effort of contributing and maintaining a dataset, increase the usability of the contributed dataset, as well as encourage attributions to different contributors of the dataset. Second, GLI is designed to curate a knowledge base, instead of a plain collection, of benchmark datasets. We use multiple sources of meta information to augment the benchmark datasets with rich characteristics, so that they can be easily selected and used in downstream research or development. The source code of GLI is available at https://github.com/Graph-Learning-Benchmarks/gli.
more » « less
Full Text Available
Operating Systems for Resource-adaptive Intelligent Software: Challenges and Opportunities

https://doi.org/10.1145/3425866

Liu, Xuanzhe ; Wang, Shangguang ; Ma, Yun ; Zhang, Ying ; Mei, Qiaozhu ; Liu, Yunxin ; Huang, Gang ( March 2021 , ACM Transactions on Internet Technology)

The past decades witnessed the fast and wide deployment of Internet. The Internet has bred the ubiquitous computing environment that is spanning the cloud, edge, mobile devices, and IoT. Software running over such a ubiquitous computing environment environment is eating the world. A recently emerging trend of Internet-based software systems is “ resource adaptive ,” i.e., software systems should be robust and intelligent enough to the changes of heterogeneous resources, both physical and logical, provided by their running environment. To keep pace of such a trend, we argue that some considerations should be taken into account for the future operating system design and implementation. From the structural perspective, rather than the “monolithic OS” that manages the aggregated resources on the single machine, the OS should be dynamically composed over the distributed resources and flexibly adapt to the resource and environment changes. Meanwhile, the OS should leverage advanced machine/deep learning techniques to derive configurations and policies and automatically learn to tune itself and schedule resources. This article envisions our recent thinking of the new OS abstraction, namely, ServiceOS , for future resource-adaptive intelligent software systems. The idea of ServiceOS is inspired by the delivery model of “ Software-as-a-Service ” that is supported by the Service-Oriented Architecture (SOA). The key principle of ServiceOS is based on resource disaggregation, resource provisioning as a service, and learning-based resource scheduling and allocation. The major goal of this article is not providing an immediately deployable OS. Instead, we aim to summarize the challenges and potentially promising opportunities and try to provide some practical implications for researchers and practitioners.
more » « less
Full Text Available
Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification

https://doi.org/10.18653/v1/2021.acl-long.88

Garbacea, Cristina ; Guo, Mengtian ; Carton, Samuel ; Mei, Qiaozhu ( January 2021 , Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing)

Full Text Available

« Prev Next »