skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Empirical Study of Pre-trained Transformers for Arabic Information Extraction
Multilingual pre-trained Transformers, such as mBERT (Devlin et al., 2019) and XLM-RoBERTa (Conneau et al., 2020a), have been shown to enable effective cross-lingual zero-shot transfer. However, their performance on Arabic information extraction (IE) tasks is not very well studied. In this paper, we pre-train a customized bilingual BERT, dubbed GigaBERT, that is designed specifically for Arabic NLP and English-to-Arabic zero-shot transfer learning. We study GigaBERT’s effectiveness on zero-short transfer across four IE tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Our best model significantly outperforms mBERT, XLM-RoBERTa, and AraBERT (Antoun et al., 2020) in both the supervised and zero-shot transfer settings. We have made our pre-trained models publicly available at: https://github.com/lanwuwei/GigaBERT.  more » « less
Award ID(s):
2052498
PAR ID:
10243512
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Page Range / eLocation ID:
4727 to 4734
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models. 
    more » « less
  2. Transfer learning using ImageNet pre-trained models has been the de facto approach in a wide range of computer vision tasks. However, fine-tuning still requires task-specific training data. In this paper, we propose N3 (Neural Networks from Natural Language) - a new paradigm of synthesizing task-specific neural networks from language descriptions and a generic pre-trained model. N3 leverages language descriptions to generate parameter adaptations as well as a new task-specific classification layer for a pre-trained neural network, effectively “fine-tuning” the network for a new task using only language descriptions as input. To the best of our knowledge, N3 is the first method to synthesize entire neural networks from natural language. Experimental results show that N3 can out-perform previous natural-language based zero-shot learning methods across 4 different zero-shot image classification benchmarks. We also demonstrate a simple method to help identify keywords in language descriptions leveraged by N3 when synthesizing model parameters. 
    more » « less
  3. ABSTRACT Automated scoring is a current hot topic in creativity research. However, most research has focused on the English language and popular verbal creative thinking tasks, such as the alternate uses task. Therefore, in this study, we present a large language model approach for automated scoring of a scientific creative thinking task that assesses divergent ideation in experimental tasks in the German language. Participants are required to generate alternative explanations for an empirical observation. This work analyzed a total of 13,423 unique responses. To predict human ratings of originality, we used XLM‐RoBERTa (Cross‐lingual Language Model‐RoBERTa), a large, multilingual model. The prediction model was trained on 9,400 responses. Results showed a strong correlation between model predictions and human ratings in a held‐out test set (n = 2,682;r = 0.80; CI‐95% [0.79, 0.81]). These promising findings underscore the potential of large language models for automated scoring of scientific creative thinking in the German language. We encourage researchers to further investigate automated scoring of other domain‐specific creative thinking tasks. 
    more » « less
  4. Peng, Hanchuan (Ed.)
    Motivation:Deep learning models have achieved remarkable success in a wide range of natural-world tasks, such as vision, language, and speech recognition. These accomplishments are largely attributed to the availability of open-source large-scale datasets. More importantly, pre-trained foundational model learnings exhibit a surprising degree of transferability to downstream tasks, enabling efficient learning even with limited training examples. However, the application of such natural-domain models to the domain of tiny Cryo-Electron Tomography (Cryo-ET) images has been a relatively unexplored frontier. This research is motivated by the intuition that 3D Cryo-ET voxel data can be conceptually viewed as a sequence of progressively evolving video frames. Results: Leveraging the above insight, we propose a novel approach that involves the utilization of 3D models pre-trained on large-scale video datasets to enhance Cryo-ET subtomogram classification. Our experiments, conducted on both simulated and real Cryo-ET datasets, reveal compelling results. The use of video initialization not only demonstrates improvements in classification accuracy but also substantially reduces training costs. Further analyses provide additional evidence of the value of video initialization in enhancing subtomogram feature extraction. Additionally, we observe that video initialization yields similar positive effects when applied to medical 3D classification tasks, underscoring the potential of cross-domain knowledge transfer from video-based models to advance the state-of-the-art in a wide range of biological and medical data types.Availability and implementation: https://github.com/xulabs/aitom. 
    more » « less
  5. Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC. Motivated by the close connection between ReC and CLIP’s contrastive pre-training objective, the first component of ReCLIP is a region-scoring method that isolates object proposals via cropping and blurring, and passes them to CLIP. However, through controlled experiments on a synthetic dataset, we find that CLIP is largely incapable of performing spatial reasoning off-the-shelf. We reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg, and on RefGTA (video game imagery), ReCLIP’s relative improvement over supervised ReC models trained on real images is 8%. 
    more » « less