skip to main content

Search for: All records

Creators/Authors contains: "Jiang, Haoming"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets. These data often ex- hibit complicated short-term and long-term temporal dependencies. However, most of the ex- isting recurrent neural network-based point process models fail to capture such dependencies, and yield unreliable prediction performance. To address this issue, we propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long- term dependencies and meanwhile enjoys computational efficiency. Numerical experiments on various datasets show that THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. Moreover, THP is quite general and can incorpo- rate additional structural knowledge. We provide a concrete example, where THP achieves im- proved prediction performance for learning multiple point processes when incorporating their relational information.
  2. Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning of- ten causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The pro- posed framework contains two important in- gredients: 1. Smoothness-inducing regulariza- tion, which effectively manages the complex- ity of the model; 2. Bregman proximal point optimization, which is an instance of trust- region methods and can prevent aggressive up- dating. Our experiments show that the pro- posed framework achieves new state-of-the-art performance on a number of NLP tasks includ- ing GLUE, SNLI, SciTail and ANLI. More- over, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE.
  3. We study the open-domain named entity recognition (NER) prob- lem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly in- complete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework – BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algo- rithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 bench- mark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in