Search for: All records

Creators/Authors contains: "Huang, Jiaxin"

« Prev Next »

Total Resources

10

Resource Type
Conference Paper

9

Conference Proceeding

0

Dataset

0

Journal Article

1

Workshop Report

0

Availability
Full Text / Resource Available

10

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FineSum: Target-Oriented, Fine-Grained Opinion Summarization

https://doi.org/10.1145/3539597.3570397

Ge, Suyu ; Huang, Jiaxin ; Meng, Yu ; Han, Jiawei ( February 2023 , ACM)
Proc. 2023 ACM Int. Conf. on Web Search and Data Mining (Ed.)
Target-oriented opinion summarization is to profile a target by extracting user opinions from multiple related documents. Instead of simply mining opinion ratings on a target (e.g., a restaurant) or on multiple aspects (e.g., food, service) of a target, it is desirable to go deeper, to mine opinion on fine-grained sub-aspects (e.g., fish). However, it is expensive to obtain high-quality annotations at such fine-grained scale. This leads to our proposal of a new framework, FineSum, which advances the frontier of opinion analysis in three aspects: (1) minimal supervision, where no document-summary pairs are provided, only aspect names and a few aspect/sentiment keywords are available; (2) fine-grained opinion analysis, where sentiment analysis drills down to a specific subject or characteristic within each general aspect; and (3) phrase-based summarization, where short phrases are taken as basic units for summarization, and semantically coherent phrases are gathered to improve the consistency and comprehensiveness of summary. Given a large corpus with no annotation, FineSum first automatically identifies potential spans of opinion phrases, and further reduces the noise in identification results using aspect and sentiment classifiers. It then constructs multiple fine-grained opinion clusters under each aspect and sentiment. Each cluster expresses uniform opinions towards certain sub-aspects (e.g., “fish” in “food” aspect) or characteristics (e.g., “Mexican” in “food” aspect). To accomplish this, we train a spherical word embedding space to explicitly represent different aspects and sentiments. We then distill the knowledge from embedding to a contextualized phrase classifier, and perform clustering using the contextualized opinion-aware phrase embedding. Both automatic evaluations on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
more » « less
Full Text Available
Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

https://doi.org/10.1145/3534678.3539443

Huang, Jiaxin ; Meng, Yu ; Han, Jiawei ( August 2022 , KDD'22:The 28th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining, August 14-18, 2021)

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type. Recently, prompt-based tuning has demonstrated superior performance to standard fine-tuning in few-shot scenarios by formulating the entity type classification task as a “fill-in-the-blank” problem. This allows effective utilization of the strong language modeling capability of Pre-trained Language Models (PLMs). Despite the success of current prompt-based tuning approaches, two major challenges remain: (1) the verbalizer in prompts is either manually designed or constructed from external knowledge bases, without considering the target corpus and label hierarchy information, and (2) current approaches mainly utilize the representation power of PLMs, but have not explored their generation power acquired through extensive general-domain pre-training. In this work, we propose a novel framework for fewshot FET consisting of two modules: (1) an entity type label interpretation module automatically learns to relate type labels to the vocabulary by jointly leveraging few-shot instances and the label hierarchy, and (2) a type-based contextualized instance generator produces new instances based on given instances to enlarge the training set for better generalization. On three benchmark datasets, our model outperforms existing methods by significant margins.
more » « less
Full Text Available
On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining

https://doi.org/10.1145/3447548.3470810

Meng, Yu ; Huang, Jiaxin ; Zhang, Yu ; Han, Jiawei ( August 2021 , KDD'21:The 27th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining, August 14-18, 2021)
null (Ed.)
Recent years have witnessed the enormous success of text representation learning in a wide range of text mining tasks. Earlier word embedding learning approaches represent words as fixed low-dimensional vectors to capture their semantics. The word embeddings so learned are used as the input features of task-specific models. Recently, pre-trained language models (PLMs), which learn universal language representations via pre-training Transformer-based neural models on large-scale text corpora, have revolutionized the natural language processing (NLP) field. Such pre-trained representations encode generic linguistic features that can be transferred to almost any text-related applications. PLMs outperform previous task-specific models in many applications as they only need to be fine-tuned on the target corpus instead of being trained from scratch. In this tutorial, we introduce recent advances in pre-trained text embeddings and language models, as well as their applications to a wide range of text mining tasks. Specifically, we first overview a set of recently developed self-supervised and weakly-supervised text embedding methods and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several new methods based on pre-trained text embeddings and language models for various text mining applications such as topic discovery and text classification. We focus on methods that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge from large-scale text corpora. Finally, we demonstrate with real world datasets how pre-trained text representations help mitigate the human annotation burden and facilitate automatic, accurate and efficient text analyses.
more » « less
Full Text Available
Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis

https://doi.org/10.1145/3394486.3406483

Meng, Yu ; Huang, Jiaxin ; Han, Jiawei ( August 2020 , KDD:20 The 26th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

https://doi.org/10.1145/3394486.3403244

Huang, Jiaxin ; Xie, Yiqing ; Meng, Yu ; Zhang, Yunyi ; Han, Jiawei ( July 2020 , KDD:20 The 26th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

https://doi.org/10.1145/3394486.3403242

Meng, Yu ; Zhang, Yunyi ; Huang, Jiaxin ; Zhang, Yu ; Zhang, Chao ; Han, Jiawei ( July 2020 , KDD:20 The 26th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts

https://doi.org/10.3389/fdata.2020.00009

Meng, Yu ; Huang, Jiaxin ; Wang, Guangyuan ; Wang, Zihan ; Zhang, Chao ; Han, Jiawei ( March 2020 , Frontiers in Big Data)
null (Ed.)
Full Text Available
Discriminative Topic Mining via Category-Name Guided Text Embedding

https://doi.org/10.1145/3366423.3380278

Meng, Yu ; Huang, Jiaxin ; Wang, Guangyuan ; Wang, Zihan ; Zhang, Chao ; Zhang, Yu ; Han, Jiawei ( April 2020 , WWW '20: The Web Conference 2020)

Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users’ particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines highquality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
more » « less
Full Text Available
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

https://doi.org/10.18653/v1/2020.emnlp-main.568

Huang, Jiaxin ; Meng, Yu ; Guo, Fang ; Ji, Heng ; Han, Jiawei ( January 2020 , EMNLP'20: 2020 Conf. on Empirical Methods in Natural Language Processing, Nov. 2020)
null (Ed.)
Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner. It has in general two sub-tasks: (i) extracting aspects from each review, and (ii) classifying aspect-based reviews by sentiment polarity. In this pa-per, we propose a weakly-supervised approach for aspect-based sentiment analysis, which uses only a few keywords describing each aspect/sentiment without using any labeled examples. Existing methods are either designed only for one of the sub-tasks, neglecting the benefit of coupling both, or are based on topic models that may contain overlapping concepts. We propose to first learn sentiment, aspectjoint topic embeddings in the word embedding space by imposing regularizations to encourage topic distinctiveness, and then use neural models to generalize the word-level discriminative information by pre-training the classifiers with embedding-based predictions and self-training them on unlabeled data. Our comprehensive performance analysis shows that our method generates quality joint topics and outperforms the baselines significantly (7.4%and 5.1% F1-score gain on average for aspect and sentiment classification respectively) on benchmark datasets.
more » « less
Full Text Available
Text Classification Using Label Names Only: A Language Model Self-Training Approach

https://doi.org/10.18653/v1/2020.emnlp-main.724

Meng, Yu ; Zhang, Yunyi ; Huang, Jiaxin ; Xiong, Chenyan ; Ji, Heng ; Zhang, Chao ; Han, Jiawei ( January 2020 , EMNLP'20: 2020 Conf. on Empirical Methods in Natural Language Processing, Nov. 2020)
null (Ed.)
Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Hu-mans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on un-labeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% ac-curacy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name1.
more » « less
Full Text Available