The automated construction of topic taxonomies can benefit numerous
applications, including web search, recommendation, and
knowledge discovery. One of the major advantages of automatic
taxonomy construction is the ability to capture corpus-specific
information and adapt to different scenarios. To better reflect the
characteristics of a corpus, we take the meta-data of documents into
consideration and view the corpus as a text-rich network. In this
paper, we propose NetTaxo, a novel automatic topic taxonomy construction
framework, which goes beyond the existing paradigm and
allows text data to collaborate with network structure. Specifically,
we learn term embeddings from both text and network as contexts.
Network motifs are adopted to capture appropriate network
contexts. We conduct an instance-level selection for motifs, which
further refines term embedding according to the granularity and
semantics of each taxonomy node. Clustering is then applied to obtain
sub-topics under a taxonomy node. Extensive experiments on
two real-world datasets demonstrate the superiority of our method
over the state-of-the-art, and further verify the effectiveness and
importance of instance-level motif selection.
more »
« less
HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion
Taxonomies are of great value to many knowledge-rich applications.
As the manual taxonomy curation costs enormous human
effects, automatic taxonomy construction is in great demand. However,
most existing automatic taxonomy construction methods can
only build hypernymy taxonomies wherein each edge is limited
to expressing the “is-a” relation. Such a restriction limits their applicability
to more diverse real-world tasks where the parent-child
may carry different relations. In this paper, we aim to construct
a task-guided taxonomy from a domain-specific corpus, and allow
users to input a “seed” taxonomy, serving as the task guidance. We
propose an expansion-based taxonomy construction framework,
namely HiExpan, which automatically generates key term list from
the corpus and iteratively grows the seed taxonomy. Specifically,
HiExpan views all children under each taxonomy node forming a
coherent set and builds the taxonomy by recursively expanding all
these sets. Furthermore, HiExpan incorporates a weakly-supervised
relation extraction module to extract the initial children of a newly expanded
node and adjusts the taxonomy tree by optimizing its
global structure. Our experiments on three real datasets from different
domains demonstrate the effectiveness of HiExpan for building
task-guided taxonomies.
more »
« less
- PAR ID:
- 10079172
- Date Published:
- Journal Name:
- Proceedings of the 24th {ACM} {SIGKDD} International Conference on Knowledge Discovery {\&} Data Mining, {KDD} 2018
- Volume:
- 2018
- Issue:
- 1
- Page Range / eLocation ID:
- 2180 to 2189
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Automatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering. Existing taxonomy expansion or completion methods assume that new concepts have been accurately extracted and their embedding vectors learned from the text corpus. However, one critical and fundamental challenge in fixing the incompleteness of taxonomies is the incompleteness of the extracted concepts, especially for those whose names have multiple words and consequently low frequency in the corpus. To resolve the limitations of extraction-based methods, we propose GenTaxo to enhance taxonomy completion by identifying positions in existing taxonomies that need new concepts and then generating appropriate concept names. Instead of relying on the corpus for concept embeddings, GenTaxo learns the contextual embeddings from their surrounding graph-based and language-based relational information, and leverages the corpus for pre-training a concept name generator. Experimental results demonstrate that GenTaxo improves the completeness of taxonomies over existing methods.more » « less
-
Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.more » « less
-
null (Ed.)Network embedding aims at transferring node proximity in networks into distributed vectors, which can be leveraged in various downstream applications. Recent research has shown that nodes in a network can often be organized in latent hierarchical structures, but without a particular underlying taxonomy, the learned node embedding is less useful nor interpretable. In this work, we aim to improve network embedding by modeling the conditional node proximity in networks indicated by node labels residing in real taxonomies. In the meantime, we also aim to model the hierarchical label proximity in the given taxonomies, which is too coarse by solely looking at the hierarchical topologies. To this end, we propose TAXOGAN to co-embed network nodes and hierarchical labels, through a hierarchical network generation process. Particularly, TAXOGAN models the child labels and network nodes of each parent label in an individual embedding space while learning to transfer network proximity among the spaces of hierarchical labels through stacked network generators and embedding encoders. To enable robust and efficient model inference, we further develop a hierarchical adversarial training process. Comprehensive experiments and case studies on four real-world datasets of networks with hierarchical labels demonstrate the utility of TAXOGAN in improving network embedding on traditional tasks of node classification and link prediction, as well as novel tasks like conditional proximity search and fine-grained taxonomy layout.more » « less
-
Taxonomies serve many applications with a structural representation of knowledge. To incorporate emerging concepts into existing taxonomies, the task of taxonomy completion aims to find suitable positions for emerging query concepts. Previous work captured homogeneous token-level interactions inside a concatenation of the query concept term and definition using pre-trained language mod- els. However, they ignored the token-level interactions between the term and definition of the query concepts and their related concepts. In this work, we propose to capture heterogeneous token-level interactions between the different textual components of concepts that have different types of relations. We design a relation-aware mutual attention module (RAMA) to learn such interactions for taxonomy completion. Experimental results demonstrate that our new taxonomy completion framework based on RAMA achieves the state-of-the-art performance on six taxonomy datasets.more » « less