skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery
Graphs and texts are two key modalities in data mining. In many cases, the data presents a mixture of the two modalities and the information is often complementary: in e-commerce data, the product-user graph and product descriptions capture different aspects of product features; in scientific literature, the citation graph, author metadata, and the paper content all contribute to modeling the paper impact.  more » « less
Award ID(s):
2118329
PAR ID:
10543138
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400703713
Page Range / eLocation ID:
1122 to 1125
Format(s):
Medium: X
Location:
Merida Mexico
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Understanding the impact of engineering design on product competitions is imperative for product designers to better address customer needs and develop more competitive products. In this paper, we propose a dynamic network-based approach for modeling and analyzing the evolution of product competitions using multi-year buyer survey data. The product co-consideration network, formed based on the likelihood of two products being co-considered from survey data, is treated as a proxy of products’ competition relations in a market. The separate temporal exponential random graph model (STERGM) is employed as the dynamic network modeling technique to model the evolution of network as two separate processes: link formation and link dissolution. We use China’s automotive market as a case study to illustrate the implementation of the proposed approach and the benefits of dynamic network models compared to the static network modeling approach based on an exponential random graph model (ERGM). The results show that since STERGM takes preexisting competition relations into account, it provides a pathway to gain insights into why a product may maintain or lose its competitiveness over time. These driving factors include both product attributes (e.g., fuel consumption) as well as current market structures (e.g., the centralization effect). With the proposed dynamic network-based approach, the insights gained from this paper can help designers better interpret the temporal changes of product competition relations to support product design decisions. 
    more » « less
  2. Graph neural networks, a powerful deep learning tool to model graph-structured data, have demonstrated remarkable performance on numerous graph learning tasks. To address the data noise and data scarcity issues in deep graph learning, the research on graph data augmentation has intensified lately. However, conventional data augmentation methods can hardly handle graph-structured data which is defined in non-Euclidean space with multi-modality. In this survey, we formally formulate the problem of graph data augmentation and further review the representative techniques and their applications in different deep graph learning problems. Specifically, we first propose a taxonomy for graph data augmentation techniques and then provide a structured review by categorizing the related work based on the augmented information modalities. Moreover, we summarize the applications of graph data augmentation in two representative problems in data-centric deep graph learning: (1) reliable graph learning which focuses on enhancing the utility of input graph as well as the model capacity via graph data augmentation; and (2) low-resource graph learning which targets on enlarging the labeled training data scale through graph data augmentation. For each problem, we also provide a hierarchical problem taxonomy and review the existing literature related to graph data augmentation. Finally, we point out promising research directions and the challenges in future research. 
    more » « less
  3. Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both spoken language (e.g., human textual annotations) and unspoken language (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken languages, we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants. 
    more » « less
  4. One primary focus in multimodal feature extraction is to find the representations of individual modalities that are maximally correlated. As a well-known measure of dependence, the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation be-´ comes an appealing objective because of its operational meaning and desirable properties. However, the strict whitening constraints formalized in the HGR maximal correlation limit its application. To address this problem, this paper proposes Soft-HGR, a novel framework to extract informative features from multiple data modalities. Specifically, our framework prevents the “hard” whitening constraints, while simultaneously preserving the same feature geometry as in the HGR maximal correlation. The objective of Soft-HGR is straightforward, only involving two inner products, which guarantees the efficiency and stability in optimization. We further generalize the framework to handle more than two modalities and missing modalities. When labels are partially available, we enhance the discriminative power of the feature representations by making a semi-supervised adaptation. Empirical evaluation implies that our approach learns more informative feature mappings and is more efficient to optimize. 
    more » « less
  5. Human-centric situational awareness and visualization are needed for analyzing the big data in an efficient way. One of the challenges is to create an algorithm to analyze the given data without any help of other data analyzing tools. This research effort aims to identify how graphical objects (such as data-shapes) developed in accordance with an analyst's mental model can enhance analyst's situation awareness. Our approach for improved big data visualization is two-fold, focusing on both visualization and interaction. This paper presents the developed data and graph technique based on forcedirected model graph in 3D. It is developed using Unity 3D gaming engine. Pilot testing was done with different data sets for checking the efficiency of the system in immersive environment and non-immersive environment. The application is able to handle the data successfully for the given data sets in data visualization. The currently graph can render around 200 to 300 linked nodes in real-time. 
    more » « less