skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction
Open-domain Keyphrase extraction (KPE) on the Web is a fundamental yet complex NLP task with a wide range of practical applications within the field of Information Retrieval. In contrast to other document types, web page designs are intended for easy navigation and information finding. Effective designs encode within the layout and formatting signals that point to where the important information can be found. In this work, we propose a modeling approach that leverages these multi-modal signals to aid in the KPE task. In particular, we leverage both lexical and visual features (e.g., size, font, position) at the micro-level to enable effective strategy induction, and metalevel features that describe pages at a macrolevel to aid in strategy selection. Our evaluation demonstrates that a combination of effective strategy induction and strategy selection within this approach for the KPE task outperforms state-of-the-art models. A qualitative post-hoc analysis illustrates how these features function within the model.  more » « less
Award ID(s):
1917955 1822831
PAR ID:
10295169
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction
Page Range / eLocation ID:
1790 to 1800
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In February 2021, Google Search added a new interface feature to support the evaluation of web domains, known as the “About this result” feature. A prominent part of this feature is a snippet of text pulled automatically from Wikipedia, if a Wiki page for the web domain exists. While conducting large-scale audits of Google Search, we discovered that less than 40% of web domains shown in Google Search results contain a Wikipedia page. Then, we retrieved their Wikidata entries and looked at the extent they incorporate features related to W3C credibility signals. The lack of information for many signals points out to avenues for expanding Wikidata coverage. 
    more » « less
  2. The rapid evolution of Graph Neural Networks (GNNs) has led to a growing number of new architectures as well as novel applications. However, current research focuses on proposing and evaluating specific architectural designs of GNNs, such as GCN, GIN, or GAT, as opposed to studying the more general design space of GNNs that consists of a Cartesian product of different design dimensions, such as the number of layers or the type of the aggregation function. Additionally, GNN designs are often specialized to a single task, yet few efforts have been made to understand how to quickly find the best GNN design for a novel task or a novel dataset. Here we define and systematically study the architectural design space for GNNs which consists of 315,000 different designs over 32 different predictive tasks. Our approach features three key innovations: (1) A general GNN design space; (2) a GNN task space with a similarity metric, so that for a given novel task/dataset, we can quickly identify/transfer the best performing architecture; (3) an efficient and effective design space evaluation method which allows insights to be distilled from a huge number of model-task combinations. Our key results include: (1) A comprehensive set of guidelines for designing well-performing GNNs; (2) while best GNN designs for different tasks vary significantly, the GNN task space allows for transferring the best designs across different tasks; (3) models discovered using our design space achieve state-of-the-art performance. Overall, our work offers a principled and scalable approach to transition from studying individual GNN designs for specific tasks, to systematically studying the GNN design space and the task space. Finally, we release GraphGym, a powerful platform for exploring different GNN designs and tasks. GraphGym features modularized GNN implementation, standardized GNN evaluation, and reproducible and scalable experiment management 
    more » « less
  3. In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach. 
    more » « less
  4. Wind turbine control via concurrent yaw misalignment and axial induction control has demonstrated potential for improving wind farm power output and mitigating structural loads. However, the complex aerodynamic interplay between these two effects requires deeper investigation. This study presents a modified blade element momentum (BEM) model that matches rotor-averaged quantities to an actuator disk model of yawed rotor induction, enabling analysis of joint yaw-induction control using realistic turbine control inputs. The BEM approach reveals that common torque control strategies such as K−Ω^2 exhibit sub-optimal performance under yawed conditions. Notably, the power-yaw and thrust-yaw sensitivities vary significantly depending on the chosen control strategy, contrary to common modeling assumptions. In the context of wind farm control, employing induction control which minimizes the thrust coefficient proves most effective at reducing wake strength for a given power output across all yaw angles. Results indicate that while yaw control deflects wakes effectively, induction control more directly influences wake velocity magnitude, underscoring their complementary effects. This study advances a fundamental understanding of turbine aerodynamic responses in yawed operation and sets the stage for modeling joint yaw and induction control in wind farms. 
    more » « less
  5. Through human-aided dispersal over the last ~ 10,000 years, house mice (Mus musculus) have recently colonized diverse habitats across the globe, promoting the emergence of new traits that confer adaptive advantages in distinct environments. Despite their status as the premier mammalian model system, the impact of this demographic and selective history on the global patterning of disease-relevant trait variation in wild mouse populations is poorly understood. Here, we leveraged 154 whole-genome sequences from diverse wild house mouse populations to survey the geographic organization of functional variation and systematically identify signals of positive selection. We show that a significant proportion of wild mouse variation is private to single populations, including numerous predicted functional alleles. In addition, we report strong signals of positive selection at many genes associated with both complex and Mendelian diseases in humans. Notably, we detect a significant excess of selection signals at disease-associated genes relative to null expectations, pointing to the important role of adaptation in shaping the landscape of functional variation in wild mouse populations. We also uncover strong signals of selection at multiple genes involved in starch digestion, including Mgam and Amy1. We speculate that the successful emergence of the human-mouse commensalism may have been facilitated, in part, by dietary adaptations at these loci. Finally, our work uncovers multiple cryptic structural variants that manifest as putative signals of positive selection, highlighting an important and under-appreciated source of false-positive signals in genome-wide selection scans. Overall, our findings highlight the role of adaptation in shaping wild mouse genetic variation at human disease-associated genes. Our work also highlights the biomedical relevance of wild mouse genetic diversity and underscores the potential for targeted sampling of mice from specific populations as a strategy for developing effective new mouse models of both rare and common human diseases. 
    more » « less