skip to main content


Title: Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search
Modern web search engines exploit users' search history to personalize search results, with a goal of improving their service utility on a per-user basis. But it is this very dimension that leads to the risk of privacy infringement and raises serious public concerns. In this work, we propose a client-centered intent-aware query obfuscation solution for protecting user privacy in a personalized web search scenario. In our solution, each user query is submitted with l additional cover queries and corresponding clicks, which act as decoys to mask users' genuine search intent from a search engine. The cover queries are sequentially sampled from a set of hierarchically organized language models to ensure the coherency of fake search intents in a cover search task. Our approach emphasizes the plausibility of generated cover queries, not only to the current genuine query but also to previous queries in the same task, to increase the complexity for a search engine to identify a user's true intent. We also develop two new metrics from an information theoretic perspective to evaluate the effectiveness of provided privacy protection. Comprehensive experiment comparisons with state-of-the-art query obfuscation techniques are performed on the public AOL search log, and the propitious results substantiate the effectiveness of our solution.  more » « less
Award ID(s):
1553568 1760523
NSF-PAR ID:
10066043
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
SIGIR '18 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
Page Range / eLocation ID:
285 to 294
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We develop Hide-n-Seek, an intent-aware privacy protection plugin for personalized web search. In addition to users' genuine search queries, Hide-n-Seek submits k cover queries and corresponding clicks to an external search engine to disguise a user's search intent grounded and reinforced in a search session by mimicking the true query sequence. The cover queries are synthesized and randomly sampled from a topic hierarchy, where each node represents a coherent search topic estimated by both n-gram and neural language models constructed over crawled web documents. Hide-n-Seek also personalizes the returned search results by re-ranking them based on the genuine user profile developed and maintained on the client side. With a variety of graphical user interfaces, we present the topic-based query obfuscation mechanism to the end users for them to digest how their search privacy is protected. 
    more » « less
  2. We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search. It consists of two major components, a document ranker and a query recommender. Document ranker combines current query and session information and compares the combined representation with document representation to rank the documents. Query recommender tracks users’ query reformulation sequence considering all previous in-session queries using a sequence to sequence approach. As both tasks are driven by the users’ underlying search intent, we perform joint learning of these two components through session recurrence, which encodes search context and intent. Extensive comparisons against state-of-the-art document ranking and query suggestion algorithms are performed on the public AOL search log, and the promising results endorse the effectiveness of the joint learning framework. 
    more » « less
  3. We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search. It consists of two major components, a document ranker and a query recommender. Document ranker combines current query and session information and compares the combined representation with document representation to rank the documents. Query recommender tracks users’ query reformulation sequence considering all previous in-session queries using a sequence to sequence approach. As both tasks are driven by the users’ underlying search intent, we perform joint learning of these two components through session recurrence, which encodes search context and intent. Extensive comparisons against state-of-the-art document ranking and query suggestion algorithms are performed on the public AOL search log, and the promising results endorse the effectiveness of the joint learning framework. 
    more » « less
  4. One longstanding complication with Earth data discovery involves understanding a user’s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and temporal information from a query or understanding the query with ontology. No research in the geospatial domain has investigated user queries in a systematic way. Here, we propose a query understanding framework and apply it to fill the gap by better interpreting a user’s search intent for Earth data search engines and adopting knowledge that was mined from metadata and user query logs. The proposed query understanding tool contains four components: spatial and temporal parsing; concept recognition; Named Entity Recognition (NER); and, semantic query expansion. Spatial and temporal parsing detects the spatial bounding box and temporal range from a query. Concept recognition isolates clauses from free text and provides the search engine phrases instead of a list of words. Name entity recognition detects entities from the query, which inform the search engine to query the entities detected. The semantic query expansion module expands the original query by adding synonyms and acronyms to phrases in the query that was discovered from Web usage data and metadata. The four modules interact to parse a user’s query from multiple perspectives, with the goal of understanding the consumer’s quest intent for data. As a proof-of-concept, the framework is applied to oceanographic data discovery. It is demonstrated that the proposed framework accurately captures a user’s intent. 
    more » « less
  5. Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers' preferences in order to elicit purchases. Starting from the same query, customers may purchase different products due to their personal taste or needs. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer's purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users' hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. To further solve the word mismatch problem between queries and items, we proposed an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models. 
    more » « less