skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Doan, Khoa"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
  2. Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder.We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections. 
    more » « less
  3. Digital advertising is performed in multiple ways, for e.g., contextual, display-based and search-based advertising. Across these avenues, the primary goal of the advertiser is to maximize the return on investment. To realize this, the advertiser often aims to target the advertisements towards a targeted set of audience as this set has a high likelihood to respond positively towards the advertisements. One such form of tailored and personalized, targeted advertising is known as look-alike modeling, where the advertiser provides a set of seed users and expects the machine learning model to identify a new set of users such that the newly identified set is similar to the seed-set with respect to the online purchasing activity. Existing look-alike modeling techniques (i.e., similarity-based and regression-based) suffer from serious limitations due to the implicit constraints induced during modeling. In addition, the high-dimensional and sparse nature of the advertising data increases the complexity. To overcome these limitations, in this paper, we propose a novel Adversarial Factorization Autoencoder that can efficiently learn a binary mapping from sparse, high-dimensional data to a binary address space through the use of an adversarial training procedure. We demonstrate the effectiveness of our proposed approach on a dataset obtained from a real-world setting and also systematically compare the performance of our proposed approach with existing look-alike modeling baselines. 
    more » « less