skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-objective Privacy-preserving Text Representation Learning
are explicitly contained in the text or be implicit. For example, demographic information about the author of a text can be predicted with above-chance accuracy from linguistic cues in the text itself. Letting alone its explicitness, some of the private information correlates with the output labels and therefore can be learned by a neural network. In such a case, there is a tradeoff between the utility of the representation (measured by the accuracy of the classification network) and its privacy. This problem is inherently a multi-objective problem because these two objectives may conflict, necessitating a trade-off. Thus, we explicitly cast this problem as multi-objective optimization (MOO) with the overall objective of finding a Pareto stationary solution. We, therefore, propose a multiple-gradient descent algorithm (MGDA) that enables the efficient application of the Frank-Wolfe algorithm [10] using the line search. Experimental results on sentiment analysis and part-of-speech (POS) tagging show that MGDA produces higher-performing models than most recent proxy objective approaches, and performs as well as single objective baselines.  more » « less
Award ID(s):
1946391
PAR ID:
10321974
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Evaluating two‐terminal network reliability is a classical problem with numerous applications. Because this problem is #P‐Complete, practical studies involving large systems commonly resort to approximating or estimating system reliability rather than evaluating it exactly. Researchers have characterized signatures, such as the destruction spectrum and survival signature, which summarize the system's structure and give rise to procedures for evaluating or approximating network reliability. These procedures are advantageous if the signature can be computed efficiently; however, computing the signature is challenging for complex systems. With this motivation, we consider the use of Monte Carlo (MC) simulation to estimate the survival signature of a two‐terminal network in which there are two classes of i.i.d. components. In this setting, we prove that each MC replication to estimate the signature of a multi‐class system entails solving a multi‐objective maximum capacity path problem. For the case of two classes of components, we adapt a Dijkstra's‐like bi‐objective shortest path algorithm from the literature for the purpose of solving the resulting bi‐objective maximum capacity path problem. We perform computational experiments to compare our method's efficiency against intuitive benchmark approaches. Our computational results demonstrate that the bi‐objective optimization approach consistently outperforms the benchmark approaches, thereby enabling a larger number of MC replications and improved accuracy of the reliability estimation. Furthermore, the efficiency gains versus benchmark approaches appear to become more significant as the network increases in size. 
    more » « less
  2. This paper focuses on a newly challenging setting in hard-label adversarial attacks on text data by taking the budget information into account. Although existing approaches can successfully generate adversarial examples in the hard-label setting, they follow an ideal assumption that the victim model does not restrict the number of queries. However, in real-world applications the query budget is usually tight or limited. Moreover, existing hard-label adversarial attack techniques use the genetic algorithm to optimize discrete text data by maintaining a number of adversarial candidates during optimization, which can lead to the problem of generating low-quality adversarial examples in the tight-budget setting. To solve this problem, in this paper, we propose a new method named TextHoaxer by formulating the budgeted hard-label adversarial attack task on text data as a gradient-based optimization problem of perturbation matrix in the continuous word embedding space. Compared with the genetic algorithm-based optimization, our solution only uses a single initialized adversarial example as the adversarial candidate for optimization, which significantly reduces the number of queries. The optimization is guided by a new objective function consisting of three terms, i.e., semantic similarity term, pair-wise perturbation constraint, and sparsity constraint. Semantic similarity term and pair-wise perturbation constraint can ensure the high semantic similarity of adversarial examples from both comprehensive text-level and individual word-level, while the sparsity constraint explicitly restricts the number of perturbed words, which is also helpful for enhancing the quality of generated text. We conduct extensive experiments on eight text datasets against three representative natural language models, and experimental results show that TextHoaxer can generate high-quality adversarial examples with higher semantic similarity and lower perturbation rate under the tight-budget setting. 
    more » « less
  3. High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling. 
    more » « less
  4. Task offloading in edge computing infrastructure remains a challenge for dynamic and complex environments, such as Industrial Internet-of-Things. The hardware resource constraints of edge servers must be explicitly considered to ensure that system resources are not overloaded. Many works have studied task offloading while focusing primarily on ensuring system resilience. However, in the face of deep learning-based services, model performance with respect to loss/accuracy must also be considered. Deep learning services with different implementations may provide varying amounts of loss/accuracy while also being more complex to run inference on. That said, communication latency can be reduced to improve overall Quality-of-Service by employing compression techniques. However, such techniques can also have the side-effect of reducing the loss/accuracy provided by deep learning-based service. As such, this work studies a joint optimization problem for task offloading decisions in 3-tier edge computing platforms where decisions regarding task offloading are made in tandem with compression decisions. The objective is to optimally offload requests with compression such that the trade-off between latency-accuracy is not greatly jeopardized. We cast this problem as a mixed integer nonlinear program. Due to its nonlinear nature, we then decompose it into separate subproblems for offloading and compression. An efficient algorithm is proposed to solve the problem. Empirically, we show that our algorithm attains roughly a 0.958-approximation of the optimal solution provided by a block coordinate descent method for solving the two sub-problems back-to-back. 
    more » « less
  5. null (Ed.)
    Convolutional Neural Networks (CNN) are becomin deeper and deeper. It is challenging to deploy the networks directly to embedded devices be- cause they may have different computational capacities. When deploying CNNs, the trade-off between the two objectives: accuracy and inference speed, should be considered. NSGA-II (Non-dominated Sorting Genetic Algorithm II) algorithm is a multi-objective optimiza- tion algorithm with good performance. The network architecture has a significant influence on the accuracy and inference time. In this paper, we proposed a con- volutional neural network optimization method using a modified NSGA-II algorithm to optimize the network architecture. The NSGA-II algorithm is employed to generate the Pareto front set for a specific convolutional neural network, which can be utilized as a guideline for the deployment of the network in embedded devices. The modified NSGA-II algorithm can help speed up the training process. The experimental results show that the modified NSGA-II algorithm can achieve similar results as the original NSGA-II algorithm with respect to our specific task and saves 46.20% of the original training time. 
    more » « less