In this paper, we propose Task-Adversarial co-Generative Nets (TAGN) for learning from multiple tasks. It aims to address the two fundamental issues of multi-task learning, i.e., domain shift and limited labeled data, in a principled way. To this end, TAGN first learns the task-invariant representations of features to bridge the domain shift among tasks. Based on the task-invariant features, TAGN generates the plausible examples for each task to tackle the data scarcity issue. In TAGN, we leverage multiple game players to gradually improve the quality of the co-generation of features and examples by using an adversarial strategy. It simultaneously learns the marginal distribution of task-invariant features across different tasks and the joint distributions of examples with labels for each task. The theoretical study shows the desired results: at the equilibrium point of the multi-player game, the feature extractor exactly produces the task-invariant features for different tasks, while both the generator and the classifier perfectly replicate the joint distribution for each task. The experimental results on the benchmark data sets demonstrate the effectiveness of the proposed approach.
more »
« less
Automated Optimal Online Civil Issue Classification using Multiple Feature Sets
In this paper, the automatic classification of non-emergency civil issues in crowdsourcing systems is addressed in the case where multiple feature sets are available. We recognize that multiple feature sets can contain useful complementary information regarding the type of an issue leading to a more accurate decision. However, using all features in these sets may delay the decision. Since we are interested in reaching an accurate decision in a timely manner, an optimal way of selecting features from multiple feature sets is needed. To this end, we propose a novel approach that sequentially reviews available features and feature sets to decide whether the feature review process must continue in the current set or move to the next one. In the end, when all feature sets have been reviewed, the issue is classified using all available information. It is shown that the proposed approach is guaranteed to review the least number of features in all feature sets before reaching a decision, while the optimum decision rule is shown to minimize the average Bayes risk. Evaluation on real world SeeClickFix data demonstrates the ability to classify issues by reviewing 99.5% less features than state-of-the-art without sacrificing accuracy.
more »
« less
- Award ID(s):
- 1737443
- PAR ID:
- 10196231
- Date Published:
- Journal Name:
- 2019 53rd Asilomar Conference on Signals, Systems, and Computers
- Page Range / eLocation ID:
- 1591 to 1595
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Of importance when selecting a voting method is whether, on a regular basis, its outcomes accurately capture the intent of voters. A surprise is that very few procedures do this. Another desired feature is for a decision approach to assist groups in reaching a consensus (Sect. 5). As described, these goals are satisfied only with the Borda count. Addressing these objectives requires understanding what can go wrong, what causes voting difficulties, and how bad they can be. To avoid technicalities, all of this is illustrated with examples accompanied by references for readers wishing a complete analysis. As shown (Sects. 1–3), most problems reflect a loss of vital information. Understanding this feature assists in showing that the typical description of Arrow’s Theorem, “with three or more alternatives, no voting method is fair,” is not accurate (Sect. 2).more » « less
-
Wren, Jonathan (Ed.)Abstract Motivation In the training of predictive models using high-dimensional genomic data, multiple studies’ worth of data are often combined to increase sample size and improve generalizability. A drawback of this approach is that there may be different sets of features measured in each study due to variations in expression measurement platform or technology. It is often common practice to work only with the intersection of features measured in common across all studies, which results in the blind discarding of potentially useful feature information that is measured in individual or subsets of studies. Results We characterize the loss in predictive performance incurred by using only the intersection of feature information available across all studies when training predictors using gene expression data from microarray and sequencing datasets. We study the properties of linear and polynomial regression for imputing discarded features and demonstrate improvements in the external performance of prediction functions through simulation and in gene expression data collected on breast cancer patients. To improve this process, we propose a pairwise strategy that applies any imputation algorithm to two studies at a time and averages imputed features across pairs. We demonstrate that the pairwise strategy is preferable to first merging all datasets together and imputing any resulting missing features. Finally, we provide insights on which subsets of intersected and study-specific features should be used so that missing-feature imputation best promotes cross-study replicability. Availability and implementation The code is available at https://github.com/YujieWuu/Pairwise_imputation. Supplementary information Supplementary information is available at Bioinformatics online.more » « less
-
Civic engagement platforms such as SeeClickFix and FixMyStreet have revolutionized the way citizens interact with local governments to report and resolve urban issues. However, recognizing which urban issues are important to the community in an accurate and timely manner is essential for authorities to prioritize important issues, allocate resources and maintain citizens' satisfaction with local governments. To this end, a novel formulation based on optimal stopping theory is devised to infer urban issues importance from ambiguous textual, time and location information. The goal is to optimize recognition accuracy, while minimizing the time to reach a decision. The optimal classification and stopping rules are derived. Furthermore, a near-real-time urban issue reports processing method to infer the importance of incoming issues is proposed. The effectiveness of the proposed method is illustrated on a real-word dataset from SeeClick-Fix, where significant reduction in time-to-decision without sacrificing accuracy is observed.more » « less
-
A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called "pairwise issue expansion", which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a "pairwise pricing equilibrium" in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions.more » « less