- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Advances in neural information processing systems
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
We present an algorithm STRSAGA for efficiently maintaining a machine learning model over data points that arrive over time, quickly updating the model as new training data is observed. We present a competitive analysis comparing the suboptimality of the model maintained by STRSAGA with that of an offline algorithm that is given the entire data beforehand, and analyze the risk-competitiveness of STRSAGA under different arrival patterns. Our theoretical and experimental results show that the risk of STRSAGA is comparable to that of offline algorithms on a variety of input arrival patterns, and its experimental performance is significantly better than prior algorithms suited for streaming data, such as SGD and SSVRG.
Efficient allocation of tasks to workers is a central problem in crowdsourcing. In this paper, we consider a special setting inspired by spatial crowdsourcing platforms where both workers and tasks arrive dynamically. Additionally, we assume all tasks are heterogeneous and each worker-task assignment brings a known reward. The natural challenge lies in how to incorporate the uncertainty in the arrivals from both workers and tasks into our online allocation policy such that the total expected reward is maximized. To formulate this, we assume the arrival patterns of worker “types” and task “types” can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step, a single worker and task are sampled (i.e., “arrive”) from two respective distributions independently, and that this sampling process repeats identically and independently for the entire T online time-steps. Our model, called Online Task Assignment with Two-Sided Arrival (OTA-TSA), is a significant generalization of the classical online task assignment where the set of tasks is assumed to be available offline. For the general version of OTA-TSA, we present an optimal non-adaptive algorithm which achieves an online competitive ratio of 0.295. For the special case of OTA-TSA where themore »
Online bipartite matching and allocation models are widely used to analyze and design markets such as Internet advertising, online labor, and crowdsourcing. Traditionally, vertices on one side of the market are fixed and known a priori, while vertices on the other side arrive online and are matched by a central agent to the offline side. The issue of possible conflicts among offline agents emerges in various real scenarios when we need to match each online agent with a set of offline agents. For example, in event-based social networks (e.g., Meetup), offline events conflict for some users since they will be unable to attend mutually-distant events at proximate times; in advertising markets, two competing firms may prefer not to be shown to one user simultaneously; and in online recommendation systems (e.g., Amazon Books), books of the same type “conflict” with each other in some sense due to the diversity requirement for each online buyer. The conflict nature inherent among certain offline agents raises significant challenges in both modeling and online algorithm design. In this paper, we propose a unifying model, generalizing the conflict models proposed in (She et al., TKDE 2016) and (Chen et al., TKDE 16). Our model can capturemore »
With the popularity of the Internet, traditional offline resource allocation has evolved into a new form, called online resource allocation. It features the online arrivals of agents in the system and the real-time decision-making requirement upon the arrival of each online agent. Both offline and online resource allocation have wide applications in various real-world matching markets ranging from ridesharing to crowdsourcing. There are some emerging applications such as rebalancing in bike sharing and trip-vehicle dispatching in ridesharing, which involve a two-stage resource allocation process. The process consists of an offline phase and another sequential online phase, and both phases compete for the same set of resources. In this paper, we propose a unified model which incorporates both offline and online resource allocation into a single framework. Our model assumes non-uniform and known arrival distributions for online agents in the second online phase, which can be learned from historical data. We propose a parameterized linear programming (LP)-based algorithm, which is shown to be at most a constant factor of 1/4 from the optimal. Experimental results on the real dataset show that our LP-based approaches outperform the LP-agnostic heuristics in terms of robustness and effectiveness.
The University of Washington Ice–Liquid Discriminator (UWILD) improves single-particle phase classifications of hydrometeors within Southern Ocean clouds using machine learningAbstract. Mixed-phase Southern Ocean clouds are challenging to simulate, and theirrepresentation in climate models is an important control on climatesensitivity. In particular, the amount of supercooled water and frozen massthat they contain in the present climate is a predictor of their planetaryfeedback in a warming climate. The recent Southern Ocean Clouds, Radiation, Aerosol Transport Experimental Study (SOCRATES) vastly increased theamount of in situ data available from mixed-phase Southern Ocean clouds usefulfor model evaluation. Bulk measurements distinguishing liquid and ice watercontent are not available from SOCRATES, so single-particle phaseclassifications from the Two-Dimensional Stereo (2D-S) probe are invaluablefor quantifying mixed-phase cloud properties. Motivated by the presence oflarge biases in existing phase discrimination algorithms, we develop a noveltechnique for single-particle phase classification of binary 2D-S images usinga random forest algorithm, which we refer to as the University of WashingtonIce–Liquid Discriminator (UWILD). UWILD uses 14 parameters computed frombinary image data, as well as particle inter-arrival time, to predict phase.We use liquid-only and ice-dominated time periods within the SOCRATES datasetas training and testing data. This novel approach to model training avoidsmajor pitfalls associated with using manually labeled data, including reducedmodel generalizability and high labor costs. We find that UWILD is wellcalibrated and has an overall accuracymore »