We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be highly unsatisfactory. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses --- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.
more »
« less
Learning-Assisted Algorithm Unrolling for Online Optimization with Budget Constraints
Online optimization with multiple budget constraints is challenging since the online decisions over a short time horizon are coupled together by strict inventory constraints. The existing manually-designed algorithms cannot achieve satisfactory average performance for this setting because they often need a large number of time steps for convergence and/or may violate the inventory constraints. In this paper, we propose a new machine learning (ML) assisted unrolling approach, called LAAU (Learning-Assisted Algorithm Unrolling), which unrolls the agent’s online decision pipeline and leverages an ML model for updating the Lagrangian multiplier online. For efficient training via backpropagation, we derive gradients of the decision pipeline over time. We also provide the average cost bounds for two cases when training data is available offline and collected online, respectively. Finally, we present numerical results to highlight that LAAU can outperform the existing baselines.
more »
« less
- Award ID(s):
- 1910208
- PAR ID:
- 10466187
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 37
- Issue:
- 9
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 10771 to 10779
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Current state-of-the-art systems for hybrid memory management are enriched with machine intelligence. To enable the practical use of Machine Learning (ML), system-level page schedulers focus the ML model training over a small subset of the applications’ memory footprint. At the same time, they use existing lightweight historical information to predict the access behavior of majority of the pages. To maximize application performance improvements, the pages selected for machine learning-based management are identified with elaborate page selection methods. These methods involve the calculation of detailed performance estimates depending on the configuration of the hybrid memory platform. This paper explores the opportunities to reduce such operational overheads of machine learning-based hybrid memory page schedulers via use of visualization techniques to depict memory access patterns, and reveal spatial and temporal correlations among the selected pages, that current methods fail to leverage. We propose an initial version of a visualization pipeline for prioritizing pages for machine learning, that is independent of the hybrid memory configuration. Our approach selects pages whose ML-based management delivers, on average, performance levels within 5% of current solutions, while reducing by 75 × the page selection time. We discuss future directions and make a case that visualization and computer vision methods can unlock new insights and reduce the operational complexity of emerging systems solutions.more » « less
-
This paper develops competitive bidding strategies for an online linear optimization problem with inventory management constraints in both cost minimization and profit maximization settings. In the minimization problem, a decision maker should satisfy its time-varying demand by either purchasing units of an asset from the market or producing them from a local inventory with limited capacity. In the maximization problem, a decision maker has a time-varying supply of an asset that may be sold to the market or stored in the inventory to be sold later. In both settings, the market price is unknown in each timeslot and the decision maker can submit a finite number of bids to buy/sell the asset. Once all bids have been submitted, the market price clears and the amount bought/sold is determined based on the clearing price and submitted bids. From this setup, the decision maker must minimize/maximize their cost/profit in the market, while also devising a bidding strategy in the face of an unknown clearing price. We propose DEMBID and SUPBID, two competitive bidding strategies for these online linear optimization problems with inventory management constraints for the minimization and maximization setting respectively. We then analyze the competitive ratios of the proposed algorithms and show that the performance of our algorithms approaches the best possible competitive ratio as the maximum number of bids increases. As a case study, we use energy data traces from Akamai data centers, renewable outputs from NREL, and energy prices from NYISO to show the effectiveness of our bidding strategies in the context of energy storage management for a large energy customer participating in a real-time electricity market.more » « less
-
Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these applications post-deployment is difficult due to lack of real-time feedback (i.e., labels) for predictions and silent failures that could occur at any component of the ML pipeline (e.g., data distribution shift or anomalous features). We propose a new type of data management system that offers end-to-end observability , or visibility into complex system behavior, for deployed ML pipelines through assisted (1) detection, (2) diagnosis, and (3) reaction to ML-related bugs. We describe new research challenges and suggest preliminary solution ideas in all three aspects. Finally, we introduce an example architecture for a "bolt-on" ML observability system, or one that wraps around existing tools in the stack.more » « less
-
Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.more » « less