skip to main content


Title: A Collaborative Learning Based Approach for Parameter Configuration of Cellular Networks
Cellular network performance depends heavily on the configuration of its network parameters. Current practice of parameter configuration relies largely on expert experience, which is often suboptimal, time-consuming, and error-prone. Therefore, it is desirable to automate this process to improve the accuracy and efficiency via learning-based approaches. However, such approaches need to address several challenges in real operational networks: the lack of diverse historical data, a limited amount of experiment budget set by network operators, and highly complex and unknown network performance functions. To address those challenges, we propose a collaborative learning approach to leverage data from different cells to boost the learning efficiency and to improve network performance. Specifically, we formulate the problem as a transferable contextual bandit problem, and prove that by transfer learning, one could significantly reduce the regret bound. Based on the theoretical result, we further develop a practical algorithm that decomposes a cell’s policy into a common homogeneous policy learned using all cells’ data and a cell-specific policy that captures each individual cell’s heterogeneous behavior. We evaluate our proposed algorithm via a simulator constructed using real network data and demonstrates faster convergence compared to baselines. More importantly, a live field test is also conducted on a real metropolitan cellular network consisting 1700+ cells to optimize five parameters for two weeks. Our proposed algorithm shows a significant performance improvement of 20%.  more » « less
Award ID(s):
1718901
NSF-PAR ID:
10097232
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
IEEE Infocom
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The COVID-19 pandemic has posed grand challenges to policy makers, raising major social conflicts between public health and economic resilience. Policies such as closure or reopen of businesses are made based on scientific projections of infection risks obtained from infection dynamics models. While most parameters in infection dynamics models can be set using domain knowledge of COVID-19, a key parameter - human mobility - is often challenging to estimate due to complex social contexts and limited training data under escalating COVID-19 conditions. To address these challenges, we formulate the problem as a spatio-temporal data generation problem and propose COVID-GAN, a spatio-temporal Conditional Generative Adversarial Network, to estimate mobility (e.g., changes in POI visits) under various real-world conditions (e.g., COVID-19 severity, local policy interventions) integrated from multiple data sources. We also introduce a domain-constraint correction layer in the generator of COVID-GAN to reduce the difficulty of learning. Experiments using urban mobility data derived from cell phone records and census data show that COVID-GAN can well approximate real-world human mobility responses, and that the proposed domain-constraint based correction can greatly improve solution quality. 
    more » « less
  2. A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance. 
    more » « less
  3. Cellular network configuration is critical for network performance. Current practice is labor-intensive, errorprone, and far from optimal. To automate efficient cellular network configuration, in this work, we propose an onlinelearning-based joint-optimization approach that addresses a few specific challenges: limited data availability, convoluted sample data, highly complex optimization due to interactions among neighboring cells, and the need to adapt to network dynamics. In our approach, to learn an appropriate utility function for a cell, we develop a neural-network-based model that addresses the convoluted sample data issue and achieves good accuracy based on data aggregation. Based on the utility function learned, we formulate a global network configuration optimization problem. To solve this high-dimensional nonconcave maximization problem, we design a Gibbs-sampling-based algorithm that converges to an optimal solution when a technical parameter is small enough. Furthermore, we design an online scheme that updates the learned utility function and solves the corresponding maximization problem efficiently to adapt to network dynamics. To illustrate the idea, we use the case study of pilot power configuration. Numerical results illustrate the effectiveness of the proposed approach. 
    more » « less
  4. Cellular network configuration is critical for network performance. Current practice is labor-intensive, errorprone, and far from optimal. To automate efficient cellular network configuration, in this work, we propose an onlinelearning-based joint-optimization approach that addresses a few specific challenges: limited data availability, convoluted sample data, highly complex optimization due to interactions among neighboring cells, and the need to adapt to network dynamics. In our approach, to learn an appropriate utility function for a cell, we develop a neural-network-based model that addresses the convoluted sample data issue and achieves good accuracy based on data aggregation. Based on the utility function learned, we formulate a global network configuration optimization problem. To solve this high-dimensional nonconcave maximization problem, we design a Gibbs-samplingbased algorithm that converges to an optimal solution when a technical parameter is small enough. Furthermore, we design an online scheme that updates the learned utility function and solves the corresponding maximization problem efficiently to adapt to network dynamics. To illustrate the idea, we use the case study of pilot power configuration. Numerical results illustrate the effectiveness of the proposed approach. 
    more » « less
  5. In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance. 
    more » « less