skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Unified Prediction Framework for Signal Maps: Not All Measurements are Created Equal
Signal maps are essential for the planning and operation of cellular networks. However, the measurements needed to create such maps are expensive, often biased, not always reflecting the performance metrics of interest, and posing privacy risks. In this paper, we develop a unified framework for predicting cellular performance maps from limited available measurements. Our framework builds on a state-of-the-art random-forest predictor, or any other base predictor. We propose and combine three mechanisms that deal with the fact that not all measurements are equally important for a particular prediction task. First, we design quality-of-service functions (Q), including signal strength (RSRP) but also other metrics of interest to operators, such as number of bars, coverage (improving recall by 76%-92%) and call drop probability (reducing error by as much as 32%). By implicitly altering the loss function employed in learning, quality functions can also improve prediction for RSRP itself where it matters (e.g., MSE reduction up to 27% in the low signal strength regime, where high accuracy is critical). Second, we introduce weight functions (W) to specify the relative importance of prediction at different locations and other parts of the feature space. We propose re-weighting based on importance sampling to obtain unbiased estimators when the sampling and target distributions are different. This yields improvements up to 20% for targets based on spatially uniform loss or losses based on user population density. Third, we apply the Data Shapley framework for the first time in this context: to assign values (ϕ) to individual measurement points, which capture the importance of their contribution to the prediction task. This can improve prediction (e.g., from 64% to 94% in recall for coverage loss) by removing points with negative values and storing only the remaining data points (i.e., as low as 30%), which also has the side-benefit of helping privacy. We evaluate our methods and demonstrate significant improvement in prediction performance, using several real-world datasets.  more » « less
Award ID(s):
1956393 1900654 1939237 1901488
PAR ID:
10431376
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Mobile Computing
ISSN:
1536-1233
Page Range / eLocation ID:
1 to 18
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Signal strength maps are of great importance to cellular providers for network planning and operation, however they are expensive to obtain and possibly limited or inaccurate in some locations. In this paper, we develop a prediction framework based on random forests to improve signal strength maps from limited measurements. First, we propose a random forests (RFs)-based predictor, with a rich set of features including location as well as time, cell ID, device hardware and other features. We show that our RFs-based predictor can significantly improve the tradeoff between prediction error and number of measurements needed compared to state-of-the-art data-driven predictors, i.e., requiring 80% less measurements for the same prediction accuracy, or reduces the relative error by 17% for the same number of measurements. Second, we leverage two types of real-world LTE RSRP datasets to evaluate into the performance of different prediction methods: (i) a small but dense Campus dataset, collected on a university campus and (ii) several large but sparser NYC and LA datasets, provided by a mobile data analytics company. 
    more » « less
  2. Efficient soil sampling is essential for effective soil management and research on soil health. Traditional site selection methods are labor-intensive and fail to capture soil variability comprehensively. This study introduces a deep learning-based tool that automates soil sampling site selection using spectral images. The proposed framework consists of two key components: an extractor and a predictor. The extractor, based on a convolutional neural network (CNN), derives features from spectral images, while the predictor employs self-attention mechanisms to assess feature importance and generate prediction maps. The model is designed to process multiple spectral images and address the class imbalance in soil segmentation. The model was trained on a soil dataset from 20 fields in eastern South Dakota, collected via drone-mounted LiDAR with high-precision GPS. Evaluation on a test set achieved a mean intersection over union (mIoU) of 69.46 % and a mean Dice coefficient (mDc) of 80.35 %, demonstrating strong segmentation performance. The results highlight the model's effectiveness in automating soil sampling site selection, providing an advanced tool for producers and soil scientists. Compared to existing state-of-the-art methods, the proposed approach improves accuracy and efficiency, optimizing soil sampling processes and enhancing soil research. 
    more » « less
  3. A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Past research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case of learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. A predictor answers a stream of classification queries instead of outputting a hypothesis. Earlier work has considered a private prediction model with a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and in a manner that cannot rely solely on the training set. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions over infinite domains, for which (traditional) private learning is known to be impossible. 
    more » « less
  4. We consider the problem of predicting cellular network performance (signal maps) from measurements collected by several mobile devices. We formulate the problem within the online federated learning framework: (i) federated learning (FL) enables users to collaboratively train a model, while keeping their training data on their devices; (ii) measurements are collected as users move around over time and are used for local training in an online fashion. We consider an honest-but-curious server, who observes the updates from target users participating in FL and infers their location using a deep leakage from gradients (DLG) type of attack, originally developed to reconstruct training data of DNN image classifiers. We make the key observation that a DLG attack, applied to our setting, infers the average location of a batch of local data, and can thus be used to reconstruct the target users' trajectory at a coarse granularity. We build on this observation to protect location privacy, in our setting, by revisiting and designing mechanisms within the federated learning framework including: tuning the FL parameters for averaging, curating local batches so as to mislead the DLG attacker, and aggregating across multiple users with different trajectories. We evaluate the performance of our algorithms through both analysis and simulation based on real-world mobile datasets, and we show that they achieve a good privacy-utility tradeoff. 
    more » « less
  5. In-context learning (ICL), the ability of large language models to perform novel tasks by conditioning on a prompt with a few task examples, requires these examples to be informative about the test instance. The standard approach of independently ranking and selecting the most similar examples selects redundant examples while omitting important information. In this work, we show that BERTScore-Recall (BSR) selects better examples that demonstrate more of the salient aspects, e.g. reasoning patterns, of the test input. We further extend BSR and many standard metrics to easily optimizable set-level metrics, giving still better coverage of those salient aspects. On 15 datasets spanning 6 tasks and with 7 diverse LLMs, we show that (1) BSR is the superior metric for in-context example selection across the board, and (2) for compositional tasks, set selection using Set-BSR outperforms independent ranking by up to 17 points on average and, despite being training-free, surpasses methods that leverage task or LLM-specific training. 
    more » « less