skip to main content


Title: Weatherman: Exposing weather-based privacy threats in big energy data
Smart energy meters record electricity consumption and generation at fine-grained intervals, and are among the most widely deployed sensors in the world. Energy data embeds detailed information about a building's energy-efficiency, as well as the behavior of its occupants, which academia and industry are actively working to extract. In many cases, either inadvertently or by design, these third-parties only have access to anonymous energy data without an associated location. The location of energy data is highly useful and highly sensitive information: it can provide important contextual information to improve big data analytics or interpret their results, but it can also enable third-parties to link private behavior derived from energy data with a particular location. In this paper, we present Weatherman, which leverages a suite of analytics techniques to localize the source of anonymous energy data. Our key insight is that energy consumption data, as well as wind and solar generation data, largely correlates with weather, e.g., temperature, wind speed, and cloud cover, and that every location on Earth has a distinct weather signature that uniquely identifies it. Weatherman represents a serious privacy threat, but also a potentially useful tool for researchers working with anonymous smart meter data. We evaluate Weatherman's potential in both areas by localizing data from over one hundred smart meters using a weather database that includes data from over 35,000 locations. Our results show that Weatherman localizes coarse (one-hour resolution) energy consumption, wind, and solar data to within 16.68km, 9.84km, and 5.12km, respectively, on average, which is more accurate using much coarser resolution data than prior work on localizing only anonymous solar data using solar signatures.  more » « less
Award ID(s):
1505422 1405826 1645952 1253063 1534080
NSF-PAR ID:
10062543
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE International Conference on Big Data
Page Range / eLocation ID:
1079 to 1086
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Solar energy capacity is continuing to increase. The key challenge with integrating solar into buildings and the electric grid is its high power generation variability, which is a function of many factors, including a site's location, time, weather, and numerous physical attributes. There has been significant prior work on solar performance modeling and forecasting that infers a site's current and future solar generation based on these factors. Accurate solar performance models and forecasts are also a pre-requisite for conducting a wide range of building and grid energy-efficiency research. Unfortunately, much of the prior work is not accessible to researchers, either because it has not been released as open source, is time-consuming to re-implement, or requires access to proprietary data sources. To address the problem, we present Solar-TK, a data-driven toolkit for solar performance modeling and forecasting that is simple, extensible, and publicly accessible. Solar-TK's simple approach models and forecasts a site's solar output given only its location and a small amount of historical generation data. Solar-TK's extensible design includes a small collection of independent modules that connect together to implement basic modeling and forecasting, while also enabling users to implement new energy analytics. We plan to release Solar-TK as open source to enable research that requires realistic solar models and forecasts, and to serve as a baseline for comparing new solar modeling and forecasting techniques. We compare Solar-TK's simple approach with PVlib and show that it yields comparable accuracy. We present three case studies showing how Solar-TK can advance energy-efficiency research. 
    more » « less
  2. Mobile tracking has long been a privacy problem, where the geographic data and timestamps gathered by mobile network operators (MNOs) are used to track the locations and movements of mobile subscribers. Additionally, selling the geolocation information of subscribers has become a lucrative business. Many mobile carriers have violated user privacy agreements by selling users’ location history to third parties without user consent, exacerbating privacy issues related to mobile tracking and profiling. This paper presents AAKA, an anonymous authentication and key agreement scheme designed to protect against mobile tracking by honest-but-curious MNOs. AAKA leverages anonymous credentials and introduces a novel mobile authentication protocol that allows legitimate subscribers to access the network anonymously, without revealing their unique (real) IDs. It ensures the integrity of user credentials, preventing forgery, and ensures that connections made by the same user at different times cannot be linked. While the MNO alone cannot identify or profile a user, AAKA enables identification of a user under legal intervention, such as when the MNOs collaborate with an authorized law enforcement agency. Our design is compatible with the latest cellular architecture and SIM standardized by 3GPP, meeting 3GPP’s fundamental security requirements for User Equipment (UE) authentication and key agreement processes. A comprehensive security analysis demonstrates the scheme’s effectiveness. The evaluation shows that the scheme is practical, with a credential presentation generation taking∼ 52 ms on a constrained host device equipped with a standard cellular SIM. 
    more » « less
  3. null (Ed.)
    Developing accurate solar performance models, which infer solar power output in real time based on the current environmental conditions, are an important prerequisite for many advanced energy analytics. Recent work has developed sophisticated data-driven techniques that generate customized models for complex rooftop solar sites by combining well-known physical models with both system and public weather station data. However, inferring solar generation from public weather station data has two drawbacks: not all solar sites are near a public weather station, and public weather data generally quantifies cloud cover-the most significant weather metric that affects solar-using highly coarse and imprecise measurements.In this paper, we develop and evaluate solar performance models that use satellite-based estimates of downward shortwave (solar) radiation (DSR) at the Earth's surface, which NOAA began publicly releasing after the launch of the GOES-R geostationary satellites in 2017. Unlike public weather data, DSR estimates are available for every 0.5km 2 area. As we show, the accuracy of solar performance modeling using satellite data and public weather station data depends on the cloud conditions, with DSR-based modeling being more accurate under clear skies and station-based modeling being more accurate under overcast skies. Surprisingly, our results show that, overall, pure satellite-based modeling yields similar accuracy as pure station-based modeling, although the relationship is a function of conditions and the local climate. We also show that a hybrid approach that combines the best of both approaches can also modestly improve accuracy. 
    more » « less
  4. null (Ed.)
    Spurious power consumption data reported from compromised meters controlled by organized adversaries in the Advanced Metering Infrastructure (AMI) may have drastic consequences on a smart grid’s operations. While existing research on data falsification in smart grids mostly defends against isolated electricity theft, we introduce a taxonomy of various data falsification attack types, when smart meters are compromised by organized or strategic rivals. To counter these attacks, we first propose a coarse-grained and a fine-grained anomaly-based security event detection technique that uses indicators such as deviation and directional change in the time series of the proposed anomaly detection metrics to indicate: (i) occurrence, (ii) type of attack, and (iii) attack strategy used, collectively known as attack context . Leveraging the attack context information, we propose three attack response metrics to the inferred attack context: (a) an unbiased mean indicating a robust location parameter; (b) a median absolute deviation indicating a robust scale parameter; and (c) an attack probability time ratio metric indicating the active time horizon of attacks. Subsequently, we propose a trust scoring model based on Kullback-Leibler (KL) divergence, that embeds the appropriate unbiased mean, the median absolute deviation, and the attack probability ratio metric at runtime to produce trust scores for each smart meter. These trust scores help classify compromised smart meters from the non-compromised ones. The embedding of the attack context, into the trust scoring model, facilitates accurate and rapid classification of compromised meters, even under large fractions of compromised meters, generalize across various attack strategies and margins of false data. Using real datasets collected from two different AMIs, experimental results show that our proposed framework has a high true positive detection rate, while the average false alarm and missed detection rates are much lesser than 10% for most attack combinations for two different real AMI micro-grid datasets. Finally, we also establish fundamental theoretical limits of the proposed method, which will help assess the applicability of our method to other domains. 
    more » « less
  5. The growing integration of distributed energy resources (DERs) in distribution grids raises various reliability issues due to DER's uncertain and complex behaviors. With large-scale DER penetration in distribution grids, traditional outage detection methods, which rely on customers report and smart meters' “last gasp” signals, will have poor performance, because renewable generators and storage and the mesh structure in urban distribution grids can continue supplying power after line outages. To address these challenges, we propose a data-driven outage monitoring approach based on the stochastic time series analysis with a theoretical guarantee. Specifically, we prove via power flow analysis that dependency of time-series voltage measurements exhibits significant statistical changes after line outages. This makes the theory on optimal change-point detection suitable to identify line outages. However, existing change point detection methods require post-outage voltage distribution, which are unknown in distribution systems. Therefore, we design a maximum likelihood estimator to directly learn distribution pa-rameters from voltage data. We prove the estimated parameters-based detection also achieves optimal performance, making it extremely useful for fast distribution grid outage identifications. Furthermore, since smart meters have been widely installed in distribution grids and advanced infrastructure (e.g., PMU) has not widely been available, our approach only requires voltage magnitude for quick outage identification. Simulation results show highly accurate outage identification in eight distribution grids with 17 configurations with and without DERs using smart meter data. 
    more » « less