Data-driven applications rely on the correctness of their data to function properly and effectively. Errors in data can be incredibly costly and disruptive, leading to loss of revenue, incorrect conclusions, and misguided policy decisions. While data cleaning tools can purge datasets of many errors before the data is used, applications and users interacting with the data can introduce new errors. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Even when some of these discrepancies are discovered, they are often corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder.
In this paper, we propose QFix, a framework that derives explanations and repairs for discrepancies in relational data, by analyzing the effect of queries that operated on the data and identifying potential mistakes in those queries. QFix is flexible, handling scenarios where only a subset of the true discrepancies is known, and robust to different types of update workloads. We make four important contributions: (a) we formalize the problem of diagnosing the causes of data errors based on the queries that operated on and introduced errors to a dataset; (b) we develop exact methods for deriving diagnoses and fixes for identified errors using state-of-the-art tools; (c) we present several optimization techniques that improve our basic approach without compromising accuracy, and (d) we leverage a tradeoff between accuracy and performance to scale diagnosis to large datasets and query logs, while achieving near-optimal results. We demonstrate the effectiveness of QFix through extensive evaluation over benchmark and synthetic data.
more »
« less
Diagnosing snow accumulation errors in a rain-snow transitional environment with snow board observations: Diagnosing Snow Accumulation Errors
- NSF-PAR ID:
- 10027152
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Hydrological Processes
- Volume:
- 31
- Issue:
- 2
- ISSN:
- 0885-6087
- Page Range / eLocation ID:
- 349 to 363
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract. Here we present Antarctic snow accumulation variability at the regional scale over the past 1000 years. A total of 79 ice core snow accumulation records were gathered and assigned to seven geographical regions, separating the high-accumulation coastal zones below 2000 m of elevation from the dry central Antarctic Plateau. The regional composites of annual snow accumulation were evaluated against modelled surface mass balance (SMB) from RACMO2.3p2 and precipitation from ERA-Interim reanalysis. With the exception of the Weddell Sea coast, the low-elevation composites capture the regional precipitation and SMB variability as defined by the models. The central Antarctic sites lack coherency and either do not represent regional precipitation or indicate the model inability to capture relevant precipitation processes in the cold, dry central plateau. Our results show that SMB for the total Antarctic Ice Sheet (including ice shelves) has increased at a rate of 7 ± 0.13 Gt decade−1 since 1800 AD, representing a net reduction in sea level of ∼ 0.02 mm decade−1 since 1800 and ∼ 0.04 mm decade−1 since 1900 AD. The largest contribution is from the Antarctic Peninsula (∼ 75 %) where the annual average SMB during the most recent decade (2001–2010) is 123 ± 44 Gt yr−1 higher than the annual average during the first decade of the 19th century. Only four ice core records cover the full 1000 years, and they suggest a decrease in snow accumulation during this period. However, our study emphasizes the importance of low-elevation coastal zones, which have been under-represented in previous investigations of temporal snow accumulation.more » « less
-
Abstract. Data from the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) expedition allowed us to investigate the temporal dynamics of snowfall, snow accumulation and erosion in great detail for almost the whole accumulation season (November 2019 to May 2020). We computed cumulative snow water equivalent (SWE) over the sea ice based on snow depth and density retrievals from a SnowMicroPen and approximately weekly measured snow depths along fixed transect paths. We used the derived SWE from the snow cover to compare with precipitation sensors installed during MOSAiC. The data were also compared with ERA5 reanalysis snowfall rates for the drift track. We found an accumulated snow mass of 38 mm SWE between the end of October 2019 and end of April 2020. The initial SWE over first-year ice relative to second-year ice increased from 50 % to 90 % by end of the investigation period. Further, we found that the Vaisala Present Weather Detector 22, an optical precipitation sensor, and installed on a railing on the top deck of research vessel Polarstern, was least affected by blowing snow and showed good agreements with SWE retrievals along the transect. On the contrary, the OTT Pluvio2 pluviometer and the OTT Parsivel2 laser disdrometer were largely affected by wind and blowing snow, leading to too high measured precipitation rates. These are largely reduced when eliminating drifting snow periods in the comparison. ERA5 reveals good timing of the snowfall events and good agreement with ground measurements with an overestimation tendency. Retrieved snowfall from the ship-based Ka-band ARM zenith radar shows good agreements with SWE of the snow cover and differences comparable to those of ERA5. Based on the results, we suggest the Ka-band radar-derived snowfall as an upper limit and the present weather detector on RV Polarstern as a lower limit of a cumulative snowfall range. Based on these findings, we suggest a cumulative snowfall of 72 to 107 mm and a precipitation mass loss of the snow cover due to erosion and sublimation as between 47 % and 68 %, for the time period between 31 October 2019 and 26 April 2020. Extending this period beyond available snow cover measurements, we suggest a cumulative snowfall of 98–114 mm.more » « less
-
null (Ed.)Abstract Over the last century, the increase in snow accumulation has partly mitigated the total dynamic Antarctic Ice Sheet mass loss. However, the mechanisms behind this increase are poorly understood. Here we analyze the Antarctic Ice Sheet atmospheric moisture budget based on climate reanalysis and model simulations to reveal that the interannual variability of regional snow accumulation is controlled by both the large-scale atmospheric circulation and short-lived synoptic-scale events (i.e. storm systems). Yet, when considering the entire continent at the multi-decadal scale, only the synoptic-scale events can explain the recent and expected future snow accumulation increase. In a warmer climate induced by climate change, these synoptic-scale events transport air that can contain more humidity due to the increasing temperatures leading to more precipitation on the continent. Our findings highlight that the multi-decadal and interannual snow accumulation variability is governed by different processes, and that we thus cannot rely directly on the mechanisms driving interannual variations to predict long-term changes in snow accumulation in the future.more » « less