NSF PAR Search | NSF Public Access Repository

Code changes are often reviewed before they are deployed. Popular source control systems aid code review by presenting textual differences between old and new versions of the code, leaving developers with the difficult task of determining whether the differences actually produced the desired behavior. Fortunately, we can mine such information from code repositories. We propose aiding code review with inter-version semantic differential analysis. During review of a new commit, a developer is presented with summaries of both code differences and behavioral differences, which are expressed as diffs of likely invariants extracted by running the system's test cases. As a result, developers can more easily determine that the code changes produced the desired effect. We created an invariant-mining tool chain, GETTY, to support our concept of semantically-assisted code review. To validate our approach, 1) we applied GETTY to the commits of 6 popular open source projects, 2) we assessed the performance and cost of running GETTY in different configurations, and 3) we performed a comparative user study with 18 developers. Our results demonstrate that semantically-assisted code review is feasible, effective, and that real programmers can leverage it to improve the quality of their reviews.

Evaluating and improving the reliability of gas-phase sensor system calibrations across new locations for ambient measurements and personal exposure monitoring

https://doi.org/10.5194/amt-12-4211-2019

Vikram, Sharad; Collier-Oxandale, Ashley; Ostertag, Michael H.; Menarini, Massimiliano; Chermak, Camron; Dasgupta, Sanjoy; Rosing, Tajana; Hannigan, Michael; Griswold, William G. (January 2019, Atmospheric Measurement Techniques)

Abstract. Advances in ambient environmental monitoring technologies are enabling concerned communities and citizens to collect data to better understand their local environment and potential exposures. These mobile, low-cost tools make it possible to collect data with increased temporal and spatial resolution, providing data on a large scale with unprecedented levels of detail. This type of data has the potential to empower people to make personal decisions about their exposure and support the development of local strategies for reducing pollution and improving health outcomes. However, calibration of these low-cost instruments has been a challenge. Often, a sensor package is calibrated via field calibration. This involves colocating the sensor package with a high-quality reference instrument for an extended period and then applying machine learning or other model fitting technique such as multiple linear regression to develop a calibration model for converting raw sensor signals to pollutant concentrations. Although this method helps to correct for the effects of ambient conditions (e.g., temperature) and cross sensitivities with nontarget pollutants, there is a growing body of evidence that calibration models can overfit to a given location or set of environmental conditions on account of the incidental correlation between pollutant levels and environmental conditions, including diurnal cycles. As a result, a sensor package trained at a field site may provide less reliable data when moved, or transferred, to a different location. This is a potential concern for applications seeking to perform monitoring away from regulatory monitoring sites, such as personal mobile monitoring or high-resolution monitoring of a neighborhood. We performed experiments confirming that transferability is indeed a problem and show that it can be improved by collecting data from multiple regulatory sites and building a calibration model that leverages data from a more diverse data set. We deployed three sensor packages to each of three sites with reference monitors (nine packages total) and then rotated the sensor packages through the sites over time. Two sites were in San Diego, CA, with a third outside of Bakersfield, CA, offering varying environmental conditions, general air quality composition, and pollutant concentrations. When compared to prior single-site calibration, the multisite approach exhibits better model transferability for a range of modeling approaches. Our experiments also reveal that random forest is especially prone to overfitting and confirm prior results that transfer is a significant source of both bias and standard error. Linear regression, on the other hand, although it exhibits relatively high error, does not degrade much in transfer. Bias dominated in our experiments, suggesting that transferability might be easily increased by detecting and correcting for bias. Also, given that many monitoring applications involve the deployment of many sensor packages based on the same sensing technology, there is an opportunity to leverage the availability of multiple sensors at multiple sites during calibration to lower the cost of training and better tolerate transfer. We contribute a new neural network architecture model termed split-NN that splits the model into two stages, in which the first stage corrects for sensor-to-sensor variation and the second stage uses the combined data of all the sensors to build a model for a single sensor package. The split-NN modeling approach outperforms multiple linear regression, traditional two- and four-layer neural networks, and random forest models. Depending on the training configuration, compared to random forest the split-NN method reduced error 0 %–11 % for NO2 and 6 %–13 % for O3.

Full Text Available

Search for: All records