skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Cross-Validation: What Does It Estimate and How Well Does It Do It?
Award ID(s):
1837931
PAR ID:
10509473
Author(s) / Creator(s):
; ;
Publisher / Repository:
American Statistical Association
Date Published:
Journal Name:
Journal of the American Statistical Association
ISSN:
0162-1459
Page Range / eLocation ID:
1 to 12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existing code or how much new code they develop during a hackathon. Moreover, we aim to understand if and where that code gets reused, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from DEVPOST– a hackathon database – and obtained related code (blobs), authors, and project characteristics from the WORLD OF CODE. We investigated if code blobs in hackathon projects were created before, during, or after an event by identifying the original blob creation date and author, and also checked if the original author was a hackathon project member. We tracked code reuse by first identifying all commits containing blobs created during an event before determining all projects that contain those commits. Result: While only approximately 9.14% of the code blobs are created during hackathons, this amount is still significant considering time and member constraints of such events. Approximately a third of these code blobs get reused in other projects. The number of associated technologies and the number of participants in a project increase reuse probability. Conclusion: Our study demonstrates to what extent pre-existing code is used and new code is created during a hackathon and how much of it is reused elsewhere afterwards. Our findings help to better understand code reuse as a phenomenon and the role of hackathons in this context and can serve as a starting point for further studies in this area. 
    more » « less
  2. Abstract There is a long-standing discrepancy between the observed Galactic classical nova rate of ∼10 yr −1 and the predicted rate from Galactic models of ∼30–50 yr −1 . One explanation for this discrepancy is that many novae are hidden by interstellar extinction, but the degree to which dust can obscure novae is poorly constrained. We use newly available all-sky three-dimensional dust maps to compare the brightness and spatial distribution of known novae to that predicted from relatively simple models in which novae trace Galactic stellar mass. We find that only half (53%) of the novae are expected to be easily detectable ( g ≲ 15) with current all-sky optical surveys such as the All-Sky Automated Survey for Supernovae (ASAS-SN). This fraction is much lower than previously estimated, showing that dust does substantially affect nova detection in the optical. By comparing complementary survey results from the ASAS-SN, OGLE-IV, and Palomar Gattini IR surveys using our modeling, we find a tentative Galactic nova rate of ∼30 yr −1 , though this could be as high as ∼40 yr −1 , depending on the assumed distribution of novae within the Galaxy. These preliminary estimates will be improved in future work through more sophisticated modeling of nova detection in ASAS-SN and other surveys. 
    more » « less
  3. Quantization is often cited as a technique for reducing model size and accelerating deep learning. However, past literature suggests that the effect of quantization on latency varies significantly across different settings, in some cases even increasing inference time rather than reducing it. To address this discrepancy, we conduct a series of systematic experiments on the Chameleon testbed to investigate the impact of three key variables on the effect of post-training quantization: the machine learning framework, the compute hardware, and the model itself. Our experiments demonstrate that each of these has a substantial impact on the overall inference time of a quantized model. Furthermore, we make experiment materials and artifacts publicly available so that others can validate our findings on the same hardware using Chameleon, and we share open educational resources on this topic that may be adopted in formal and informal education settings. 
    more » « less