This essay describes essential considerations and select methods in computational text analysis for use in the study of history. We explore specific approaches that can be used for understanding conceptual change over time in a large corpus of documents. By way of example, using a corpus of 27,977 articles collected on the microbiome, this paper studies: 1) the general microbiome discourse for 2001 to 2010; 2) the usage and sense of the word “human” from 2001 to 2010; and 3) highlights shifts in the microbiome discourse from 2001 to 2010.
more »
« less
Interpolating Population Distributions using Public-Use Data: An Application to Income Segregation using American Community Survey Data
- Award ID(s):
- 1853096
- PAR ID:
- 10483753
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- Journal of the American Statistical Association
- Volume:
- 118
- Issue:
- 541
- ISSN:
- 0162-1459
- Page Range / eLocation ID:
- 84 to 96
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Foundation Models using Self-Improving Data Foundation Models using Self-Improving Data AugmentationOptical multilayer thin film structures are widely used in many photonic applica- tions, including filters, absorbers, photovoltaics, display devices. The important part to enable these applications is the inverse design, which seeks to identify a suitable structure that satisfy desired optical responses. Recently, a Foundation model-based OptoGPT is proposed and has shown great potential to solve a wide range of inverse design problems. However, OptoGPT fails to design certain types of optical responses that are important to practical applications. The major rea- son is that the training data is randomly sampled and it is highly probable that these design targets are not selected in training, leading to the out-of-distribution issue. In this work, we propose a self-improving data augmentation technique by leveraging neural networks’ extrapolation ability. Using this method, we show sig- nificant improvement in various application design tasks with minimum fine-tuning. The approach can be potentially generalized to other inverse scientific foundation models.more » « less
-
Institute of Industrial and Systems Engineers (Ed.)
-
Çakırtaş, M.; Ozdemir, M.K. (Ed.)
-
Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task.more » « less
An official website of the United States government

