Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg. 
                        more » 
                        « less   
                    
                            
                            Using Machine Learning to Aid in Data Classification: Classifying Occupation Compatibility with Highly Automated Vehicles
                        
                    
    
            Data classification is central to human factors research, and manual data classification is tedious and error prone. Supervised learning enables analysts to train an algorithm by manually classifying a few cases and then have that algorithm classify many cases. However, algorithms often fail to leverage human insight. To address this, we augment supervised learning with unsupervised learning and data visualization. Unsupervised learning highlights potential classification errors, explains the underlying classification, and identifies additional cases that merit manual classification. We illustrate this using the Occupational Information Network database to classify occupations as having tasks that might be performed in an automated vehicle. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1839484
- PAR ID:
- 10546753
- Publisher / Repository:
- SAGE Publications
- Date Published:
- Journal Name:
- Ergonomics in Design: The Quarterly of Human Factors Applications
- Volume:
- 29
- Issue:
- 2
- ISSN:
- 1064-8046
- Format(s):
- Medium: X Size: p. 4-12
- Size(s):
- p. 4-12
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In Activities of Daily Living (ADL) research, which has gained prominence due to the burgeoning aging population, the challenge of acquiring sufficient ground truth data for model training is a significant bottleneck. This obstacle necessitates a pivot towards unsupervised representation learning methodologies, which do not require many labeled datasets. The existing research focused on the tradeoff between the fully supervised model and the unsupervised pre-trained model and found that the unsupervised version outperformed in most cases. However, their investigation did not use large enough Human Activity Recognition (HAR) datasets, both datasets resulting in 3 dimensions. This poster extends the investigation by employing a large multivariate time series HAR dataset and experimenting with the models with different combinations of critical training parameters such as batch size and learning rate to observe the performance tradeoff. Our findings reveal that the pre-trained model is comparable to the fully supervised classification with a larger multivariate time series HAR dataset. This discovery underscores the potential of unsupervised representation learning in ADL extractions and highlights the importance of model configuration in optimizing performance.more » « less
- 
            null (Ed.)Abstract Background Cryo-electron microscopy (Cryo-EM) is widely used in the determination of the three-dimensional (3D) structures of macromolecules. Particle picking from 2D micrographs remains a challenging early step in the Cryo-EM pipeline due to the diversity of particle shapes and the extremely low signal-to-noise ratio of micrographs. Because of these issues, significant human intervention is often required to generate a high-quality set of particles for input to the downstream structure determination steps. Results Here we propose a fully automated approach (DeepCryoPicker) for single particle picking based on deep learning. It first uses automated unsupervised learning to generate particle training datasets. Then it trains a deep neural network to classify particles automatically. Results indicate that the DeepCryoPicker compares favorably with semi-automated methods such as DeepEM, DeepPicker, and RELION, with the significant advantage of not requiring human intervention. Conclusions Our framework combing supervised deep learning classification with automated un-supervised clustering for generating training data provides an effective approach to pick particles in cryo-EM images automatically and accurately.more » « less
- 
            We present a novel fine-tuning algorithm in a deep hybrid architecture for semi-supervised text classification. During each increment of the online learning process‚ the fine-tuning algorithm serves as a top-down mechanism for pseudo-jointly modifying model parameters following a bottom-up generative learning pass. The resulting model‚ trained under what we call the Bottom-Up-Top-Down learning algorithm‚ is shown to outperform a variety of competitive models and baselines trained across a wide range of splits between supervised and unsupervised training data.more » « less
- 
            Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techniques is the best suited for their task and tune its parameters. This is particularly challenging in the unsupervised setting, where no labels are available for cross-validation needed for such method and parameter optimization. In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. AutoOD's fundamentally new strategy unifies the merits of unsupervised outlier detection and supervised classification within one integrated solution. It automatically tests a diverse set of unsupervised outlier detectors on a target data set, extracts useful signals from their combined detection results to reliably capture key differences between outliers and inliers. It then uses these signals to produce a "custom outlier classifier" to classify outliers, with its accuracy comparable to supervised outlier classification models trained with ground truth labels - without having access to the much needed labels. On a diverse set of benchmark outlier detection datasets, AutoOD consistently outperforms the best unsupervised outlier detector selected from hundreds of detectors. It also outperforms other tuning-free approaches from 12 to 97 points (out of 100) in the F-1 score.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
