Person Re-IDentification (P-RID), as an instance-level recognition problem, still remains challenging in computer vision community. Many P-RID works aim to learn faithful and discriminative features/metrics from offline training data and directly use them for the unseen online testing data. However, their performance is largely limited due to the severe data shifting issue between training and testing data. Therefore, we propose an online joint multi-metric adaptation model to adapt the offline learned P-RID models for the online data by learning a series of metrics for all the sharing-subsets. Each sharing-subset is obtained from the proposed novel frequent sharing-subset mining module and contains a group of testing samples which share strong visual similarity relationships to each other. Unlike existing online P-RID methods, our model simultaneously takes both the sample-specific discriminant and the set-based visual similarity among testing samples into consideration so that the adapted multiple metrics can refine the discriminant of all the given testing samples jointly via a multi-kernel late fusion framework. Our proposed model is generally suitable to any offline learned P-RID baselines for online boosting, the performance improvement by our model is not only verified by extensive experiments on several widely-used P-RID benchmarks (CUHK03, Market1501, DukeMTMC-reID and MSMT17) and state-of-the-art P-RID baselines but also guaranteed by the provided in-depth theoretical analyses.
more »
« less
Optimal ratio for data splitting
Abstract It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is , where is the number of parameters in a linear regression model that explains the data well.
more »
« less
- PAR ID:
- 10445061
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Statistical Analysis and Data Mining: The ASA Data Science Journal
- Volume:
- 15
- Issue:
- 4
- ISSN:
- 1932-1864
- Page Range / eLocation ID:
- p. 531-538
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT In this paper, we introduce a novel data augmentation methodology based on Conditional Progressive Generative Adversarial Networks (CPGAN) to generate diverse black hole (BH) images, accounting for variations in spin and electron temperature prescriptions. These generated images are valuable resources for training deep learning algorithms to accurately estimate black hole parameters from observational data. Our model can generate BH images for any spin value within the range of [−1, 1], given an electron temperature distribution. To validate the effectiveness of our approach, we employ a convolutional neural network to predict the BH spin using both the GRMHD images and the images generated by our proposed model. Our results demonstrate a significant performance improvement when training is conducted with the augmented data set while testing is performed using GRMHD simulated data, as indicated by the high R2 score. Consequently, we propose that GANs can be employed as cost-effective models for black hole image generation and reliably augment training data sets for other parametrization algorithms.more » « less
-
Abstract The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72–74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this ‘middle exclusion’ protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem.more » « less
-
null (Ed.)Abstract Average lifetime, or mean time to failure (MTTF), of a product is an important metric to measure the product reliability. Current methods of evaluating MTTF are mainly statistics or data based. They need lifetime testing on a number of products to get the lifetime samples, which are then used to estimate MTTF. The lifetime testing, however, is expensive in terms of both time and cost. The efficiency is also low because it cannot be effectively incorporated in the early design stage where many physics-based models are available. We propose to predict MTTF in the design stage by means of physics-based models. The advantage is that the design can be continually improved by changing design variables until reliability measures, including MTTF, are satisfied. Since the physics-based models are usually computationally demanding, we face a problem with both big data (on the model input side) and small data (on the model output side). We develop an adaptive supervised training method based on Gaussian process regression, and the method can then quickly predict MTTF with minimized number of calling the physics-based models. The effectiveness of the method is demonstrated by two examples.more » « less
-
The investigation of brain health development is paramount, as a healthy brain underpins cognitive and physical well-being, and mitigates cognitive decline, neurodegenerative diseases, and mental health disorders. This study leverages the UK Biobank dataset containing static functional network connectivity (sFNC) data derived from resting-state functional magnetic resonance imaging (rs-fMRI) and assessment data. We introduce a novel approach to forecasting a brain health index (BHI) by deploying three distinct models, each capitalizing on different modalities for training and testing. The first model exclusively employs psychological assessment measures, while the second model harnesses both neuroimaging and assessment data for training but relies solely on assessment data during testing. The third model encompasses a holistic strategy, utilizing neuroimaging and assessment data for the training and testing phases. The proposed models employ a two-step approach for calculating the BHI. In the first step, the input data is subjected to dimensionality reduction using principal component analysis (PCA) to identify critical patterns and extract relevant features. The resultant concatenated feature vector is then utilized as input to variational autoencoders (VAE). This network generates a low-dimensional representation of the input data used for calculating BHI in new subjects without requiring imaging data. The results suggest that incorporating neuroimaging data into the BHI model, even when predicting from assessments alone, enhances its ability to accurately evaluate brain health. The VAE model exemplifies this improvement by reconstructing the sFNC matrix more accurately than the assessment data. Moreover, these BHI models also enable us to identify distinct behavioral and neural patterns. Hence, this approach lays the foundation for larger-scale efforts to monitor and enhance brain health, aiming to build resilient brain systems.more » « less
An official website of the United States government
