Net Promotor Score is an important business measurement process where customers are surveyed and asked to rate their likelihood of recommending the company's products and/or services. In many applications, customers are asked to respond on an 11-point ordinal scale of 0 to 10. In developing the score, the data are reformulated into a labelled 3 class scale (0-6: Detractor, 7-8: Passive and 9-10: Promoter). [1] Many companies that choose to use Net Promoter Score as a core management metric integrate the measurement into all phases of the company and seek every opportunity to assess company performance in terms of likelihood to promote the company. In addition to a variety of survey opportunities, the ability to score comments in survey, social media and blogs with promoter rating may provide an additional valuable source of business insight. Even on a three-point scale, Net Promoter is an ordinal classification problem. A number of successful algorithms, that develop ordinal classifiers have been developed. [2] None of the top performing classifiers can be used for applications like text classification or image classification, since they don't employ deep learning. Any appropriate strategy must utilize the ordering information of classes without imposing a strong continuous assumption or fixed spacing assumption on the ordinal classes. In this paper, we use a novel Deep Learning methodology called OHPLnet (Ordinal Hyperplane Loss Network) that is specifically designed for data with ordinal classes. [3] The algorithm is used to develop predictions of the eleven classes, that may be used in the standard Net Promoter Score generation process.
more »
« less
Ordinal Hyperplane Loss
The problem of ordinal classification occurs in a large and growing number of areas. Some of the most common source and applications of ordinal data include rating scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity, facial age estimation, etc. The problem of predicting ordinal classes is typically addressed by either performing n-1 binary classification for n ordinal classes or treating ordinal classes as continuous values for regression. However, the first strategy doesn't fully utilize the ordering information of classes and the second strategy imposes a strong continuous assumption to ordinal classes. In this paper, we propose a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. The proposal of OHPL is a significant advancement in predicting ordinal class data, since it enables deep learning techniques to be applied to the ordinal classification problem on both structured and unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space where the distance between points and their class centroids are minimized while a nontrivial ordinal relationship among classes are maintained. Experimental results show that deep neural network with OHPL not only outperforms the state-of-the-art alternatives on classification accuracy but also scales well to large ordinal classification problems.
more »
« less
- Award ID(s):
- 1853191
- PAR ID:
- 10157405
- Date Published:
- Journal Name:
- 2018 IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 2337 to 2344
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Hyperspectral imagery (HSI) has emerged as a highly successful sensing modality for a variety of applications ranging from urban mapping to environmental monitoring and precision agriculture. Despite the efforts by the scientific community, developing reliable algorithms of HSI classification remains a challenging problem especially for high-resolution HSI data where there is often larger intraclass variability combined with scarcity of ground truth data and class imbalance. In recent years, deep neural networks have emerged as a promising strategy for problems of HSI classification where they have shown a remarkable potential for learning joint spectral-spatial features efficiently via backpropagation. In this paper, we propose a deep learning strategy for HSI classification that combines different convolutional neural networks especially designed to efficiently learn joint spatial-spectral features over multiple scales. Our method achieves an overall classification accuracy of 66.73% on the 2018 IEEE GRSS hyperspectral dataset – a high-resolution dataset that includes 20 urban land-cover and land-use classesmore » « less
-
Markopoulos, Panos P. ; Ouyang, Bing (Ed.)We consider the problem of unsupervised (blind) evaluation and assessment of the quality of data used for deep neural network (DNN) RF signal classification. When neural networks train on noisy or mislabeled data, they often (over-)fit to the noise measurements and faulty labels, which leads to significant performance degradation. Also, DNNs are vulnerable to adversarial attacks, which can considerably reduce their classification performance, with extremely small perturbations of their input. In this paper, we consider a new method based on L1-norm principal-component analysis (PCA) to improve the quality of labeled wireless data sets that are used for training a convolutional neural network (CNN), and a deep residual network (ResNet) for RF signal classification. Experiments with data generated for eleven classes of digital and analog modulated signals show that L1-norm tensor conformity curation of the data identifies and removes from the training data set inappropriate class instances that appear due to mislabeling and universal black-box adversarial attacks and drastically improves/restores the classification accuracy of the identified deep neural network architectures.more » « less
-
While cross entropy (CE) is the most commonly used loss function to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses (i.e., CE, LS, FL, MSE) produce equivalent features on training data. In particular, based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses (i.e., CE, LS, FL, MSE) lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.more » « less
-
Few-shot machine learning attempts to predict outputs given only a very small number of training examples. The key idea behind most few-shot learning approaches is to pre-train the model with a large number of instances from a different but related class of data, classes for which a large number of instances are available for training. Few-shot learning has been most successfully demonstrated for classification problems using Siamese deep learning neural networks. Few-shot learning is less extensively applied to time-series forecasting. Few-shot forecasting is the task of predicting future values of a time-series even when only a small set of historic time-series is available. Few-shot forecasting has applications in domains where a long history of data is not available. This work describes deep neural network architectures for few-shot forecasting. All the architectures use a Siamese twin network approach to learn a difference function between pairs of time-series, rather than directly forecasting based on historical data as seen in traditional forecasting models. The networks are built using Long short-term memory units (LSTM). During forecasting, a model is able to forecast time-series types that were never seen in the training data by using the few available instances of the new time-series type as reference inputs. The proposed architectures are evaluated on Vehicular traffic data collected in California from the Caltrans Performance Measurement System (PeMS). The models were trained with traffic flow data collected at specific locations and then are evaluated by predicting traffic at different locations at different time horizons (0 to 12 hours). The Mean Absolute Error (MAE) was used as the evaluation metric and also as the loss function for training. The proposed architectures show lower prediction error than a baseline nearest neighbor forecast model. The prediction error increases at longer time horizons.more » « less