Abstract Although the connectivity offered by industrial internet of things (IIoT) enables enhanced operational capabilities, the exposure of systems to significant cybersecurity risks poses critical challenges. Recently, machine learning (ML) algorithms such as feature-based support vector machines and logistic regression, together with end-to-end deep neural networks, have been implemented to detect intrusions, including command injection, denial of service, reconnaissance, and backdoor attacks, by capturing anomalous patterns. However, ML algorithms not only fall short in agile identification of intrusion with few samples, but also fail in adapting to new data or environments. This paper introduces hyperdimensional computing (HDC) as a new cognitive computing paradigm that mimics brain functionality to detect intrusions in IIoT systems. HDC encodes real-time data into a high-dimensional representation, allowing for ultra-efficient learning and analysis with limited samples and a few passes. Additionally, we incorporate the concept of regenerating brain cells into hyperdimensional computing to further improve learning capability and reduce the required memory. Experimental results on the WUSTL-IIOT-2021 dataset show that HDC detects intrusion with the accuracy of 92.6%, which is superior to multi-layer perceptron (40.2%), support vector machine (72.9%), logistic regression (84.2%), and Gaussian process classification (89.1%) while requires only 300 data and 5 iterations for training.
more »
« less
Machine learning approaches in non-contact autofluorescence spectrum classification
Manual surgical resection of soft tissue sarcoma tissue can involve many challenges, including the critical need for precise determination of tumor boundary with normal tissue and limitations of current surgical instrumentation, in addition to standard risks of infection or tissue healing difficulty. Substantial research has been conducted in the biomedical sensing landscape for development of non-human contact sensing devices. One such point-of-care platform, previously devised by our group, utilizes autofluorescence-based spectroscopic signatures to highlight important physiological differences in tumorous and healthy tissue. The following study builds on this work, implementing classification algorithms, including Artificial Neural Network, Support Vector Machine, Logistic Regression, and K-Nearest Neighbors, to diagnose freshly resected murine tissue as sarcoma or healthy. Classification accuracies of over 93% are achieved with Logistic Regression, and Area Under the Curve scores over 94% are achieved with Support Vector Machines, delineating a clear way to automate photonic diagnosis of ambiguous tissue in assistance of surgeons. These interpretable algorithms can also be linked to important physiological diagnostic indicators, unlike the black-box ANN architecture. This is the first known study to use machine learning to interpret data from a non-contact autofluorescence sensing device on sarcoma tissue, and has direct applications in rapid intraoperative sensing.
more »
« less
- Award ID(s):
- 2125528
- PAR ID:
- 10651545
- Editor(s):
- Rudzicz, Frank
- Publisher / Repository:
- The Public Library of Science
- Date Published:
- Journal Name:
- PLOS Digital Health
- Volume:
- 3
- Issue:
- 10
- ISSN:
- 2767-3170
- Page Range / eLocation ID:
- e0000602
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Previous work using logistic regression suggests that cognitive control‐related frontoparietal activation in early psychosis can predict symptomatic improvement after 1 year of coordinated specialty care with 66% accuracy. Here, we evaluated the ability of six machine learning (ML) algorithms and deep learning (DL) to predict “Improver” status (>20% improvement on Brief Psychiatric Rating Scale [BPRS] total score at 1‐year follow‐up vs. baseline) and continuous change in BPRS score using the same functional magnetic resonance imaging‐based features (frontoparietal activations during the AX‐continuous performance task) in the same sample (individuals with either schizophrenia (n =65, 49M/16F, mean age 20.8 years) or Type I bipolar disorder (n= 17, 9M/8F, mean age 21.6 years)). 138 healthy controls were included as a reference group. “Shallow” ML methods included Naive Bayes, support vector machine, K Star, AdaBoost, J48 decision tree, and random forest. DL included an explainable artificial intelligence (XAI) procedure for understanding results. The best overall performances (70% accuracy for the binary outcome and root mean square error = 9.47 for the continuous outcome) were achieved using DL. XAI revealed left DLPFC activation was the strongest feature used to make binary classification decisions, with a classification activation threshold (adjusted beta = .017) intermediate to the healthy control mean (adjusted beta = .15, 95% CI = −0.02 to 0.31) and patient mean (adjusted beta = −.13, 95% CI = −0.37 to 0.11). Our results suggest DL is more powerful than shallow ML methods for predicting symptomatic improvement. The left DLPFC may be a functional target for future biomarker development as its activation was particularly important for predicting improvement.more » « less
-
Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.more » « less
-
Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methodstoestimateresourcefavorability,buttheseanalyses relied uponsomeexpert decisions.Whileexpert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support- vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.more » « less
-
Binary classification is a fundamental machine learning task defined as correctly assigning new objects to one of two groups based on a set of training objects. Driven by the practical importance of binary classification, numerous machine learning techniques have been developed and refined over the last three decades. Among the most popular techniques are artificial neural networks, decision trees, ensemble methods, logistic regression, and support vector machines. We present here machine learning and pattern recognition algorithms that, unlike the commonly used techniques, are based on combinatorial optimization and make use of information on pairwise relations between the objects of the data set, whether training objects or not. These algorithms solve the respective problems optimally and efficiently, in contrast to the primarily heuristic approaches currently used for intractable problem models in pattern recognition and machine learning. The algorithms described solve efficiently the classification problem as a network flow problem on a graph. The technical tools used in the algorithm are the parametric cut procedure and a process called sparse computation that computes only the pairwise similarities that are “relevant.” Sparse computation enables the scalability of any algorithm that uses pairwise similarities. We present evidence on the effectiveness of the approaches, measured in terms of accuracy and running time, in pattern recognition, image segmentation, and general data mining.more » « less
An official website of the United States government

