Grapevine rootstocks are gaining importance in viticulture as a strategy to combat abiotic challenges, as well as enhance scion physiology. Direct leaf-level physiological parameters like net assimilation rate, stomatal conductance to water vapor, quantum yield of PSII, and transpiration can illuminate the rootstock effect on scion physiology. However, these measures are time-consuming and limited to leaf-level analysis. This study used different rootstocks to investigate the potential application of aerial hyperspectral imagery in the estimation of canopy level measurements. A statistical framework was developed as an ensemble stacked regression (REGST) that aggregated five different individual machine learning algorithms: Least absolute shrinkage and selection operator (Lasso), Partial least squares regression (PLSR), Ridge regression (RR), Elastic net (ENET), and Principal component regression (PCR) to optimize high-throughput assessment of vine physiology. In addition, a Convolutional Neural Network (CNN) algorithm was integrated into an existing REGST, forming a hybrid CNN-REGST model with the aim of capturing patterns from the hyperspectral signal. Based on the findings, the performance of individual base models exhibited variable prediction accuracies. In most cases, Ridge Regression (RR) demonstrated the lowest test Root Mean Squared Error (RMSE). The ensemble stacked regression model (REGST) outperformed the individual machine learning algorithms with an increase in R2 by (0.03 to 0.1). The performances of CNN-REGST and REGST were similar in estimating the four different traits. Overall, these models were able to explain approximately 55–67% of the variation in the actual ground-truth data. This study suggests that hyperspectral features integrated with powerful AI approaches show great potential in tracing functional traits in grapevines.
more »
« less
Improving Prediction of Peroxide Value of Edible Oils Using Regularized Regression Models
We present four unique prediction techniques, combined with multiple data pre-processing methods, utilizing a wide range of both oil types and oil peroxide values (PV) as well as incorporating natural aging for peroxide creation. Samples were PV assayed using a standard starch titration method, AOCS Method Cd 8-53, and used as a verified reference method for PV determination. Near-infrared (NIR) spectra were collected from each sample in two unique optical pathlengths (OPLs), 2 and 24 mm, then fused into a third distinct set. All three sets were used in partial least squares (PLS) regression, ridge regression, LASSO regression, and elastic net regression model calculation. While no individual regression model was established as the best, global models for each regression type and pre-processing method show good agreement between all regression types when performed in their optimal scenarios. Furthermore, small spectral window size boxcar averaging shows prediction accuracy improvements for edible oil PVs. Best-performing models for each regression type are: PLS regression, 25 point boxcar window fused OPL spectral information RMSEP = 2.50; ridge regression, 5 point boxcar window, 24 mm OPL, RMSEP = 2.20; LASSO raw spectral information, 24 mm OPL, RMSEP = 1.80; and elastic net, 10 point boxcar window, 24 mm OPL, RMSEP = 1.91. The results show promising advancements in the development of a full global model for PV determination of edible oils.
more »
« less
- Award ID(s):
- 2003839
- PAR ID:
- 10390188
- Date Published:
- Journal Name:
- Molecules
- Volume:
- 26
- Issue:
- 23
- ISSN:
- 1420-3049
- Page Range / eLocation ID:
- 7281
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the $$K$$-nearest-neighbours fused lasso, involves computing the $$K$$-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the $$K$$-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an $$\epsilon$$-graph rather than a $$K$$-nearest-neighbours graph and contrast it with the $$K$$-nearest-neighbours fused lasso.more » « less
-
Abstract A potential method to determine whether two varieties of edible oils can be differentiated by Fourier transform infrared (FTIR) spectroscopy is proposed using digitally generated data of adulterated edible oils from an infrared (IR) spectral library. The first step is the evaluation of digitally blended data sets. Specifically, IR spectra of adulterated edible oils are computed from digitally blending experimental data of the IR spectra of an edible oil and the corresponding adulterant using the appropriate mixing coefficients to achieve the desired level of adulteration. To determine whether two edible oils can be differentiated by FTIR spectroscopy, pure IR spectra of the two edible oils are compared with IR spectra of two edible oils digitally mixed using a genetic algorithm for pattern recognition to solve a ternary classification problem. If the IR spectra of the two edible oils and their binary mixtures are differentiable from principal component plots of the spectral data, then differences between the IR spectra of these two edible oils are of sufficient magnitude to ensure that a reliable classification by FTIR spectroscopy can be obtained. Using this approach, the feasibility of authenticating edible oils such as extra virgin olive oil (EVOO) directly from library spectra is demonstrated. For this study, both digital and experimental data are combined to generate training and validation data sets to assess detection limits in FTIR spectroscopy for the adulterants.more » « less
-
Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB.more » « less
-
Abstract Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF–TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF–TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF–TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.more » « less
An official website of the United States government

