skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 1, 2026

Title: Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents
The preprocessing of infrared spectra can significantly improve predictive accuracy for protein, carbohydrate, lipid, or other nutrition components, yet optimal preprocessing selection is typically empirical, tedious, and dataset specific. This study introduces a Bayesian optimization-based framework designed for the automated selection of optimal spectral preprocessing pipelines within a chemometric modeling context. The framework was applied to mid-infrared spectra of milk to predict compositional parameters for fat, protein, lactose, and total solids. A total of 385 averaged spectra corresponding to 198 unique samples was split into a 70/30 ratio (training/test) using a group-aware Kennard-Stone algorithm, resulting in 269 averaged spectra (135 unique samples) for training and 116 spectra (58 unique samples) for testing. Six regression models: Elastic Net, Gradient Boosting Machines (GBM), Partial Least Squares (PLS), RidgeCV Regression, LassoLarsCV, and Support Vector Regression (SVR) were evaluated across three preprocessing conditions: (1) no preprocessing, (2) literature-derived custom preprocessing (e.g., MSC, SNV, and first and second derivatives), and (3) optimized preprocessing via the proposed Bayesian framework. Optimized preprocessing consistently outperformed other methods, with RidgeCV achieving the best performance for all components except lactose, where PLS slightly outperformed it. Improvements in predictive accuracy, particularly in terms of RMSEP were observed across all milk components. The best RMSEP results were achieved for protein (RMSEP = 0.054, R2=0.981) and lactose (RMSEP = 0.026, R2=0.917), followed by fat (RMSEP = 0.139, R2=0.926) and total solids (RMSEP = 0.154, R2=0.960). Literature-based pipelines demonstrated inconsistent effectiveness, highlighting the limitations of transferring preprocessing methods between datasets. The Bayesian optimization approach identified relatively simple yet highly effective preprocessing pipelines, typically involving few steps. By eliminating manual trial and error, this data-driven strategy offers a robust and generalizable solution that streamlines spectral modeling in dairy analysis and can be readily applied to other types of spectroscopic data across various domains.  more » « less
Award ID(s):
2345069
PAR ID:
10649822
Author(s) / Creator(s):
; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Foods
Volume:
14
Issue:
17
ISSN:
2304-8158
Page Range / eLocation ID:
2996
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract ObjectivesThis study explored differing levels of macronutrients in breast milk in relation to maternal anemia and hemoglobin. MethodsArchived milk specimens and data from a cross‐sectional sample of 208 breastfeeding mothers in northern Kenya, originally collected in 2006, were analyzed; data included milk fat, maternal hemoglobin concentration, and anemia status (anemia defined as hemoglobin <12 g/dL). Total protein and lactose were measured and energy was calculated. To explore the association between milk outcomes (fat, protein, lactose, and energy) and anemia, regression models were constructed with and without adjustment for maternal age, parity, and time (days) postpartum. The same models were constructed using hemoglobin as a continuous predictor in lieu of dichotomous anemia to explore the role of hemoglobin levels and anemia severity in predicting milk outcomes. ResultsThe group comparison indicated significantly higher milk protein and lower milk fat for anemic mothers relative to nonanemic counterparts. After adjustment for maternal age, parity, and time postpartum, maternal anemia was associated with significantly higher milk protein (P = 0.001) and significantly lower milk fat (P = 0.025). Hemoglobin had a significant inverse relationship with milk protein (P = 0.017) and a marginally significant positive relationship with milk fat (P = 0.060) after adjusting for the maternal variables. Neither anemia nor hemoglobin was significant in predicting lactose or milk energy. ConclusionsMaternal anemia and hemoglobin concentration may be associated with complex changes in milk macronutrients. Future research should clarify the impact of maternal anemia on a range of breast milk components while accounting for other maternal characteristics. 
    more » « less
  2. Protein content variation in milk can impact the quality and consistency of dairy products, necessitating access to in-line real time monitoring. Here, we present a chemometric approach for the qualitative and quantitative monitoring of β-lactoglobulin and α-lactalbumin, using mid-infrared spectroscopy (MIR). In this study, we employed Hotelling T2 and Q-residual for outlier detection, automated preprocessing using nippy, conducted wavenumber selection with genetic algorithms, and evaluated four chemometric models, including partial least squares, support vector regression (SVR), ridge, and logistic regression to accurately predict the concentrations of β-lactoglobulin and α-lactalbumin in milk. For the quantitative analysis of these two whey proteins, SVR performed the best to interpret protein concentration from 197 MIR spectra originating from 42 Cornell University samples of preserved pasteurized modified milk. The R2 values obtained for β-lactoglobulin and α-lactalbumin using leave one out cross-validation (LOOCV) are 92.8% and 92.7%, respectively, which is the highest correlation reported to date. Our approach introduced a combination of preprocessing automation, genetic algorithm-based wavenumber selection, and used Optuna to optimize the framework for tuning hyperparameters of the chemometric models, resulting in the best chemometric analysis of MIR data to quantitate β-lactoglobulin and α-lactalbumin to date. 
    more » « less
  3. Abstract BackgroundMaternal anemia has adverse consequences for the mother‐infant dyad. To evaluate whether and how milk nutrient content may change in ways that could “buffer” infants against the conditions underlying maternal anemia, this study assessed associations between milk macronutrients and maternal iron‐deficiency anemia (IDA), non‐iron‐deficiency anemia (NIDA), and inflammation. MethodsA secondary analysis of cross‐sectional data and milk from northern Kenya was conducted (n = 204). The combination of hemoglobin and transferrin receptor defined IDA/NIDA. Elevated serum C‐reactive protein defined acute inflammation. The effects of IDA, NIDA, and inflammation on milk macronutrients were evaluated in regression models. ResultsIDA (β = 0.077,p =.022) and NIDA (β = 0.083,p =.100) predicted higher total protein (ln). IDA (β = −0.293,p =.002), NIDA (β = −0.313,p =.047), and inflammation (β = −0.269,p =.007) each predicted lower fat (ln); however, anemia accompanying inflammation predictedhigherfat (β = 0.655,p =.007 for IDA and β = 0.468,p =.092 for NIDA). NIDA predicted higher lactose (β = 1.020,p =.003). ConclusionsMilk macronutrient content both increases and decreases in the presence of maternal anemia and inflammation, suggesting a more complicated and dynamic change than simple impairment of nutrient delivery during maternal stress. Maternal fat delivery to milk may be impaired under anemia. Mothers may buffer infant nutrition against adverse conditions or poor maternal health by elevating milk protein (mothers with IDA/NIDA), lactose (mothers with NIDA), or fat (mothers with anemiaandinflammation). This study demonstrates the foundational importance of maternal micronutrient health and inflammation or infection for advancing the ecological understanding of human milk nutrient variation. 
    more » « less
  4. We present four unique prediction techniques, combined with multiple data pre-processing methods, utilizing a wide range of both oil types and oil peroxide values (PV) as well as incorporating natural aging for peroxide creation. Samples were PV assayed using a standard starch titration method, AOCS Method Cd 8-53, and used as a verified reference method for PV determination. Near-infrared (NIR) spectra were collected from each sample in two unique optical pathlengths (OPLs), 2 and 24 mm, then fused into a third distinct set. All three sets were used in partial least squares (PLS) regression, ridge regression, LASSO regression, and elastic net regression model calculation. While no individual regression model was established as the best, global models for each regression type and pre-processing method show good agreement between all regression types when performed in their optimal scenarios. Furthermore, small spectral window size boxcar averaging shows prediction accuracy improvements for edible oil PVs. Best-performing models for each regression type are: PLS regression, 25 point boxcar window fused OPL spectral information RMSEP = 2.50; ridge regression, 5 point boxcar window, 24 mm OPL, RMSEP = 2.20; LASSO raw spectral information, 24 mm OPL, RMSEP = 1.80; and elastic net, 10 point boxcar window, 24 mm OPL, RMSEP = 1.91. The results show promising advancements in the development of a full global model for PV determination of edible oils. 
    more » « less
  5. Synopsis The ability to provision offspring with milk is a significant adaptive feature of mammals that allows for considerable maternal regulation of offspring beyond gestation, as milk provides complete nutrition for developing neonates. For mothers, lactation is a period of marked increases in energetic and nutritive demands to support milk synthesis; because of this considerable increase in demand imposed on multiple physiological systems, lactation is particularly susceptible to the effects of chronic stress. Here, we present work that explores the impact of chronic stress during lactation on maternal lactation performance (i.e., milk quality and quantity) and the expression of key milk synthesis genes in mammary tissue using a Sprague–Dawley rat model. We induced chronic stress using a well-established, ethologically relevant novel male intruder paradigm for 10 consecutive days during the postpartum period. We hypothesized that the increased energetic burden of mounting a chronic stress response during lactation would decrease lactation performance. Specifically, we predicted that chronic exposure to this social stressor would decrease either milk quality (i.e., composition of proximate components and energy density) or quantity. We also predicted that changes in proximate composition (i.e., lipid, lactose, and protein concentrations) would be associated with changes in gene expression levels of milk synthesis genes. Our results supported our hypothesis that chronic stress impairs lactation performance. Relative to the controls, chronically stressed rats had lower milk yields. We also found that milk quality was decreased; milk from chronically stressed mothers had lower lipid concentration and lower energy density, though protein and lactose concentrations were not different between treatment groups. Although there was a change in proximate composition, chronic stress did not impact mammary gland expression of key milk synthesis genes. Together, this work demonstrates that exposure to a chronic stressor impacts lactation performance, which in turn has the potential to impact offspring development via maternal effects. 
    more » « less