skip to main content


Search for: All records

Award ID contains: 2021871

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    MF-LOGP, a new method for determining a single component octanol–water partition coefficients ($$LogP$$LogP) is presented which uses molecular formula as the only input. Octanol–water partition coefficients are useful in many applications, ranging from environmental fate and drug delivery. Currently, partition coefficients are either experimentally measured or predicted as a function of structural fragments, topological descriptors, or thermodynamic properties known or calculated from precise molecular structures. The MF-LOGP method presented here differs from classical methods as it does not require any structural information and uses molecular formula as the sole model input. MF-LOGP is therefore useful for situations in which the structure is unknown or where the use of a low dimensional, easily automatable, and computationally inexpensive calculations is required. MF-LOGP is a random forest algorithm that is trained and tested on 15,377 data points, using 10 features derived from the molecular formula to make$$LogP$$LogPpredictions. Using an independent validation set of 2713 data points, MF-LOGP was found to have an average$$RMSE$$RMSE= 0.77 ± 0.007,$$MAE$$MAE= 0.52 ± 0.003, and$${R}^{2}$$R2= 0.83 ± 0.003. This performance fell within the spectrum of performances reported in the published literature for conventional higher dimensional models ($$RMSE$$RMSE= 0.42–1.54,$$MAE$$MAE= 0.09–1.07, and$${R}^{2}$$R2= 0.32–0.95). Compared with existing models, MF-LOGP requires a maximum of ten features and no structural information, thereby providing a practical and yet predictive tool. The development of MF-LOGP provides the groundwork for development of more physical prediction models leveraging big data analytical methods or complex multicomponent mixtures.

    Graphical Abstract

     
    more » « less
  2. Multi-label classification (MLC), which assigns multiple labels to each instance, is crucial to domains from computer vision to text mining. Conventional methods for MLC require huge amounts of labeled data to capture complex dependencies between labels. However, such labeled datasets are expensive, or even impossible, to acquire. Worse yet, these pre-trained MLC models can only be used for the particular label set covered in the training data. Despite this severe limitation, few methods exist for expanding the set of labels predicted by pre-trained models. Instead, we acquire vast amounts of new labeled data and retrain a new model from scratch. Here, we propose combining the knowledge from multiple pre-trained models (teachers) to train a new student model that covers the union of the labels predicted by this set of teachers. This student supports a broader label set than any one of its teachers without using labeled data. We call this new problem knowledge amalgamation for multi-label classification. Our new method, Adaptive KNowledge Transfer (ANT), trains a student by learning from each teacher’s partial knowledge of label dependencies to infer the global dependencies between all labels across the teachers. We show that ANT succeeds in unifying label dependencies among teachers, outperforming five state-of-the-art methods on eight real-world datasets. 
    more » « less
    Free, publicly-accessible full text available June 27, 2024
  3. Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techniques is the best suited for their task and tune its parameters. This is particularly challenging in the unsupervised setting, where no labels are available for cross-validation needed for such method and parameter optimization. In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. AutoOD's fundamentally new strategy unifies the merits of unsupervised outlier detection and supervised classification within one integrated solution. It automatically tests a diverse set of unsupervised outlier detectors on a target data set, extracts useful signals from their combined detection results to reliably capture key differences between outliers and inliers. It then uses these signals to produce a "custom outlier classifier" to classify outliers, with its accuracy comparable to supervised outlier classification models trained with ground truth labels - without having access to the much needed labels. On a diverse set of benchmark outlier detection datasets, AutoOD consistently outperforms the best unsupervised outlier detector selected from hundreds of detectors. It also outperforms other tuning-free approaches from 12 to 97 points (out of 100) in the F-1 score. 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  4. Whiteman, N (Ed.)
    Abstract Probiotic yeasts are emerging as preventative and therapeutic solutions for disease. Often ingested via cultured foods and beverages, they can survive the harsh conditions of the gastrointestinal tract and adhere to it, where they provide nutrients and inhibit pathogens like Candida albicans. Yet, little is known of the genomic determinants of these beneficial traits. To this end, we have sequenced 2 food-derived probiotic yeast isolates that mitigate fungal infections. We find that the first strain, KTP, is a strain of Saccharomyces cerevisiae within a small clade that lacks any apparent ancestry from common European/wine S. cerevisiae strains. Significantly, we show that S. cerevisiae KTP genes involved in general stress, pH tolerance, and adherence are markedly different from S. cerevisiae S288C but are similar to the commercial probiotic yeast species S. boulardii. This suggests that even though S. cerevisiae KTP and S. boulardii are from different clades, they may achieve probiotic effect through similar genetic mechanisms. We find that the second strain, ApC, is a strain of Issatchenkia occidentalis, one of the few of this family of yeasts to be sequenced. Because of the dissimilarity of its genome structure and gene organization, we infer that I. occidentalis ApC likely achieves a probiotic effect through a different mechanism than the Saccharomyces strains. Therefore, this work establishes a strong genetic link among probiotic Saccharomycetes, advances the genomics of Issatchenkia yeasts, and indicates that probiotic activity is not monophyletic and complimentary mixtures of probiotics could enhance health benefits beyond a single species. 
    more » « less
    Free, publicly-accessible full text available April 27, 2024
  5. Betz, Markus ; Elezzabi, Abdulhakem Y. (Ed.)
  6. Betz, Markus ; Elezzabi, Abdulhakem Y. (Ed.)
  7. Human context recognition (HCR) using sensor data is a crucial task in Context-Aware (CA) applications in domains such as healthcare and security. Supervised machine learning HCR models are trained using smartphone HCR datasets that are scripted or gathered in-the-wild. Scripted datasets are most accurate because of their consistent visit patterns. Supervised machine learning HCR models perform well on scripted datasets but poorly on realistic data. In-the-wild datasets are more realistic, but cause HCR models to perform worse due to data imbalance, missing or incorrect labels, and a wide variety of phone placements and device types. Lab-to-field approaches learn a robust data representation from a scripted, high-fidelity dataset, which is then used for enhancing performance on a noisy, in-the-wild dataset with similar labels. This research introduces Triplet-based Domain Adaptation for Context REcognition (Triple-DARE), a lab-to-field neural network method that combines three unique loss functions to enhance intra-class compactness and inter-class separation within the embedding space of multi-labeled datasets: (1) domain alignment loss in order to learn domain-invariant embeddings; (2) classification loss to preserve task-discriminative features; and (3) joint fusion triplet loss. Rigorous evaluations showed that Triple-DARE achieved 6.3% and 4.5% higher F1-score and classification, respectively, than state-of-the-art HCR baselines and outperformed non-adaptive HCR models by 44.6% and 10.7%, respectively. 
    more » « less
  8. We use transient optical absorption and time-resolved terahertz THz spectroscopy to investigate photoexcitations in Ti3C2, Mo2Ti2C3, and Nb2C. Measurements reveal pronounced plasmonic effects. Monitoring them provides insights into thermal relaxation processes and low thermal conductivity.

     
    more » « less