skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimized Bags of Artificial Neural Networks Can Predict the Viability of Organisms Exposed to Nanoparticles
Prediction of organismal viability upon exposure to a nanoparticle in varying environments─as fully specified at the molecular scale─has emerged as a useful figure of merit in the design of engineered nanoparticles. We build on our earlier finding that a bag of artificial neural networks (ANNs) can provide such a prediction when such machines are trained with a relatively small data set (with ca. 200 examples). Therein, viabilities were predicted by consensus using the weighted means of the predictions from the bags. Here, we confirm the accuracy and precision of the prediction of nanoparticle viabilities using an optimized bag of ANNs over sets of data examples that had not previously been used in the training and validation process. We also introduce the viability strip, rather than a single value, as the prediction and construct it from the viability probability distribution of an ensemble of ANNs compatible with the data set. Specifically, the ensemble consists of the ANNs arising from subsets of the data set corresponding to different splittings between training and validation, and the different bags (k-folds). A k−1k machine uses a single partition (or bag) of k – 1 ANNs each trained on 1/k of the data to obtain a consensus prediction, and a k-bag machine quorum samples the k possible k−1k machines available for a given partition. We find that with increasing k in the k-bag or k−1k machines, the viability strips become more normally distributed and their predictions become more precise. Benchmark comparisons between ensembles of 4-bag machines and 34 fraction machines suggest that the 34 fraction machine has similar accuracy while overcoming some of the challenges arising from divergent ANNs in the 4-bag machines.  more » « less
Award ID(s):
2001611
PAR ID:
10499724
Author(s) / Creator(s):
; ;
Publisher / Repository:
American Chemical Society
Date Published:
Journal Name:
The Journal of Physical Chemistry A
ISSN:
1089-5639
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We introduce a novel methodology for anomaly detection in time-series data. The method uses persistence diagrams and bottleneck distances to identify anomalies. Specifically, we generate multiple predictors by randomly bagging the data (reference bags), then for each data point replacing the data point for a randomly chosen point in each bag (modified bags). The predictors then are the set of bottleneck distances for the reference/modified bag pairs. We prove the stability of the predictors as the number of bags increases. We apply our methodology to traffic data and measure the performance for identifying known incidents. 
    more » « less
  2. Bouyer, Patricia; Srinivasan, Srikanth (Ed.)
    In recent years the framework of learning from label proportions (LLP) has been gaining importance in machine learning. In this setting, the training examples are aggregated into subsets or bags and only the average label per bag is available for learning an example-level predictor. This generalizes traditional PAC learning which is the special case of unit-sized bags. The computational learning aspects of LLP were studied in recent works [R. Saket, 2021; R. Saket, 2022] which showed algorithms and hardness for learning halfspaces in the LLP setting. In this work we focus on the intractability of LLP learning Boolean functions. Our first result shows that given a collection of bags of size at most 2 which are consistent with an OR function, it is NP-hard to find a CNF of constantly many clauses which satisfies any constant-fraction of the bags. This is in contrast with the work of [R. Saket, 2021] which gave a (2/5)-approximation for learning ORs using a halfspace. Thus, our result provides a separation between constant clause CNFs and halfspaces as hypotheses for LLP learning ORs. Next, we prove the hardness of satisfying more than 1/2 + o(1) fraction of such bags using a t-DNF (i.e. DNF where each term has ≤ t literals) for any constant t. In usual PAC learning such a hardness was known [S. Khot and R. Saket, 2008] only for learning noisy ORs. We also study the learnability of parities and show that it is NP-hard to satisfy more than (q/2^{q-1} + o(1))-fraction of q-sized bags which are consistent with a parity using a parity, while a random parity based algorithm achieves a (1/2^{q-2})-approximation. 
    more » « less
  3. In batch steganography, the sender communicates a secret message by hiding it in a bag of cover objects. The adversary performs the so-called pooled steganalysis in that she inspects the entire bag to detect the presence of secrets. This is typically realized by using a detector trained to de- tect secrets within a single object, applying it to all objects in the bag, and feeding the detector outputs to a pooling function to obtain the final detection statistic. This paper deals with the problem of building the pooler while keep- ing in mind that the Warden will need to be able to de- tect steganography in variable size bags carrying variable payload. We propose a flexible machine learning solution to this challenge in the form of a Transformer Encoder Pooler, which is easily trained to be agnostic to the bag size and payload and offers a better detection accuracy than pre- viously proposed poolers. 
    more » « less
  4. Abstract Monte Carlo simulations for photon transport are commonly used to predict the spectral response, including reflectance, absorptance, and transmittance in nanoparticle laden media, while the computational cost could be high. In this study, we demonstrate a general purpose fully connected neural network approach, trained with Monte Carlo simulations, to accurately predict the spectral response while dramatically accelerating the computational speed. Monte Carlo simulations are first used to generate a training set with a wide range of optical properties covering dielectrics, semiconductors, and metals. Each input is normalized, with the scattering and absorption coefficients normalized on a logarithmic scale to accelerate the training process and reduce error. A deep neural network with ReLU activation is trained on this dataset with the optical properties and medium thickness as the inputs, and diffuse reflectance, absorptance, and transmittance as the outputs. The neural network is validated on a validation set with randomized optical properties, as well as nanoparticle medium examples including barium sulfate, aluminum, and silicon. The error in the spectral response predictions is within 1% which is sufficient for many applications, while the speedup is 1–3 orders of magnitude. This machine learning accelerated approach can allow for high throughput screening, optimization, or real-time monitoring of nanoparticle media's spectral response. 
    more » « less
  5. Fadlelmola, Faisal Mohamed (Ed.)
    Sickle cell disease, a genetic disorder affecting a sizeable global demographic, manifests in sickle red blood cells (sRBCs) with altered shape and biomechanics. sRBCs show heightened adhesive interactions with inflamed endothelium, triggering painful vascular occlusion events. Numerous studies employ microfluidic-assay-based monitoring tools to quantify characteristics of adhered sRBCs from high resolution channel images. The current image analysis workflow relies on detailed morphological characterization and cell counting by a specially trained worker. This is time and labor intensive, and prone to user bias artifacts. Here we establish a morphology based classification scheme to identify two naturally arising sRBC subpopulations—deformable and non-deformable sRBCs—utilizing novel visual markers that link to underlying cell biomechanical properties and hold promise for clinically relevant insights. We then set up a standardized, reproducible, and fully automated image analysis workflow designed to carry out this classification. This relies on a two part deep neural network architecture that works in tandem for segmentation of channel images and classification of adhered cells into subtypes. Network training utilized an extensive data set of images generated by the SCD BioChip, a microfluidic assay which injects clinical whole blood samples into protein-functionalized microchannels, mimicking physiological conditions in the microvasculature. Here we carried out the assay with the sub-endothelial protein laminin. The machine learning approach segmented the resulting channel images with 99.1±0.3% mean IoU on the validation set across 5 k -folds, classified detected sRBCs with 96.0±0.3% mean accuracy on the validation set across 5 k -folds, and matched trained personnel in overall characterization of whole channel images with R 2 = 0.992, 0.987 and 0.834 for total, deformable and non-deformable sRBC counts respectively. Average analysis time per channel image was also improved by two orders of magnitude (∼ 2 minutes vs ∼ 2-3 hours) over manual characterization. Finally, the network results show an order of magnitude less variance in counts on repeat trials than humans. This kind of standardization is a prerequisite for the viability of any diagnostic technology, making our system suitable for affordable and high throughput disease monitoring. 
    more » « less