Surface-enhanced Raman spectroscopy (SERS) is an attractive method for bio-chemical sensing due to its potential for single molecule sensitivity and the prospect of DNA composition analysis. In this manuscript we leverage metal specific chemical enhancement effect to detect differences in SERS spectra of 200-base length single-stranded DNA (ssDNA) molecules adsorbed on gold or silver nanorod substrates, and then develop and train a linear regression as well as neural network models to predict the composition of ssDNA. Our results indicate that employing substrates of different metals that host a given adsorbed molecule leads to distinct SERS spectra, allowing to probe metal-molecule interactions under distinct chemical enhancement regimes. Leveraging this difference and combining spectra from different metals as an input for PCA (Principal Component Analysis) and NN (Neural Network) models, allows to significantly lower the detection errors compared to manual feature-choosing analysis as well as compared to the case where data from single metal is used. Furthermore, we show that NN model provides superior performance in the presence of complex noise and data dispersion factors that affect SERS signals collected from metal substrates fabricated on different days.
more »
« less
SERS-based ssDNA composition analysis with inhomogeneous peak broadening and reservoir computing
Surface-enhanced Raman spectroscopy employed in conjunction with post-processing machine learning methods is a promising technique for effective data analysis, allowing one to enhance the molecular and chemical composition analysis of information rich DNA molecules. In this work, we report on a room temperature inhomogeneous broadening as a function of the increased adenine concentration and employ this feature to develop one-dimensional and two dimensional chemical composition classification models of 200 long single stranded DNA sequences. Afterwards, we develop a reservoir computing chemical composition classification scheme of the same molecules and demonstrate enhanced performance that does not rely on manual feature identification.
more »
« less
- PAR ID:
- 10594729
- Publisher / Repository:
- American Institute of Physics
- Date Published:
- Journal Name:
- Applied Physics Letters
- Volume:
- 120
- Issue:
- 2
- ISSN:
- 0003-6951
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract Surface-enhanced Raman scattering (SERS) process results in a tremendous increase of Raman scattering cross section of molecules adsorbed to plasmonic metals and influenced by numerous physico-chemical factors such as geometry and optical properties of the metal surface, orientation of chemisorbed molecules and chemical environment. While SERS holds promise for single molecule sensitivity and optical sensing of DNA sequences, more detailed understanding of the rich physico-chemical interplay between various factors is needed to enhance predictive power of existing and future SERS-based DNA sensing platforms. In this work, we report on experimental results indicating that SERS spectra of adsorbed single-stranded DNA (ssDNA) isomers depend on the order on which individual bases appear in the 3-base long ssDNA due to intramolecular interaction between DNA bases. Furthermore, we experimentally demonstrate that the effect holds under more general conditions when the molecules do not experience chemical enhancement due to resonant charge transfer effect and also under standard Raman scattering without electromagnetic or chemical enhancements. Our numerical simulations qualitatively support the experimental findings and indicate that base permutation results in modification of both Raman and chemically enhanced Raman spectra.more » « less
-
Abstract DNA mechanical properties play a critical role in every aspect of DNA-dependent biological processes. Recently a high throughput assay named loop-seq has been developed to quantify the intrinsic bendability of a massive number of DNA fragments simultaneously. Using the loop-seq data, we develop a software tool, DNAcycP, based on a deep-learning approach for intrinsic DNA cyclizability prediction. We demonstrate DNAcycP predicts intrinsic DNA cyclizability with high fidelity compared to the experimental data. Using an independent dataset from in vitro selection for enrichment of loopable sequences, we further verified the predicted cyclizability score, termed C-score, can well distinguish DNA fragments with different loopability. We applied DNAcycP to multiple species and compared the C-scores with available high-resolution chemical nucleosome maps. Our analyses showed that both yeast and mouse genomes share a conserved feature of high DNA bendability spanning nucleosome dyads. Additionally, we extended our analysis to transcription factor binding sites and surprisingly found that the cyclizability is substantially elevated at CTCF binding sites in the mouse genome. We further demonstrate this distinct mechanical property is conserved across mammalian species and is inherent to CTCF binding DNA motif.more » « less
-
Abstract A quasi‐one‐dimensional organic semiconductor, hepta(p‐phenylene vinylene) (HPV), was incorporated into a DNA tensegrity triangle motif using a covalent strategy. 3D arrays were self‐assembled from an HPV‐DNA pseudo‐rhombohedron edge by rational design and characterized by X‐ray diffraction. Templated by the DNA motif, HPV molecules exist as single‐molecule fluorescence emitters at the concentration of 8 mM within the crystal lattice. The anisotropic fluorescence emission from HPV‐DNA crystals indicates HPV molecules are well aligned in the macroscopic 3D DNA lattices. Sophisticated nanodevices and functional materials constructed from DNA can be developed from this strategy by addressing functional components with molecular accuracy.more » « less
-
All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system’s regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data’s information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.more » « less
An official website of the United States government
