skip to main content


Title: High Fidelity Fingerprint Generation: Quality, Uniqueness, And Privacy
In this work, we utilize progressive growth-based Generative Adversarial Networks (GANs) to develop the Clarkson Fingerprint Generator (CFG). We demonstrate that the CFG is capable of generating realistic, high fidelity, 512×512 pixels, full, plain impression fingerprints. Our results suggest that the fingerprints generated by the CFG are unique, diverse, and resemble the training dataset in terms of minutiae configuration and quality, while not revealing the underlying identities of the training data. We make the pre-trained CFG model and the synthetically generated dataset publicly available at https://github.com/keivanB/Clarkson_Finger_Gen  more » « less
Award ID(s):
1650503
NSF-PAR ID:
10318826
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2021 IEEE International Conference on Image Processing (ICIP)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Limited data availability is a challenging problem in the latent fingerprint domain. Synthetically generated fingerprints are vital for training data-hungry neural network-based algorithms. Conventional methods distort clean fingerprints to generate synthetic latent fingerprints. We propose a simple and effective approach using style transfer and image blending to synthesize realistic latent fingerprints. Our evaluation criteria and experiments demonstrate that the generated synthetic latent fingerprints preserve the identity information from the input contact- based fingerprints while possessing similar characteristics as real latent fingerprints. Additionally, we show that the generated fingerprints exhibit several qualities and styles, suggesting that the proposed method can generate multiple samples from a single fingerprint. 
    more » « less
  2. The study purpose was to train and validate a deep learning approach to detect microscale streetscape features related to pedestrian physical activity. This work innovates by combining computer vision techniques with Google Street View (GSV) images to overcome impediments to conducting audits (e.g., time, safety, and expert labor cost). The EfficientNETB5 architecture was used to build deep learning models for eight microscale features guided by the Microscale Audit of Pedestrian Streetscapes Mini tool: sidewalks, sidewalk buffers, curb cuts, zebra and line crosswalks, walk signals, bike symbols, and streetlights. We used a train–correct loop, whereby images were trained on a training dataset, evaluated using a separate validation dataset, and trained further until acceptable performance metrics were achieved. Further, we used trained models to audit participant (N = 512) neighborhoods in the WalkIT Arizona trial. Correlations were explored between microscale features and GIS-measured and participant-reported neighborhood macroscale walkability. Classifier precision, recall, and overall accuracy were all over >84%. Total microscale was associated with overall macroscale walkability (r = 0.30, p < 0.001). Positive associations were found between model-detected and self-reported sidewalks (r = 0.41, p < 0.001) and sidewalk buffers (r = 0.26, p < 0.001). The computer vision model results suggest an alternative to trained human raters, allowing for audits of hundreds or thousands of neighborhoods for population surveillance or hypothesis testing. 
    more » « less
  3. This dataset contains sequence information, three-dimensional structures (from AlphaFold2 model), and substrate classification labels for 358 short-chain dehydrogenase/reductases (SDRs) and 953 S-adenosylmethionine dependent methyltransferases (SAM-MTases).

    The aminoacid sequences of these enzymes were obtained from the UniProt Knowledgebase (https://www.uniprot.org). The sets of proteins were obtained by querying using InterPro protein family/domain identifiers corresponding to each family: IPR002347 (SDRs) and IPR029063 (SAM-MTases). The query results were filtered by UniProt annotation score, keeping only those with score above 4-out-of-5, and deduplicated by exact sequence matches.

    The structures were submitted to the publicly available AlphaFold2 protein structure predictor (J. Jumper et al., Nature, 2021, 596, 583) using the ColabFold notebook (https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.1-premultimer/batch/AlphaFold2_batch.ipynb, M. Mirdita, S. Ovchinnikov, M. Steinegger, Nature Meth., 2022, 19, 679, https://github.com/sokrypton/ColabFold). The model settings used were  msa_model = MMSeq2(Uniref+Environmental), num_models = 1, use_amber = False, use_templates = True, do_not_overwrite_results = True. The resulting PDB structures are included as ZIP archives

    The classification labels were obtained from the substrate and product annotations of the enzyme UniProtKB records. Two approaches were used: substrate clustering based on molecular fingerprints and manual substrate type classification. For the substate clustering, Morgan fingerprints were generated for all enzymatic substrates and products with known structures (excluding cofactors) with radius = 3 using RDKit (https://rdkit.org). The fingerprints were projected onto two-dimensional space using the UMAP algorithm (L. McInnes, J. Healy, 2018, arXiv 1802.03426) and Jaccard metric and clustered using k-means. This procedure generated 9 clusters for SDR substrates and 13 clusters for SAM-MTases. The SMILES representations of the substrates are listed in the SDR_substrates_to_cluster_map_2DIMUMAP.csv and SAM_substrates_to_13clusters_map_2DIMUMAP.csv files.


    The following manually defined classification tasks are included for SDRs: NADP/NAD cofactor classification; phenol substrate, sterol substrate, coenzyme A (CoA) substrate. For SAM-MTases, the manually defined classification tasks are: biopolymer (protein/RNA/DNA) vs. small molecule substrate, phenol subsrates, sterol substrates, nitrogen heterocycle substrates. The SMARTS strings used to define the substrate classes are listed in substructure_search_SMARTS.docx.
     

     
    more » « less
  4. Cosmological simulations of galaxy formation are limited by finite computational resources. We draw from the ongoing rapid advances in artificial intelligence (AI; specifically deep learning) to address this problem. Neural networks have been developed to learn from high-resolution (HR) image data and then make accurate superresolution (SR) versions of different low-resolution (LR) images. We apply such techniques to LR cosmological N-body simulations, generating SR versions. Specifically, we are able to enhance the simulation resolution by generating 512 times more particles and predicting their displacements from the initial positions. Therefore, our results can be viewed as simulation realizations themselves, rather than projections, e.g., to their density fields. Furthermore, the generation process is stochastic, enabling us to sample the small-scale modes conditioning on the large-scale environment. Our model learns from only 16 pairs of small-volume LR-HR simulations and is then able to generate SR simulations that successfully reproduce the HR matter power spectrum to percent level up to16h1Mpcand the HR halo mass function to within10%down to1011M. We successfully deploy the model in a box 1,000 times larger than the training simulation box, showing that high-resolution mock surveys can be generated rapidly. We conclude that AI assistance has the potential to revolutionize modeling of small-scale galaxy-formation physics in large cosmological volumes.

     
    more » « less
  5. This paper presents a new approach for predicting thermodynamic properties of perovskites that harnesses deep learning and crystal structure fingerprinting based on Hirshfeld surface analysis. It is demonstrated that convolutional neural network methods capture critical features embedded in two-dimensional Hirshfeld surface fingerprints that enable a quantitative assessment of the formation energy of perovskites. Building on our recent work on lattice parameter prediction from Hirshfeld surface calculations, we show how transfer learning can be used to speed up the training of the neural network, allowing multiple properties to be trained using the same feature extraction layers. We also predict formation energies for various perovskite polymorphs, and our predictions are found to give generally improved performance over a well-established graph network method, but with the methods better suited to different types of datasets. Analysis of the structure types within the dataset reveals the Hirshfeld surface-based method to excel for the less symmetric and similar structures, while the graph network performs better for very symmetric and similar structures. 
    more » « less