skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fingerprint Identification of Generative Models Using a MultiFormer Ensemble Approach
In the ever-changing realm of medical image processing, ImageCLEF brought a newdimension with the Identifying GAN Fingerprint task, catering to the advancement of visual media analysis. This year, the author presented the task of detecting training image fingerprints to control the quality of synthetic images for the second time (as task 1) and introduced the task of detecting generative model fingerprints for the first time (as task 2). Both tasks are aimed at discerning these fingerprints from images, on both real training images and the generative models. The dataset utilized encompassed 3D CT images of lung tuberculosis patients, with the development dataset featuring a mix of real and generated images, and the test dataset. Our team ’CSMorgan’ contributed several approaches, leveraging multiformer (combined feature extracted using BLIP2 and DINOv2) networks, additive and mode thresholding techniques, and late fusion methodologies, bolstered by morphological operations. In Task 1, our optimal performance was attained through a late fusion-based reranking strategy, achieving an F1 score of 0.51, while the additive average thresholding approach closely followed with a score of 0.504. In Task 2, our multiformer model garnered an impressive Adjusted Rand Index (ARI) score of 0.90, and a fine-tuned variant of the multiformer yielded a score of 0.8137. These outcomes underscore the efficacy of the multiformer-based approach in accurately discerning both real image and generative model fingerprints.  more » « less
Award ID(s):
2131207
PAR ID:
10561091
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Faggioli, G; Ferro, N; Galuščáková, P; Herrera, A
Publisher / Repository:
https://ceur-ws.org/Vol-3497/
Date Published:
Journal Name:
CEUR workshop proceedings
ISSN:
1613-0073
Format(s):
Medium: X
Location:
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024) Grenoble, France, 9-12 September, 2024, https://ceur-ws.org/Vol-3740/paper-147.pdf
Sponsoring Org:
National Science Foundation
More Like this
  1. Faggioli, G; Ferro, N; Galuščáková, P; de, A (Ed.)
    This working note documents the participation of CS_Morgan in the ImageCLEFmedical 2024 Caption subtasks, focusing on Caption Prediction and Concept Detection challenges. The primary objectives included training, validating, and testing multimodal Artificial Intelligence (AI) models intended to automate the process of generating captions and identifying multi-concepts of radiology images. The dataset used is a subset of the Radiology Objects in COntext version 2 (ROCOv2) dataset and contains image-caption pairs and corresponding Unified Medical Language System (UMLS) concepts. To address the caption prediction challenge, different variants of the Large Language and Vision Assistant (LLaVA) models were experimented with, tailoring them for the medical domain. Additionally, a lightweight Large Multimodal Model (LMM), and MoonDream2, a small Vision Language Model (VLM), were explored. The former is the instruct variant of the Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS (IDEFICS) 9B obtained through quantization. Besides LMMs, conventional encoder-decoder models like Vision Generative Pre-trained Transformer 2 (visionGPT2) and Convolutional Neural Network-Transformer (CNN-Transformer) architectures were considered. Consequently, this enabled 10 submissions for the caption prediction task, with the first submission of LLaVA 1.6 on the Mistral 7B weights securing the 2nd position among the participants. This model was adapted using 40.1M parameters and achieved the best performance on the test data across the performance metrics of BERTScore (0.628059), ROUGE (0.250801), BLEU-1 (0.209298), BLEURT (0.317385), METEOR (0.092682), CIDEr (0.245029), and RefCLIPScore (0.815534). For the concept detection task, our single submission based on the ConvMixer architecture—a hybrid approach leveraging CNN and Transformer advantages—ranked 9th with an F1-score of 0.107645. Overall, the evaluations on the test data for the caption prediction task submissions suggest that LMMs, quantized LMMs, and small VLMs, when adapted and selectively fine-tuned using fewer parameters, have ample potential for understanding medical concepts present in images. 
    more » « less
  2. ABSTRACT In this paper, we introduce a novel data augmentation methodology based on Conditional Progressive Generative Adversarial Networks (CPGAN) to generate diverse black hole (BH) images, accounting for variations in spin and electron temperature prescriptions. These generated images are valuable resources for training deep learning algorithms to accurately estimate black hole parameters from observational data. Our model can generate BH images for any spin value within the range of [−1, 1], given an electron temperature distribution. To validate the effectiveness of our approach, we employ a convolutional neural network to predict the BH spin using both the GRMHD images and the images generated by our proposed model. Our results demonstrate a significant performance improvement when training is conducted with the augmented data set while testing is performed using GRMHD simulated data, as indicated by the high R2 score. Consequently, we propose that GANs can be employed as cost-effective models for black hole image generation and reliably augment training data sets for other parametrization algorithms. 
    more » « less
  3. Deep learning models have demonstrated significant advantages over traditional algorithms in image processing tasks like object detection. However, a large amount of data are needed to train such deep networks, which limits their application to tasks such as biometric recognition that require more training samples for each class (i.e., each individual). Researchers developing such complex systems rely on real biometric data, which raises privacy concerns and is restricted by the availability of extensive, varied datasets. This paper proposes a generative adversarial network (GAN)-based solution to produce training data (palm images) for improved biometric (palmprint-based) recognition systems. We investigate the performance of the most recent StyleGAN models in generating a thorough contactless palm image dataset for application in biometric research. Training on publicly available H-PolyU and IIDT palmprint databases, a total of 4839 images were generated using StyleGAN models. SIFT (Scale-Invariant Feature Transform) was used to find uniqueness and features at different sizes and angles, which showed a similarity score of 16.12% with the most recent StyleGAN3-based model. For the regions of interest (ROIs) in both the palm and finger, the average similarity scores were 17.85%. We present the Frechet Inception Distance (FID) of the proposed model, which achieved a 16.1 score, demonstrating significant performance. These results demonstrated StyleGAN as effective in producing unique synthetic biometric images. 
    more » « less
  4. Aliannejadi, M; Faggioli, G; Ferro, N; Vlachos, M. (Ed.)
    This work discusses the participation of CS_Morgan in the Concept Detection and Caption Prediction tasks of the ImageCLEFmedical 2023 Caption benchmark evaluation campaign. The goal of this task is to automatically identify relevant concepts and their locations in images, as well as generate coherent captions for the images. The dataset used for this task is a subset of the extended Radiology Objects in Context (ROCO) dataset. The implementation approach employed by us involved the use of pre-trained Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and Text-to-Text Transfer Transformer (T5) architectures. These models were leveraged to handle the different aspects of the tasks, such as concept detection and caption generation. In the Concept Detection task, the objective was to classify multiple concepts associated with each image. We utilized several deep learning architectures with ‘sigmoid’ activation to enable multilabel classification using the Keras framework. We submitted a total of five (5) runs for this task, and the best run achieved an F1 score of 0.4834, indicating its effectiveness in detecting relevant concepts in the images. For the Caption Prediction task, we successfully submitted eight (8) runs. Our approach involved combining the ViT and T5 models to generate captions for the images. For the caption prediction task, the ranking is based on the BERTScore, and our best run achieved a score of 0.5819 based on generating captions using the fine-tuned T5 model from keywords generated using the pretrained ViT as the encoder. 
    more » « less
  5. Purpose: Magnetic Resonance Imaging (MRI) enables non‐invasive assessment of brain abnormalities during early life development. Permanent magnet scanners operating in the neonatal intensive care unit (NICU) facilitate MRI of sick infants, but have long scan times due to lower signal‐to‐noise ratios (SNR) and limited receive coils. This work accelerates in‐NICU MRI with diffusion probabilistic generative models by developing a training pipeline accounting for these challenges. Methods: We establish a novel training dataset of clinical, 1 Tesla neonatal MR images in collaboration with Aspect Imaging and Sha'are Zedek Medical Center. We propose a pipeline to handle the low quantity and SNR of our real‐world dataset (1) modifying existing network architectures to support varying resolutions; (2) training a single model on all data with learned class embedding vectors; (3) applying self‐supervised denoising before training; and (4) reconstructing by averaging posterior samples. Retrospective under‐sampling experiments, accounting for signal decay, evaluated each item of our proposed methodology. A clinical reader study with practicing pediatric neuroradiologists evaluated our proposed images reconstructed from under‐sampled data. Results: Combining all data, denoising pre‐training, and averaging posterior samples yields quantitative improvements in reconstruction. The generative model decouples the learned prior from the measurement model and functions at two acceleration rates without re‐training. The reader study suggests that proposed images reconstructed from under‐sampled data are adequate for clinical use. Conclusion: Diffusion probabilistic generative models applied with the proposed pipeline to handle challenging real‐world datasets could reduce the scan time of in‐NICU neonatal MRI. 
    more » « less