Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Aliannejadi, M ; Faggioli, G ; Ferro, N ; Vlachos, M. (Ed.)This work discusses the participation of CS_Morgan in the Concept Detection and Caption Prediction tasks of the ImageCLEFmedical 2023 Caption benchmark evaluation campaign. The goal of this task is to automatically identify relevant concepts and their locations in images, as well as generate coherent captions for the images. The dataset used for this task is a subset of the extended Radiology Objects in Context (ROCO) dataset. The implementation approach employed by us involved the use of pre-trained Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and Text-to-Text Transfer Transformer (T5) architectures. These models were leveraged to handle the different aspects of the tasks, such as concept detection and caption generation. In the Concept Detection task, the objective was to classify multiple concepts associated with each image. We utilized several deep learning architectures with ‘sigmoid’ activation to enable multilabel classification using the Keras framework. We submitted a total of five (5) runs for this task, and the best run achieved an F1 score of 0.4834, indicating its effectiveness in detecting relevant concepts in the images. For the Caption Prediction task, we successfully submitted eight (8) runs. Our approach involved combining the ViT and T5 models to generate captions for the images. For the caption prediction task, the ranking is based on the BERTScore, and our best run achieved a score of 0.5819 based on generating captions using the fine-tuned T5 model from keywords generated using the pretrained ViT as the encoder.more » « less
-
Aliannejadi, M ; Faggioli, G ; Ferro, N ; Vlachos, M. (Ed.)The field of computer vision plays a key role in managing, processing, analyzing, and interpreting multimedia data in diverse applications. Visual interestingness in multimedia contents is crucial for many practical applications, such as search and recommendation. Determining the interestingness of a particular piece of media content and selecting the highest-value item in terms of content analysis, viewers’ perspective, content classification, and scoring media are sophisticated tasks to perform due to the heavily subjective nature. This work presents the approaches of the CS_Morgan team by participating in the media interestingness prediction task under ImageCLEFfusion 2023 benchmark evaluation. We experimented with two ensemble methods which contain a dense architecture and a gradient boosting scaled architecture. For the dense architecture, several hyperparameters tunings are performed and the output scores of all the inducers after the dense layers are combined using min-max rule. The gradient boost estimator provides an additive model in staged forward propagation, which allows an optimized loss function. For every step in the ensemble gradient boosting scaled (EGBS) architecture, a regression tree is fitted to the negative gradient of the loss function. We achieved the best accuracy with a MAP@10 score of 0.1287 by using the ensemble EGBS.more » « less