Deep learning models have demonstrated significant advantages over traditional algorithms in image processing tasks like object detection. However, a large amount of data are needed to train such deep networks, which limits their application to tasks such as biometric recognition that require more training samples for each class (i.e., each individual). Researchers developing such complex systems rely on real biometric data, which raises privacy concerns and is restricted by the availability of extensive, varied datasets. This paper proposes a generative adversarial network (GAN)-based solution to produce training data (palm images) for improved biometric (palmprint-based) recognition systems. We investigate the performance of the most recent StyleGAN models in generating a thorough contactless palm image dataset for application in biometric research. Training on publicly available H-PolyU and IIDT palmprint databases, a total of 4839 images were generated using StyleGAN models. SIFT (Scale-Invariant Feature Transform) was used to find uniqueness and features at different sizes and angles, which showed a similarity score of 16.12% with the most recent StyleGAN3-based model. For the regions of interest (ROIs) in both the palm and finger, the average similarity scores were 17.85%. We present the Frechet Inception Distance (FID) of the proposed model, which achieved a 16.1 score, demonstrating significant performance. These results demonstrated StyleGAN as effective in producing unique synthetic biometric images.
more »
« less
Large Scale Data Collection of Tattoo-Based Biometric Data from Social-Media Websites
The use of tattoos as a soft biometric is increasing in popularity among law enforcement communities. There is great need for large scale, publicly available tattoo datasets that can be used to standardize efforts to develop tattoo-based biometric systems. In this work, we introduce a large tattoo dataset (WVUMediaTatt) collected from a social-media website. Additionally, we provide the source links to the images so that anyone can re-generate this dataset. Our WVU-MediaTatt database contains tattoo sample images from over 1,000 subjects, with two tattoo image samples per subject. To the best of our knowledge, this dataset is significantly bigger than any current released publicly available tattoo dataset, including the recently released NIST Tatt-C dataset. The use of social media in deep learning, data mining, and biometrics has traditionally been a controversial issue in terms of data security and protection of privacy. In this work, we first conduct a full discussion on the issues associated with data collection from social media sources for the use of biometric system development, and provide a framework for data collection. In this study, within the process of creating a new large scale tattoo dataset, we consider the issues and make attempts protect the subject’s privacy and information, while ensuring that subjects remain in control of their data in this study and the use of the data adheres to the guidelines proposed by the Heath Care Compliance Association (HCCA) and the U.S. Department of Health & Human Services.
more »
« less
- PAR ID:
- 10053528
- Date Published:
- Journal Name:
- IEEE EISIC
- Page Range / eLocation ID:
- 135 to 138
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Illicit drug trafficking via social media sites such as Instagram have become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge for the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Nearly 4,000 user accounts, of which more than 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.more » « less
-
The deep neural networks used in modern computer vision systems require enormous image datasets to train them. These carefully-curated datasets typically have a million or more images, across a thousand or more distinct categories. The process of creating and curating such a dataset is a monumental undertaking, demanding extensive effort and labelling expense and necessitating careful navigation of technical and social issues such as label accuracy, copyright ownership, and content bias.What if we had a way to harness the power of large image datasets but with few or none of the major issues and concerns currently faced? This paper extends the recent work of Kataoka et al. [15], proposing an improved pre-training dataset based on dynamically-generated fractal images. Challenging issues with large-scale image datasets become points of elegance for fractal pre-training: perfect label accuracy at zero cost; no need to store/transmit large image archives; no privacy/demographic bias/concerns of inappropriate content, as no humans are pictured; limitless supply and diversity of images; and the images are free/open-source. Perhaps surprisingly, avoiding these difficulties imposes only a small penalty in performance. Leveraging a newly-proposed pre-training task—multi-instance prediction—our experiments demonstrate that fine-tuning a network pre-trained using fractals attains 92.7-98.1% of the accuracy of an ImageNet pre-trained network. Our code is publicly available. 1more » « less
-
Computer vision is a data hungry field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including public data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices.more » « less
-
Many research communities routinely conduct activities that fall outside the bounds of traditional human subjects research, yet still frequently rely on the determinations of institutional review boards (IRBs) or similar regulatory bodies to scope ethical decision-making. Presented as a U.S. university-based fictional memo describing a post-hoc IRB review of a research study about social media and public health, this design fiction draws inspiration from current debates and uncertainties in the HCI and social computing communities around issues such as the use of public data, privacy, open science, and unintended consequences, in order to highlight the limitations of regulatory bodies as arbiters of ethics and the importance of forward-thinking ethical considerations from researchers and research communities.more » « less
An official website of the United States government

