skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Ensemble-Based Machine Learning for Predicting Fraud of Credit Card Transactions
Recently, using credit cards has been considered one of the essential things of our life due to its pros of being easy to use and flexible to pay. The critical impact of the increment of using credit cards is the occurrence of fraudulent transactions, which allow the illegal user to get money and free goods via unauthorized usage. Artificial Intelligence (AI) and Machine Learning (ML) have become effective techniques used in different applications to ensure cybersecurity. This paper proposes our fraud detection system called Man-Ensemble CCFD using an ensemble-learning model with two stages of classification and detection. Stage one, called ML-CCFD, utilizes ten machine learning (ML) algorithms to classify credit card transactions to class 1 as a fraudulent transaction or class 0 as a legitimate transaction. As a result, we compared their classification reports together, precisely precision, recall (sensitivity), and f1-score. Then, we selected the most accurate ML algorithms based on their classification performance and prediction accuracy. The second stage, known Ensemble-learning CCFD, is an ensemble model that applies the Man-Ensemble method on the most effective ML algorithms from stage one. The output of the second stage is to get the final prediction instead of using common types of ensemble learning, such as voting, stacking, boosting, and others. Our framework’s results showed the effectiveness and efficiency of our fraud detection system compared to using ML algorithms individually due to their weakness issues, such as errors, overfitting, bias, prediction accuracy, and even their robustness level.  more » « less
Award ID(s):
2022448
PAR ID:
10341852
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of 2022 Computing Conference, Lecture Notes in Networks and Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Online merchants face difficulties in using existing card fraud detection algorithms, so in this paper we propose a novel proactive fraud detection model using what we call invariant diversity to reveal patterns among attributes of the devices (computers or smartphones) that are used in conducting the transactions. The model generates a regression function from a diversity index of various attribute combinations, and use it to detect anomalies inherent in certain fraudulent transactions. This approach allows for proactive fraud detection using a relatively small number of unsupervised transactions and is resistant to fraudsters' device obfuscation attempt. We tested our system successfully on real online merchant transactions and it managed to find several instances of previously undetected fraudulent transactions. 
    more » « less
  2. Online card transaction fraud is one of the major threats to the bottom line of E-commerce merchants. In this paper, we propose a novel method for online merchants to utilize disposable (“one-time use”) domain names to detect client IP spoofing by collecting client's DNS information during an E-commerce transaction, which in turn can help with transaction fraud detection. By inserting a dynamically generated unique hostname on the E-commerce transaction webpage, a client will issue an identifiable DNS query to the customized authoritative DNS server maintained by the online Merchant. In this way, the online Merchant is able to collect DNS configuration of the client and match it with the client's corresponding transaction in order to verify the consistency of the client's IP address. Any discrepancy can reveal proxy usage, which fraudsters commonly use to spoof their true origins. We have deployed our preliminary prototype system on a real online merchant and successfully collected clients DNS queries correlated with their web transactions; then we show some real instances of successful fraud detection using this method. We also address some concerns regarding the use of disposable domains. 
    more » « less
  3. Machine Learning (ML) analyzes, and processes data and discover patterns. In cybersecurity, it effectively analyzes big data from existing cybersecurity attacks and develop proactive strategies to detect current and future cybersecurity attacks. Both ML and cybersecurity are important subjects in computing curriculum, but using ML for cybersecurity is not commonly explored. This paper designs and presents a case study-based portable labware experience built on Google's CoLaboratory (CoLab) for a ML cybersecurity application to provide students with hands-on labs accessing from anywhere and anytime, reducing or eliminating tedious installations and configurations. This approach allows students to focus on learning essential concepts and gaining valuable experience through hands-on problem solving skills. Our preliminary results and student evaluations are reported for a case-based hands-on regression labware in cyber fraud prediction using credit card fraud as an example. 
    more » « less
  4. Furht, Borko; Khoshgoftaar, Taghi (Ed.)
    Acquiring labeled datasets often incurs substantial costs primarily due to the requirement of expert human intervention to produce accurate and reliable class labels. In the modern data landscape, an overwhelming proportion of newly generated data is unlabeled. This paradigm is especially evident in domains such as fraud detection and datasets for credit card fraud detection. These types of data have their own difficulties associated with being highly class imbalanced, which poses its own challenges to machine learning and classification. Our research addresses these challenges by extensively evaluating a novel methodology for synthesizing class labels for highly imbalanced credit card fraud data. The methodology uses an autoencoder as its underlying learner to effectively learn from dataset features to produce an error metric for use in creating new binary class labels. The methodology aims to automatically produce new labels with minimal expert input. These class labels are then used to train supervised classifiers for fraud detection. Our empirical results show that the synthesized labels are of high enough quality to produce classifiers that significantly outperform a baseline learner comparison when using area under the precision-recall curve (AUPRC). We also present results of varying levels of positive-labeled instances and their effect on classifier performance. Results show that AUPRC performance improves as more instances are labeled positive and belong to the minority class. Our methodology thereby effectively addresses the concerns of high class imbalance in machine learning by creating new and effective class labels. 
    more » « less
  5. In the field of visualization, understanding users’ analytical reasoning is important for evaluating the effectiveness of visualization applications. Several studies have been conducted to capture and analyze user interactions to comprehend this reasoning process. However, few have successfully linked these interactions to users’ reasoning processes. This paper introduces an approach that addresses the limitation by correlating semantic user interactions with analysis decisions using an interactive wire transaction analysis system and a visual state transition matrix, both designed as visual analytics applications. The system enables interactive analysis for evaluating financial fraud in wire transactions. It also allows mapping captured user interactions and analytical decisions back onto the visualization to reveal their decision differences. The visual state transition matrix further aids in understanding users’ analytical flows, revealing their decision-making processes. Classification machine learning algorithms are applied to evaluate the effectiveness of our approach in understanding users’ analytical reasoning process by connecting the captured semantic user interactions to their decisions (i.e., suspicious, not suspicious, and inconclusive) on wire transactions. With the algorithms, an average of 72% accuracy is determined to classify the semantic user interactions. For classifying individual decisions, the average accuracy is 70%. Notably, the accuracy for classifying ‘inconclusive’ decisions is 83%. Overall, the proposed approach improves the understanding of users’ analytical decisions and provides a robust method for evaluating user interactions in visualization tools. 
    more » « less