A hybrid model based on variational autoencoders and PCA for credit card fraud detection

Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component An...

Πλήρης περιγραφή

Αποθηκεύτηκε σε:

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας:	Caicedo García, Kevin Omar (author)
Μορφή:	masterThesis
Έκδοση:	2025
Θέματα:	Detección de Fraudes Autoencodificador Variacional Desbalance de Clases Fraud detection Variational autoencoder Class Imbalance
Διαθέσιμο Online:	https://repositorio.yachaytech.edu.ec/handle/123456789/1043
Ετικέτες:	Προσθήκη ετικέτας Δεν υπάρχουν, Καταχωρήστε ετικέτα πρώτοι!

Περιγραφή
Περίληψη:	Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component Analysis (PCA) for dimensionality reduction, the Synthetic Minority Over-Sampling Technique (SMOTE) for class balancing, and a Support Vector Machine (SVM) for classification. The VAE reduces the original 30 features to an 8 dimensional dataset space complemented by reconstruction loss, while PCA further reduces it to five principal components, preserving 90.18% of the variance. Subsequently, SMO TE is applied in the latent space, followed by 50% stratified sampling to optimize SVM training time. La SVM hyperparameters were tuned via grid search, selecting C = 0,1, γ =’auto’, and an RBF kernel. Evaluated on a real dataset with 284,807 transactions, the model achieved an AUC-ROC of 0.950, an F1-score of 0.13, and a recall of 0.87 for the fraud class, identifying 85 out of 98 frauds in the test set, but with 1,133 false positives. Visualizations such as confusion matrices and ROC curves highlight high recall but low precision (0.07). Compared to recent approaches, this model offers a scalable and effective solution for imbalanced data.

A hybrid model based on variational autoencoders and PCA for credit card fraud detection

Παρόμοια τεκμήρια