A hybrid model based on variational autoencoders and PCA for credit card fraud detection

Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component An...

Полное описание

Сохранить в:
Библиографические подробности
Главный автор: Caicedo García, Kevin Omar (author)
Формат: masterThesis
Опубликовано: 2025
Предметы:
Online-ссылка:https://repositorio.yachaytech.edu.ec/handle/123456789/1043
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!
_version_ 1862900802443542528
author Caicedo García, Kevin Omar
author_facet Caicedo García, Kevin Omar
author_role author
collection Repositorio Universidad Yachay Tech
dc.contributor.none.fl_str_mv Pineda, Israel
dc.creator.none.fl_str_mv Caicedo García, Kevin Omar
dc.date.none.fl_str_mv 2025
2026-01-29T16:12:13Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://repositorio.yachaytech.edu.ec/handle/123456789/1043
dc.language.none.fl_str_mv en
dc.publisher.none.fl_str_mv Universidad de Investigación de Tecnología Experimental Yachay
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv reponame:Repositorio Universidad Yachay Tech
instname:Universidad Yachay Tech
instacron:Yachay
dc.subject.none.fl_str_mv Detección de Fraudes
Autoencodificador Variacional
Desbalance de Clases
Fraud detection
Variational autoencoder
Class Imbalance
dc.title.none.fl_str_mv A hybrid model based on variational autoencoders and PCA for credit card fraud detection
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/masterThesis
description Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component Analysis (PCA) for dimensionality reduction, the Synthetic Minority Over-Sampling Technique (SMOTE) for class balancing, and a Support Vector Machine (SVM) for classification. The VAE reduces the original 30 features to an 8 dimensional dataset space complemented by reconstruction loss, while PCA further reduces it to five principal components, preserving 90.18% of the variance. Subsequently, SMO TE is applied in the latent space, followed by 50% stratified sampling to optimize SVM training time. La SVM hyperparameters were tuned via grid search, selecting C = 0,1, γ =’auto’, and an RBF kernel. Evaluated on a real dataset with 284,807 transactions, the model achieved an AUC-ROC of 0.950, an F1-score of 0.13, and a recall of 0.87 for the fraud class, identifying 85 out of 98 frauds in the test set, but with 1,133 false positives. Visualizations such as confusion matrices and ROC curves highlight high recall but low precision (0.07). Compared to recent approaches, this model offers a scalable and effective solution for imbalanced data.
eu_rights_str_mv openAccess
format masterThesis
id Yachay_d8f3a3dcff0eb868b0f04fb485a57cdf
instacron_str Yachay
institution Yachay
instname_str Universidad Yachay Tech
language_invalid_str_mv en
network_acronym_str Yachay
network_name_str Repositorio Universidad Yachay Tech
oai_identifier_str oai:repositorio.yachaytech.edu.ec:123456789/1043
publishDate 2025
publisher.none.fl_str_mv Universidad de Investigación de Tecnología Experimental Yachay
reponame_str Repositorio Universidad Yachay Tech
repository.mail.fl_str_mv .
repository.name.fl_str_mv Repositorio Universidad Yachay Tech - Universidad Yachay Tech
repository_id_str 10284
spelling A hybrid model based on variational autoencoders and PCA for credit card fraud detectionCaicedo García, Kevin OmarDetección de FraudesAutoencodificador VariacionalDesbalance de ClasesFraud detectionVariational autoencoderClass ImbalanceCredit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component Analysis (PCA) for dimensionality reduction, the Synthetic Minority Over-Sampling Technique (SMOTE) for class balancing, and a Support Vector Machine (SVM) for classification. The VAE reduces the original 30 features to an 8 dimensional dataset space complemented by reconstruction loss, while PCA further reduces it to five principal components, preserving 90.18% of the variance. Subsequently, SMO TE is applied in the latent space, followed by 50% stratified sampling to optimize SVM training time. La SVM hyperparameters were tuned via grid search, selecting C = 0,1, γ =’auto’, and an RBF kernel. Evaluated on a real dataset with 284,807 transactions, the model achieved an AUC-ROC of 0.950, an F1-score of 0.13, and a recall of 0.87 for the fraud class, identifying 85 out of 98 frauds in the test set, but with 1,133 false positives. Visualizations such as confusion matrices and ROC curves highlight high recall but low precision (0.07). Compared to recent approaches, this model offers a scalable and effective solution for imbalanced data.La detección de fraudes en tarjetas de crédito presenta un desafío significativo debido al desbalance extremo en los datos, donde las transacciones fraudulentas representan menos del 0.2% del total. Este estudio propone un enfoque híbrido que combina un Autoencodificador Variacional (Variacional Autoencodificador, VAE) para el aprendizaje no supervisado de características, Análisis de Componentes Principales (Principal Component Analysis, PCA) para la reducción de dimensionalidad, la técnica SMOTE para equilibrar las clases de clases, y una Máquina de Soporte Vectorial (Support Vector Machine, SVM) para la clasificación. La VAE reduce las 30 características originales a un espacio latente de ocho dimensiones, complementado con la pérdida de reconstrucción, mientras que el PCA reduce estas dimensiones a cinco componentes principales, preservando el 90.18% de la varianza, seguido se aplica SMOTE en el espacio latente, tomando un muestreo estratificado al 50% para optimizar el tiempo de entrenamiento del SVM. Los hiperparámetros del SVM se ajustaron mediante búsqueda en cuadrícula, seleccionando C = 0,1, γ = ’auto’, y un núcleo RBF (Radial Basis Function). Evaluado en un conjunto de datos real con 284,807 transacciones, el modelo alcanzó un AUC-ROC de 0.950, un F1-score de 0.13 y un recall de 0.87 para la clase de fraude, identificando 85 de 98 fraudes en el conjunto de prueba, aunque con 1,133 falsos positivos. Visualizaciones como matrices de confusión y curvas ROC destacan un alto recall pero una precisión baja (0.07). En comparación con enfoques recientes, este modelo ofrece una solución escalable y efectiva para datos desbalanceados.Magíster en Inteligencia ArtificialUniversidad de Investigación de Tecnología Experimental YachayPineda, Israel2026-01-29T16:12:13Z2025info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://repositorio.yachaytech.edu.ec/handle/123456789/1043eninfo:eu-repo/semantics/openAccessreponame:Repositorio Universidad Yachay Techinstname:Universidad Yachay Techinstacron:Yachay2026-01-30T08:00:28Zoai:repositorio.yachaytech.edu.ec:123456789/1043Institucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oaiEcuador...opendoar:102842026-01-30T08:00:28falseInstitucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oai.Ecuador...opendoar:102842026-01-30T08:00:28Repositorio Universidad Yachay Tech - Universidad Yachay Techfalse
spellingShingle A hybrid model based on variational autoencoders and PCA for credit card fraud detection
Caicedo García, Kevin Omar
Detección de Fraudes
Autoencodificador Variacional
Desbalance de Clases
Fraud detection
Variational autoencoder
Class Imbalance
status_str publishedVersion
title A hybrid model based on variational autoencoders and PCA for credit card fraud detection
title_full A hybrid model based on variational autoencoders and PCA for credit card fraud detection
title_fullStr A hybrid model based on variational autoencoders and PCA for credit card fraud detection
title_full_unstemmed A hybrid model based on variational autoencoders and PCA for credit card fraud detection
title_short A hybrid model based on variational autoencoders and PCA for credit card fraud detection
title_sort A hybrid model based on variational autoencoders and PCA for credit card fraud detection
topic Detección de Fraudes
Autoencodificador Variacional
Desbalance de Clases
Fraud detection
Variational autoencoder
Class Imbalance
url https://repositorio.yachaytech.edu.ec/handle/123456789/1043