A hybrid model based on variational autoencoders and PCA for credit card fraud detection
Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component An...
Сохранить в:
| Главный автор: | |
|---|---|
| Формат: | masterThesis |
| Опубликовано: |
2025
|
| Предметы: | |
| Online-ссылка: | https://repositorio.yachaytech.edu.ec/handle/123456789/1043 |
| Метки: |
Добавить метку
Нет меток, Требуется 1-ая метка записи!
|
| _version_ | 1862900802443542528 |
|---|---|
| author | Caicedo García, Kevin Omar |
| author_facet | Caicedo García, Kevin Omar |
| author_role | author |
| collection | Repositorio Universidad Yachay Tech |
| dc.contributor.none.fl_str_mv | Pineda, Israel |
| dc.creator.none.fl_str_mv | Caicedo García, Kevin Omar |
| dc.date.none.fl_str_mv | 2025 2026-01-29T16:12:13Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | https://repositorio.yachaytech.edu.ec/handle/123456789/1043 |
| dc.language.none.fl_str_mv | en |
| dc.publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
| dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess |
| dc.source.none.fl_str_mv | reponame:Repositorio Universidad Yachay Tech instname:Universidad Yachay Tech instacron:Yachay |
| dc.subject.none.fl_str_mv | Detección de Fraudes Autoencodificador Variacional Desbalance de Clases Fraud detection Variational autoencoder Class Imbalance |
| dc.title.none.fl_str_mv | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| dc.type.none.fl_str_mv | info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/masterThesis |
| description | Credit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component Analysis (PCA) for dimensionality reduction, the Synthetic Minority Over-Sampling Technique (SMOTE) for class balancing, and a Support Vector Machine (SVM) for classification. The VAE reduces the original 30 features to an 8 dimensional dataset space complemented by reconstruction loss, while PCA further reduces it to five principal components, preserving 90.18% of the variance. Subsequently, SMO TE is applied in the latent space, followed by 50% stratified sampling to optimize SVM training time. La SVM hyperparameters were tuned via grid search, selecting C = 0,1, γ =’auto’, and an RBF kernel. Evaluated on a real dataset with 284,807 transactions, the model achieved an AUC-ROC of 0.950, an F1-score of 0.13, and a recall of 0.87 for the fraud class, identifying 85 out of 98 frauds in the test set, but with 1,133 false positives. Visualizations such as confusion matrices and ROC curves highlight high recall but low precision (0.07). Compared to recent approaches, this model offers a scalable and effective solution for imbalanced data. |
| eu_rights_str_mv | openAccess |
| format | masterThesis |
| id | Yachay_d8f3a3dcff0eb868b0f04fb485a57cdf |
| instacron_str | Yachay |
| institution | Yachay |
| instname_str | Universidad Yachay Tech |
| language_invalid_str_mv | en |
| network_acronym_str | Yachay |
| network_name_str | Repositorio Universidad Yachay Tech |
| oai_identifier_str | oai:repositorio.yachaytech.edu.ec:123456789/1043 |
| publishDate | 2025 |
| publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
| reponame_str | Repositorio Universidad Yachay Tech |
| repository.mail.fl_str_mv | . |
| repository.name.fl_str_mv | Repositorio Universidad Yachay Tech - Universidad Yachay Tech |
| repository_id_str | 10284 |
| spelling | A hybrid model based on variational autoencoders and PCA for credit card fraud detectionCaicedo García, Kevin OmarDetección de FraudesAutoencodificador VariacionalDesbalance de ClasesFraud detectionVariational autoencoderClass ImbalanceCredit card fraud detection poses a significant challenge due to extreme data imbalance, where fraudulent transactions represent less than 0.2% of the total. This study propo ses a hybrid approach that combines a Variational Autoencoder (VAE) for unsupervised feature learning, Principal Component Analysis (PCA) for dimensionality reduction, the Synthetic Minority Over-Sampling Technique (SMOTE) for class balancing, and a Support Vector Machine (SVM) for classification. The VAE reduces the original 30 features to an 8 dimensional dataset space complemented by reconstruction loss, while PCA further reduces it to five principal components, preserving 90.18% of the variance. Subsequently, SMO TE is applied in the latent space, followed by 50% stratified sampling to optimize SVM training time. La SVM hyperparameters were tuned via grid search, selecting C = 0,1, γ =’auto’, and an RBF kernel. Evaluated on a real dataset with 284,807 transactions, the model achieved an AUC-ROC of 0.950, an F1-score of 0.13, and a recall of 0.87 for the fraud class, identifying 85 out of 98 frauds in the test set, but with 1,133 false positives. Visualizations such as confusion matrices and ROC curves highlight high recall but low precision (0.07). Compared to recent approaches, this model offers a scalable and effective solution for imbalanced data.La detección de fraudes en tarjetas de crédito presenta un desafío significativo debido al desbalance extremo en los datos, donde las transacciones fraudulentas representan menos del 0.2% del total. Este estudio propone un enfoque híbrido que combina un Autoencodificador Variacional (Variacional Autoencodificador, VAE) para el aprendizaje no supervisado de características, Análisis de Componentes Principales (Principal Component Analysis, PCA) para la reducción de dimensionalidad, la técnica SMOTE para equilibrar las clases de clases, y una Máquina de Soporte Vectorial (Support Vector Machine, SVM) para la clasificación. La VAE reduce las 30 características originales a un espacio latente de ocho dimensiones, complementado con la pérdida de reconstrucción, mientras que el PCA reduce estas dimensiones a cinco componentes principales, preservando el 90.18% de la varianza, seguido se aplica SMOTE en el espacio latente, tomando un muestreo estratificado al 50% para optimizar el tiempo de entrenamiento del SVM. Los hiperparámetros del SVM se ajustaron mediante búsqueda en cuadrícula, seleccionando C = 0,1, γ = ’auto’, y un núcleo RBF (Radial Basis Function). Evaluado en un conjunto de datos real con 284,807 transacciones, el modelo alcanzó un AUC-ROC de 0.950, un F1-score de 0.13 y un recall de 0.87 para la clase de fraude, identificando 85 de 98 fraudes en el conjunto de prueba, aunque con 1,133 falsos positivos. Visualizaciones como matrices de confusión y curvas ROC destacan un alto recall pero una precisión baja (0.07). En comparación con enfoques recientes, este modelo ofrece una solución escalable y efectiva para datos desbalanceados.Magíster en Inteligencia ArtificialUniversidad de Investigación de Tecnología Experimental YachayPineda, Israel2026-01-29T16:12:13Z2025info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://repositorio.yachaytech.edu.ec/handle/123456789/1043eninfo:eu-repo/semantics/openAccessreponame:Repositorio Universidad Yachay Techinstname:Universidad Yachay Techinstacron:Yachay2026-01-30T08:00:28Zoai:repositorio.yachaytech.edu.ec:123456789/1043Institucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oaiEcuador...opendoar:102842026-01-30T08:00:28falseInstitucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oai.Ecuador...opendoar:102842026-01-30T08:00:28Repositorio Universidad Yachay Tech - Universidad Yachay Techfalse |
| spellingShingle | A hybrid model based on variational autoencoders and PCA for credit card fraud detection Caicedo García, Kevin Omar Detección de Fraudes Autoencodificador Variacional Desbalance de Clases Fraud detection Variational autoencoder Class Imbalance |
| status_str | publishedVersion |
| title | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| title_full | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| title_fullStr | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| title_full_unstemmed | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| title_short | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| title_sort | A hybrid model based on variational autoencoders and PCA for credit card fraud detection |
| topic | Detección de Fraudes Autoencodificador Variacional Desbalance de Clases Fraud detection Variational autoencoder Class Imbalance |
| url | https://repositorio.yachaytech.edu.ec/handle/123456789/1043 |