Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry

The research using computational intelligence methods to improve bad debt recovery is imperative due to the rapid increase in the cost of healthcare in the U.S. This study explores effectiveness of using cost-sensitive learning methods to classify the unknown cases in imbalanced bad debt datasets an...

Descripció completa

Guardat en:
Dades bibliogràfiques
Autor principal: Shi, D. (author)
Format: article
Publicat: 2015
Matèries:
Accés en línia:http://dspace.utpl.edu.ec/handle/123456789/18864
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
_version_ 1858364503269310464
author Shi, D.
author_facet Shi, D.
author_role author
collection Repositorio Universidad Técnica Particular de Loja
dc.creator.none.fl_str_mv Shi, D.
dc.date.none.fl_str_mv 2015-10-01
2017-06-16T22:02:30Z
2017-06-16T22:02:30Z
dc.identifier.none.fl_str_mv 10.1109/APCASE.2015.13
9.78E+17
10.1109/APCASE.2015.13
http://dspace.utpl.edu.ec/handle/123456789/18864
dc.publisher.none.fl_str_mv Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv reponame:Repositorio Universidad Técnica Particular de Loja
instname:Universidad Técnica Particular de Loja
instacron:UTPL
dc.subject.none.fl_str_mv bad debt recovey
cost-sensitive
imbalanced
semi-supervised learning
dc.title.none.fl_str_mv Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description The research using computational intelligence methods to improve bad debt recovery is imperative due to the rapid increase in the cost of healthcare in the U.S. This study explores effectiveness of using cost-sensitive learning methods to classify the unknown cases in imbalanced bad debt datasets and compares the results with those of two other methods: undersampling and oversampling, often used in processing imbalanced datasets. The study also analyzes the function of a semi-supervised learning algorithm in different circumstances. The results show that although the predictive accuracy rates with oversampling in balanced testing datasets is the best, it is unpractical due to the existence of imbalanced classes in real healthcare situations. The models constructed by undersampling have high classification accuracy rates of the minority class in imbalanced datasets, but they tend to make the overall classification accuracy rates of the majority class worse. The results show that cost-sensitive learning methods can improve the classification accuracy rates of the minority class in imbalanced datasets while achieving considerably good overall classification accuracy rates and classification accuracy rates of majority class. The results and analysis in this study show that cost-sensitive learning methods provide a potentially viable approach to classify the unknown cases in imbalanced bad debt datasets. At last, more practical predictive results are obtained by using the models to predict the unlabeled cases. Although oversampling and the cost-sensitive learning methods with the semi-supervised learning can improve the overall and majority class classification accuracy rates, the minority class classification accuracy rates are still relatively low. The semi-supervised learning algorithms need to be improved to adapt to the imbalanced bad debt datasets.
eu_rights_str_mv openAccess
format article
id UTPL_c73378c9e7e73beb4e58edecea3fae95
identifier_str_mv 10.1109/APCASE.2015.13
9.78E+17
instacron_str UTPL
institution UTPL
instname_str Universidad Técnica Particular de Loja
network_acronym_str UTPL
network_name_str Repositorio Universidad Técnica Particular de Loja
oai_identifier_str oai:dspace.utpl.edu.ec:123456789/18864
publishDate 2015
publisher.none.fl_str_mv Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 2015
reponame_str Repositorio Universidad Técnica Particular de Loja
repository.mail.fl_str_mv .
repository.name.fl_str_mv Repositorio Universidad Técnica Particular de Loja - Universidad Técnica Particular de Loja
repository_id_str 1227
spelling Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare IndustryShi, D.bad debt recoveycost-sensitiveimbalancedsemi-supervised learningThe research using computational intelligence methods to improve bad debt recovery is imperative due to the rapid increase in the cost of healthcare in the U.S. This study explores effectiveness of using cost-sensitive learning methods to classify the unknown cases in imbalanced bad debt datasets and compares the results with those of two other methods: undersampling and oversampling, often used in processing imbalanced datasets. The study also analyzes the function of a semi-supervised learning algorithm in different circumstances. The results show that although the predictive accuracy rates with oversampling in balanced testing datasets is the best, it is unpractical due to the existence of imbalanced classes in real healthcare situations. The models constructed by undersampling have high classification accuracy rates of the minority class in imbalanced datasets, but they tend to make the overall classification accuracy rates of the majority class worse. The results show that cost-sensitive learning methods can improve the classification accuracy rates of the minority class in imbalanced datasets while achieving considerably good overall classification accuracy rates and classification accuracy rates of majority class. The results and analysis in this study show that cost-sensitive learning methods provide a potentially viable approach to classify the unknown cases in imbalanced bad debt datasets. At last, more practical predictive results are obtained by using the models to predict the unlabeled cases. Although oversampling and the cost-sensitive learning methods with the semi-supervised learning can improve the overall and majority class classification accuracy rates, the minority class classification accuracy rates are still relatively low. The semi-supervised learning algorithms need to be improved to adapt to the imbalanced bad debt datasets.Proceedings - 2015 Asia-Pacific Conference on Computer-Aided System Engineering, APCASE 20152017-06-16T22:02:30Z2017-06-16T22:02:30Z2015-10-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article10.1109/APCASE.2015.139.78E+1710.1109/APCASE.2015.13http://dspace.utpl.edu.ec/handle/123456789/18864info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Técnica Particular de Lojainstname:Universidad Técnica Particular de Lojainstacron:UTPL2017-06-16T22:02:30Zoai:dspace.utpl.edu.ec:123456789/18864Institucionalhttps://dspace.utpl.edu.ec/Institución privadahttps://www.utpl.edu.ec/https://dspace.utpl.edu.ec/oai.Ecuador...opendoar:12272017-06-16T22:02:30Repositorio Universidad Técnica Particular de Loja - Universidad Técnica Particular de Lojafalse
spellingShingle Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
Shi, D.
bad debt recovey
cost-sensitive
imbalanced
semi-supervised learning
status_str publishedVersion
title Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
title_full Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
title_fullStr Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
title_full_unstemmed Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
title_short Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
title_sort Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
topic bad debt recovey
cost-sensitive
imbalanced
semi-supervised learning
url http://dspace.utpl.edu.ec/handle/123456789/18864