Minería de datos para determinar los factores más influyentes en la ocurrencia de siniestros de tránsito en Ecuador en el año 2020.

The occurrence of traffic accidents represents a public health problem at national and regional level, causing human losses, in addition to the fact that every day is increasing worldwide, which is why it is appropriate to propose a study to determine what are the factors that cause the occurrence o...

全面介紹

Saved in:
書目詳細資料
主要作者: Torres Quezada, Yulissa Stefania (author)
格式: bachelorThesis
語言:spa
出版: 2022
主題:
在線閱讀:https://dspace.unl.edu.ec/jspui/handle/123456789/24502
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:The occurrence of traffic accidents represents a public health problem at national and regional level, causing human losses, in addition to the fact that every day is increasing worldwide, which is why it is appropriate to propose a study to determine what are the factors that cause the occurrence of traffic accidents. The objective of this thesis is to apply data mining to determine the most influential factors in the occurrence of traffic accidents in Ecuador in the year 2020, this was carried out through five phases of the methodology of Knowledge Discovery in Databases (KDD) consisting of: search for information, data collection, database cleaning, application of data mining techniques and interpretation and presentation of results, whereby, through the establishment of guidelines for the search for information, the set of data collected by the National Transit Agency (ANT) was obtained, which includes the collection of police reports, designed and approved by each of the control entities, under the technical parameters established by the same institution, which is available on its official website. Using the OpenRefine and RStudio tools, the obtained data set was debugged, evaluating and determining the most useful and relevant variables for the object of study. The software tools used for the application of the data mining algorithms were SPSS Statistics and Weka. Seven predictive data mining techniques were applied: CHAID, Exhaustive CHAID, CRT, Multilayer Perceptron, Radial Basis Function, Naive Bayes and BayesNet. The evaluation of these algorithms was performed by comparing the results obtained by each one, in relation to performance metrics with respect to percentages of correct classification of instances and accuracy. The CHAID Exhaustive algorithm was the one that obtained the best results with a percentage of correct classification of 58.38% and 44.60% accuracy, with which the most important patterns in the data were identified and the possible associations between the variables collected were evaluated. Finally, the human factor was determined to be the most influential factor with a probability of occurrence of 69.64%. Keywords: Data mining, KDD methodology, Decision Trees, Neural Networks, Bayesian Networks, Traffic Accidents in Ecuador.