Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes

Ivandari; Much. Rifqi Maulana; Muhammad Faizal Kurniawan; M. Adib Al Karomi

doi:10.47065/bulletincsr.v3i5.280

Authors

Ivandari STMIK Widya Pratama, Pekalongan, Indonesia
Much. Rifqi Maulana STMIK Widya Pratama, Pekalongan, Indonesia
Muhammad Faizal Kurniawan STMIK Widya Pratama, Pekalongan, Indonesia
M. Adib Al Karomi STMIK Widya Pratama, Pekalongan, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v3i5.280

Keywords:

Decission Tree; Naïve Bayes; K-NN

Abstract

Diabetes is one of the deadly non-communicable diseases that can attack humans. According to data from the World Health Organization (WHO), diabetes has killed at least 2 million people throughout 2019. Many recordings of each phase and condition of diabetes patients are done to support research. One of the most updated records of diabetes patients is the early stage diabetes risk prediction dataset. This dataset was released by the uci repository in late 2020 by the Diabetes Hospital in Bangladesh. Classification in data mining is a science that can extract data to look for patterns or data models to gain new knowledge. Several classification algorithms that are widely used and proven to be able to handle large data include K-NN, Naïve Bayes, and Decission Tree. This study compares the three algorithms to classify early stage diabetes risk prediction dataset. From the research results, the decision tree is the best algorithm for classifying diabetes datasets with an accuracy rate of 95.96%. Next is the KNN algorithm with an accuracy rate of 92.5%. Meanwhile, naïve Bayes only produces an accuracy rate of 86.92%. From this comparison it is known that the decision tree is the best algorithm for classifying the early stage diabetes risk prediction dataset with an accuracy rate of 95.96%.

Downloads

Download data is not yet available.

References

WHO, “Diabetes,” 2023. https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Jul. 25, 2023).

C. J. Ejiyi et al., “A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms,” Healthc. Anal., vol. 3, no. December 2022, p. 100166, 2023, doi: 10.1016/j.health.2023.100166.

O. Maimoon and L. Rokach, Data Mining and Knowledge Discovery Handbook, vol. 40, no. 6. Springer, 2010.

J. Han and M. Kamber, “Data Mining: Concepts and Techniques Second Edition,” vol. 40, no. 6, p. 9823, Mar. 2006, doi: 10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition. Elsevier, 2011.

X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2007.

ikhsan wisnuadji Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis (Pgk) Menggunakan K-Nearest Neighbor (Knn) Dengan Backward Elimination,” vol. 7, no. 2, pp. 417–426, 2018, doi: 10.25126/jtiik.202071896.

M. F. Kurniawan and Ivandari, “Komparasi Algoritma Data Mining untuk Klasifikasi Kanker Payudara,” IC Tech, vol. I April 20, pp. 1–8, 2017.

G. Aguilera-Venegas, A. López-Molina, G. Rojo-Martínez, and J. L. Galán-García, “Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus,” J. Comput. Appl. Math., vol. 427, p. 115115, 2023, doi: 10.1016/j.cam.2023.115115.

S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, no. January, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.

C. Carpinteiro, J. Lopes, A. Abelha, and M. F. Santos, “A Comparative Study of Classification Algorithms for Early Detection of Diabetes,” Procedia Comput. Sci., vol. 220, pp. 868–873, 2023, doi: 10.1016/j.procs.2023.03.117.

S. Diabetes and B. Hospital in Sylhet, “Early stage diabetes risk prediction dataset,” 2020. https://archive.ics.uci.edu/dataset/529/early+stage+diabetes+risk+prediction+dataset.

Ian H Witten. Eibe Frank. Mark A Hall, Data Mining 3rd. 2011.

Ivandari and M. A. Al Karomi, “Classification of Covid-19 Survillance Datasets using the Decision Tree Algorithm,” Jaict, vol. 6, no. 1, pp. 44–49, 2021, [Online]. Available: https://jurnal.polines.ac.id/index.php/jaict/article/view/2896.

Ivandari and M. A. Al Karomi, “Algoritma K-NN untuk klasifikasi dataset Covid-19 survillance,” IC Tech, vol. 16, no. 1, pp. 12–15, 2021, [Online]. Available: https://ejournal.stmik-wp.ac.id/index.php/ictech/article/view/137.

F. Gorunescu, Data Mining: Concepts; Models and Techniques. Springer, 2011.

M. A. Alkaromi, “Komparasi Algoritma Klasifikasi untuk dataset iris dengan rapid miner,” IC Tech, vol. XI, no. 2, 2014.

T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” vol. I, 1967.

Kusrini and L. E. Taufiq, Algoritma Data Mining. Yogyakarta: Andi Offset, 2009.

Ivandari, “Improved Performance Algorithm K-Nearest Neighbor Classification in High Dimension Data,” IC Tech, vol. IX-April 2, pp. 5–9, 2014.

V. K. Xindong Wu, The Top Ten Algorithm in Data Mining. 2009.

A. A. Aljarullah, “Decision Tree Discovery for the Diagnosis of Type II Diabetes,” in International Conference on Innovations in Information Technology, 2011, pp. 303–307.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes

Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes

Authors

DOI:

Keywords:

Abstract

Downloads

References

ARTICLE HISTORY

How to Cite

Issue

Section

Most read articles by the same author(s)