Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes
DOI:
https://doi.org/10.47065/bulletincsr.v3i5.280Keywords:
Decission Tree; Naïve Bayes; K-NNAbstract
Diabetes is one of the deadly non-communicable diseases that can attack humans. According to data from the World Health Organization (WHO), diabetes has killed at least 2 million people throughout 2019. Many recordings of each phase and condition of diabetes patients are done to support research. One of the most updated records of diabetes patients is the early stage diabetes risk prediction dataset. This dataset was released by the uci repository in late 2020 by the Diabetes Hospital in Bangladesh. Classification in data mining is a science that can extract data to look for patterns or data models to gain new knowledge. Several classification algorithms that are widely used and proven to be able to handle large data include K-NN, Naïve Bayes, and Decission Tree. This study compares the three algorithms to classify early stage diabetes risk prediction dataset. From the research results, the decision tree is the best algorithm for classifying diabetes datasets with an accuracy rate of 95.96%. Next is the KNN algorithm with an accuracy rate of 92.5%. Meanwhile, naïve Bayes only produces an accuracy rate of 86.92%. From this comparison it is known that the decision tree is the best algorithm for classifying the early stage diabetes risk prediction dataset with an accuracy rate of 95.96%.
Downloads
References
WHO, “Diabetes,” 2023. https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Jul. 25, 2023).
C. J. Ejiyi et al., “A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms,” Healthc. Anal., vol. 3, no. December 2022, p. 100166, 2023, doi: 10.1016/j.health.2023.100166.
O. Maimoon and L. Rokach, Data Mining and Knowledge Discovery Handbook, vol. 40, no. 6. Springer, 2010.
J. Han and M. Kamber, “Data Mining: Concepts and Techniques Second Edition,” vol. 40, no. 6, p. 9823, Mar. 2006, doi: 10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition. Elsevier, 2011.
X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2007.
ikhsan wisnuadji Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis (Pgk) Menggunakan K-Nearest Neighbor (Knn) Dengan Backward Elimination,” vol. 7, no. 2, pp. 417–426, 2018, doi: 10.25126/jtiik.202071896.
M. F. Kurniawan and Ivandari, “Komparasi Algoritma Data Mining untuk Klasifikasi Kanker Payudara,” IC Tech, vol. I April 20, pp. 1–8, 2017.
G. Aguilera-Venegas, A. López-Molina, G. Rojo-Martínez, and J. L. Galán-García, “Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus,” J. Comput. Appl. Math., vol. 427, p. 115115, 2023, doi: 10.1016/j.cam.2023.115115.
S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, no. January, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.
C. Carpinteiro, J. Lopes, A. Abelha, and M. F. Santos, “A Comparative Study of Classification Algorithms for Early Detection of Diabetes,” Procedia Comput. Sci., vol. 220, pp. 868–873, 2023, doi: 10.1016/j.procs.2023.03.117.
S. Diabetes and B. Hospital in Sylhet, “Early stage diabetes risk prediction dataset,” 2020. https://archive.ics.uci.edu/dataset/529/early+stage+diabetes+risk+prediction+dataset.
Ian H Witten. Eibe Frank. Mark A Hall, Data Mining 3rd. 2011.
Ivandari and M. A. Al Karomi, “Classification of Covid-19 Survillance Datasets using the Decision Tree Algorithm,” Jaict, vol. 6, no. 1, pp. 44–49, 2021, [Online]. Available: https://jurnal.polines.ac.id/index.php/jaict/article/view/2896.
Ivandari and M. A. Al Karomi, “Algoritma K-NN untuk klasifikasi dataset Covid-19 survillance,” IC Tech, vol. 16, no. 1, pp. 12–15, 2021, [Online]. Available: https://ejournal.stmik-wp.ac.id/index.php/ictech/article/view/137.
F. Gorunescu, Data Mining: Concepts; Models and Techniques. Springer, 2011.
M. A. Alkaromi, “Komparasi Algoritma Klasifikasi untuk dataset iris dengan rapid miner,” IC Tech, vol. XI, no. 2, 2014.
T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” vol. I, 1967.
Kusrini and L. E. Taufiq, Algoritma Data Mining. Yogyakarta: Andi Offset, 2009.
Ivandari, “Improved Performance Algorithm K-Nearest Neighbor Classification in High Dimension Data,” IC Tech, vol. IX-April 2, pp. 5–9, 2014.
V. K. Xindong Wu, The Top Ten Algorithm in Data Mining. 2009.
A. A. Aljarullah, “Decision Tree Discovery for the Diagnosis of Type II Diabetes,” in International Conference on Innovations in Information Technology, 2011, pp. 303–307.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2023 Ivandari, Much. Rifqi Maulana, Muhammad Faizal Kurniawan, M. Adib Al Karomi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













