Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes


Authors

  • Ivandari STMIK Widya Pratama, Pekalongan, Indonesia
  • Much. Rifqi Maulana STMIK Widya Pratama, Pekalongan, Indonesia
  • Muhammad Faizal Kurniawan STMIK Widya Pratama, Pekalongan, Indonesia
  • M. Adib Al Karomi STMIK Widya Pratama, Pekalongan, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v3i5.280

Keywords:

Decission Tree; Naïve Bayes; K-NN

Abstract

Diabetes is one of the deadly non-communicable diseases that can attack humans. According to data from the World Health Organization (WHO), diabetes has killed at least 2 million people throughout 2019. Many recordings of each phase and condition of diabetes patients are done to support research. One of the most updated records of diabetes patients is the early stage diabetes risk prediction dataset. This dataset was released by the uci repository in late 2020 by the Diabetes Hospital in Bangladesh. Classification in data mining is a science that can extract data to look for patterns or data models to gain new knowledge. Several classification algorithms that are widely used and proven to be able to handle large data include K-NN, Naïve Bayes, and Decission Tree. This study compares the three algorithms to classify early stage diabetes risk prediction dataset. From the research results, the decision tree is the best algorithm for classifying diabetes datasets with an accuracy rate of 95.96%. Next is the KNN algorithm with an accuracy rate of 92.5%. Meanwhile, naïve Bayes only produces an accuracy rate of 86.92%. From this comparison it is known that the decision tree is the best algorithm for classifying the early stage diabetes risk prediction dataset with an accuracy rate of 95.96%.

Downloads

Download data is not yet available.

References

WHO, “Diabetes,” 2023. https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed Jul. 25, 2023).

C. J. Ejiyi et al., “A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms,” Healthc. Anal., vol. 3, no. December 2022, p. 100166, 2023, doi: 10.1016/j.health.2023.100166.

O. Maimoon and L. Rokach, Data Mining and Knowledge Discovery Handbook, vol. 40, no. 6. Springer, 2010.

J. Han and M. Kamber, “Data Mining: Concepts and Techniques Second Edition,” vol. 40, no. 6, p. 9823, Mar. 2006, doi: 10.1002/1521-3773(20010316)40:6<9823::AID-ANIE9823>3.3.CO;2-C.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques 3rd Edition. Elsevier, 2011.

X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2007.

ikhsan wisnuadji Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis (Pgk) Menggunakan K-Nearest Neighbor (Knn) Dengan Backward Elimination,” vol. 7, no. 2, pp. 417–426, 2018, doi: 10.25126/jtiik.202071896.

M. F. Kurniawan and Ivandari, “Komparasi Algoritma Data Mining untuk Klasifikasi Kanker Payudara,” IC Tech, vol. I April 20, pp. 1–8, 2017.

G. Aguilera-Venegas, A. López-Molina, G. Rojo-Martínez, and J. L. Galán-García, “Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus,” J. Comput. Appl. Math., vol. 427, p. 115115, 2023, doi: 10.1016/j.cam.2023.115115.

S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, no. January, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.

C. Carpinteiro, J. Lopes, A. Abelha, and M. F. Santos, “A Comparative Study of Classification Algorithms for Early Detection of Diabetes,” Procedia Comput. Sci., vol. 220, pp. 868–873, 2023, doi: 10.1016/j.procs.2023.03.117.

S. Diabetes and B. Hospital in Sylhet, “Early stage diabetes risk prediction dataset,” 2020. https://archive.ics.uci.edu/dataset/529/early+stage+diabetes+risk+prediction+dataset.

Ian H Witten. Eibe Frank. Mark A Hall, Data Mining 3rd. 2011.

Ivandari and M. A. Al Karomi, “Classification of Covid-19 Survillance Datasets using the Decision Tree Algorithm,” Jaict, vol. 6, no. 1, pp. 44–49, 2021, [Online]. Available: https://jurnal.polines.ac.id/index.php/jaict/article/view/2896.

Ivandari and M. A. Al Karomi, “Algoritma K-NN untuk klasifikasi dataset Covid-19 survillance,” IC Tech, vol. 16, no. 1, pp. 12–15, 2021, [Online]. Available: https://ejournal.stmik-wp.ac.id/index.php/ictech/article/view/137.

F. Gorunescu, Data Mining: Concepts; Models and Techniques. Springer, 2011.

M. A. Alkaromi, “Komparasi Algoritma Klasifikasi untuk dataset iris dengan rapid miner,” IC Tech, vol. XI, no. 2, 2014.

T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” vol. I, 1967.

Kusrini and L. E. Taufiq, Algoritma Data Mining. Yogyakarta: Andi Offset, 2009.

Ivandari, “Improved Performance Algorithm K-Nearest Neighbor Classification in High Dimension Data,” IC Tech, vol. IX-April 2, pp. 5–9, 2014.

V. K. Xindong Wu, The Top Ten Algorithm in Data Mining. 2009.

A. A. Aljarullah, “Decision Tree Discovery for the Diagnosis of Type II Diabetes,” in International Conference on Innovations in Information Technology, 2011, pp. 303–307.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes

Dimensions Badge

ARTICLE HISTORY

Published: 2023-08-31

Abstract View: 834 times
PDF Download: 761 times

How to Cite

Ivandari, Much. Rifqi Maulana, Muhammad Faizal Kurniawan, & Al Karomi, M. A. (2023). Komparasi Algoritma Data Mining untuk Klasifikasi Penyakit Diabetes. Bulletin of Computer Science Research, 3(5), 343-350. https://doi.org/10.47065/bulletincsr.v3i5.280

Issue

Section

Articles

Most read articles by the same author(s)