An End-to-End Python Based Data Science Framework for Customer Transaction Big Data Analytics
DOI:
https://doi.org/10.47065/jimat.v6i1.964Keywords:
Data Science; Python; Big Data; Customer Transactions; Clustering; Insight AnalyticsAbstract
This study aims to address the problem of underutilized big data customer transactions by implementing a data science approach using the Python programming language. Many organizations accumulate large volumes of transaction data; however, these data often fail to generate strategic value due to the absence of systematic analytical models. The main problem examined in this research is how customer transaction big data can be processed and analyzed to extract meaningful insights that support data-driven business decision making. As a solution, this study applies a Python-based data science model that integrates data preprocessing, exploratory data analysis (EDA), and machine learning techniques to uncover patterns of customer behavior.The model used in this research is developed using Python and its data science ecosystem, including pandas and NumPy for data manipulation, matplotlib for data visualization, and scikit-learn for machine learning implementation. Customer transaction data are processed through several analytical stages, beginning with data cleaning and transformation, followed by the construction of behavioral variables using the Recency, Frequency, and Monetary (RFM) framework. Subsequently, a clustering model based on the K-Means algorithm is applied to segment customers according to their transaction characteristics. The results of the study show that the proposed data science model is effective in extracting insights from big data customer transactions. The clustering process successfully identifies distinct customer segments with different levels of activity and value contribution. The findings reveal three main customer groups: low-contribution customers, potential customers, and high-value customers. These results demonstrate that the implementation of data science using Python can transform raw transaction data into actionable knowledge that supports more targeted marketing strategies, improved customer retention, and enhanced strategic decision making.
Downloads
References
M. Paramesha, N. Rane, and J. Rane, “Big data analytics, artificial intelligence, machine learning, internet of things, and blockchain for enhanced business intelligence,” Artif. Intell. Mach. Learn. Internet Things, Blockchain Enhanc. Bus. Intell. (June 6, 2024), 2024, doi: 10.5281/zenodo.12827323
N. A. Ochuba, O. O. Amoo, E. S. Okafor, O. Akinrinola, and F. O. Usman, “Strategies for leveraging big data and analytics for business development: a comprehensive review across sectors,” Comput. Sci. IT Res. J., vol. 5, no. 3, pp. 562–575, 2024, doi: 10.51594/csitrj.v5i3.861
A. A. Alsmadi, A. Shuhaiber, M. Al-Okaily, A. Al-Gasaymeh, and N. Alrawashdeh, “Big data analytics and innovation in e-commerce: current insights and future directions,” J. Financ. Serv. Mark., p. 1, 2023, doi: 10.1057/s41264-023-00235-7
L. N. Nalla and V. M. Reddy, “AI-driven big data analytics for enhanced customer journeys: A new paradigm in e-commerce,” Int. J. Adv. Eng. Technol. Innov., vol. 2, no. 1, pp. 719–740, 2024, url: https://ijaeti.com/index.php/Journal/article/view/633
V. M. Reddy and L. N. Nalla, “Leveraging Big Data Analytics to Enhance Customer Experience in E-commerce,” Rev. Esp. Doc. Cient., vol. 18, no. 02, pp. 295–324, 2024, doi: 10.1109/DASA63652.2024.10836440.
P. A. Myers et al., “pyMAISE: A Python platform for automatic machine learning and accelerated development for nuclear power applications,” Prog. Nucl. Energy, vol. 180, p. 105568, 2025, doi: 10.1016/j.pnucene.2024.105568
S. W. Linderman et al., “Dynamax: A Python package for probabilistic state space modeling with JAX,” J. Open Source Softw., vol. 10, no. 108, p. 7069, 2025, doi: 10.21105/joss.07069
F. Ekundayo, I. Atoyebi, A. Soyele, and E. Ogunwobi, “Predictive analytics for cyber threat intelligence in fintech using big data and machine learning,” Int J Res Publ Rev, vol. 5, no. 11, pp. 1–15, 2024, doi: 10.55248/gengpi.5.1124.3352
L. N. Eni, K. Chaudhary, M. Raparthi, and R. Reddy, “Evaluating the role of artificial intelligence and big data analytics in indian bank marketing,” Tuijin Jishu/Journal Propuls. Technol., vol. 44, no. 3, 2023, doi: 10.52783/tjjpt.v44.i4.1684
T. T. Adewale, T. D. Olorunyomi, and T. N. Odonkor, “Big data-driven financial analysis: A new paradigm for strategic insights and decision-making,” J. Financ. Innov. Anal., vol. 1, no. 1, pp. 1–15, 2023, doi: 10.53294/ijfstr.2023.4.2.0060
W. M. Putri, E. Asril, and U. L. Kuning, “Analisis Clustering Buku Sebagai Upaya Untuk Meningkatkan Minat Baca Siswa Pada Perpustakaan Sma Negeri 3 Pekanbaru,” Prosiding-Seminar Nas. Teknol. Inf. Ilmu Komput., vol. 2, no. 1, pp. 313–323, 2023, url: https://journal.unilak.ac.id/index.php/Semaster/article/view/18631
D. Aulia, M. Safii, and D. Suhendro, “Penerapan Algoritma K-Means dalam Proses Clustering Penilaian Kinerja Aparatur Sipil Negera di Sekretariat DPRD Pematangsiantar,” Jurasik (Jurnal Ris. Sist. Inf. dan Tek. Inform., vol. 6, no. 1, p. 47, 2021, doi: 10.30645/jurasik.v6i1.270.
G. B. Kaligis and S. Yulianto, “Analisa Perbandingan Algoritma K-Means, K-Medoids, Dan X-Means Untuk Pengelompokkan Kinerja Pegawai,” IT-Explore J. Penerapan Teknol. Inf. dan Komun., vol. 1, no. 3, pp. 179–193, 2022, doi: 10.24246/itexplore.v1i3.2022.pp179-193.
C. S. Odionu, B. Bristol-Alagbariya, and R. Okon, “Big data analytics for customer relationship management: Enhancing engagement and retention strategies,” Int. J. Sch. Res. Sci. Technol., vol. 5, no. 2, pp. 50–67, 2024, doi: 10.56781/ijsrst.2024.5.2.0039
S. Bose, S. K. Dey, and S. Bhattacharjee, “Big data, data analytics and artificial intelligence in accounting: An overview,” Handb. big data Res. methods, pp. 32–51, 2023, doi: 10.4337/9781800888555.00007
S. Sharifymoghaddam et al., “Rankllm: A python package for reranking with llms,” in Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025, pp. 3681–3690, doi: 10.1145/3726302.3730331
M. Herviany, S. Putri Delima, T. Nurhidayah, and Kasini, “Comparison of K-Means and K-Medoids Algorithms for Grouping Landslide Prone Areas in West Java Province,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 1, no. 1, pp. 34–40, 2021, doi: 10.57152/malcom.v1i1.60
K. C. Di, R. Sakit, W. Ngawi, H. Dilawati, H. Widianto, and A. Kuswiadji, “Klasterisasi Data Rekam Medis Pasien Menggunakan Metode K-Means Clustering Di Rumah Sakit Widodo Ngawi,” Teknol. Inf. dan Rekayasa Komput., vol. 5, no. 2, pp. 139–147, 2024, doi: 10.37148/bios.v5i2.134
M. T. Hidayat, M. Arifin, and S. Muzid, “Prediction Sentiment Analysis Grab Reviews using SVM Linear Based Streamlit,” Indones. J. Comput. Cybern. Syst., vol. 19, no. 2, pp. 1–12, 2025, doi: 10.22146/ijccs.104924.
N. Lozada, J. Arias-Pérez, and E. A. Henao-García, “Unveiling the effects of big data analytics capability on innovation capability through absorptive capacity: why more and better insights matter,” J. Enterp. Inf. Manag., vol. 36, no. 2, pp. 680–701, 2023, doi: 10.1108/JEIM-02-2021-0092
T. Yang, Q. Xin, X. Zhan, S. Zhuang, and H. Li, “Enhancing financial services through big data and AI-driven customer insights and risk analysis,” J. Knowl. Learn. Sci. Technol. ISSN 2959-6386, vol. 3, no. 3, pp. 53–62, 2024, doi: 10.60087/jklst.vol3.n3.p53-62
T. Firmansyah, P. Poningsih, and S. R. Andani, “Analisis Clustering Algoritma K-Means Sebagai Rekomendasi Penambahan Koleksi Buku Di Perpustakaan Madrasah Tsanawiyah Negeri 2 Simalungun,” Zahra Bull. Big data, Data Sci. Artif. Intell., vol. 1, no. 1, pp. 44–48, 2022, url: https://ejurnal.pdsi.or.id/index.php/zahra/article/view/13
N. L. Rane, M. Paramesha, S. P. Choudhary, and J. Rane, “Machine learning and deep learning for big data analytics: A review of methods and applications,” Partners Univers. Int. Innov. J., vol. 2, no. 3, pp. 172–197, 2024, doi: 10.5281/zenodo.12271006
L. Theodorakopoulos and A. Theodoropoulou, “Leveraging big data analytics for understanding consumer behavior in digital marketing: A systematic review,” Hum. Behav. Emerg. Technol., vol. 2024, no. 1, p. 3641502, 2024, doi: 10.1155/2024/3641502
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel An End-to-End Python Based Data Science Framework for Customer Transaction Big Data Analytics
ARTICLE HISTORY
Issue
Section
Copyright (c) 2026 Mayang Modelina Cynthia, Muhammad Iqbal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













