Analisis Visual Perilaku Agen Q-Learning dan SARSA pada Cliff Walking Problem dengan Explainable Reinforcement Learning
DOI:
https://doi.org/10.47065/bulletincsr.v6i2.984Keywords:
Explainable Reinforcement Learning; Q-Learning; SARSA; Visualisasi Real-Time; Cliff WalkingAbstract
Reinforcement Learning (RL) has achieved remarkable success in complex sequential decision tasks. However, modern RL models often lack explainability, creating a serious "black box" problem, especially in high-stakes domains. This study proposes a Pygame-based real-time visualization architecture for RL, and demonstrates its benefits in a Cliff Walking case study using Q-Learning and SARSA algorithms. Key contributions include: (1) a real-time visualization architecture that decouples training logic from graphics rendering with support more than 60 FPS, (2) interpretive visualization techniques including diverging heatmaps, dynamic policy arrows, and Ghost Policies, and (3) a comprehensive empirical study clarifying the distinct characteristics of both algorithms. Experimental results clearly show that Q-Learning selects an efficient but risky path aligned with its optimistic off-policy nature, while SARSA converges on a safer path reflecting its on-policy nature that considers exploration safety. Quantitatively, Q-Learning successfully achieved an optimal 13-step path with an accumulation of 10,642 falls, whereas SARSA converged to a safe 23-step path with a significantly higher collision frequency (232,844 times) to avoid extreme penalties from the cliff zone.
Downloads
References
R. S. Sutton and A. Barto, Reinforcement learning: an introduction, Second edition. in Adaptive computation and machine learning. Cambridge, Massachusetts London, England: The MIT Press, 2020.
Y. Qing et al., “A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges,” 2022, arXiv. doi: 10.48550/ARXIV.2211.06665.
S. Milani, N. Topin, M. Veloso, and F. Fang, “Explainable Reinforcement Learning: A Survey and Comparative Review,” ACM Comput. Surv., vol. 56, no. 7, pp. 1–36, Jul. 2024, doi: 10.1145/3616864.
L. Saulières, “A Survey of Explainable Reinforcement Learning: Targets, Methods and Needs,” Jul. 16, 2025, arXiv: arXiv:2507.12599. doi: 10.48550/arXiv.2507.12599.
P. Li, U. Siddique, and Y. Cao, “From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations,” in Proceedings of the Reinforcement Learning Conference, 2025. [Online]. Available: https://openreview.net/forum?id=kreQkWaOK5
J. Wang, L. Gou, H.-W. Shen, and H. Yang, “DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks,” IEEE Trans. Visual. Comput. Graphics, vol. 25, no. 1, pp. 288–298, Jan. 2019, doi: 10.1109/TVCG.2018.2864504.
B. La Rosa et al., “State of the Art of Visual Analytics for eXplainable Deep Learning,” Computer Graphics Forum, vol. 42, no. 1, pp. 319–355, Feb. 2023, doi: 10.1111/cgf.14733.
Z. Cheng, J. Yu, and X. Xing, “A Survey on Explainable Deep Reinforcement Learning,” 2025, arXiv. doi: 10.48550/ARXIV.2502.06869.
CodeSignal, “Visualizing Training Statistics in Reinforcement Learning.” [Online]. Available: https://codesignal.com/learn/courses/game-on-integrating-rl-agents-with-environments/lessons/visualizing-training-statistics-in-reinforcement-learning
SourceForge, “Matplotlib Alternatives & Competitors.” [Online]. Available: https://sourceforge.net/software/product/Matplotlib/alternatives
MathWorks, “What Is Reinforcement Learning?” [Online]. Available: https://uk.mathworks.com/discovery/reinforcement-learning.html
UnitX Labs, “How Frame Rate Impacts Machine Vision Performance.” [Online]. Available: https://www.unitxlabs.com/frame-rate-machine-vision-performance/
X. Olaz, “Ghost Policies: A New Paradigm for Understanding and Learning from Failure in Deep Reinforcement Learning,” 2025, arXiv. doi: 10.48550/ARXIV.2506.12366.
R. Fusco et al., “Visual Perception and Pre-Attentive Attributes in Oncological Data Visualisation,” Bioengineering, vol. 12, no. 7, p. 782, Jul. 2025, doi: 10.3390/bioengineering12070782.
K. Moreland, “Diverging Color Maps for Scientific Visualization,” in Advances in Visual Computing, vol. 5876, G. Bebis, R. Boyle, B. Parvin, D. Koracin, Y. Kuno, J. Wang, R. Pajarola, P. Lindstrom, A. Hinkenjann, M. L. Encarnação, C. T. Silva, and D. Coming, Eds., in Lecture Notes in Computer Science, vol. 5876. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 92–103. doi: 10.1007/978-3-642-10520-3_9.
L. Zhong, “Comparison of Q-learning and SARSA Reinforcement Learning Models on Cliff Walking Problem,” vol. 180, Dordrecht: Atlantis Press International BV, 2024, pp. 207–213. doi: 10.2991/978-94-6463-370-2_23.
Baeldung, “Q-Learning vs SARSA.” [Online]. Available: https://www.baeldung.com/cs/q-learning-vs-sarsa
“Cliff Walking,” Gymnasium Documentation. [Online]. Available: https://gymnasium.farama.org/environments/toy_text/cliff_walking/
C. J. C. H. Watkins, “Learning from Delayed Rewards,” King’s College, 1989.
C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach Learn, vol. 8, no. 3–4, pp. 279–292, May 1992, doi: 10.1007/BF00992698.
G. A. Rummery and M. Niranjan, “On-line Q-learning using connectionist systems,” 1994. [Online]. Available: https://api.semanticscholar.org/CorpusID:59872172
S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms,” Machine Learning, vol. 38, no. 3, pp. 287–308, Mar. 2000, doi: 10.1023/A:1007678930559.
ApX Machine Learning, “Comparing SARSA and Q-Learning.” [Online]. Available: https://apxml.com/courses/intro-to-reinforcement-learning/chapter-5-temporal-difference-learning/comparing-sarsa-q-learning
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis Visual Perilaku Agen Q-Learning dan SARSA pada Cliff Walking Problem dengan Explainable Reinforcement Learning
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2026 Firas Atqiya, Muhammad Rizqi Sholahuddin

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













