Analisis Visual Perilaku Agen Q-Learning dan SARSA pada Cliff Walking Problem dengan Explainable Reinforcement Learning


Authors

  • Firas Atqiya Universitas Padjadjaran, Sumedang, Indonesia
  • Muhammad Rizqi Sholahuddin Politeknik Negeri Bandung, Bandung, Indonesia

DOI:

https://doi.org/10.47065/bulletincsr.v6i2.984

Keywords:

Explainable Reinforcement Learning; Q-Learning; SARSA; Visualisasi Real-Time; Cliff Walking

Abstract

Reinforcement Learning (RL) has achieved remarkable success in complex sequential decision tasks. However, modern RL models often lack explainability, creating a serious "black box" problem, especially in high-stakes domains. This study proposes a Pygame-based real-time visualization architecture for RL, and demonstrates its benefits in a Cliff Walking case study using Q-Learning and SARSA algorithms. Key contributions include: (1) a real-time visualization architecture that decouples training logic from graphics rendering with support more than 60 FPS, (2) interpretive visualization techniques including diverging heatmaps, dynamic policy arrows, and Ghost Policies, and (3) a comprehensive empirical study clarifying the distinct characteristics of both algorithms. Experimental results clearly show that Q-Learning selects an efficient but risky path aligned with its optimistic off-policy nature, while SARSA converges on a safer path reflecting its on-policy nature that considers exploration safety. Quantitatively, Q-Learning successfully achieved an optimal 13-step path with an accumulation of 10,642 falls, whereas SARSA converged to a safe 23-step path with a significantly higher collision frequency (232,844 times) to avoid extreme penalties from the cliff zone.

Downloads

Download data is not yet available.

References

R. S. Sutton and A. Barto, Reinforcement learning: an introduction, Second edition. in Adaptive computation and machine learning. Cambridge, Massachusetts London, England: The MIT Press, 2020.

Y. Qing et al., “A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges,” 2022, arXiv. doi: 10.48550/ARXIV.2211.06665.

S. Milani, N. Topin, M. Veloso, and F. Fang, “Explainable Reinforcement Learning: A Survey and Comparative Review,” ACM Comput. Surv., vol. 56, no. 7, pp. 1–36, Jul. 2024, doi: 10.1145/3616864.

L. Saulières, “A Survey of Explainable Reinforcement Learning: Targets, Methods and Needs,” Jul. 16, 2025, arXiv: arXiv:2507.12599. doi: 10.48550/arXiv.2507.12599.

P. Li, U. Siddique, and Y. Cao, “From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations,” in Proceedings of the Reinforcement Learning Conference, 2025. [Online]. Available: https://openreview.net/forum?id=kreQkWaOK5

J. Wang, L. Gou, H.-W. Shen, and H. Yang, “DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks,” IEEE Trans. Visual. Comput. Graphics, vol. 25, no. 1, pp. 288–298, Jan. 2019, doi: 10.1109/TVCG.2018.2864504.

B. La Rosa et al., “State of the Art of Visual Analytics for eXplainable Deep Learning,” Computer Graphics Forum, vol. 42, no. 1, pp. 319–355, Feb. 2023, doi: 10.1111/cgf.14733.

Z. Cheng, J. Yu, and X. Xing, “A Survey on Explainable Deep Reinforcement Learning,” 2025, arXiv. doi: 10.48550/ARXIV.2502.06869.

CodeSignal, “Visualizing Training Statistics in Reinforcement Learning.” [Online]. Available: https://codesignal.com/learn/courses/game-on-integrating-rl-agents-with-environments/lessons/visualizing-training-statistics-in-reinforcement-learning

SourceForge, “Matplotlib Alternatives & Competitors.” [Online]. Available: https://sourceforge.net/software/product/Matplotlib/alternatives

MathWorks, “What Is Reinforcement Learning?” [Online]. Available: https://uk.mathworks.com/discovery/reinforcement-learning.html

UnitX Labs, “How Frame Rate Impacts Machine Vision Performance.” [Online]. Available: https://www.unitxlabs.com/frame-rate-machine-vision-performance/

X. Olaz, “Ghost Policies: A New Paradigm for Understanding and Learning from Failure in Deep Reinforcement Learning,” 2025, arXiv. doi: 10.48550/ARXIV.2506.12366.

R. Fusco et al., “Visual Perception and Pre-Attentive Attributes in Oncological Data Visualisation,” Bioengineering, vol. 12, no. 7, p. 782, Jul. 2025, doi: 10.3390/bioengineering12070782.

K. Moreland, “Diverging Color Maps for Scientific Visualization,” in Advances in Visual Computing, vol. 5876, G. Bebis, R. Boyle, B. Parvin, D. Koracin, Y. Kuno, J. Wang, R. Pajarola, P. Lindstrom, A. Hinkenjann, M. L. Encarnação, C. T. Silva, and D. Coming, Eds., in Lecture Notes in Computer Science, vol. 5876. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 92–103. doi: 10.1007/978-3-642-10520-3_9.

L. Zhong, “Comparison of Q-learning and SARSA Reinforcement Learning Models on Cliff Walking Problem,” vol. 180, Dordrecht: Atlantis Press International BV, 2024, pp. 207–213. doi: 10.2991/978-94-6463-370-2_23.

Baeldung, “Q-Learning vs SARSA.” [Online]. Available: https://www.baeldung.com/cs/q-learning-vs-sarsa

“Cliff Walking,” Gymnasium Documentation. [Online]. Available: https://gymnasium.farama.org/environments/toy_text/cliff_walking/

C. J. C. H. Watkins, “Learning from Delayed Rewards,” King’s College, 1989.

C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach Learn, vol. 8, no. 3–4, pp. 279–292, May 1992, doi: 10.1007/BF00992698.

G. A. Rummery and M. Niranjan, “On-line Q-learning using connectionist systems,” 1994. [Online]. Available: https://api.semanticscholar.org/CorpusID:59872172

S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms,” Machine Learning, vol. 38, no. 3, pp. 287–308, Mar. 2000, doi: 10.1023/A:1007678930559.

ApX Machine Learning, “Comparing SARSA and Q-Learning.” [Online]. Available: https://apxml.com/courses/intro-to-reinforcement-learning/chapter-5-temporal-difference-learning/comparing-sarsa-q-learning


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Visual Perilaku Agen Q-Learning dan SARSA pada Cliff Walking Problem dengan Explainable Reinforcement Learning

Dimensions Badge

ARTICLE HISTORY

Published: 2026-02-11

Abstract View: 41 times
PDF Download: 42 times

How to Cite

Atqiya, F., & Sholahuddin, M. R. (2026). Analisis Visual Perilaku Agen Q-Learning dan SARSA pada Cliff Walking Problem dengan Explainable Reinforcement Learning . Bulletin of Computer Science Research, 6(2), 633-642. https://doi.org/10.47065/bulletincsr.v6i2.984

Issue

Section

Articles