Reinforcement Learning Snake Game
Live gameplay demonstration of the trained Q-Learning agent navigating the board.
About the Project
A reinforcement learning agent that learns to play Snake using tabular Q-Learning in a custom Python environment. The agent improves purely through interaction with the game—no scripted strategy—learning a policy that balances reaching food with avoiding collisions.
Highlights
- Custom Snake environment built for RL-style training and evaluation
- Tabular Q-Learning with epsilon-greedy exploration and decay
- Compact state encoding (8-bit binary, 256 discrete states) to keep learning feasible and fast
- Reward design to speed up learning and reduce random wandering early on
- Reproducible experiments with seeded runs and logged results
- Visual outputs: training curves and gameplay GIFs
How it works
At each step, the agent observes a compact representation of the board (food direction + immediate collision risks), selects an action, receives a reward signal, and updates its Q-table via the Bellman update:
Q(s,a) ← Q(s,a) + α(r + γ max Q(s',a') − Q(s,a))
Key hyperparameters: Learning rate α = 0.1, discount factor γ = 0.9, epsilon decay = 0.9995 per episode (0.2 → 0.05)
Performance: Best average score of 33.20, highest single score of 62 (24.2% board coverage) with ~15,700 training episodes.
What I learned
- MDPs and value functions, exploration vs exploitation trade-offs
- State design to prevent state-space explosion (8-bit encoding reduces 2^32 potential states to just 256)
- Reward shaping for faster convergence—sparse rewards slow learning, so designed multi-tier structure: food eating (+10 + length×0.5), moving toward food (+1.1), survival bonus (+0.1 per step), collision (-10)
- Writing maintainable RL code with type hints, modular structure, and model persistence
Next steps
- Compare against DQN-based agents (Double/Dueling DQN) for scalability
- Test with larger boards and richer state representations
- Curriculum learning to gradually increase difficulty
Tech
Python • NumPy • Reinforcement Learning • Q-Learning