icon

Train your AIs on Gameboy Games

Turn any game into a reinforcement learning environment with PyBoyEnv

Project

PyBoyEnv was a project I started at 18 to get hands-on with reinforcement learning and gym environments. It aimed to transform any GameBoy game into a playable environment, offering a unique way to explore AI learning and adaptation. This project was a big leap into the world of artificial intelligence for me, combining a personal challenge with my love for tech.

Reinforcement Learning Schema

Tech stack

Python choice

Python is favored in data science for its simplicity and powerful libraries like NumPy and pandas, which simplify data tasks. Its readable syntax makes it ideal for both beginners and senior programmers, streamlining the process of data analysis and machine learning.

Reinforcement learning

Reinforcement learning uses gyms, introduced by OpenAI, as environments where agents learn by trial and error. These gyms offer diverse scenarios for training algorithms, making them essential for testing and improving decision-making skills in a controlled environment.

env = gym.make('CartPole-v1')       # Create a new Gym environment

for _ in range(100):
    env.render()                    # Show the environment
    action = agent.act(observation) # Agent act based on observation
    observation, \                  # Perform the action and 
    reward, \                       # observe the new state
    done, \                         # Get rewards and more info
    info = env.step(action)         
    if done:                        # Check if we finished
        print("Episode finished after {} timesteps".format(_+1))
        break

env.close()

Memory events

Concept

For GameBoy games, tracking specific memory values lets us detect key in-game events and tie them to rewards in reinforcement learning. By linking these values to player achievements or failures, it’s possible to train AI agents more effectively, guiding their actions towards desired game outcomes through a tailored reward system.

How to

Let’s take “The Legend of Zelda: Link’s Awakening DX” as example. Cheat codes provide a valuable resource for understanding how in-game events are triggered and managed through memory values. We can decrypt and analyze how these codes alter the game’s memory using tools available on gamehacking.org. This decryption process reveals specific memory addresses and the modifications they undergo, guiding us in identifying which memory values to monitor for rewards.

For example, to monitor health changes, we can use the Infinite Health cheatcode 01185ADB. According to gamehacking.org, this cheat code writes 0x18 to memory address 0xDB5A. Knowing this, we can establish the following rule:

env.set_reward_rule(0xDB5A, 'increase', 1, "Health") # Gaining health = +1
env.set_reward_rule(0xDB5A, 'decrease', -1, "Health") # Losing health = -1

Example

By applying the earlier technique and including extra rules to monitor more memory addresses, we end up with the following code:

import gym
import pyboyenv

env = gym.make('PyBoy-v0', game='DX.gbc', visible=True)

env.set_reward_rule(0xDB5A, 'increase', 1, "Health")  # Label not required
env.set_reward_rule(0xDB5A, 'decrease', -1, "Health") # Health
env.set_reward_rule(0xDB5E, 'increase', 1, "Money")   # Money
env.set_reward_rule(0xDB45, 'increase', 1, "Arrows")  # Arrows
env.set_reward_rule(0xDB4D, 'increase', 1, "Bombs")   # Bombs
env.set_reward_rule(0xDBD0, 'increase', 1, "Keys")    # Keys
env.set_reward_rule(0xDBCF, 'increase', 5, "Big Keys")  # Big Keys
env.set_reward_rule(0xD368, 'equals 3', -25, "Death")   # Death
env.set_reward_rule(0xD360, 'equals 3', 1, "Hit Enemy") # Hit enemy
env.set_reward_rule(0xD360, 'equals 1', 2, "Loot")      # Loot
env.set_reward_rule(                                    # Events
    0xD368, 
    'in 59,15,16,21,49,24,25,27,30,33,34,39', 
    25, 
    "Event"
)  # Events
env.set_done_rule(0xD368, 'equals 3', "Death") # Done if player dies


env.reset()

cumul = 0
done = False
while not done:
    state, reward, done, info = env.step(16) # 16 = nothing..
    cumul += reward
    for i in info:
        print(f"{i[0]}: {i[1]}")

The code provided sets up the following environment:

Demo GIF