In this chapter, we’re going to give our Game Boy CPU the ability to make decisions and reuse code. We’ll implement the crucial control flow instructions: JP (Jump), JR (Jump Relative), CALL, and RET (Return), along with their conditional variants. These instructions are fundamental to how programs execute, allowing them to branch, loop, and call subroutines.

By the end of this milestone, your emulator will be able to follow more complex program paths, enabling it to execute actual Game Boy program logic beyond simple linear instruction sequences. This is a significant step towards running real Game Boy ROMs, as it unlocks the ability for programs to react to different states and organize their code efficiently.

Project Overview

Our overarching goal is to build a functional Game Boy emulator in F# from first principles. This involves meticulously replicating the behavior of the Game Boy’s hardware components, including its custom 8-bit CPU (SM83), Memory Management Unit (MMU), Picture Processing Unit (PPU), and more. Each chapter incrementally adds a critical piece of this complex system.

Tech Stack

For this project, we are leveraging:

  • F# (latest stable release via .NET SDK): A functional-first language on the .NET platform, chosen for its strong type system, conciseness, and excellent tooling. Its immutability-first approach helps manage complex state in an emulator.
  • .NET SDK (latest stable release): Provides the runtime, libraries, and build tools for F# applications. As of 2026-05-05, this would typically be .NET 9 or .NET 10.
  • SDL.NET (or similar): While not directly implemented in this chapter, a cross-platform graphics library like SDL.NET will eventually be used for rendering the Game Boy’s display output.

Milestones and Build Plan

This chapter focuses on enabling the CPU to alter its execution path. Our plan includes:

  1. Update CPU State: Introduce the Stack Pointer (SP) register to our CpuState record.
  2. Implement Stack Operations: Create pushWord and popWord functions in the MMU to handle 16-bit data on the stack, respecting Game Boy’s little-endian architecture.
  3. Integrate Control Flow Opcodes: Extend the executeInstruction function with match cases for JP, JR, CALL, and RET instructions, including their conditional variants.
  4. Verify Execution: Test stack operations with unit tests and validate CPU control flow by observing CPU state during the execution of simple test ROMs.

Architecture

Implementing control flow requires a deeper interaction between the CPU and memory, specifically concerning the stack. The stack is a region of memory used by the CPU to temporarily store return addresses for CALL instructions and other local data.

The Stack and Stack Pointer

The Game Boy’s CPU (a custom Z80 variant, often called SM83) has a 16-bit Stack Pointer (SP) register. The stack grows downwards from higher memory addresses to lower ones. When a value is “pushed” onto the stack, the SP is decremented, and the value is written to the new SP address. When a value is “popped,” the value is read from SP, and SP is incremented. This Last-In, First-Out (LIFO) structure is crucial for managing subroutine calls.

Stack Behavior Summary:

  • PUSH value: SP decrements by 2 (for a 16-bit value), then value is written to [SP]. Due to little-endian ordering, the low byte goes to SP, and the high byte to SP+1.
  • POP value: value is read from [SP] (low byte at SP, high byte at SP+1), then SP increments by 2.

Control Flow Instruction Types

We’re implementing three primary categories of control flow instructions:

  1. Jumps (JP, JR): These instructions alter the Program Counter (PC) to a new address, either unconditionally or based on CPU flags.

    • JP nn: Jump to an absolute 16-bit address nn.
    • JR e: Jump relative by a signed 8-bit offset e from the current PC.
    • Conditional Jumps: JP Z, nn (jump if Zero flag is set), JP NZ, nn (jump if Zero flag is not set), JP C, nn (jump if Carry flag is set), JP NC, nn (jump if Carry flag is not set). Similar conditional variants exist for JR.
  2. Calls (CALL): These instructions are used to call subroutines. They save the current execution context (the address of the next instruction) before jumping.

    • CALL nn: Push the address of the next instruction (after the CALL itself) onto the stack, then jump to absolute 16-bit address nn.
    • Conditional Calls: CALL Z, nn, CALL NZ, nn, CALL C, nn, CALL NC, nn.
  3. Returns (RET): These instructions return from subroutines, restoring the program flow to where the CALL originated.

    • RET: Pop a 16-bit address from the stack, and jump to that address.
    • Conditional Returns: RET Z, RET NZ, RET C, RET NC.

CPU Decision Flow for Conditional Jumps

The CPU’s logic for handling conditional control flow is straightforward: fetch, check condition, then act.

flowchart TD A[Fetch Opcode] --> B{Is Conditional}; B -->|Yes| C{Check CPU Flag}; C -->|Condition Met| D[Update PC to Target]; C -->|Condition Not Met| E[Increment PC by Opcode Length]; B -->|No| D; D --> F[Execute Next Instruction]; E --> F;

This diagram illustrates how the CPU evaluates a conditional jump or call. If the condition is met, the PC is updated to the target address. Otherwise, the PC simply advances past the instruction.

Step-by-Step Implementation

We’ll start by updating our CpuState to include the SP, then implement the necessary stack operations in Mmu.fs, and finally integrate the new opcodes into Cpu.fs.

1. Update CpuState

First, open Cpu.fs and add the SP register to our CpuState record. We’ll also update the initialCpuState to reflect a typical Game Boy boot-up value for SP.

// src/GbSharp.Core/Cpu.fs

module GbSharp.Core.Cpu

// ... (existing types)

type CpuState =
    {
        A: byte
        F: byte
        B: byte
        C: byte
        D: byte
        E: byte
        H: byte
        L: byte
        PC: uint16
        SP: uint16 // Add the Stack Pointer here
        IME: bool // Interrupt Master Enable
        Halted: bool
        Stopped: bool
        Flags: CpuFlags
    }
    member this.BC with get() = (uint16 this.B <<< 8) ||| (uint16 this.C) and set(value) = { this with B = byte (value >>> 8); C = byte (value &&& 0xFFu) }
    member this.DE with get() = (uint16 this.D <<< 8) ||| (uint16 this.E) and set(value) = { this with D = byte (value >>> 8); E = byte (value &&& 0xFFu) }
    member this.HL with get() = (uint16 this.H <<< 8) ||| (uint16 this.L) and set(value) = { this with H = byte (value >>> 8); L = byte (value &&& 0xFFu) }

// ... (rest of the module)

let initialCpuState =
    {
        A = 0x01uy // Typically 0x01 after boot ROM
        F = 0xB0uy // Typically 0xB0 after boot ROM
        B = 0x00uy
        C = 0x13uy
        D = 0x00uy
        E = 0xD8uy
        H = 0x01uy
        L = 0x4Duy
        PC = 0x0100u // Start after boot ROM, or 0x0000 if no boot ROM
        SP = 0xFFFEu // Initial stack pointer
        IME = false
        Halted = false
        Stopped = false
        Flags = { Z = true; N = false; H = true; C = true } // Corresponds to F = 0xB0
    }

We initialize SP to 0xFFFE. This is the typical starting point for the stack pointer after the Game Boy’s boot ROM executes. This address is within the high RAM area, which is typically used for the stack.

2. Implement Stack Operations in Mmu.fs

The Game Boy’s CPU is little-endian, meaning the least significant byte (LSB) of a 16-bit value is stored at the lower memory address, and the most significant byte (MSB) at the higher address. We need to account for this when pushing and popping 16-bit values to/from memory.

Open Mmu.fs and add the following functions:

// src/GbSharp.Core/Mmu.fs

module GbSharp.Core.Mmu

open GbSharp.Core.Cpu
// ... (existing types and functions)

/// Writes a 16-bit word to memory, little-endian.
/// The low byte (LSB) is written to 'addr', and the high byte (MSB) to 'addr + 1'.
let writeWord (mmu: MmuState) (addr: uint16) (value: uint16) : MmuState =
    mmu
    |> writeByte addr (byte (value &&& 0xFFu)) // Write low byte to addr
    |> fun newMmu -> writeByte (addr + 1u) (byte (value >>> 8)) newMmu // Write high byte to addr + 1

/// Reads a 16-bit word from memory, little-endian.
/// The low byte (LSB) is read from 'addr', and the high byte (MSB) from 'addr + 1'.
let readWord (mmu: MmuState) (addr: uint16) : uint16 =
    let lowByte = readByte mmu addr
    let highByte = readByte mmu (addr + 1u)
    (uint16 highByte <<< 8) ||| (uint16 lowByte)

/// Pushes a 16-bit word onto the stack.
/// SP is decremented by 2, then the value is written to the new SP address.
let pushWord (mmu: MmuState) (sp: uint16) (value: uint16) : MmuState * uint16 =
    let newSp = sp - 2u // Stack grows downwards
    let newMmu = writeWord mmu newSp value
    (newMmu, newSp)

/// Pops a 16-bit word from the stack.
/// The value is read from the current SP address, then SP is incremented by 2.
let popWord (mmu: MmuState) (sp: uint16) : uint16 * uint16 * MmuState =
    let value = readWord mmu sp
    let newSp = sp + 2u // Stack grows upwards after pop
    (value, newSp, mmu)

๐Ÿ“Œ Key Idea: The pushWord and popWord functions are crucial for managing the call stack. They abstract away the little-endian byte order and the SP manipulation, making CALL and RET logic cleaner and less error-prone in the CPU emulation.

3. Implement Control Flow Opcodes in Cpu.fs

Now, we’ll extend the executeInstruction function in Cpu.fs to handle the new opcodes. This will involve reading operands, checking flags, and updating PC and SP. We’ll also add helper functions for conditional checks to keep the match cases clean.

// src/GbSharp.Core/Cpu.fs

module GbSharp.Core.Cpu

open GbSharp.Core.Mmu
// ... (existing types and functions)

/// Helper to check if Zero flag condition is met (condition = true for Z, false for NZ)
let checkZ (flags: CpuFlags) (condition: bool) =
    if condition then flags.Z else not flags.Z

/// Helper to check if Carry flag condition is met (condition = true for C, false for NC)
let checkC (flags: CpuFlags) (condition: bool) =
    if condition then flags.C else not flags.C

let executeInstruction (cpu: CpuState) (mmu: MmuState) : CpuState * MmuState * int =
    let opcode = Mmu.readByte mmu cpu.PC

    let getOperand8 () = Mmu.readByte mmu (cpu.PC + 1u)
    let getOperand16 () = Mmu.readWord mmu (cpu.PC + 1u)

    let mutable cycles = 0

    let newCpu, newMmu =
        match opcode with
        // ... (existing opcodes)

        // Jumps (JP nn)
        | 0xC3uy -> // JP nn (unconditional)
            let addr = getOperand16 ()
            cycles <- 16
            { cpu with PC = addr }, mmu // PC directly set to new address
        | 0xC2uy -> // JP NZ, nn (Jump if Zero flag NOT set)
            let addr = getOperand16 ()
            cycles <- if checkZ cpu.Flags false then 16 else 12
            if checkZ cpu.Flags false then
                { cpu with PC = addr }, mmu
            else
                { cpu with PC = cpu.PC + 3u }, mmu // Skip operand
        | 0xCAuy -> // JP Z, nn (Jump if Zero flag SET)
            let addr = getOperand16 ()
            cycles <- if checkZ cpu.Flags true then 16 else 12
            if checkZ cpu.Flags true then
                { cpu with PC = addr }, mmu
            else
                { cpu with PC = cpu.PC + 3u }, mmu
        | 0xD2uy -> // JP NC, nn (Jump if Carry flag NOT set)
            let addr = getOperand16 ()
            cycles <- if checkC cpu.Flags false then 16 else 12
            if checkC cpu.Flags false then
                { cpu with PC = addr }, mmu
            else
                { cpu with PC = cpu.PC + 3u }, mmu
        | 0xDAuy -> // JP C, nn (Jump if Carry flag SET)
            let addr = getOperand16 ()
            cycles <- if checkC cpu.Flags true then 16 else 12
            if checkC cpu.Flags true then
                { cpu with PC = addr }, mmu
            else
                { cpu with PC = cpu.PC + 3u }, mmu
        | 0xE9uy -> // JP (HL) (Jump to address in HL)
            cycles <- 4
            { cpu with PC = cpu.HL }, mmu

        // Relative Jumps (JR e)
        | 0x18uy -> // JR e (unconditional)
            let offset = int8 (getOperand8 ()) // Signed 8-bit offset
            cycles <- 12
            // PC points to opcode. Instruction is 2 bytes (opcode + operand).
            // So, new PC = (current PC + 2) + offset.
            { cpu with PC = uint16 (int cpu.PC + 2 + int offset) }, mmu
        | 0x20uy -> // JR NZ, e
            let offset = int8 (getOperand8 ())
            cycles <- if checkZ cpu.Flags false then 12 else 8
            if checkZ cpu.Flags false then
                { cpu with PC = uint16 (int cpu.PC + 2 + int offset) }, mmu
            else
                { cpu with PC = cpu.PC + 2u }, mmu // Skip operand
        | 0x28uy -> // JR Z, e
            let offset = int8 (getOperand8 ())
            cycles <- if checkZ cpu.Flags true then 12 else 8
            if checkZ cpu.Flags true then
                { cpu with PC = uint16 (int cpu.PC + 2 + int offset) }, mmu
            else
                { cpu with PC = cpu.PC + 2u }, mmu
        | 0x30uy -> // JR NC, e
            let offset = int8 (getOperand8 ())
            cycles <- if checkC cpu.Flags false then 12 else 8
            if checkC cpu.Flags false then
                { cpu with PC = uint16 (int cpu.PC + 2 + int offset) }, mmu
            else
                { cpu with PC = cpu.PC + 2u }, mmu
        | 0x38uy -> // JR C, e
            let offset = int8 (getOperand8 ())
            cycles <- if checkC cpu.Flags true then 12 else 8
            if checkC cpu.Flags true then
                { cpu with PC = uint16 (int cpu.PC + 2 + int offset) }, mmu
            else
                { cpu with PC = cpu.PC + 2u }, mmu

        // Calls (CALL nn)
        | 0xCDuy -> // CALL nn (unconditional)
            let addr = getOperand16 ()
            cycles <- 24
            // Return address is PC of instruction AFTER CALL (opcode + 2 operand bytes = 3 bytes total)
            let returnAddr = cpu.PC + 3u
            let mmuAfterPush, newSp = pushWord mmu cpu.SP returnAddr
            { cpu with PC = addr; SP = newSp }, mmuAfterPush
        | 0xC4uy -> // CALL NZ, nn
            let addr = getOperand16 ()
            cycles <- if checkZ cpu.Flags false then 24 else 12
            if checkZ cpu.Flags false then
                let returnAddr = cpu.PC + 3u
                let mmuAfterPush, newSp = pushWord mmu cpu.SP returnAddr
                { cpu with PC = addr; SP = newSp }, mmuAfterPush
            else
                { cpu with PC = cpu.PC + 3u }, mmu // Skip operand
        | 0xCCuy -> // CALL Z, nn
            let addr = getOperand16 ()
            cycles <- if checkZ cpu.Flags true then 24 else 12
            if checkZ cpu.Flags true then
                let returnAddr = cpu.PC + 3u
                let mmuAfterPush, newSp = pushWord mmu cpu.SP returnAddr
                { cpu with PC = addr; SP = newSp }, mmuAfterPush
            else
                { cpu with PC = cpu.PC + 3u }, mmu
        | 0xD4uy -> // CALL NC, nn
            let addr = getOperand16 ()
            cycles <- if checkC cpu.Flags false then 24 else 12
            if checkC cpu.Flags false then
                let returnAddr = cpu.PC + 3u
                let mmuAfterPush, newSp = pushWord mmu cpu.SP returnAddr
                { cpu with PC = addr; SP = newSp }, mmuAfterPush
            else
                { cpu with PC = cpu.PC + 3u }, mmu
        | 0xDCuy -> // CALL C, nn
            let addr = getOperand16 ()
            cycles <- if checkC cpu.Flags true then 24 else 12
            if checkC cpu.Flags true then
                let returnAddr = cpu.PC + 3u
                let mmuAfterPush, newSp = pushWord mmu cpu.SP returnAddr
                { cpu with PC = addr; SP = newSp }, mmuAfterPush
            else
                { cpu with PC = cpu.PC + 3u }, mmu

        // Returns (RET)
        | 0xC9uy -> // RET (unconditional)
            cycles <- 16
            let returnAddr, newSp, mmuAfterPop = popWord mmu cpu.SP
            { cpu with PC = returnAddr; SP = newSp }, mmuAfterPop
        | 0xC0uy -> // RET NZ
            cycles <- if checkZ cpu.Flags false then 20 else 8
            if checkZ cpu.Flags false then
                let returnAddr, newSp, mmuAfterPop = popWord mmu cpu.SP
                { cpu with PC = returnAddr; SP = newSp }, mmuAfterPop
            else
                { cpu with PC = cpu.PC + 1u }, mmu // Skip opcode
        | 0xC8uy -> // RET Z
            cycles <- if checkZ cpu.Flags true then 20 else 8
            if checkZ cpu.Flags true then
                let returnAddr, newSp, mmuAfterPop = popWord mmu cpu.SP
                { cpu with PC = returnAddr; SP = newSp }, mmuAfterPop
            else
                { cpu with PC = cpu.PC + 1u }, mmu
        | 0xD0uy -> // RET NC
            cycles <- if checkC cpu.Flags false then 20 else 8
            if checkC cpu.Flags false then
                let returnAddr, newSp, mmuAfterPop = popWord mmu cpu.SP
                { cpu with PC = returnAddr; SP = newSp }, mmuAfterPop
            else
                { cpu with PC = cpu.PC + 1u }, mmu
        | 0xD8uy -> // RET C
            cycles <- if checkC cpu.Flags true then 20 else 8
            if checkC cpu.Flags true then
                let returnAddr, newSp, mmuAfterPop = popWord mmu cpu.SP
                { cpu with PC = returnAddr; SP = newSp }, mmuAfterPop
            else
                { cpu with PC = cpu.PC + 1u }, mmu

        // ... (other opcodes)
        | _ ->
            // Fallback for unimplemented opcodes
            failwithf "Unimplemented opcode: 0x%02X at PC: 0x%04X" opcode cpu.PC

    // Advance PC for non-jump/call/return instructions if not already handled
    let finalCpu =
        if not (cpu.Halted || cpu.Stopped) && newCpu.PC = cpu.PC then // Only advance if PC wasn't explicitly set by a control flow instruction
            match opcode with
            // Instructions that take 2 bytes (opcode + 8-bit operand)
            | 0x18uy | 0x20uy | 0x28uy | 0x30uy | 0x38uy -> { newCpu with PC = newCpu.PC + 2u }
            // Instructions that take 3 bytes (opcode + 16-bit operand)
            | 0xC2uy | 0xC3uy | 0xCAuy | 0xD2uy | 0xDAuy
            | 0xC4uy | 0xCCuy | 0xD4uy | 0xDCuy -> { newCpu with PC = newCpu.PC + 3u }
            // Instructions that take 1 byte (all others)
            | _ -> { newCpu with PC = newCpu.PC + 1u }
        else
            newCpu
    
    (finalCpu, newMmu, cycles)

๐Ÿง  Important: The PC calculation for relative jumps (JR e) is critical. The offset e is signed and added to PC + 2 (the current PC plus the length of the JR e instruction itself). For CALL nn, the returnAddr pushed to the stack is PC + 3 (the current PC plus the length of the CALL nn instruction). Pay close attention to these offsets; off-by-one errors are a very common source of bugs in emulators.

โšก Quick Note: The cycle counts included are based on common Game Boy CPU documentation (like Pan Docs). These are approximations and will need fine-tuning as we build out more of the emulator’s timing system. For now, they provide a reasonable baseline for instruction execution cost.

Testing & Verification

With control flow implemented, we can start verifying that our CPU can correctly alter its execution path. This involves both isolated unit tests for stack operations and observing the CPU’s behavior with simple test ROMs.

Unit Tests for Stack Operations

Let’s add some basic unit tests for pushWord and popWord in MmuTests.fs to ensure our little-endian handling and SP manipulation are correct.

// tests/GbSharp.Core.Tests/MmuTests.fs

module GbSharp.Core.Tests.MmuTests

open Xunit
open GbSharp.Core
open GbSharp.Core.Mmu

// ... (existing tests)

[<Fact>]
let ``pushWord and popWord should correctly handle 16-bit values and SP`` () =
    let initialMmu = Mmu.create () // Assuming Mmu.create initializes memory
    let initialSp = 0xFFFEu
    let testValue = 0x1234u

    // Push word
    let mmuAfterPush, spAfterPush = Mmu.pushWord initialMmu initialSp testValue
    Assert.Equal(0xFFFCu, spAfterPush) // SP should decrement by 2 (0xFFFE -> 0xFFFC)
    // Verify little-endian storage: LSB at SP, MSB at SP+1
    Assert.Equal(0x34uy, Mmu.readByte mmuAfterPush 0xFFFCu) // Low byte (0x34) at 0xFFFC
    Assert.Equal(0x12uy, Mmu.readByte mmuAfterPush 0xFFFDu) // High byte (0x12) at 0xFFFD

    // Pop word
    let poppedValue, spAfterPop, _ = Mmu.popWord mmuAfterPush spAfterPush
    Assert.Equal(testValue, poppedValue) // Verify the correct value was popped
    Assert.Equal(initialSp, spAfterPop) // SP should return to its original value (0xFFFC -> 0xFFFE)

Verification with Simple Test ROMs

While full Game Boy ROMs will require more components (especially the PPU), we can use small, custom-written ROMs (or early stages of well-known test suites like Blargg’s CPU instruction tests) to verify these opcodes.

Consider a simple Game Boy assembly program:

; Example Assembly (RGBDS syntax)
; Filename: control_flow_test.asm
    SECTION "Test Code",ROM0[$100] ; Start at 0x100, past typical boot ROM area

    LD SP, $FFFE    ; Initialize stack pointer
    LD HL, $C050    ; Load an arbitrary value into HL for JP (HL) test

Start:
    CALL Subroutine1 ; Call a subroutine
    LD A, $AA       ; This instruction should execute after Subroutine1 returns
    JR NZ, SkipJump ; Conditional relative jump (if Z flag NOT set)
    JP $0000        ; Should not be reached if NZ is true

SkipJump:
    LD B, $BB       ; Should execute
    JP $0100        ; Jump back to Start for a simple loop (or halt here for a real test)

Subroutine1:
    LD C, $CC       ; Set C register
    JR Z, SkipRet   ; Conditional relative jump (if Z flag SET, should NOT jump if Z is false)
    RET             ; Return from subroutine
SkipRet:
    LD D, $DD       ; Should not be reached in normal flow
    RET

To test this:

  1. Assemble the ROM: Use an assembler like RGBDS (RGBASM) to compile this .asm file into a .gb ROM.
    rgbasm -o control_flow_test.o control_flow_test.asm
    rgblink -o control_flow_test.gb control_flow_test.o
    
  2. Load into Emulator: Modify your Program.fs (or main emulation loop) to load control_flow_test.gb.
  3. Log CPU State: Implement detailed logging of cpu.PC, cpu.SP, and relevant registers (A, B, C, D, E, H, L, F) at each instruction step.
  4. Trace Execution: Manually trace the PC and SP values in your logs:
    • Observe SP initializing to 0xFFFE.
    • CALL Subroutine1 should push 0x103 (address of LD A, $AA) onto the stack, and PC should jump to Subroutine1’s address. SP should decrement to 0xFFFC.
    • Inside Subroutine1, LD C, $CC executes.
    • JR Z, SkipRet should not jump if the Z flag is false (which it typically is after LD C, $CC unless CC was 0).
    • RET should pop 0x103 from the stack, restoring PC to 0x103. SP should increment back to 0xFFFE.
    • LD A, $AA should then execute.
    • JR NZ, SkipJump should jump to SkipJump (if Z flag is still false).
    • LD B, $BB should execute.
    • JP $0100 should jump PC back to 0x0100 (Start).

This manual inspection, possibly augmented with a debugger or detailed logging, is invaluable for understanding the subtle interactions of hardware components.

โšก Real-world insight: Emulator development heavily relies on detailed logging of CPU state (PC, SP, registers, flags) and memory dumps. This “printf debugging” approach, especially in early stages, is often more effective than traditional debuggers for understanding the subtle, cycle-accurate behavior of emulated hardware.

Production Considerations

The correctness and performance of control flow instructions are paramount. Any error here will quickly lead to a crashed or misbehaving emulator, making the system unstable and unpredictable.

  • Correctness is King: A single byte off in a PC calculation, an incorrect signed offset for JR, or an endianness error in stack operations will cause the CPU to execute garbage data or jump to incorrect locations. This leads to immediate crashes or subtle, hard-to-debug issues later in development. Thoroughly test PC updates and pushWord/popWord against official documentation.
  • Performance: Jumps, calls, and returns are frequent operations in any program. Our current F# implementation, while functional and clear, might involve some overhead due to immutable state updates and pattern matching. For a Game Boy, which runs at approximately 4.19 MHz, the CPU emulation loop needs to be extremely fast. We’ll revisit performance optimizations later, but for now, focus on absolute correctness.
  • Stack Overflow/Underflow: While Game Boy software typically manages the stack within its limits, a robust emulator might include checks to detect SP going out of the expected RAM range (e.g., 0xC000-0xDFFF for WRAM, or 0xFFFE downwards). This could indicate a bug in the emulated program or, more critically, in the emulator itself. Detecting this early can prevent memory corruption.

๐Ÿ”ฅ Optimization / Pro tip: For extreme performance-critical sections, especially the main CPU loop, some functional emulators might resort to highly optimized, potentially more imperative, internal structures or leverage F#’s mutable features carefully. However, prioritize correctness and clarity first; optimize only when profiling reveals a bottleneck.

Common Issues & Solutions

  1. Incorrect PC after Jump/Call:

    • Issue: The CPU jumps to the wrong address, or returns to an incorrect instruction, leading to unexpected program flow or crashes.
    • Cause: Miscalculation of operand addresses (PC + 1u, PC + 2u), or incorrect handling of signed offsets for JR. For CALL, pushing PC instead of PC + instruction_length (the address after the CALL instruction).
    • Solution: Double-check the Game Boy CPU documentation (e.g., Pan Docs, GBDEV Wiki) for each instruction’s operand length and how PC is modified. Log PC before and after each instruction to trace the flow. โš ๏ธ What can go wrong: For JR e, the e is a signed 8-bit value. Using uint8 directly without casting to int8 before arithmetic operations will lead to incorrect signed extension, causing jumps to wildly wrong addresses. Always cast byte from readByte to int8 before performing arithmetic with it for relative offsets.
  2. Stack Corruption:

    • Issue: CALLs push incorrect return addresses, or RETs pop garbage values, leading to program crashes or jumps to random memory locations.
    • Cause: Incorrect pushWord/popWord implementation, particularly regarding the little-endian byte order or SP increment/decrement logic.
    • Solution: Thoroughly test pushWord and popWord in isolation (as shown in unit tests). Print SP and the stack memory region (e.g., 0xFFF0 to 0xFFFF) before and after calls/returns to verify values are pushed and popped correctly. โšก Real-world insight: The stack is a common vector for security exploits in real systems. While our emulator isn’t directly exposed to external threats, a corrupted stack in emulation can mimic such issues, making it a valuable learning experience in debugging memory-related problems.
  3. Conditional Jumps/Calls Not Working:

    • Issue: Instructions like JP Z, nn always jump, or never jump, regardless of the Zero flag’s state, leading to incorrect program logic.
    • Cause: Incorrectly checking the CpuFlags record, or the flags themselves are not being set correctly by previous arithmetic/logic instructions (from previous chapters).
    • Solution: Ensure the checkZ and checkC helper functions are correct. Verify that arithmetic and logic instructions are correctly updating the Z, N, H, C flags in CpuState.Flags. Debug by inspecting the Flags state immediately before a conditional instruction.

๐Ÿง  Check Your Understanding

  • Why is the stack pointer initialized to 0xFFFE in the Game Boy, and what does this imply about the direction the stack grows?
  • Explain the critical difference between JP nn and JR e instructions, both in terms of addressing and common use cases.
  • What is the significance of pushing PC + 3u as the return address for a CALL nn instruction, rather than just PC?

โšก Mini Task

Modify a simple Game Boy assembly program (or create a new one if you’re comfortable with assembly) to include a CALL to a subroutine, and then a RET. Add an LD A, $00 instruction before a JP Z, $XXXX and another LD A, $01 before a JP NZ, $YYYY. Load this ROM into your emulator and trace the PC and SP values, along with the Zero flag, to confirm correct conditional execution.

๐Ÿš€ Scenario

You’ve implemented all control flow instructions, but when running a test ROM, the emulator consistently crashes with an “Unimplemented opcode” error after a specific CALL instruction, even though the opcode itself is implemented. You suspect stack corruption. How would you approach debugging this, focusing on the stack, PC, and memory dumps? What specific data points would you log or inspect?

๐Ÿ“Œ TL;DR

  • CPU control flow instructions (JP, JR, CALL, RET) enable complex program execution paths.
  • The Stack Pointer (SP) and memory stack are fundamental for CALL/RET operations, managing return addresses.
  • Correctly handling 16-bit values (little-endian) and SP manipulation in pushWord/popWord is crucial for stack integrity.
  • Precise PC calculation, especially for relative jumps and call return addresses, is a common source of emulator bugs.

๐Ÿง  Core Flow

  1. Extend CpuState: Add the SP register to track the stack’s current top.
  2. Implement Stack Primitives: Create pushWord and popWord in Mmu.fs to safely interact with memory, handling little-endian byte order and SP updates.
  3. Integrate Opcodes: Add match cases for all JP, JR, CALL, and RET variants into the executeInstruction loop in Cpu.fs.
  4. Verify Logic: Ensure PC and SP are updated correctly, paying close attention to instruction lengths and conditional flag checks.
  5. Test and Debug: Use unit tests for stack operations and detailed logging with simple test ROMs to trace execution flow.

๐Ÿš€ Key Takeaway

Mastering CPU control flow is the gateway to executing complex program logic. The interplay between the Program Counter, Stack Pointer, and memory stack is a fundamental pattern in computer architecture, and understanding its precise, cycle-accurate implementation is crucial for any low-level system development.


References

  1. Pan Docs: The Ultimate Game Boy Technical Manual. Essential for Game Boy hardware details, including CPU, MMU, and PPU specifications. https://gbdev.io/pandocs/
  2. F# Language Reference: Microsoft Learn. Official F# documentation, providing syntax, language features, and best practices. https://learn.microsoft.com/en-us/dotnet/fsharp/
  3. .NET SDK: Microsoft Learn. Official .NET documentation, covering the runtime, libraries, and tools for F# development. https://learn.microsoft.com/en-us/dotnet/
  4. Game Boy CPU Opcodes: GBDEV Wiki. Detailed opcode information for the SM83 CPU, including instruction formats, effects on flags, and cycle counts. https://gbdev.io/wiki/index.php?title=CPU_Instruction_Set

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.