# Audio Details

tl;dr:

The PPU is a bunch of state machines, and the APU is a bunch of counters.

Each of the four “conceptual” channels is composed of a “generation” circuit (designated “channel” in the above diagram), and a DAC. The digital value produced by the generator, which ranges between \$0 and \$F (0 and 15), is linearly translated by the DAC into an analog1 value between -1 and 1 (the unit is arbitrary).

The four analog channel outputs are then fed into the mixer2, which selectively adds them (depending on `NR51`) into two analog outputs (Left and Right). Thus, the analog range of those outputs is 4× that of each channel, -4 to 4.

Then, both of these two get their respective volume scaled, once from `NR50`, and once from the volume knob (if the console has one). Note that the former step never mutes a non-silent input, but the latter can.

Each of the two analog outputs then goes through a high-pass filter (HPF). For short, a HPF constantly tries to “pull” the signal towards analog 0 (neutral); the reason for that is explained further below.

1

To be clear: digital values are discrete and clear-cut; conversely, the analog domain is continuous. The former is what computers use, the latter is what the real world is made of.

2

Actually, VIN acts as a 5th channel fed into the mixer, whose control bits are in NR50 instead of NR51. This was omitted from the diagram for simplicity.

## PCM registers

These two registers, only present in the Game Boy Color and later models, allow reading the output of the generation circuits directly; this is very useful for testing “internal” APU behavior. These registers are not documented in any known Nintendo manual.

### FF76 — PCM12 (CGB Mode only): Digital outputs 1 & 2 [read-only]

The low nibble is a copy of sound channel 1’s digital output, the high nibble a copy of sound channel 2’s.

### FF77 — PCM34 (CGB Mode only): Digital outputs 3 & 4 [read-only]

Same, but with channels 3 and 4.

## Finer technical explanation

### DIV-APU

A “DIV-APU” counter is increased every time `DIV`’s bit 4 (5 in double-speed mode) goes from 1 to 0, therefore at a frequency of 512 Hz (regardless of whether double-speed is active). Thus, the counter can be made to increase faster by writing to `DIV` while its relevant bit is set (which clears `DIV`, and triggers the falling edge).

The following events occur every N DIV-APU ticks:

EventRateFrequency3
Envelope sweep864 Hz
Sound length2256 Hz
CH1 freq sweep4128 Hz
3

Indicated values are under normal operation; the frequencies will obviously differ if writing to `DIV` to increase the counter faster.

### Mixer

A high-pass filter (HPF) removes constant biases over time. The HPFs therefore remove the DC offset created by inactive channels with an enabled DAC, and off-center waveforms.

Avoiding audio pops

Enabling or disabling a DAC (see below), adding or removing it using NR51, or changing the volume in NR50, will cause an audio pop. (All of these actions cause a change in DC offset, which is smoothed out by the HPFs over time, but still creates a pop.)

To avoid this, a sound driver should avoid turning the DACs off; this can be done by writing \$08 to `NRx2` (silences the channel but keeps the DAC on) then \$80 to `NRx4` to retrigger the channel and reload `NRx2`.

The HPF is more aggressive on GBA than on GBC, which itself is more aggressive than on DMG. (The more “aggressive” a HPF, the faster it pulls the signal towards “analog 0”; this tends to also distort waveforms.)

### DACs

Channel x’s DAC is enabled if and only if `[NRx2] & \$F8 != 0`; the exception is CH3, whose DAC is directly controlled by bit 7 of NR30 instead. Note that the envelope functionality changes the volume, but not the value stored in NRx2, and thus doesn’t disable the DACs.

If a DAC is enabled, the digital range \$0 to \$F is linearly translated to the analog range -1 to 1, in arbitrary units. Importantly, the slope is negative: “digital 0” maps to “analog 1”, not “analog -1”.

If a DAC is disabled, it fades to an analog value of 0. The nature of this fade is not entirely deterministic and varies between models.

NR52’s low 4 bits report whether the channels are turned on, not their DACs.

### Channels

A channel is activated by a write to NRx4’s MSB, unless its DAC is off, which forces it to be disabled as well. The opposite is not true, however: a disabled channel outputs 0, which an enabled DAC will dutifully convert into “analog 1”.

A channel can be deactivated in one of the following ways:

### Pulse channels (CH1, CH2)

Each pulse channel has an internal “duty step” counter, which is used to index into the selected waveform (each background stripe corresponds to one “duty step”)4. The “duty step” increments at 8 times the channel’s frequency).

The “duty step” counter cannot be reset, except by turning the APU off, which sets both back to 0. Retriggering a pulse channel causes its “duty step timer” to reset, thus retriggering a pulse channel often enough will cause its “duty step” to never advance.

When first starting up a pulse channel, it will always output a (digital) zero.

4

Actually, there is not LUT, but the manipulations done to the counter’s bits are equivalent.

### Wave channel (CH3)

CH3 has an internal “sample index” counter. Each time it is ticked (as determined by the “wavelength” in NR33/NR34), that “sample index” is incremented, and then the corresponding “sample” (nibble) is read from wave RAM. (This means that sample #0 is skipped when first starting up CH3.)

CH3 does not emit samples directly, but stores every sample read into a buffer, and emits that continuously; (re)triggering the channel does not clear nor refresh this buffer, so the last sample ever read will be emitted again. This buffer is cleared when turning the APU on, so CH3 will emit a “digital 0” when first powered on.

CH3 output level control does not, in fact, alter the output level. It shifts the digital value CH3 is outputting, not the analog value. This only matters when changing the setting mid-playback: the digital values being shifted bias them towards 0, which biases the analog output towards “1”; the HPF will smooth this over time, but not instantly.

### Noise channel (CH4)

CH4 revolves around a LFSR, pictured above. The LFSR is 16-bit internally, but really acts as if it was 15-bit.

When CH4 is ticked (at the frequency specified via `NR43`):

1. The result of $\overline{){\mathrm{LFSR}}_{0}\oplus {\mathrm{LFSR}}_{1}}$ (`1` if bit 0 and bit 1 are identical, `0` otherwise) is written to bit 15.
2. If “short mode” was selected in `NR43`, then bit 15 is copied to bit 7 as well.
3. Finally, the entire LFSR is shifted right, and bit 0 selects between 0 and the chosen volume.

The LFSR is set to 0 when (re)triggering the channel.

Lock-up

If the “active” portion of the LFSR only contains “1” bits, only “1” bits will be generated; this prevents CH4 from ever changing values (until retriggered), essentially silencing it.

This does not happen under regular operation, but can be achieved by switching from 15-bit to 7-bit mode when the LFSR’s bottom 7 bits are all “1“s (which occurs relatively early after triggering the channel, for example).

• Instead of mixing being done by analog circuitry, it’s instead done digitally; then, sound is converted to an analog signal and an offset is added (see `SOUNDBIAS` in GBATEK for more details).