1

Audio System


Avatar
Marcel Pineoak

Summary:

1. Introduction to game audio management
1.1 Folder Structure
1.2 Audio Banks
1.3 Bus & Mixers
1.4 Emitter & Listener
1.5 3D Visualization & Attenuation
1.6 Performance & Technical Info

1. Introduction to game audio management

Hello! My name is Marcel, also known as Pineoak. I'm a musician and sound designer with experience using FMOD and game engines such as Unity, Godot, Ren'Py, and more. I've worked on Adventure Quest (2002) and participated in several game jams over the past three years. marcel-pineoak.itch.io/

I've been working with game audio for a while, and you can effectively utilize the audio logic behind FMOD and WWISE.

1.1 Folder Structure

Here is my personal take on the structure. Just place the audios inside:

project/audio/

├── mus

├── sfx

├── env

├── ui

├── voice

There are other naming conventions like the Japanese one (BGM, BGS, SE, ME).

1.2 Audio Banks

Audio files are registered to their own banks that are loaded during initialization and can be classified by:

- Music: Background music tracks that set the tone and atmosphere for the game.
- Environment: Ambient sounds that enhance the game's setting. These can be either local (e.g., a crackling torch, a flowing stream) or general (e.g., wind, bird calls).
- SFX: Specific and non-specific one-shot sounds triggered by interactions (e.g., a door opening, a character jumping).
- UI: Sounds associated with user interface interactions (e.g., button clicks, menu navigation).
- Voice: Sound from dubbing.

Environment and music sounds are treated differently since they might loop.

1.3 Bus & Mixers

Mixers, also known as Voltage Controlled Amplifiers (VCAs), are responsible for controlling the overall volume of a subgroup of faders called Channels. They are linked to UI fader controls, converting from decimal (between 0 and 1) to dB. The maximum the computer can handle is 0 dB; beyond that, it will clip and damage the hardware.

The VCAs are:

- Music
- Environment & SFX
- UI
- Voice

The Channels are subgroups and play only one audio (you can change that) at a time. They are linked to Mixers (VCAs).

VCA: Music
├── layer01
├── layer02
├── layer03
* Layers are used in vertical composing technique.

VCA: Environment & SFX
├── sfx01
├── sfx02
├── env01
├── env02

VCA: UI
├── ui01

VCA: Voice
├── voice01

Sound files that are loopable, like music and environment sounds, are treated differently:

- Music & Environment: Keeps repeating (loop)
- SFX & UI: One-shot sound (plays only once)

A good documentation for that is Ren'Py audio system: www.renpy.org/doc/html/audio.html

1.4 Emitter & Listener

Emitter: It's the sound source. It can be the player or the object that is producing the sound or interaction.

Listener: Who's listening to the sound source. Usually placed on the player head or camera.

1.5 3D Visualization & Attenuation

The most effective tool for audio is visualizing the 3D area in which the sound can be heard. This is particularly useful for localized environmental sounds such as fireplaces, torches, and water streams. These sounds typically have a radius, and as you move away from them, a low pass filter will be applied, and the volume will be reduced. To minimize processing consumption, the asset will be paused when you are far away and will only resume playing when you get close again.

1.6 Performance & Technical Info

You can improve the loading by:

- Using .mp3 or .ogg files (convert automatically if they aren't in these formats).
- Streaming environment and music banks.
- Indexing all the audio files but not loading them unless they are used in a scene. Creating banks for every different scene can also be effective.
- Enabling a compressor to normalize the sound effects and music volume. A good article about it can be found here: blog.audiokinetic.com/en/loudness-processing-best-practices-series-chapter1-loudness-measurement-part1/
- Avoiding processing the signal with reverbs and delays, as they will increase RAM and CPU usage.

Godot has a simple explanation about audio here: docs.godotengine.org/en/stable/tutorials/audio/audio_buses.html

For those unfamiliar with it, it can be explained with a few facts:

- The decibel (dB) scale is a relative scale. It represents the ratio of sound power by using 20 times the base 10 logarithm of the ratio (20 × log10(P/P0)).
- For every 6 dB, sound amplitude doubles or halves. 12 dB represents a factor of 4, 18 dB a factor of 8, 20 dB a factor of 10, 40 dB a factor of 100, etc.
- Since the scale is logarithmic, true zero (no audio) can't be represented.
- 0 dB is the maximum amplitude possible in a digital audio system. This limit is not the human limit but a limit from the sound hardware. Audio with amplitudes that are too high to be represented properly below 0 dB creates a kind of distortion called clipping.
- To avoid clipping, your sound mix should be arranged so that the output of the master bus never exceeds 0 dB.
- Every 6 dB below the 0 dB limit, sound energy is halved. It means the sound volume at -6 dB is half as loud as 0 dB. -12 dB is half as loud as -6 dB, and so on.
- When working with decibels, sound is considered no longer audible between -60 dB and -80 dB. This makes your working range generally between -60 dB and 0 dB.

Audio is really a complex but delightful world. I hope I could describe the audio system feature in a clear way.

I'd also like to know what's the Discord server.

A