Glossary of terms


HRTF (Head-Related Transfer Function) is a mathematical function that describes the changes a sound undergoes as it travels from the source to the listener’s ears. It is a crucial tool in the field of 3D audio, enabling precise spatial sound effects.

The key aspects of HRTF capture the following factors influencing sound:

  • Head shape and size:
    Sound interacts with the surface of the head before reaching the ears, causing diffraction and reflection, which affect the intensity and directionality of the sound.
  • Ear shape:
    The structure of the pinna (outer ear) enhances or attenuates different frequencies, aiding in the localization of sound sources in the vertical plane.
  • Shoulders and torso:
    These also affect the sound wave’s path, contributing subtle effects to directional perception.
  • Distance and angle:
    The interaural time difference (ITD) and interaural level difference (ILD) — the time and intensity differences as sound reaches each ear — are crucial for sound localization.

Convolution reverb is a digital signal processing technique used to simulate the acoustic characteristics of real spaces. It relies on an impulse response (IR), a recording that captures how sound propagates and reflects in a specific environment. By performing a convolution operation between the original audio signal and the impulse response, it reproduces the natural reverb of that space. This method accurately recreates the acoustic ambiance of places like churches, theaters, or rooms and is widely used in music production, film sound design, and virtual reality applications.

Equal loudness curves are graphs that describe how the human ear perceives the loudness of sounds at different frequencies and sound pressure levels. They show that our ears are most sensitive to mid-range frequencies (around 1kHz to 5kHz) and less sensitive to low and high frequencies, especially at lower volumes. The curves are measured in phons, representing the perceived loudness relative to a reference tone at 1kHz. This concept is widely used in audio engineering and acoustic design, such as in equalizer adjustments and optimizing perceived loudness.

Time-domain and frequency-domain analysis of audio signals are two fundamental approaches to understanding and processing sound:

  • Time-domain analysis:
    Focuses on how an audio signal changes over time, typically represented by a waveform. This method visually shows the variation of amplitude with time, helping to observe the signal’s dynamic properties, transients, and rhythm.
  • Frequency-domain analysis:
    Transforms the audio signal into a representation based on frequency (e.g., using Fourier Transform), revealing the signal’s frequency components and their amplitude distribution. This analysis is useful for understanding the signal’s tonal characteristics, frequency distribution, and noise components.

Time-domain analysis is ideal for visualizing how a signal evolves over time, while frequency-domain analysis provides insights into its spectral structure. Combining both approaches offers a comprehensive understanding of audio signals.

Loudness refers to the subjective perception of the strength or intensity of a sound by the human ear. While it is related to the physical sound pressure level (SPL), it is not identical to it. Loudness is influenced not only by the sound’s intensity but also by its frequency, as the human ear is more sensitive to certain frequencies (e.g., mid-range frequencies). Loudness is often measured in units like phon or sone to represent perceived volume levels. In modern audio technology, loudness is calculated using international standards (such as ITU-R BS.1770) and is widely used in broadcasting, music production, and audio optimization to ensure balanced and comfortable listening experiences.

Signal masking refers to the phenomenon where a stronger sound makes a weaker sound less perceptible or completely inaudible when they occur simultaneously. Masking effects are typically categorized into two types:

  • Frequency masking (spectral masking):
    When two sounds have similar frequencies, the stronger sound can mask the weaker one. This principle is widely used in audio compression technologies like MP3, where inaudible components of a signal are removed to save data.
  • Temporal masking (time-domain masking):
    A stronger sound can mask a weaker sound that occurs just before or after it, making the weaker sound harder to detect.

Masking is a key characteristic of human auditory perception and plays a significant role in audio processing, sound encoding, and auditory research.

Frequency refers to the number of vibrations or cycles a sound wave completes in one second, measured in Hertz (Hz). One Hz equals one cycle per second. In the context of sound, frequency determines the pitch, or how high or low a sound is perceived. The human hearing range typically spans from 20 Hz to 20,000 Hz, with low frequencies (e.g., bass sounds) perceived as deep and rich, while high frequencies (e.g., bird chirps) are sharp and clear. Frequency is a fundamental characteristic of audio signals and plays a critical role in music, acoustics, and audio engineering for describing and analyzing sound properties.


Leave a Reply

Your email address will not be published. Required fields are marked *