跳至主要内容

Music Production Vocabulary: 60+ Essential Terms Every Producer Must Know

The essential music production glossary — from EQ, compression, and sidechaining to buss, stem, and LUFS. Learn the vocabulary that separates beginners from pros.

Music Production Vocabulary: 60+ Essential Terms Every Producer Must Know

You cannot mix what you cannot name. Music production vocabulary bridges the gap between what you hear in your head and what you can actually fix in your DAW. When a mix feels muddy, the vocabulary to identify the problem is low-mid buildup between 250 and 500 Hz. When a vocal disappears behind guitars, the vocabulary to fix it is high-mid presence EQ at 3–5 kHz or upward expansion via a compressor sidechain. This glossary covers 60+ essential terms — organized by discipline — so you can communicate like a professional and make decisions with precision.

Signal Flow Terms

Signal flow is the path audio takes from source to output. Understanding it prevents noise, distortion, and head-scratching routing mysteries. Every connection in your studio — from microphone to DAW to master output — is part of the signal chain.

Signal Chain
The complete path an audio signal travels from source to destination, passing through every piece of gear and processing along the way. A typical signal chain for a recorded vocal might be: microphone → XLR cable → preamp → EQ plugin → compressor plugin → DAW track → master buss → limiter → audio interface output. Each link in the chain colours the sound — understanding signal chain lets you identify which link is causing a problem.
Input / Output
Input is where audio enters a device or software track; output is where it leaves. On a hardware interface, inputs receive signal from microphones or instruments. In a DAW, every track has an input (the source it's recording or playing back) and an output (where it sends its signal in the mix). Matching input levels correctly at the source prevents clipping later in the chain.
Buss (Bus)
A routing destination that sums multiple signals into a single path. Routing drums to a drum buss lets you process them together with one compressor, adding cohesion. A master buss is the final destination for every track in your session. In analogue mixing desks, a buss is a physical summing circuit; in a DAW, it's a virtual routing concept.
Send / Return
A send routes a portion of a track's signal to a parallel destination (like a reverb or delay aux), while the original signal continues unchanged on the main track. The return is the channel that receives that sent signal and processes it independently before blending it back into the mix. Sends let you apply the same reverb or delay to multiple tracks without duplicating plugins on each channel.
Insert
An insert is a slot in a channel strip that interrupts the signal flow to process the entire signal. Unlike a send, which splits off only part of the signal, an insert processes 100% of what passes through. Insert slots are where you place EQ, compression, and saturation — processors that must affect the whole signal. Most DAW channels have 4–8 insert slots per channel.
Dry / Wet
Dry is the unprocessed original signal; wet is the fully processed output of an effects unit. A reverb with 100% wet output contains no dry signal — it is entirely the reverb's sound. A 50/50 blend plays the dry original alongside the reverb at equal levels, creating space without washing out the source. Most effects plugins offer a dry/wet knob for precise control.
Gain Staging
The practice of setting optimal signal levels at every stage of your signal chain. Good gain staging means each processor receives a healthy level — strong enough to override noise floor, quiet enough to avoid clipping. The goal is unity gain through the entire chain: the output level matches the input level when all processing is bypassed. Poor gain staging causes either noise (signals too quiet) or distortion (signals too hot).
Unity Gain
The point at which output level equals input level — no amplification, no attenuation. When a plugin or hardware unit is at unity gain, it is sonically neutral. Setting your gain staging to unity at the start of a mixing session gives you a clean baseline before you add any processing. Any volume change you hear after adding a plugin is therefore the effect of that plugin, not a level shift.
dBFS (Decibels Full Scale)
The decibel scale used in digital audio, where 0 dBFS represents the maximum representable level before clipping. Everything below 0 is negative. In digital, 0 dBFS is a ceiling — you never want sustained signals touching it. Headroom refers to the distance between your operating level and 0 dBFS. A mix peaking at -6 dBFS has 6 dB of headroom; a mix peaking at -1 dBFS has almost none.
LUFS (Loudness Units Full Scale)
The international standard for measuring perceived audio loudness, accounting for how human hearing weights different frequencies. Unlike peak meters (which show instantaneous spikes) or RMS (which shows average level), LUFS integrates loudness over time and applies frequency weighting to reflect how we actually perceive loudness. Streaming platforms use LUFS targets to ensure consistent playback levels. A master at -14 LUFS will sound the same loudness on Spotify regardless of its peak or RMS values.
RMS (Root Mean Square)
A measure of average signal power over time, roughly corresponding to perceived loudness. Unlike peaks, which capture transients, RMS tells you what the sustained energy of a sound feels like. A drum hit might peak at 0 dB but have an RMS of -12 dB — the peak is brief, but the average loudness is much lower. RMS is more useful than peak for understanding how "loud" something will feel in a mix.
Peak
The highest instantaneous level a signal reaches, measured in dB. Peaks represent transients — the initial attack of a drum hit, a pluck, a consonant spike on a vocal. While peaks are important for avoiding digital clipping, they are a poor guide to perceived loudness. A snare drum can have sharp peaks that read as dangerous on a meter while its RMS sits comfortably within the mix. Use a peak meter alongside an RMS or LUFS meter for complete level awareness.

EQ and Frequency Terms

Equalization is the art of shaping tone by boosting or cutting specific frequency ranges. Every instrument occupies a frequency spectrum; EQ lets you carve space for each one so the mix breathes. Poor EQ decisions create either thin, hollow sounds or muddy collisions where instruments fight for the same acoustic space.

Low-Pass Filter
A filter that allows frequencies below a set cutoff point to pass through while attenuating everything above it. A low-pass filter at 5 kHz on a bass track removes frequencies above 5 kHz — the shrill, brittle harmonics — while keeping the sub and fundamental. Low-pass filters are also called high-cut filters. On synth leads, a gentle low-pass with resonance creates the classic sweeping, buzzy sound.
High-Pass Filter
A filter that allows frequencies above a set cutoff point to pass while attenuating everything below it. A high-pass filter at 80 Hz on a vocal removes low-end rumble from room noise, air conditioning, and proximity effect without affecting the voice itself. High-passing everything except kick and bass is one of the fastest ways to clean up a muddy mix. Also called a low-cut filter.
Band-Pass Filter
A combination of a low-pass and high-pass filter working together to allow only a specific frequency range to pass. A band-pass at 1 kHz with a 1-octave width lets frequencies around 1 kHz through while cutting everything below and above. Band-pass filters are used to isolate specific parts of a sound — like emulating the lo-fi character of telephone audio (300 Hz–3.4 kHz) or focusing a guitar on its most musical range.
Shelf (Shelving EQ)
An EQ type that boosts or cuts all frequencies on one side of a set point uniformly. A high shelf boosted at 8 kHz adds brightness by raising everything above 8 kHz by the same amount. A low shelf cut at 200 Hz reduces mud by attenuating everything below 200 Hz. Shelving EQs are broad tonal adjustments — use them when you want to darken or brighten a sound without surgical precision.
Bell / Parametric EQ
A bell (also called parametric) is an EQ curve shaped like a bell that affects a defined frequency range around a centre point. The three controls are frequency (centre), gain (depth of boost or cut), and Q (bandwidth — how wide or narrow the bell is). A narrow Q (high number) targets a specific frequency like surgical removal of a hum (60 Hz). A wide Q (low number) affects a broader range for more subtle tonal shaping.
Q / Resonance
Q (quality factor) defines the bandwidth of a parametric EQ bell — how wide or narrow the affected frequency range is. A high Q (8–10) is a very narrow bell that targets a surgical frequency. A low Q (0.5–1) is wide and affects a broader range. In filter contexts, resonance refers to a peak at the cutoff frequency of a low-pass or high-pass filter — it adds harmonic emphasis that makes the filter sound brighter and more characterful, like the classic "filter sweep" on a synth.
Cut / Boost
Cut means reducing the level of a frequency range; boost means increasing it. Cuts are generally more transparent than boosts because removing problem frequencies reduces masking. A 3 dB cut at 300 Hz on a guitar can open space for a vocal without adding anything artificial. Boosts become audible as the ear locates the emphasized frequency — useful for adding presence (3–5 kHz) or air (10–16 kHz), but aggressive boosts can introduce harshness.
Frequency Range Allocations
Every mix can be divided into approximate frequency ranges, each responsible for a different aspect of the sound:
Range Frequency What it controls Common instruments
Sub-bass 20–60 Hz Physical weight, felt rumble, power Kick sub, 808 sub, organ, deep synths
Bass 60–250 Hz Warmth, body, fullness Bass guitar, kick body, bass synth
Low-mid 250–500 Hz Body, thickness — and mud when excessive Guitar body, snare body, piano low end
Mid 500 Hz–2 kHz Presence, honk, aggressive edge Guitar amp, snare attack, vocal sibilance area
High-mid 2–6 kHz Clarity, definition, vowel sounds Vocal presence, piano hammers, cymbals
Treble 6–20 kHz Air, sparkle, shimmer, detail Hi-hats, cymbals, string harmonics, reverb tails
Fundamental vs Harmonic
The fundamental is the lowest and loudest frequency of a pitched sound — the note you identify as the pitch. The harmonics are the higher frequencies that sit above the fundamental at integer multiples (2x, 3x, 4x the fundamental frequency) and give each instrument its timbre. A violin and a flute playing the same note (same fundamental) sound different because of their different harmonic series. EQing a guitar to sound dull means cutting harmonics; adding brightness means boosting them.
Transient vs Sustain
Transients are the initial attack phase of a sound — the crack of a snare hit, the pluck of a guitar string, the click of a kick drum beater. Transients are brief, high-energy spikes that carry punch and impact. Sustain is the portion of a sound that continues after the initial attack, the held note. Sounds with fast transients and short sustain (piano, guitar) feel percussive. Sounds with slow transients and long sustain (strings, pads, organ) feel lush and ambient. Fast attack on a compressor suppresses transients, reducing punch. Slow attack lets transients through, preserving punch while controlling the sustain.

Dynamics Terms

Dynamics processors control the variation between a signal's quietest and loudest moments. Where EQ shapes tone, dynamics shapes energy — the punch of a drum, the breathing of a vocal, the glue that holds a mix together.

Compression
Compression reduces dynamic range by automatically turning down signals that exceed a set level (the threshold). The ratio determines how much reduction occurs: at a 4:1 ratio, for every 4 dB the input exceeds the threshold, the output increases by only 1 dB. Used subtly (2–4 dB of gain reduction), compression adds punch, sustain, and glue. Used aggressively, it creates the dense, in-your-face energy of modern pop and hip-hop. Every compressor has five key parameters: threshold, ratio, attack, release, and makeup gain.
Limiter
A limiter is a compressor with an extreme ratio (usually 10:1 or higher, often infinite:1) that prevents a signal from exceeding the threshold. Where a compressor shapes dynamics over time, a limiter is a ceiling — it catches stray peaks that would otherwise clip. On a master buss, a limiter (sometimes called a brickwall limiter) is the final safety net before audio leaves your session. Over-limiting a master causes audible distortion called "limiting artefact" — a flattened, squashed sound that lacks life.
Gate (Noise Gate)
A gate is a dynamics processor that cuts the signal when it falls below a set threshold — essentially, a switch that only opens when the input is loud enough. Gates are used to silence the quiet periods between drum hits (so the bleed from other mics does not muddy the sound), to clean up guitar amp noise between riffs, or to remove room noise from a recorded source. A gate with a fast attack catches transients cleanly; a slow attack lets some initial sound through before cutting.
Expander
An expander is the opposite of a compressor — it increases dynamic range. Where compression reduces the difference between loud and quiet, expansion increases it. Downward expansion reduces signals below the threshold, making quiet parts even quieter — essentially a gentle gate. Upward expansion boosts signals above the threshold, making loud parts louder and restoring dynamics to overly compressed recordings. Expanders are less common than compressors but essential for advanced dynamics control.
Threshold and Ratio
Threshold is the dB level at which a dynamics processor begins to work. Ratio is how much gain reduction is applied once the threshold is exceeded. A threshold of -20 dB and a ratio of 4:1 means: when the signal exceeds -20 dB, only 1 dB of output increase happens for every 4 dB of input increase. Setting threshold is the first decision when compressing — set it so the loudest peaks trigger the compression, not the average level.
Attack and Release
Attack controls how quickly the compressor reduces gain once the signal crosses the threshold. Fast attack (1–10 ms) clamps down immediately — good for controlling vocal peaks or taming harsh transients. Slow attack (20–50 ms) lets the initial transient through before compression kicks in — good for adding punch to drums. Release controls how quickly gain reduction stops after the signal falls back below the threshold. Fast release (30–80 ms) restores volume quickly between notes — good for hi-hats. Slow release (100–300 ms) holds compression longer, creating smoother, sustained density.
Knee
The knee defines how gradually a compressor transitions from no compression to full-ratio compression as the signal approaches the threshold. A hard knee applies the full ratio the instant the signal crosses the threshold — aggressive and precise, like an SSL bus compressor. A soft knee begins applying compression gradually as the signal approaches the threshold, creating a smoother, more musical transition. Variable knee compressors let you dial in anywhere between the two.
Makeup Gain
After a compressor attenuates the signal, the output is quieter. Makeup gain restores the perceived loudness after compression — it does not undo the compression, it simply amplifies the compressed result. Matched level A/B testing is only valid when makeup gain is set so the compressed version sits at the same loudness as the original. Without it, compressed signals will always sound "better" only because they are louder, not because of the compression effect itself.
Parallel Compression
Parallel compression (also called New York compression) blends a dry signal with a heavily compressed copy. The dry signal preserves transients and natural dynamics; the compressed copy adds density, sustain, and glue. The result gets the benefits of heavy compression without the downside of a squashed, lifeless sound. Set up by sending a track to an aux with an aggressive compressor (ratio 8:1, fast attack, 8–12 dB of gain reduction), then blend the aux return with the original channel.
Sidechain / Sidechaining
Sidechaining uses the signal from one track to control the dynamics of another. The most common example: a kick drum sidechains a compressor on the bass — every time the kick hits, the bass is pulled down, creating rhythmic space between the two. In EDM, this creates the "pumping" effect. In pop mixing, it's a functional tool for making room for the kick without ducking the bass manually. Sidechain compression uses the external signal as the trigger; the compressor reacts to the kick, not the bass itself.
Pumping and Clashing
Pumping is an audible artefact where the release of a compressor is set too slowly, causing the gain reduction to release in an obvious, rhythmic way that draws attention. Clashing (or frequencies clashing) is when two instruments occupy the same frequency range at similar loudness, causing masking — each instrument becomes harder to hear clearly. The solution to clashing is EQ to separate them (cut one where the other lives) or volume automation to create space between them.

Time-Based Effects Terms

Time-based effects process audio by replicating, delaying, or modulating it over time. They add depth, space, and texture — transforming a dry, clinical recording into a living, dimensional mix.

Reverb: Room, Hall, Plate, Spring
Reverb is the acoustic simulation of a physical space — the collection of early reflections and diffuse reverberation that occurs when sound bounces off walls, ceiling, and floor. Room reverb emulates a small, tight acoustic space with fast reflections and short decay. Hall reverb simulates a large concert hall with slower, more expansive decay and distinct early reflections. Plate reverb is an artificial reverb created by exciting a large metal plate with a transducer — it has a dense, smooth tail with no natural acoustic colour, and is especially flattering on vocals and drums. Spring reverb uses a mechanical spring system to create a bouncy, wiry character — most associated with guitar amplifiers and surf music.
Delay / Echo
Delay records an incoming signal and plays it back after a set time interval. A short delay (20–80 ms) thickens a sound (often combined with slight pitch variation for doubling). A medium delay (80–300 ms) creates a distinct echo that is audible as a separate repetition. A long delay (500 ms+) creates rhythmic echoes that repeat in time with the music. Stereo delay uses two different delay times for left and right channels, creating width. Ping-pong delay bounces between left and right on each repeat — particularly effective on vocals and guitar arpeggios.
Chorus, Flanger, Phaser
These are all modulation effects that create richness and movement by slightly delaying and pitch-modulating a copy of the signal. Chorus uses a longer delay (15–35 ms) with a slow LFO (Low Frequency Oscillator) modulating the pitch, creating the impression of multiple instruments playing slightly out of tune — adding width and thickness to thin sounds like single-coil guitar or dry vocals. Flanger uses a very short, sweeping delay (0–10 ms) mixed with the original, creating a resonant "whooshing" sweep. Phaser uses all-pass filters to create peaks and notches in the frequency spectrum that sweep with an LFO — a softer, more phased version of flanging.
Tremolo and Auto-Pan
Tremolo is a volume modulation effect — an LFO repeatedly raises and lowers the volume of a signal at a set rate. On a guitar amp, this creates the "wiggle" of surf music. In a DAW, subtle tremolo on a vocal can add groove and energy. Auto-pan is similar but modulates stereo position (left-right) rather than volume — the signal appears to sweep between speakers. Both effects are simple LFO-controlled amplitude or pan modulation, not time-based in the same sense as reverb or delay.
Pre-Delay
Pre-delay is the time between the dry signal and the onset of reverb. A short pre-delay (10–30 ms) makes reverb feel close and tight — the source and the reverb are almost simultaneous. A long pre-delay (50–100 ms) separates the dry sound from its reverb tail, keeping the attack clear and defined while the tail fills the space behind it. On vocals, 20–40 ms pre-delay is standard — the voice remains present and articulate while gaining the spaciousness of reverb.
Decay Time and RT60
Decay time is how long a reverb tail takes to fade from its initial level to silence. RT60 (Reverberation Time 60 dB) is the standard measurement: how long it takes for reverb to decay by 60 dB from its initial level. A room with 0.8 s RT60 feels moderately live; 2.5 s RT60 is a large concert hall. In a DAW reverb plugin, decay time sets the overall length of the reverb tail. Short decay (0.5–1 s) for dry, present mixing; long decay (2–3 s) for spacious, ambient effects.

Sound Design Terms

Sound design is the craft of creating and shaping sound from raw materials — oscillators, noise, and modulation. Whether you are programming a synth patch or sculpting a drum hit, the vocabulary of sound design gives you precise control over timbre and texture.

Oscillator
An oscillator generates a repeating electronic waveform at a set frequency — the raw building block of synthesised sound. The basic waveforms are sine (pure tone, no harmonics), triangle (mild harmonic content), sawtooth (rich in harmonics, bright), and square/pulse (hollow, with only odd harmonics). Multiple oscillators combined create thick, complex timbres. The frequency of an oscillator determines the pitch — A440 means the oscillator cycles at 440 times per second, producing the note A above middle C.
Wavetable
Wavetable synthesis — pioneered by the Waldorf Microwave and later made popular by Serum and Vital — uses a table of pre-defined waveforms that can be scanned, morphed, and combined to create complex, evolving timbres. Unlike a static oscillator that cycles one waveform, a wavetable oscillator cycles through a sequence of waveforms in the table, creating smooth spectral morphing. Wavetable synthesis is particularly effective for modern bass sounds, atmospheric pads, and aggressive leads because of its harmonic complexity and ability to evolve over time.
Envelope (ADSR)
An envelope shapes how a parameter changes over time in response to a note trigger. The ADSR envelope (Attack, Decay, Sustain, Release) is the most common: Attack is the time to reach full level from zero; Decay is the time to fall from peak to the sustain level; Sustain is the level held while the key is held; Release is the time to fade to zero after the key is released. A fast attack and short decay on a pluck creates a percussive sound; a slow attack and long release creates a swelling pad.
LFO (Low Frequency Oscillator)
An LFO is an oscillator operating at sub-audio frequencies (typically 0.01–20 Hz) used to modulate parameters rather than generate sound directly. Routing an LFO to oscillator pitch creates vibrato; routing it to filter cutoff creates the classic "wobble" bass; routing it to amplitude creates tremolo. LFOs are what give synth patches movement and life — a static patch sounds robotic; an LFO modulating multiple parameters simultaneously creates a living, organic texture.
Filter and Resonance
In synthesizers, a filter removes (subtracts) energy from the frequency spectrum. The most common is a low-pass filter that rolls off highs while passing lows — the foundation of subtractive synthesis. Resonance (or Q) boosts energy at the filter's cutoff frequency, creating a peak that adds harmonic emphasis. High resonance on a low-pass filter creates the classic "squelchy" synth bass character (think 303 acid line). Too much resonance can self-oscillate — the filter begins to produce a tone at the cutoff frequency, useful for melodic effects but dangerous for hearing.
FM Synthesis
FM (Frequency Modulation) synthesis modulates one oscillator's frequency with another's output — the modulating oscillator changes the pitch of the carrier rapidly enough to create new, complex harmonics. FM synthesis produces bell-like, metallic timbres impossible to create with subtractive synthesis alone. The Yamaha DX7 made FM famous in the 1980s with its characteristic electric piano and bass sounds. Modern FM synths (like FM8, Vital's FM mode, or even Ableton's Operator) use multiple operators in algorithms to create intricate, evolving textures.
Detune and Unison
Detune slightly offsets one oscillator from another in frequency, creating chorus-like thickness from a single voice. Detuning a saw wave oscillator by 5–15 cents (hundredths of a semitone) from its pair creates the thick, rich sound of classic supersaw synth leads. Unison takes this further by stacking multiple detuned voices of the same note — 4, 8, or even 16 voices slightly spread from centre — to create massive, unison thick stacks. Unison spread controls how wide the detuning is across the stacked voices.
Portamento / Glide
Portamento (or glide) is a continuous pitch sweep from one note to the next, rather than an instant jump. On a monophonic synth (or in monophonic mode), portamento creates smooth transitions between notes — characteristic of analogue synth bass and lead lines. The glide time determines how long the pitch sweep takes. Legato playing (notes overlapping) triggers portamento; re-trigged notes (staccato) may reset the pitch immediately without glide, depending on the setting.
Arpeggiator and Sequencer
An arpeggiator takes a chord and plays its notes in a cycling pattern (up, down, up-down, random) at a rate set by a clock. Hold a Cmaj7 chord and an arpeggiator plays C–E–G–B–C–E–G–B in sequence — turning a static chord into a moving, melodic pattern. A step sequencer programs notes, velocities, and gate lengths into discrete steps, playing them back in a loop. A 16-step sequencer plays 16 notes in a repeating pattern. Both are essential for electronic music workflow and generative composition.
Granular
Granular synthesis slices audio into tiny fragments (grains, typically 10–100 ms) and reassembles them into new textures. Granular processing of a recorded vocal can stretch it beyond recognition — turning a half-second sample into an evolving, ambient soundscape. Grain size determines the character: short grains (10–30 ms) produce glitchy, digital textures; longer grains (50–100 ms) retain more of the original character. Granular effects are used for extreme time-stretching, texture creation, and ambient sound design.

Rhythm and Timing Terms

Rhythm is the temporal dimension of music — the organisation of sound and silence across time. Understanding rhythm vocabulary lets you programme beats that groove, humanize performances that feel mechanical, and diagnose why a mix feels stiff or rushed.

BPM, Bar, Beat, Downbeat, Upbeat
BPM (Beats Per Minute) defines tempo — how many quarter-note beats occur in one minute. 120 BPM is the centre of modern pop and dance music; 70–90 BPM is typical for hip-hop. Bar (or measure) is a group of beats defined by the time signature. In 4/4, a bar contains 4 beats. Beat is the fundamental rhythmic pulse. Downbeat is the first beat of a bar (the "1") — the strongest accent. Upbeat is the last beat of a bar (the "and" of 4) — the anacrusis, the push into the next downbeat.
Swing, Shuffle, Groove
Swing is a timing offset where every second beat in a straight subdivision is delayed slightly, creating a shuffle feel. At 50% swing, the timing is straight (equal); at 66% swing, it approximates a triplet feel. Shuffle is a specific swing pattern where the second note of each pair is delayed to match a triplet subdivision — think blues, house, and disco. Groove is the overall feel of a rhythm — not just the notes, but the subtle variations in timing, velocity, and micro-timing that make a performance feel human and alive. A quantised drum pattern has the right notes; a groovy one makes you move.
Quantize and Humanize
Quantize snaps recorded or programmed notes to a grid (a 16th note grid, for example), correcting timing imperfections. Full quantization makes a performance mechanically perfect but often lifeless. Humanize deliberately reintroduces small random variations in timing, velocity, and note length to make a quantised performance feel more human and musical. The goal is not imperfection — it is controlled variation that avoids the sterile feel of perfect grid alignment while keeping the notes in the right place.
Triplet Feel, Polyrhythm, Syncopation
Triplet feel divides one beat into three equal notes instead of two — the basis of swing and shuffle. A 16th note triplet fills the space of a 16th note with three notes instead of two. Polyrhythm is two or more conflicting rhythmic patterns playing simultaneously — a 3-against-4 (three notes in the space of four) is the most common. Syncopation places rhythmic accents off the downbeat — on the upbeat or in the middle of a beat — creating forward momentum and surprise. Funk and jazz are defined by syncopation.
Latency
Latency is the time delay between a physical action (playing a note, pressing a button) and the audible result. In a DAW with a small buffer size, latency might be 5–10 ms — barely perceptible. With a large buffer (for processing many plugins), latency can reach 40–100 ms — enough to feel laggy when recording. When recording through plugins in real time, low latency is essential. This is why many engineers record with the buffer size set low (256–512 samples) and only increase it when mixing, where latency matters less than processing power.

Mixing and Mastering Terms

Mixing is the process of balancing and shaping individual tracks into a cohesive stereo (or surround) mix. Mastering is the final polish applied to the stereo mix for distribution — ensuring it translates across playback systems and meets loudness standards.

Stem and Submix
A stem is a bounced file containing a grouped set of tracks — drums stem, bass stem, vocals stem, guitars stem. Stems are not the same as the full mix; they can be imported into another session and remixed. A submix is the in-session routing of multiple tracks to a common buss for shared processing. Grouping drums to a submix lets you compress them together for cohesion without affecting other elements. Stems are an export format; submixes are an in-session routing technique.
Master Buss
The master buss (or master bus, main output) is the final routing destination for every track in a DAW session. All tracks feed the master buss before audio leaves the DAW. On the master buss, you typically apply the final processing chain: EQ correction, stereo width, compression (very subtle — 1–2 dB of gain reduction), and limiting. The master buss is also where you monitor the final output level and check for clipping.
Phantom Power (+48V)
Phantom power is a 48-volt DC current sent through an XLR cable to power condenser microphones and some DI boxes. It is called phantom because it travels on the same cable as the audio signal without interfering with it. Dynamic microphones and ribbon microphones do not need phantom power — in fact, sending phantom power to a ribbon mic can damage it. Always check your microphone type before engaging +48V. Most audio interfaces and mixing desks have a global +48V switch in the mic preamp section.
Pan and Stereo Width
Pan (from panorama) positions a mono signal anywhere between the left and right speakers. Hard left (-100) places the signal entirely in the left channel; centre (0) is equal in both channels. Stereo width controls the perceived spread of a stereo signal — increasing width pulls the sides apart, making the image wider; reducing width brings sides toward centre, making it narrower. Overly wide stereo (especially in the low-mid frequencies below 200 Hz) causes phase problems on mono playback systems like club sound systems and phone speakers.
Mono Sum and Correlation Meter
A mono sum collapses a stereo signal to mono — essential for checking how your mix translates to mono playback systems (clubs, phones, some TVs). If elements disappear or thin out in mono, they are likely out of phase or too wide. A correlation meter shows the phase relationship between left and right channels. A reading of +1 means perfect correlation (both channels are identical, mono-safe). A reading of 0 means no correlation (unrelated signals). Negative readings indicate phase cancellation — frequencies will disappear in mono. Keep bass and kick in mono or with positive correlation; width effects are for mid and high frequencies.
LCR Panning and Mid-Side EQ
LCR panning (Left-Centre-Right) is a simplified panning workflow where signals are placed only at hard left, centre, or hard right — no intermediate positions. This forces more decisive mix decisions and creates clearer stereo imaging. Mid-side EQ processes the centre (mid) and sides of a stereo image independently. You can cut mud from the centre without affecting the wide overhead mics, or boost air on the sides without adding harshness to the centre. Most linear-phase EQs offer mid-side processing as a mode.
Spectral Balance and Reference Tracks
Spectral balance is the distribution of energy across the frequency spectrum — whether the mix feels bass-heavy, mid-forward, or bright. A well-balanced mix has energy in all ranges without any single range dominating. A reference track is a professionally mixed and mastered commercial track used as a sonic benchmark. A/B testing against a reference means alternating between your mix and the reference at matched levels, listening for differences in spectral balance, dynamics, stereo width, and overall loudness. Reference tracks are the single most objective tool for evaluating your own mix.
A/B Testing
A/B testing is the practice of comparing two signals (your mix vs a reference, processed vs dry, one setting vs another) at matched levels. Volume is the most misleading variable in audio perception — louder always sounds better. A/B testing at identical loudness forces an honest comparison. Toggle between your mix and a reference track every 30–60 seconds, not continuously. Extended listening causes ear fatigue and resets your reference. Take breaks. Trust the first impression.
Loudness Wars and Brickwall Limiter
The loudness wars refers to the industry trend of making recordings progressively louder to stand out on streaming platforms and radio. This has led to heavily limited masters that trade dynamic range for loudness — resulting in squashed, fatiguing sounds. A brickwall limiter is a limiter with an infinite ratio at the threshold — it allows no signal above the ceiling, theoretically. In practice, even brickwall limiters allow inter-sample peaks (peaks between samples that exceed the digital ceiling) to exceed 0 dBFS. Modern mastering targets -14 LUFS for streaming rather than maximum loudness.
Inter-Sample Peak (ISP)
An inter-sample peak is a momentary level that occurs between two digital samples — not at a sample point itself, but in the space between them. When a limiter flattens sample-peak levels, ISPs can still push above 0 dBFS, causing clipping even on a limited signal. True peak metering (as specified by the EBU R128 standard) accounts for ISPs and gives a more accurate picture of the actual maximum level of a digital signal. Always use true peak metering when mastering for digital distribution to ensure the signal hitting the DAC (Digital-to-Analogue Converter) is truly below 0 dBFS.

Recording Terms

Great recordings are the foundation of great mixes. Understanding microphone types, polar patterns, and signal routing lets you capture a performance at its best — before any processing is applied.

Condenser, Dynamic, Ribbon Microphone
Condenser microphones use a capacitor capsule that requires phantom power (+48V). They are sensitive, accurate, and have a wide frequency response — the standard choice for studio vocal recording and acoustic instruments. Dynamic microphones use a moving coil mechanism (like a small speaker in reverse) and need no power. They are rugged, resistant to high SPLs (sound pressure levels), and have a warm, coloured character — the standard choice for close-miking loud sources like guitar amps, snare drums, and brass. Ribbon microphones use a thin metal ribbon in a magnetic field and produce a naturally bidirectional, smooth frequency response with a gentle high-frequency rolloff. They are fragile (never send phantom power to a ribbon) and excel on overheads, guitar amps, and orchestral recording.
XLR, TRS, DI Box
XLR is a three-pin connector standard for balanced microphone cables — pin 1 is ground, pin 2 is hot (positive), pin 3 is cold (negative). Balanced connections reject noise over long cable runs. TRS (Tip-Ring-Sleeve) is a two-channel balanced connection used for stereo signals (headphone output) or two mono balanced signals (insert cables on mixing desks). DI box (Direct Input) converts an unbalanced instrument-level signal (from a bass guitar or keyboard) to a balanced mic-level signal for long cable runs to the preamp. Active DI boxes add a preamp and require power; passive DI boxes only transform the signal impedance.
Preamplifier and Impedance
A preamp (preamplifier) amplifies a low-level microphone or instrument signal to line level — the standard operating level for DAWs and mixing consoles. Microphone signals are tiny (millivolts); line level signals are robust (hundreds of millivolts to a volt). A good preamp adds clean gain while preserving the character of the microphone. Impedance (measured in ohms) is the opposition to AC current flow in a circuit. High-impedance sources (guitar pickups, guitars directly plugged in) lose high frequencies over long cables; low-impedance sources (most microphones) preserve high frequencies. Matching preamp impedance to microphone impedance affects the tone — some preamps let you switch between 150 ohm and 600 ohm input impedance for this reason.
Polar Patterns: Cardioid, Omnidirectional, Figure-8
A polar pattern describes a microphone's sensitivity to sound from different directions. Cardioid (heart-shaped) picks up sound primarily from the front and rejects sound from the sides and rear — the most common pattern for studio vocals and single-source recording because it isolates the target sound. Omnidirectional picks up sound equally from all directions — natural for room miking and orchestral recording, but susceptible to room noise and feedback. Figure-8 picks up from front and rear equally, rejecting the sides — characteristic of ribbon microphones and used in mid-side stereo recording techniques.
Proximity Effect and Pop Filter
Proximity effect is a low-frequency boost that occurs when a directional microphone (cardioid, figure-8) is placed very close to the sound source. The closer the source, the more low-end is captured — useful for adding warmth to a thin vocalist, but can make a voice boomy and muddy. A pop filter (or pop shield) is a mesh screen placed between the vocalist and the microphone to block plosive puffs of air (B and P sounds) that cause a low-frequency thump on the recording. A pop filter is non-negotiable for any close-mic vocal session.
Acoustic Treatment and Reflection Filter
Acoustic treatment uses absorptive and diffusive materials to control the sound in a room. Bass traps absorb low frequencies that accumulate in corners. Acoustic panels absorb mid and high frequencies to reduce flutter echo between parallel walls. Diffusers scatter mid and high frequencies to reduce echo without killing the room's liveliness. Acoustic treatment is different from soundproofing — treatment shapes the sound within the room; soundproofing prevents sound from entering or leaving. A reflection filter is a portable acoustic panel that wraps around a microphone to reduce reflections from the back and sides — useful for untreated rooms but not a substitute for proper treatment.

Workflow and DAW Terms

The DAW (Digital Audio Workstation) is your creative environment — the software where recordings become mixes. Understanding DAW-specific vocabulary helps you work faster and communicate with other producers.

MIDI (Musical Instrument Digital Interface)
MIDI is not audio — it is a protocol for transmitting musical data (note on/off, pitch, velocity, modulation, expression) between devices. A MIDI keyboard does not produce sound; it sends note data to a synthesiser or sampler which produces the sound. MIDI allows you to programme beats, play virtual instruments, and automate parameters without recording audio. Understanding MIDI is foundational because it decouples the performance (the note data) from the sound (the instrument), making editing and correction far easier than with audio recordings.
VST, AU, AAX
These are plugin formats — the standard formats for third-party audio effects and virtual instruments. VST (Virtual Studio Technology, Steinberg) is the most widely supported format across DAWs (Ableton, FL Studio, Cubase, Reaper, Studio One). AU (Audio Unit, Apple) is native to macOS and iOS — Logic and GarageBand only. AAX (Avid Audio Extension) is Avid's format for Pro Tools. Most plugins are available in VST and AU; AAX versions are often slightly later releases. When a plugin says "VST3," it refers to the newer VST specification with improved processing architecture.
Plugin and Instance
A plugin is a software module (effect or instrument) that runs within your DAW. An instance is a specific copy of that plugin loaded on a specific channel. If you load the same EQ plugin on 10 different tracks, you have 10 instances of that plugin. Each instance has its own settings and processing. Some plugins (particularly CPU-heavy ones like Omnisphere or Keyscape) allow you to load multiple sounds within a single instance via layering, rather than loading a new instance per sound.
Buffer Size and Sample Rate
Buffer size (measured in samples) determines how much audio the DAW processes at once. Small buffers (128–256 samples) give low latency for recording but high CPU load. Large buffers (1024–4096 samples) reduce CPU load but introduce latency that makes real-time monitoring difficult. During recording, set buffer to 256–512 for minimal lag; during mixing, raise it to 1024+ for more CPU headroom. Sample rate (measured in kHz — 44.1 kHz, 48 kHz, 96 kHz) is how many times per second the analogue waveform is measured. 44.1 kHz captures frequencies up to 22.05 kHz (beyond human hearing). 48 kHz is the video standard. 96 kHz is used for archival and high-fidelity work but offers negligible audible benefit over 44.1/48 kHz for music.
Bit Depth: 24-bit vs 32-bit Float
Bit depth determines the resolution of each audio sample — how precisely the analogue signal is represented digitally. 16-bit (CD standard) has 65,536 possible values per sample and a theoretical dynamic range of 96 dB. 24-bit has 16,777,216 values and 144 dB dynamic range — far beyond any real-world need. 32-bit float adds a floating-point exponent, allowing it to represent values far beyond the 24-bit ceiling without clipping. 32-bit float DAW projects (like Reaper's internal format) are more forgiving of signal spikes during tracking. For final export, dithering to 16-bit for CD or 24-bit for digital distribution is standard.
Offline Bounce, Real-Time Processing, Freeze, Commit
Offline bounce renders audio to a file as fast as the computer can process it — faster than real-time when a session has many plugins. Real-time processing plays the session at normal speed, processing each plugin sequentially as audio plays. Freeze (Ableton, Logic) temporarily renders CPU-heavy instrument tracks to audio, unloading the plugins and eliminating the CPU load until you unfreeze. Commit (Pro Tools, newer Logic) permanently flattens frozen or MIDI tracks into audio — the plugins are baked in and cannot be un-baked. Freeze is reversible; commit is permanent.
Bus Track, Folder Track, Aux Track
A bus track is a track that receives signal from multiple source tracks routed to it rather than directly to the master — used for grouping. A folder track (or group track) is a container that holds multiple tracks visually, letting you collapse and expand them in the arrangement view without affecting routing. An aux track (auxiliary) is a receive-only channel that accepts signal from sends on other tracks — the destination for reverb, delay, and parallel processing chains.
MIDI CC and Program Change
MIDI CC (Continuous Controller) messages are MIDI data that control parameters in real time — volume (CC7), pan (CC10), modulation wheel (CC1), expression (CC11), and hundreds of others. CC messages are what lets a MIDI controller manipulate plugin parameters during playback. Program Change messages switch between presets or programs on a synthesiser or sampler — like pressing a button to change from piano to strings. Both are essential for live performance setups and for automating plugin parameters across a session.

常见问题

What is the most important mixing term to learn first?
Gain staging is the foundation of everything in music production. Understanding how to set healthy signal levels throughout your chain — from input to output — prevents distortion, noise, and thin sounds. Every other tool in your arsenal, from EQ to compression, depends on a well-established gain structure. Master gain staging before touching any other processor.
What is the difference between RMS and peak levels?
Peak levels show the highest instantaneous voltage of a signal (the tallest spike), while RMS (Root Mean Square) measures the average perceived loudness over time. A transient like a kick drum hit can peak at 0 dB while its RMS sits at -12 dB. For mixing, RMS tells you how loud something feels; peak tells you how close you are to clipping. Streaming platforms target -14 LUFS (roughly -14 dB RMS), not peak values.
How do I learn what frequencies to cut and boost?
Start with the frequency range allocations: sub bass (20–60 Hz) for physical weight, bass (60–250 Hz) for warmth, low-mid (250–500 Hz) for body, mid (500 Hz–2 kHz) for presence, high-mid (2–6 kHz) for clarity and attack, treble (6–20 kHz) for air and sparkle. Use a parametric EQ and sweep narrow boosts to identify problem frequencies — the frequency that sounds harsh when boosted is usually the one to cut. Reference tracks are the single most reliable guide for what a well-balanced mix sounds like.
What does LUFS mean and why does it matter?
LUFS (Loudness Units relative to Full Scale) is the international standard for measuring perceived loudness, accounting for how human hearing weights frequencies. It replaced peak and RMS as the go-to metering standard because it more accurately predicts how loud a mix will feel to listeners. Streaming platforms enforce specific LUFS targets: -14 LUFS for Spotify, -16 LUFS for Apple Music, -14 LUFS for YouTube. If your master sits at -10 LUFS but the platform expects -14, it will be turned down automatically, losing your intended loudness and dynamics.
What's the difference between a buss and an aux track?
A buss (or bus) is a routing destination that sums multiple signals together — all tracks routed to a drum buss get mixed into a single output before further processing. An aux track (auxiliary) is a separate input channel that receives signal from other tracks via sends, processes it independently, and returns it to the mix. Think of a buss as a funnel and an aux as a parallel processing loop. Reverb and delay typically run on aux tracks because you want to blend the wet signal with the dry original, not replace it.
What's the difference between 24-bit and 32-bit float audio?
Both 24-bit and 32-bit float represent higher fidelity than the 16-bit CD standard, but they behave differently at extremes. 24-bit integer provides a theoretical 144 dB dynamic range — more than enough for any real-world recording. 32-bit float adds an exponent, allowing it to represent values far beyond 24-bit range without clipping. This means 32-bit float DAW projects can absorb accidental signal spikes (like an instrument plugging in hot) without hard clipping, making it safer during tracking. For mixing and mastering, the practical audible difference between the two is negligible.

This glossary covers the vocabulary that separates beginner producers from professionals — the precise language that lets you diagnose a problem by name and know exactly which tool to reach for. Bookmark this page. Every time you encounter an unfamiliar term in a tutorial or mix session, come back here and anchor it to a definition. Terminology is not abstract knowledge — it is the framework through which every engineering decision is made. The more precisely you can name what you hear, the faster you can fix it.

Continue Learning: Mixing Fundamentals

Learning path

Related answer hubs