M05 · Audio Fundamentals — Filmmaker Curriculum

Phase 1 · Module 5

Audio Fundamentals

Recording clean sound with your Rode NTG, DJI Mic 2, and ATH-M40X

Focus: microphone types, polar patterns, and the signal chain from microphone to camera. Clean audio starts with understanding what microphones are designed to do.

Microphone types — shotgun, lavalier, condenserYour Rode VideoMic NTG is a supercardioid shotgun microphone — highly directional, capturing sound primarily from directly in front while rejecting the sides and rear. It is a condenser microphone requiring phantom power or battery. Your DJI Mic 2 transmitter captures via a small omnidirectional condenser capsule clipped 15–20cm below the mouth. The lavalier element provides consistent close-mic quality regardless of how the subject moves their head.
Polar patterns — what they mean in practiceA polar pattern shows how sensitive a microphone is to sound from different directions. Omnidirectional: equal sensitivity in all directions — captures the room along with the subject. Cardioid: front-sensitive, rejects rear. Supercardioid/hypercardioid: tighter front pattern, strong side rejection — the pattern of your Rode NTG. The NTG's tight pattern means you must point it directly at the source — off-axis rejection is strong.
The signal chain — mic to cameraSound → microphone capsule (converts acoustic to electrical signal) → preamp (amplifies to usable level) → ADC (converts analogue to digital) → camera recording. Target: record at −18 to −12 dBFS average for dialogue, with peaks not exceeding −6 dBFS. Leave headroom for unexpected loud moments.
Rode NTG positioningMounted on-camera hotshoe the NTG is a useful compromise for run-and-gun. For proper use, mount on a boom pole held 30–40cm above and in front of the subject, just outside the frame. At close range (under 1.5m) it is excellent. At 3m it picks up significantly more room ambience than direct signal. Always combine with the DJI Mic 2 lapel for critical recordings.
DJI Mic 2 — wireless freedom and its limitsThe DJI Mic 2 transmitter clips to your subject and sends wirelessly to the receiver in the camera. This gives complete freedom of movement and consistent close-mic quality at any camera-to-subject distance. Limitation: it picks up everything in the immediate vicinity — handling noise from clothing, proximity effects, and wind outdoors. Place the transmitter at the centre of the chest under one thin layer of clothing with a wind cover outdoors.

Focus: room acoustics, diagnosing bad audio by ear, and the environmental factors that ruin otherwise well-shot footage.

Room acoustics — why some rooms sound badSound reflects off hard surfaces (walls, floors, glass, concrete) and reaches the microphone slightly delayed after the direct sound — reverb. Short reverb tails (under 0.3s) are acceptable. Long tails (over 0.5s) make dialogue sound roomy and unclear. Flutter echo is rapid repetition from parallel reflective walls. Kitchens, bathrooms, and garage-style spaces are audio nightmares for dialogue recording. Carpet, curtains, and soft furnishings absorb reflections — your home office is probably your cleanest recording environment.
HVAC, electrical hum, and environmental noiseHVAC (heating, ventilation, air conditioning) systems produce a constant low-frequency hum inaudible to your brain on set but clearly audible in playback. Always turn off HVAC before recording. Electrical hum (50Hz or 100Hz in Australia) can be induced by unbalanced audio cables near power sources or dimmer switches. Refrigerators, computers, and appliances all produce noise — know what is in your recording environment and silence it before rolling.
Wind noise and its solutionsWind passing over a microphone capsule creates a low-frequency rumble entirely distinct from the subject's voice. Your NTG's foam windshield handles very light breezes only. In outdoor conditions with any significant wind, a furry 'dead cat' over the NTG is essential. The DJI Mic 2 transmitter should have its foam or fur cover attached outdoors at all times. A wind cover is the single cheapest insurance purchase in your audio kit.
Clothing rustle on lavaliersThe DJI Mic 2 lapel picks up clothing movement — fabric against the capsule creates a low-frequency rumbling thud nearly impossible to EQ out. Solutions: tape the transmitter body securely to the subject's skin or inner garment with medical tape. Run the cable loop through the clothing so it doesn't transmit movement. Avoid synthetic fabrics (polyester, nylon) which rustle loudly. Natural fibres (cotton, wool) are much quieter. Always do a movement test before rolling.
Sync — the critical relationship between audio and videoRecording audio directly into the camera (NTG on hotshoe or DJI Mic 2 receiver), sync is automatic. Recording on a separate device requires a sync reference — the traditional clapperboard, or the DJI Mic 2's internal recording as a backup. In DaVinci Resolve, the Fairlight page's 'Auto Align Clips Using Audio Waveform' performs sync automatically. The DJI Mic 2 also stores a backup recording internally on the transmitter — always recover this after every critical shoot.

Focus: audio post-production in DaVinci Resolve Fairlight — dialogue editing, EQ, compression, and the basic mix.

The five layers of a professional mix(1) Dialogue — primary human speech, always the highest priority and clearest element. (2) Ambience/room tone — the environmental background sound that gives the world continuity. Without ambience, spaces feel dead and artificial. (3) Sound effects (SFX) — specific sounds tied to visual events. (4) Foley — custom-recorded sounds that replace or enhance real sounds (footsteps, clothing, object handling). (5) Music — score or licensed tracks driving emotion and pacing. All five interact — managing their relative volume and frequency content is the art of mixing.
Dialogue editing in FairlightBefore any processing: remove breath sounds that interrupt flow, cut out filler words where appropriate, trim clip edges cleanly. In Fairlight use the clip tool to trim and the selection tool to slip clips. Cross-fades between dialogue clips (3–8 frames) prevent clicks and pops at edit points. Room tone fills gaps — a looped section of room tone on a separate track under the dialogue, slightly below the dialogue level, fills 'dead air' between clips and creates continuity.
EQ for dialogue — frequency decisionsDialogue EQ is subtractive before it is additive — remove problems before boosting. Typical decisions: high-pass filter at 80–120Hz to remove low-frequency rumble and HVAC noise. Cut any resonant room frequencies that make the voice sound boxy (often 200–500Hz). A slight presence boost (2–4kHz) improves intelligibility if the voice sounds dull. High-frequency air (8–12kHz) can add openness if the recording is slightly muffled. Use your ATH-M40X to judge — they are accurate enough in the midrange for dialogue work.
Compression for dialogueA compressor reduces dynamic range — making loud parts quieter and allowing the overall level to be raised, creating consistent intelligible speech where both quiet and loud moments are clearly audible. Key settings for dialogue: attack 5–15ms (lets transients/consonants through first), release 50–150ms, ratio 3:1 to 6:1, threshold set so compression engages on louder passages not continuously. A gentle dialogue compressor should be nearly inaudible — audible 'breathing' or 'pumping' indicates too much compression.
Music placement and the music-dialogue relationshipMusic and dialogue compete for the same frequency range — primarily 200Hz–4kHz. When both run simultaneously, dialogue always wins. Reduce music level significantly under dialogue (often 10–15 dB below the music-only level). Use Fairlight's volume automation to duck the music when dialogue is present and rise in pauses. The music should feel present throughout without competing — the viewer should feel it emotionally without being aware of the volume relationship.

Focus: loudness standards, stereo field, and delivering broadcast-quality audio from a single-camera production.

Loudness standards — LUFS and why they matterLUFS (Loudness Units Full Scale) is the modern measurement of perceived loudness over time. YouTube normalises to −14 LUFS integrated. Australian broadcast (EBU R128) targets −23 LUFS. Spotify targets −14 LUFS. If you deliver louder than the platform standard, the platform turns it down and your careful mix is overridden. If quieter, the platform raises it, amplifying noise. In Fairlight, use the Loudness Meter (View → Meters → Loudness) to check your integrated LUFS before export. Target: −14 LUFS integrated, True Peak not exceeding −1 dBTP.
The stereo field and panning decisionsA stereo mix positions sounds across a left-right field from −100% (hard left) to +100% (hard right). For video production: dialogue sits centred (0). Background ambience can be spread slightly left and right to create a sense of space. Music is typically mixed in stereo. Avoid panning important sounds (dialogue, prominent SFX) hard to one side — it becomes distracting and unpleasant on headphone listening. Subtle stereo placement (±15–30%) for non-dialogue elements adds depth without distraction.
True peak limiting — preventing digital clipping on deliveryTrue peak (TP) measures the actual peak value as it will be reconstructed after decoding — which can momentarily exceed the digital ceiling and cause distortion. Platform standards require TP not to exceed −1 dBTP. In Fairlight, add a Limiter as the last plugin on the master bus, set to −1 dBTP. This is a transparent safety net — if your mix is at a reasonable level it should rarely trigger.
Monitoring your mix — checking on multiple systemsYour ATH-M40X are an excellent monitoring tool but no single system tells the full story. A mix that sounds good on closed-back headphones may have too much bass on speakers. Professional practice: mix on ATH-M40X, then check on built-in Mac speakers (reveals midrange balance), then on earbuds (reveals bass balance), then on a Bluetooth speaker (reveals the mix at normal casual listening volume). Adjust until acceptable on all four systems.
The stems delivery — dialogue and music separatelyFor professional deliveries, clients often request M&E (music and effects) and dialogue on separate tracks — allowing localisation or future re-edits. In Resolve, use bus outputs to create separate stems. Export dialogue stem, music stem, and effects stem separately, then verify that the three combined reproduce the original mix exactly. This confirms your routing was correct and provides a professional-grade delivery.

Setting gain too low and boosting in post

Underexposing audio and then amplifying it in post amplifies the camera's noise floor equally with the signal. The result: hiss and electronic noise underneath dialogue that is expensive to remove and never fully clean.

Fix: Set camera gain so dialogue peaks consistently between −12 and −6 dBFS on the in-camera meter. Monitor on set with your ATH-M40X whenever possible. A correctly gained recording needs minimal post amplification.

Not turning off HVAC before recording dialogue

HVAC noise is inaudible to your brain on set — the microphone does not filter it. The result is dialogue recorded over a constant low-frequency rumble that is difficult to remove cleanly without introducing artefacts.

Fix: Before every dialogue recording, turn off HVAC systems, close windows, turn off refrigerators in nearby rooms. Record 10 seconds of room tone in this quieter environment. Your ATH-M40X will confirm whether the space is acceptably quiet before you roll on the subject.

Not recording room tone on location

Forgetting to record room tone means every edit point in the dialogue has a dead, unnatural gap between clips. This is instantly noticeable and cannot be adequately faked — the room tone must be recorded in the actual space with the actual microphone setup of the session.

Fix: At the end of every recording session, before striking the audio setup, record 60 seconds of room tone — everyone quiet, no movement. Label it 'RT' in your session. This takes 90 seconds and saves significant post-production time.

Clothing rustle destroying the lapel recording

The DJI Mic 2 transmitter picks up clothing movement extremely readily. A low-frequency rumbling thud from fabric movement is almost impossible to EQ out without also destroying the dialogue.

Fix: Tape the transmitter body securely to the subject's skin or inner garment with medical tape. Run the cable loop through the clothing so it doesn't transmit movement. Always do a movement test — walk, turn, nod — and listen on ATH-M40X before rolling. Avoid synthetic fabrics entirely.

← M4 · Cinematography Phase 1 overview M6 · Editing →

Kit for this module

Rode VideoMic NTG

DJI Mic 2 + lapel

ATH-M40X headphones

Sony a6700 / FX30

DaVinci Resolve Fairlight

Quick reference

Recording level targets

Average: −18 to −12 dBFS
Peak max: −6 dBFS
Never clip: 0 dBFS

Delivery loudness

YouTube: −14 LUFS
Broadcast (AU): −23 LUFS
True peak max: −1 dBTP

NTG boom placement

30–40cm above and in front of subject. Aim directly at the mouth. Stay outside the frame. Maintain consistent distance throughout the scene.

Next up

M6 · Editing

Continue →

Filmmaker &
Photographer
Curriculum

Week 1 Assignment

Week 2 Assignment

Week 3 Assignment

Week 4 Assignment

Kit for this module

Quick reference

Recording level targets

Delivery loudness

NTG boom placement

Next up

Filmmaker &PhotographerCurriculum

Week 1 Assignment

Week 2 Assignment

Week 3 Assignment

Week 4 Assignment

Kit for this module

Quick reference

Recording level targets

Delivery loudness

NTG boom placement

Next up

Filmmaker &
Photographer
Curriculum