Phase 1 · Module 5
Audio Fundamentals
Recording clean sound with your Rode NTG, DJI Mic 2, and ATH-M40X
Focus: microphone types, polar patterns, and the signal chain from microphone to camera. Clean audio starts with understanding what microphones are designed to do.
- Microphone types — shotgun, lavalier, condenserYour Rode VideoMic NTG is a supercardioid shotgun microphone — highly directional, capturing sound primarily from directly in front while rejecting the sides and rear. It is a condenser microphone requiring phantom power or battery. Your DJI Mic 2 transmitter captures via a small omnidirectional condenser capsule clipped 15–20cm below the mouth. The lavalier element provides consistent close-mic quality regardless of how the subject moves their head.
- Polar patterns — what they mean in practiceA polar pattern shows how sensitive a microphone is to sound from different directions. Omnidirectional: equal sensitivity in all directions — captures the room along with the subject. Cardioid: front-sensitive, rejects rear. Supercardioid/hypercardioid: tighter front pattern, strong side rejection — the pattern of your Rode NTG. The NTG's tight pattern means you must point it directly at the source — off-axis rejection is strong.
- The signal chain — mic to cameraSound → microphone capsule (converts acoustic to electrical signal) → preamp (amplifies to usable level) → ADC (converts analogue to digital) → camera recording. Target: record at −18 to −12 dBFS average for dialogue, with peaks not exceeding −6 dBFS. Leave headroom for unexpected loud moments.
- Rode NTG positioningMounted on-camera hotshoe the NTG is a useful compromise for run-and-gun. For proper use, mount on a boom pole held 30–40cm above and in front of the subject, just outside the frame. At close range (under 1.5m) it is excellent. At 3m it picks up significantly more room ambience than direct signal. Always combine with the DJI Mic 2 lapel for critical recordings.
- DJI Mic 2 — wireless freedom and its limitsThe DJI Mic 2 transmitter clips to your subject and sends wirelessly to the receiver in the camera. This gives complete freedom of movement and consistent close-mic quality at any camera-to-subject distance. Limitation: it picks up everything in the immediate vicinity — handling noise from clothing, proximity effects, and wind outdoors. Place the transmitter at the centre of the chest under one thin layer of clothing with a wind cover outdoors.
Drill 1
Mic comparison shoot
Record the same 60-second speech simultaneously with: NTG on hotshoe, NTG on boom at 35cm above subject, and DJI Mic 2 lapel on subject's chest. Import all three to Resolve Fairlight. Listen on your ATH-M40X. Compare: room ambience level, consonant clarity, proximity effect on the NTG, and handling noise on the lapel. Write a sentence describing the character of each.
Drill 2
Gain staging exercise
Record your subject at five gain settings: far too low (meter barely moving), slightly below ideal, ideal (peaks −12 to −6 dBFS), slightly above ideal (occasional clipping), and clearly clipping. Import all five to Fairlight. On ATH-M40X, hear: the noise floor at low gain, the solid quality of a correctly gained recording, and the harsh distortion of a clipped signal. The clipped signal is the one sound you must never record — it cannot be fixed in post.
Drill 3
Boom positioning
Record a subject from four boom positions: above and in front (standard), below pointing up (floor boom), to the side (off-axis), and from behind. Listen on ATH-M40X. The standard above-and-front position should be clearly the cleanest. Note how dramatically quality changes when the NTG is aimed off-axis.
Drill 4
Room tone recording
Record 60 seconds of pure room tone in 5 different spaces: your home office, bathroom, kitchen, outdoors in the street, and in a car. Make no noise during recording. Import to Fairlight and listen carefully on ATH-M40X. Every space has a distinct acoustic fingerprint — reverb length, frequency character, background noise content. Room tone recordings fill edit gaps in post.
Week 1 Assignment
"The mic comparison"
Record the same 90-second interview segment simultaneously with all three approaches: NTG on boom, NTG on hotshoe, and DJI Mic 2 lapel. Deliver all three as separate audio files. Include a written comparison (one paragraph per microphone) describing the sonic character, strengths, and ideal use case for each. Listen on ATH-M40X throughout.
- All three recordings correctly gained — peaks not exceeding −6 dBFS
- Written comparison demonstrates genuine listening, not theoretical knowledge
- Ideal use case for each microphone is correctly identified
- Boom technique is correct — microphone above and in front, outside frame
Rode VideoMic NTGDJI Mic 2Sony a6700 or FX30ATH-M40X headphones
Focus: room acoustics, diagnosing bad audio by ear, and the environmental factors that ruin otherwise well-shot footage.
- Room acoustics — why some rooms sound badSound reflects off hard surfaces (walls, floors, glass, concrete) and reaches the microphone slightly delayed after the direct sound — reverb. Short reverb tails (under 0.3s) are acceptable. Long tails (over 0.5s) make dialogue sound roomy and unclear. Flutter echo is rapid repetition from parallel reflective walls. Kitchens, bathrooms, and garage-style spaces are audio nightmares for dialogue recording. Carpet, curtains, and soft furnishings absorb reflections — your home office is probably your cleanest recording environment.
- HVAC, electrical hum, and environmental noiseHVAC (heating, ventilation, air conditioning) systems produce a constant low-frequency hum inaudible to your brain on set but clearly audible in playback. Always turn off HVAC before recording. Electrical hum (50Hz or 100Hz in Australia) can be induced by unbalanced audio cables near power sources or dimmer switches. Refrigerators, computers, and appliances all produce noise — know what is in your recording environment and silence it before rolling.
- Wind noise and its solutionsWind passing over a microphone capsule creates a low-frequency rumble entirely distinct from the subject's voice. Your NTG's foam windshield handles very light breezes only. In outdoor conditions with any significant wind, a furry 'dead cat' over the NTG is essential. The DJI Mic 2 transmitter should have its foam or fur cover attached outdoors at all times. A wind cover is the single cheapest insurance purchase in your audio kit.
- Clothing rustle on lavaliersThe DJI Mic 2 lapel picks up clothing movement — fabric against the capsule creates a low-frequency rumbling thud nearly impossible to EQ out. Solutions: tape the transmitter body securely to the subject's skin or inner garment with medical tape. Run the cable loop through the clothing so it doesn't transmit movement. Avoid synthetic fabrics (polyester, nylon) which rustle loudly. Natural fibres (cotton, wool) are much quieter. Always do a movement test before rolling.
- Sync — the critical relationship between audio and videoRecording audio directly into the camera (NTG on hotshoe or DJI Mic 2 receiver), sync is automatic. Recording on a separate device requires a sync reference — the traditional clapperboard, or the DJI Mic 2's internal recording as a backup. In DaVinci Resolve, the Fairlight page's 'Auto Align Clips Using Audio Waveform' performs sync automatically. The DJI Mic 2 also stores a backup recording internally on the transmitter — always recover this after every critical shoot.
Drill 1
Acoustics diagnosis
Record 30 seconds of speech in 5 different spaces. Import to Fairlight. For each space write: approximate reverb tail length, is there flutter echo, is there HVAC noise, are there electrical hums, and a cleanliness rating 1–10. Listen only on ATH-M40X. This trains your ear to diagnose audio problems rather than simply experiencing them as 'bad sound.'
Drill 2
HVAC on/off comparison
Record 60 seconds of dialogue in a room with HVAC running. Turn the HVAC off and record the same dialogue again. Import both to Fairlight. Listen on ATH-M40X — the difference will likely be obvious. Then use Fairlight's noise reduction (Fairlight FX → Noise Reduction) on the HVAC version and attempt to clean it. Note how much processing is required and whether any artefacts are introduced.
Drill 3
Wind noise solutions test
Go outdoors on a day with a light breeze. Record 30 seconds of speech: (a) NTG bare, (b) NTG with foam windshield, (c) NTG pointed away from the wind direction. Compare all three recordings on ATH-M40X. Note which solutions are adequate for this wind level and which fail.
Drill 4
Lapel hiding technique
Practice hiding the DJI Mic 2 transmitter on three different subjects wearing three different clothing types: a buttoned shirt, a T-shirt, and a jacket. For each: transmitter must not be visible from the front, cable must not create a visible lump, clothing must not rustle audibly during normal movement. Record each subject talking and moving. Review both visually and aurally.
Week 2 Assignment
"The clean room"
Find the worst-sounding space you have access to — bare walls, hard floors, small room. Record a 60-second interview under three conditions: (1) no treatment, (2) improvised treatment (hang blankets, use furniture to break up reflections), (3) post-processing in Fairlight using noise reduction and EQ. Deliver all three audio versions alongside a written account of what acoustic problems you identified and how each approach addressed them.
- Three versions are clearly distinct in quality
- Written account correctly identifies the acoustic problems in the untreated version
- Improvised treatment genuinely improves the recording
- Post-processing does not introduce obvious artefacts
Rode VideoMic NTGDJI Mic 2ATH-M40X headphonesDaVinci Resolve Fairlight
Focus: audio post-production in DaVinci Resolve Fairlight — dialogue editing, EQ, compression, and the basic mix.
- The five layers of a professional mix(1) Dialogue — primary human speech, always the highest priority and clearest element. (2) Ambience/room tone — the environmental background sound that gives the world continuity. Without ambience, spaces feel dead and artificial. (3) Sound effects (SFX) — specific sounds tied to visual events. (4) Foley — custom-recorded sounds that replace or enhance real sounds (footsteps, clothing, object handling). (5) Music — score or licensed tracks driving emotion and pacing. All five interact — managing their relative volume and frequency content is the art of mixing.
- Dialogue editing in FairlightBefore any processing: remove breath sounds that interrupt flow, cut out filler words where appropriate, trim clip edges cleanly. In Fairlight use the clip tool to trim and the selection tool to slip clips. Cross-fades between dialogue clips (3–8 frames) prevent clicks and pops at edit points. Room tone fills gaps — a looped section of room tone on a separate track under the dialogue, slightly below the dialogue level, fills 'dead air' between clips and creates continuity.
- EQ for dialogue — frequency decisionsDialogue EQ is subtractive before it is additive — remove problems before boosting. Typical decisions: high-pass filter at 80–120Hz to remove low-frequency rumble and HVAC noise. Cut any resonant room frequencies that make the voice sound boxy (often 200–500Hz). A slight presence boost (2–4kHz) improves intelligibility if the voice sounds dull. High-frequency air (8–12kHz) can add openness if the recording is slightly muffled. Use your ATH-M40X to judge — they are accurate enough in the midrange for dialogue work.
- Compression for dialogueA compressor reduces dynamic range — making loud parts quieter and allowing the overall level to be raised, creating consistent intelligible speech where both quiet and loud moments are clearly audible. Key settings for dialogue: attack 5–15ms (lets transients/consonants through first), release 50–150ms, ratio 3:1 to 6:1, threshold set so compression engages on louder passages not continuously. A gentle dialogue compressor should be nearly inaudible — audible 'breathing' or 'pumping' indicates too much compression.
- Music placement and the music-dialogue relationshipMusic and dialogue compete for the same frequency range — primarily 200Hz–4kHz. When both run simultaneously, dialogue always wins. Reduce music level significantly under dialogue (often 10–15 dB below the music-only level). Use Fairlight's volume automation to duck the music when dialogue is present and rise in pauses. The music should feel present throughout without competing — the viewer should feel it emotionally without being aware of the volume relationship.
Drill 1
Dialogue clean-up session
Take any 3-minute dialogue recording from a previous module. In Fairlight: trim all clips to remove breaths and fillers, add 5-frame cross-fades at all edit points, place room tone on a separate track under the dialogue, and apply EQ: high-pass filter at 100Hz plus any problem frequency cuts. Export before and after versions as audio files and compare on ATH-M40X. The improvement should be clearly audible.
Drill 2
Compression before/after
Take an uncompressed dialogue recording with significant level differences between quiet and loud moments. Apply Fairlight's built-in compressor (insert on the dialogue track): ratio 4:1, attack 10ms, release 100ms, threshold set to engage on louder moments. Export both compressed and uncompressed versions. On ATH-M40X, the compressed version should have more consistent level throughout with no audible pumping.
Drill 3
Music and dialogue balance
Take a 2-minute clip of dialogue and add a music track underneath it. Set the music at a level where it feels emotionally present but dialogue remains completely dominant. Have someone else watch without context — does the dialogue feel clear? Does the music feel present? Adjust until both answers are yes. Note the approximate dB difference between your dialogue peak level and your music level.
Drill 4
Full audio layering exercise
Take one minute of footage from any previous module. Build a complete 5-layer audio timeline: dialogue (edited and EQ'd), room tone (looped under dialogue), one SFX event, music (ducked under dialogue), and a spot effect emphasis. Mix all five to sit together naturally — dialogue clear, music felt, effects appropriate in level. Export and listen on ATH-M40X.
Week 3 Assignment
"The complete audio mix"
Take the interview footage from Module 3 Week 4 (or any 3-minute interview you have) and produce a complete audio post-production pass: dialogue editing, EQ, compression, room tone fill, music bed, and at least two spot SFX. Deliver the finished mix as a stereo audio file and a screenshot of your Fairlight timeline showing all layers clearly labelled.
- Dialogue is edited — no breaths between sentences, cross-fades at all cuts
- Room tone fills all gaps — no dead air between clips
- EQ reduces low-frequency rumble and any boxiness
- Compression creates consistent dialogue level throughout
- Music is clearly present but never drowns the dialogue
Rode VideoMic NTGDJI Mic 2ATH-M40X headphonesDaVinci Resolve Fairlight
Focus: loudness standards, stereo field, and delivering broadcast-quality audio from a single-camera production.
- Loudness standards — LUFS and why they matterLUFS (Loudness Units Full Scale) is the modern measurement of perceived loudness over time. YouTube normalises to −14 LUFS integrated. Australian broadcast (EBU R128) targets −23 LUFS. Spotify targets −14 LUFS. If you deliver louder than the platform standard, the platform turns it down and your careful mix is overridden. If quieter, the platform raises it, amplifying noise. In Fairlight, use the Loudness Meter (View → Meters → Loudness) to check your integrated LUFS before export. Target: −14 LUFS integrated, True Peak not exceeding −1 dBTP.
- The stereo field and panning decisionsA stereo mix positions sounds across a left-right field from −100% (hard left) to +100% (hard right). For video production: dialogue sits centred (0). Background ambience can be spread slightly left and right to create a sense of space. Music is typically mixed in stereo. Avoid panning important sounds (dialogue, prominent SFX) hard to one side — it becomes distracting and unpleasant on headphone listening. Subtle stereo placement (±15–30%) for non-dialogue elements adds depth without distraction.
- True peak limiting — preventing digital clipping on deliveryTrue peak (TP) measures the actual peak value as it will be reconstructed after decoding — which can momentarily exceed the digital ceiling and cause distortion. Platform standards require TP not to exceed −1 dBTP. In Fairlight, add a Limiter as the last plugin on the master bus, set to −1 dBTP. This is a transparent safety net — if your mix is at a reasonable level it should rarely trigger.
- Monitoring your mix — checking on multiple systemsYour ATH-M40X are an excellent monitoring tool but no single system tells the full story. A mix that sounds good on closed-back headphones may have too much bass on speakers. Professional practice: mix on ATH-M40X, then check on built-in Mac speakers (reveals midrange balance), then on earbuds (reveals bass balance), then on a Bluetooth speaker (reveals the mix at normal casual listening volume). Adjust until acceptable on all four systems.
- The stems delivery — dialogue and music separatelyFor professional deliveries, clients often request M&E (music and effects) and dialogue on separate tracks — allowing localisation or future re-edits. In Resolve, use bus outputs to create separate stems. Export dialogue stem, music stem, and effects stem separately, then verify that the three combined reproduce the original mix exactly. This confirms your routing was correct and provides a professional-grade delivery.
Drill 1
LUFS measurement and correction
Take your Week 3 mix. Open the Fairlight Loudness Meter. Play the entire mix and note the integrated LUFS reading. Adjust the master fader until it reads −14 LUFS. Check True Peak — if any TP exceeds −1 dBTP, add a Limiter on the master bus. Export and upload a sample to YouTube. Compare how the platform handles your delivery versus a previous uncorrected export.
Drill 2
Multi-system monitoring check
Take a finished mix and monitor it through four systems: ATH-M40X, Mac built-in speakers, earbuds, and a Bluetooth speaker. After each system, write two notes: what sounds good, and what sounds different or problematic compared to the ATH-M40X. This trains your awareness of how your primary monitoring compares to casual playback environments.
Drill 3
Stereo field exercise
Take a 2-minute scene with dialogue, room tone, and music. Experiment with panning: dialogue at centre, room tone at ±30%, music according to its natural stereo field. Listen on ATH-M40X. Then try everything (except dialogue) hard left/right. The first version should feel natural and spatial; the second should feel exaggerated and distracting.
Drill 4
Full pipeline drill
Shoot a 2-minute interview from scratch. Record with both NTG and DJI Mic 2 simultaneously. Ingest audio into Fairlight, edit dialogue, add room tone, EQ, compress, add music, set levels to −14 LUFS / −1 TP, and export. Time the complete audio post-production process — for a 2-minute interview, professional audio post should take 45–90 minutes. This is your baseline.
Week 4 Assignment
"Broadcast-ready audio"
Produce a 3-minute interview from shoot to finished mix. Deliver: the video with mixed audio at −14 LUFS / −1 TP, a Fairlight screenshot showing all tracks and the loudness meter reading, and a written note explaining every EQ and dynamics decision you made for the dialogue track.
- Integrated loudness measures −14 LUFS (±1 LU) on the Fairlight meter
- True peak does not exceed −1 dBTP
- Dialogue is clear and consistent throughout
- Music is present but dialogue remains dominant at all times
- Written notes demonstrate understanding of EQ and compression decisions
Rode VideoMic NTGDJI Mic 2ATH-M40X headphonesDaVinci Resolve Fairlight
Setting gain too low and boosting in post
Underexposing audio and then amplifying it in post amplifies the camera's noise floor equally with the signal. The result: hiss and electronic noise underneath dialogue that is expensive to remove and never fully clean.
Fix: Set camera gain so dialogue peaks consistently between −12 and −6 dBFS on the in-camera meter. Monitor on set with your ATH-M40X whenever possible. A correctly gained recording needs minimal post amplification.
Not turning off HVAC before recording dialogue
HVAC noise is inaudible to your brain on set — the microphone does not filter it. The result is dialogue recorded over a constant low-frequency rumble that is difficult to remove cleanly without introducing artefacts.
Fix: Before every dialogue recording, turn off HVAC systems, close windows, turn off refrigerators in nearby rooms. Record 10 seconds of room tone in this quieter environment. Your ATH-M40X will confirm whether the space is acceptably quiet before you roll on the subject.
Not recording room tone on location
Forgetting to record room tone means every edit point in the dialogue has a dead, unnatural gap between clips. This is instantly noticeable and cannot be adequately faked — the room tone must be recorded in the actual space with the actual microphone setup of the session.
Fix: At the end of every recording session, before striking the audio setup, record 60 seconds of room tone — everyone quiet, no movement. Label it 'RT' in your session. This takes 90 seconds and saves significant post-production time.
Clothing rustle destroying the lapel recording
The DJI Mic 2 transmitter picks up clothing movement extremely readily. A low-frequency rumbling thud from fabric movement is almost impossible to EQ out without also destroying the dialogue.
Fix: Tape the transmitter body securely to the subject's skin or inner garment with medical tape. Run the cable loop through the clothing so it doesn't transmit movement. Always do a movement test — walk, turn, nod — and listen on ATH-M40X before rolling. Avoid synthetic fabrics entirely.
Always record both NTG and DJI Mic 2 simultaneously
On any important shoot, run both microphones at the same time. The NTG captures the room and ambience authentically; the DJI Mic 2 lapel captures the voice intimately. In post you can blend the two — using the NTG's room character with the lapel's clarity — or simply fall back to the lapel if the NTG picks up an unexpected noise. Two sources cost nothing extra once both are set up.
Rode VideoMic NTG · DJI Mic 2
Monitor with ATH-M40X on set — not the camera speaker
The built-in camera speaker produces audio of such poor quality that you cannot make meaningful judgements about gain level, background noise, or microphone quality from it. Your ATH-M40X connected to the camera headphone output gives you an accurate representation of what is being recorded. Develop the habit of monitoring audio through headphones on every dialogue shoot.
ATH-M40X · Sony a6700 · Sony FX30
The DJI Mic 2 internal recording as insurance
The DJI Mic 2 transmitter records internally as a backup at all times. If the wireless signal drops or the receiver loses connection, the internal recording is unaffected. After every critical shoot, recover the internal recordings from the transmitter via USB or the DJI Mic app. Label them clearly and keep them until the edit is picture-locked.
DJI Mic 2
Kit for this module
Rode VideoMic NTG
DJI Mic 2 + lapel
ATH-M40X headphones
Sony a6700 / FX30
DaVinci Resolve Fairlight
Quick reference
Recording level targets
Average: −18 to −12 dBFS
Peak max: −6 dBFS
Never clip: 0 dBFS
Delivery loudness
YouTube: −14 LUFS
Broadcast (AU): −23 LUFS
True peak max: −1 dBTP
NTG boom placement
30–40cm above and in front of subject. Aim directly at the mouth. Stay outside the frame. Maintain consistent distance throughout the scene.