Music terminology for videographers

When working with directors and videographers, one of the most common phrases I hear is “I know what I mean, but I don’t know how to say it.” Communication is key on creative projects and getting the composer and director on the same wavelength is vital. Even if you’re not working with a composer directly, music licensing sites like Killer Tracks uses the same sort of terminology for their key words, so knowing how to articulate what you want will help there too.

So, here’s a list of common music terminology to describe the way music sounds that will help with this communication. I’ll also write what is actually going on with the music so it makes more sense. You’ll notice that a lot of the descriptors are the same ones you’d use for visual art forms, so realize that just about any visual descriptor can translate to an auditory one.

This post is broken into sections:

  • The basics (you probably already know many these, but just in case you don’t, here they are)
  • Desktop music production terms
  • Speed and rhythm
  • Volume
  • Character of the sound
  • Instrumentation
  • Style/Genre
  • General Film Scoring Tips

– The Basics –

Unison – When all parts of the music are playing the same melody.

Harmony – An instrument or other part of the music plays notes that compliment others rather than playing in unison, and often implies a lack of dissonance.

Pitch – How high or low a note is.

Interval – The space between the pitches of two notes.

Octave – Physically speaking, this is a note that has a frequency double or half of the note it’s being compared to (octave up means half, octave down means double). It will sound like the same note, just higher or lower.

Dissonance – Arguably the opposite of harmony, this is when two or more notes clash with each other. This can be used to create conflict, agitation, or to add character to a more complex chord. Dissonance isn’t necessarily bad, but it’s a tool in the composer’s toolbox.

Key – This is difficult to describe. A given melody or song generally uses a set of notes, not all of them. However, it can change keys mid-song to add drama, exchanging the original set of notes for another (usually higher). Melodies will sound the same, but start on another note.

Scale – All the notes within a key, usually played in sequence, usually numbered 1-7 with the 8th note an octave above the root (1). These numbers, in conjunction with the type of scale used, can be used to define intervals and chords.

Minor key – This means that the song uses a set of notes that lend themselves to a darker tonality, often used in sad, angry, or moody music. Chords can be minor as well, but don’t necessarily mean the entire key is minor. There are a few different types of minor keys, but that’s not important to a non-musician.

Major key – Similar to the the minor key in that it defines the set of notes used. However, this lends itself to brighter, happier music, and is the most common type of key used in popular music.

Other types of keys – Major and minor keys are not the only ones out there. However, the others are not used nearly as much. One good example is the song “Scarborough Fair.” At first listen, one might believe it’s in a minor key, but it’s actually in what’s called “dorian.” Dorian, along with others like mixolydian, pentatonic, etc, are commonly associated with timeless, folk, or even tribal music, due to the prevalence of these keys in those types of music. Pentatonic is specifically known for making something sound oriental.

Chord – A grouping of notes, usually three or more, that creates varying degrees of harmony or dissonance, depending on the notes selected. Chords are often described using the “root” of the chord, as if it was built off of a key, and then adding qualifiers like major or minor, and added or suspended notes. Numbers refer to position on the scale within the key.

Resolve – This generally refers to when a dissonant chord shifts to remove the dissonance, or a minor chord or key shifts to a major chord or key. In both cases, the root of the chord or key stays the same, and the shift is usually only a note or two. This creates a feeling of relief or hope, in contrast to the chord before it.

Suspended (Sus) – This is a modified chord where one note is raised by one degree on the scale temporarily. These chords almost always resolve into their unsuspended versions. Despite the dissonance, these chords can sound “bright” due to additional harmonies created.

Chord progression – A series of chords, often repeated as a sequence. This drives the music forward and gives it character. Too much repetition in the chord progression can make music sound repetitive. When talking about a chord progression, it’s common to refer to chords by roman numerals referring to where the root falls on the scale rather than the actual note itself. This allows easier transcription as well as easier understanding of how the progression works.

Melody/Theme/Motif/Groove – All of these roughly translate to “a musical idea.” Melodies and themes are generally more flowing and memorable (a catchy tune). Motifs are less distinct or shorter and grooves are more rhythm-based.

Minimalist – In a minimalist style, the same musical idea is repeated over and over with only minor changes and variations to style, instrumentation, and timbre each time. This is a common compositional style for soundtracks because it doesn’t force the listener to pay a lot of attention, which allows it to recede to the background, but still keeps it from being monotonous.

Form – The form of a piece of music is how different musical ideas come together. For instance, the traditional pop song form is ABABCBB usually. In this case, the “A” parts are the verses, “B” denotes the choruses, and “C” is the bridge (defined below). Music can follow different forms, and can have variations on themes to add additional spice.

Chorus – The main theme of a song following a traditional pop song form. It’s a repeated melody/theme throughout the song and usually repeated at the end.

Verse – Placed around the choruses, this has the bulk of the “story” of a pop song. It’s less boisterous than the chorus and generally gives the listener a little bit of a break sonically. The verses usually have slightly different chords than the chorus. Where the chorus will often have the I, IV, and V chords, the verses will have the intermediary chords like II and VI.

Bridge – This “C” part of the music is fundamentally different from the rest of the song, presenting new information, new chords, and usually building into the final chorus, verse, or refrain. The new chords in the bridge can also be used to set up a key change in the final chorus to add drama.

Refrain – This is usually used in lieu of a chorus, consisting of a single, short musically idea and lyrics at the end of each verse.

Music editor versus composer – In small productions, the role of the music editor is usually merged with either the director or the composer (or shared between both). In larger productions, these are two separate jobs. The music editor scrubs through the script and film and marks where the music cues are, what type of music is needed where, and any other requirements needed from the composer. The music editor also selects where existing music needs to be licensed, and procures the necessary licenses to use those songs (usually current, popular music). The composer then takes that and creates the necessary music to fit the music editor’s notes. I added this just so people understand that the two jobs do not have to be done by the same person.

– Desktop Music Production Terms –

DAW – Digital Audio Workstation, this describes a computer with audio editing software like ProTools, Logic, or Ableton Live.

MIDI – This is a standard method that keyboards, DAWs, and other digital instruments communicate note information. MIDI signals can say when to start playing a note (note on), when to stop (note off), how loud to play it (velocity), to gradually make a note louder or softer (expression), and a variety of other things. Once this MIDI signal is fed into a synthesizer, the synthesizer produces a standard audio signal.

Track – This is a term video editors use too. A track can be a software instrument/synthesizer, recorded audio, or a bus (see below).

Patch – Usually refers to settings on a synthesizer. This term is kind of interchangeable with “instrument,” since it changes the sound used by the synthesizer, especially sample-based ones.

Sequencer – This is a standard feature on most DAWs and some hardware items. This stores MIDI data so it can be played back into a synthesizer. This makes editing and creating music much easier. Sequencers are also commonly used to create placeholder music until a live recording of a musician playing that instrument can be substituted.

Sample-based synthesizer – Sometimes referred to simply as a “sampler” or “sound library,” this is a synthesizer that takes the MIDI note information and plays back recordings of real instruments or other sampled audio at the desired pitch. Lower-end samplers simply change the playback speed of the sample to get different pitches while higher-end ones have separate recorded samples for each individual note. Depending on the quality of the patch and the skill of the composer, these can sound very lifelike.

Articulations – A feature of high-end samplers, this loads multiple sets of sounds for the same instrument, usually for more expressive ones like woodwinds, brass, and strings. These patches will often have programmed “key switches” where a note outside of the instrument’s range can tell the sampler to change to another articulation on-the-fly. For instance, the first note of a melody could have a sharp attack, and then a key switch could be used to change to a more flowing articulation for the rest of the melody. This effectively allows digital instruments to have a more authentic sound.

ADSR – Attack, decay, sustain, release. Also called an envelope, this is how a synthesizer regulates volume and other parameters over time. Specifically referencing volume, attack is the rate at which the synthesizer increases in volume after the note-on signal is received. Once it’s reached full volume, decay defines the rate at which it will go down in volume until it hits the sustain level. Sustain is the volume where it will stay until the note-off signal is received, and release is how fast the volume will drop once the note-off signal is received. There are many creative applications for ADSR, and some synthesizers allow for even more complex envelopes.

Arpeggiator – This is a MIDI manipulation function present in some synthesizers that takes a chord and segments it into an arpeggio (see entry below under Speed and Rhythm). Arpeggiators speed up the composing process but at the expense of note-by-note fine tuning.

Signal flow – This is important when processing MIDI and audio. Signal flow refers to how an original signal, either MIDI or audio, goes from its raw form or input, through the various forms of processing, and finally output as the finished mix. It’s usually easy to see this for video due to layers, tracks, applied effects, etc., but with a chaotic mix MIDI, recorded audio, and sometimes live audio, the composer needs a good understanding of where signals are coming from and where they’re going. A MIDI signal might go through an arpeggiator, then through a synthesizer, synthesized into audio, be split into an effects bus and the main mix, then remixed into a final EQ and compressor before it gets played through speakers (most of these terms explained below). If there’s a problem, the musician needs to know where to look.

Bus – This is an audio element that can be used for a variety of purposes, and is heavily used by skilled audio engineers. Basically, it’s a separate path through which audio can be diverted. Usually, each separate track has bus output where some or all of a track’s output can be diverted to the bus, or even more than one bus. Most commonly, it’s used to apply a filter en masse to a group of tracks for both consistency and to reduce the load on the processor. One can apply reverb, delay, or a variety of other effects (see below for descriptions). It can also be used to create a sort of sub-mix before the audio goes to the main mix, either for enhancement or creative effect (suddenly making the music sound like it’s being played through a telephone).

Dry/Wet/Mix – This refers to an audio signal pre- and post-processing. A dry signal is the audio before it’s been run through a filter or plug-in. The wet signal is the audio after it’s gone through the filter or plug-in. Many filters have a “mix” setting, where you can adjust the ratio of dry-to-wet signal that leaves the filter.

Side-chain – This is a somewhat obscure element of signal flow where the audio or midi signal of one instrument is patched into the input of a filter or another instrument. This can be used for a variety of creative effects and automations. A common one is to side-chain bass drum output in techno music into the compressor (see below) on a pad or other long instrument to make the instrument pulse. The pulse of the drum will cause the compressor to attenuate the pad’s signal, and with a long decay on the compressor, the volume will slowly come back up until the next bass hit. Side-chains are also used in vocoders which apply the transients of an audio signal to a synthesized note. When used with a vocal signal, it makes the synthesizer seem to sing words. Daft Punk used vocoders heavily.

The final mix – Often times referred to as just “the mix,” this refers to the audio output from a DAW once all the instruments, filters, patches, plug-ins, etc. are all mixed together.

Mastering – This refers to the arduous process of fine-tuning the final mix and perfecting the balance between tracks, as well as adding final touches like compression and overall EQ (see below). This is a skill in and of itself, and something that people get paid good money to do. To compare it to the video production process, this would be the final color grading. It both fixes unwanted issues and adds character.

Common audio filters:

EQ – Equalizer, this allows adjustments in volume to individual frequency ranges for an instrument or the whole song.

Delay – Basically, echo. This filter takes the audio, delays it, and then mixes it back in. Setting can change the amount of delay, number of echoes, and the strength of the echoes.

Reverb – This creates a sense of space in music. This simulates how sound scatters and reflects in a room or other enclosed space. This is NOT echo, although it can have elements of it. More sophisticated reverb plug-ins can simulate specific real-life spaces.

Compression – This is a filter that reduces loud volume areas of an audio signal by a given amount. It can help make the overall volume of a recording louder, or make creative effects in the music. A more severe version of this is a “Limiter” which simply does not allow volumes louder than a given value to playback above that value.

– Speed and Rhythm –

Measure – Sometimes referred to as a “bar,” this is a main building block of music. It is a group of notes that defines a short musical idea. A melody or groove is usually a combination of 4-8 measures repeated, but there’s usually a feeling of segmentation where the measures start and end.

Time signature – This defines the number of beats in a measure. Think of this as the “main beat” or pulse of the music. Techno music usually puts bass hits on each of these main beats, and “Fear the Reaper” used cowbell for those beats. In other music, it’s less obvious, but musicians will feel the beat regardless (especially classically trained ones who practiced with metronomes). A time signature is usually written as a fraction with the top number meaning the number of beats, and the bottom meaning the length of note being counted (the bottom number is really only important to the musician though). Time signatures with top numbers divisible by four tend to feel strong and driving. Time signatures divisible by three tend to feel flowing. Odd time signatures like 7/8, 5/4, etc, tend to feel off-kilter or sinister. This is because our brains naturally want to group beats into groups of three or four, so the added (or subtracted) beat throws us off. Dave Brubeck’s “Take Five,” Gustav Holst’s “Mars,” and Saruman’s theme from Lord of the Rings are all examples of 5/4 time signatures.

Tempo – This tells you how fast the beats in the time signature are. It’s important to realize that this only refers to those main beats, NOT the ones that may come in-between.

Rhythm – This refers to the temporal placement of notes in a defined group.

“Straight” rhythm – A rhythm that stays with the main beat or time signature, either playing all or most of the beats. Also refers to a beat that is not swung (see below).

Swing rhythm – Every other note of the main beat is delayed slightly for feel. Swing is used in jazz, hip hop (sometimes), and other genres.

Syncopation – When a rhythm deviates from the time signature or other established rhythm in a complimentary way. This is particularly popular in latin music.

Subdivision – This is when there are notes being played at a faster rate than the tempo at various fractions of it so it stays in sync. Subdivision is most common in faster, higher frequency instruments, like high-hats on a drum set, or violins in an orchestra. Adding subdivision can make music seem faster or more intense without changing tempo.

Arpeggio – While not exactly referring to speed or rhythm, it belongs with subdivision due to its usage. Arpeggios are when a chord is played in sequence rather than all at once. The arpeggio can rise, fall, or mix the notes up completely, but it always plays a repeating pattern of notes derived from a chord.

– Volume –

Dynamics – This is the musical term for volume.

Forte – This dynamic marking means to play the section loudly. This can be exaggerated by adding “issimo” to the end, meaning to play it even louder, e.g., fortissimo.

Piano – In terms of volume, this doesn’t refer to an instrument, but rather means “soft” or “quiet.” This too can be bolstered by adding “issimo” to the end, e.g., pianissimo. Fun trivia: the instrument we know as the “piano” was originally called a “piano-forte” due to how expressively and suddenly it could change in volume from soft to loud and back again. This was a new feature, since its predecessor, the harpsichord, couldn’t change dynamics on-the-fly.

Crescendo – Gradually rise in volume.

Decrescendo – Gradually fade in volume.

Sforzando – The note is briefly loud, then immediately soft, often building into a crescendo after.

Velocity versus Volume – This applies when using digital instruments. Volume is pretty self-explanitory, but refers specifically to how loud the sound is in the final mix. Velocity refers to the volume the MIDI signal is telling the digital instrument to play. High quality digital instruments will change in character as they get louder or softer, the same way a physical instrument does. This can be important when talking to your composer if you want something quieter or louder, and you do or do not want the character of the sound to change.

– Character of Sound –

Transients – A sine wave is capable of playing sound at a specific frequency. However, if you pluck a string, it won’t just vibrate the full length of the string, but will also vibrate in fractions of its length, creating additional frequencies above its “fundamental.” All instruments generally play more than one specific frequency based on the shape and method of producing sound, and this is what gives music its character. These additional frequencies are called transients. Modifying these transients using EQ can drastically change the character of an instrument, as well as how it’s perceived in the final mix.

Timbre – This describes the overall character of a sound. It’s an esoteric term like “bokeh” that people know what it means but is hard to describe. Physically speaking, this usually refers to the amount and placement of transients in a sound.

Warm – This describes sound rich in lower frequencies and strong harmony. Cellos, french horns, and the lower end of a piano played softly usually can be described as “warm.” A sound can become warmer by reducing higher frequency transients.

Cold – This generally describes sound with strong higher frequencies or very little mid-range frequencies. There can also be less harmony and more dissonance. The upper range of a piano, violins, trumpets, and metal mallet instruments can be described as “cold.” A sound can become colder by increasing higher frequency transients. A sound with transients that are too loud in the 4kHz range can be described as “shrill” and is unsettling to the listener.

Tonal – This refers to an instrument that has a well-defined pitch and harmonizes well.

Atonal – This refers to instruments that don’t have defined pitches, even if they can produce higher and lower frequencies. Most percussion falls into this category, but things like heavily distorted guitar can also breech this category. This can also refer to music that doesn’t follow a set key.

Crisp/Sharp – Usually refers to a sound that has a fast attack (refer back to ADSR in the Desktop Music Production Terms section) and often has strong upper-frequency transients. Humans are most sensitive to frequencies around 4kHz, which happens to be where sibilance happens (the part of the voice that makes speech intelligible). So, sharp/crisp sounds are often strong around that specific range.

Soft – When not referring to volume, this refers to a sound with a slow attack and release.

Staccato – Notes are short and separated, and often crisp for emphasis.

Legato – Notes are long and connected.

Flowing – Derivative of legato, this generally also means the melody takes advantage of the connectedness, rising and falling in both volume and pitch.

Bright – Lots of higher frequencies, similar to cold, but with more harmony. Usually means a major key or chord too. The notes of chords are often spread-out throughout the ranges of instruments rather than tightly clustered.

Dark – More lower frequencies with minor or dissonant chords. Notes are more clustered together.

Energetic – High tempo with or without subdivision (usually only subdivided once or twice).

Relaxed – Slower tempo, often swung. Subdivision can be heavy (making for rapid notes) as long as the main beat is slow.

– Instrumentation –


– Style/Genre –


– General Film Scoring Tips –

Film score versus background music – This is arguably the most common mistake I see with people who don’t have a background in music. Music tells a story just as much as any other element of a visual production. It shouldn’t just be a “sound bed” or “background music” that sets a little bit of a mood, but doesn’t tell the story. Music has arcs and shifts, moods and pacing. These should match and/or compliment your video. If you’re using existing music, listen to where the mood and timbre change, and think about which parts compliment your music. You may have to rearrange the order of some of the musical elements. If you’re working with a composer, thinking about what the pivotal moments of your piece are, and convey those to the composer so they can appropriately write music to match those moments. Few things frustrate me more than when I hear a drastic change in music that isn’t reflected in the video, or visa versa. It’s a missed opportunity to propel a video into the emotional stratosphere.

In the example below, the opening is largely a piano motif while the story talks about the family’s background. The motif is somewhat melancholy with dissonant elements to give a happy/sad mood like something is missing. Once it becomes clear that the adoption is going to happen (1:40) high, quick strings begin a subdivision to build tension (1:45). Long, lower notes start to build as well, driving the pace of the video forward. There’s a short woodwind riff at 2:20 at the first photo of the adopted child, adding emphasis. The tension subsides to allow the viewer to dwell on the imagery of the child for a few seconds. Then, as the story turns toward the homecoming, tension builds again (2:38) with a short repeating pattern with a low hit on the piano that gets louder with each repetition, eventually building toward the pivotal moment when the entire family embraces for the first time (2:45). The music climaxes with alternating minor and major chords, becoming dissonant and then resolving to mimic the turbulent mix of emotions going through the family, the joy so great it causes tears. The lower instruments suddenly drop out (3:02), leaving the higher instruments to carry the melody, creating a feeling of relief as the music begins to drop back down again, transitioning into the original piano theme (3:19). Later, as the story begins to wrap up and show how far she’s come, the piano theme shifts into an entirely harmonic motif, free of dissonance, creating a feeling of hope for the future (4:10).

Timing – I know the general rule of thumb with nat sound is to have the viewer hear something, then see it. However, when you’re talking about synchronization, whether it be music or sync’d audio, the exact opposite is true. If you must have the synchronization be off a little, the music beat should come after the visual cue or cut, not before. The tolerance is roughly one frame before the visual cue to two frames after. Ideally, you want it to be perfect with the visual cue, or only one frame after. Notice that I say “visual cue” and not necessarily the cut. Sometimes, the visual moment that has the most weight is based on the action of the subject, not the cut itself. Evaluate where the weight is to figure out where the music should be. From a psychological standpoint, this makes sense because when we see something happen that has an associated sound, the sound never comes before we see it, but rather after, like watching someone kick a soccer ball from across the field. The nat sound rule applies to a sound bed, like how you hear the chatter of a room before you walk in.

Ducking – Many good video editors know what ducking is, but I want to expand on the topic. For those who don’t know, ducking is the practice of lowering the volume of a nat sound or music track to make a main speaking part or other sound more prominent, and then raising it back up after the sound is over. Realistically, a good film score should need very little ducking, because it’s built into the score itself. That, or the ducking should work with the natural rises and falls of the music, not fight them. Warmer music, rich in low frequency and less in the higher frequencies, works well behind voices because the two don’t compete. Consequentially, music that is particularly saturated in frequencies around 4kHz will have to be reduced dramatically to make spoken words understandable, because it’s directly competing with sibilance.

Additionally, ducking should usually be slow, not fast, unless it coincides with a sudden rise or fall in the music itself. Get a feel for how the music naturally changes in dynamic level, and work your ducking into that natural flow. The absolute worst thing you can do is to start ducking immediately after a rise in the music, or stop the ducking suddenly right when it releases into a quieter section. It will feel unnatural to the audience, even if they can’t place why exactly.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ seven = 14