Captioning Quality Guidelines

Captions convey all meaningful audio content present in media, including spoken dialogue, narration, sound effects, and music description. To ensure captions can be effectively used by your audience, captions should be accurate, equal, consistent, complete, and readable. Please read more about each of these guidelines below. This information has been adapted from the Described and Captioned Media Program.

Accurate

Errorless captions require accurate spelling, punctuation, and grammar.

Ensure no spelling errors are present and be consistent with the spelling of words.
Use correct punctuation, including commas, periods, exclamation points, question marks, and quotation marks.
When a speaker stutters, caption what is said. You do not need to include unnecessary “uh” and “um” sounds.
Use an ellipsis to indicate trailing off or a short pause within a speaker’s language.
Use a hyphen to indicate an interruption or an abrupt halt or shift in speech.
Use italics to indicate the following: an off-screen voice-over reading, when a person is dreaming, thinking, or reminiscing, when there is background audio that is essential to the plot, the first time a new word is being defined, off-screen dialogue, narrator dialogue, sound effects, or music (this includes background music).

Equal

Equal access means that the meaning and intention of the material are completely preserved.

Represent details that aid the overall understanding of the intention of the video. Do not paraphrase or omit any spoken or meaningful audio content.
Include tone or volume descriptions to convey the style or impact of speech, in parentheses above or before captions, as needed [eg., (whispering), (shouting), (over the phone), (singing)].
Indicate when no audio is present (e.g., [silence], [no talking], [no audio]).
If there is no audio for the duration of the video, include a single caption that says [silence] at the beginning of the video for approximately three to five seconds.
Preserve profanity and inflammatory language.
If speakers have an accent or are using provincial language that is relevant to the meaning of the material, note that information within brackets before the captions.
When a word is spoken phonetically, caption it the way it is commonly written (e.g. "N-double-A-C-P" should be captioned as NAACP).

Consistent

Uniformity in style and presentation of all captioning features is crucial for viewer understanding.

Write captions in standard text, using normal upper- and lower-case capitalization conventions.
If possible, use a sans serif font such as Arial or Helvetica medium.
If possible, choose a font size and color that are easily read and provide accessible contrast with the background (white on a translucent gray background is ideal). You can check if your font size and color choice are accessible using the WebAIM Contrast Checker.

Complete

A complete textual representation of the audio, including speaker identification and non-speech information, provides clarity.

Use parentheses to indicate speaker information and identification. The speaker identification should be on a line of its own, above the captions.
When possible, identify the speaker by placing the caption under the speaker on the screen.
- When a speaker cannot be identified by placement and their name is known, identify the speaker by name in parentheses.
- When a speaker cannot be identified by placement and their name is unknown, identify the speaker with the information available (e.g., (Narrator #1), (Narrator #2)).
- If there is only one speaker, identify them once at the beginning of the media.
- When a speaker is portraying another person, identify the speaker as the person being portrayed (e.g. (Acting as Violet) or (Emelia acting as Violet)).
A description of sound effects, in brackets, should include the source of the sound, unless the source of the sound is visible on screen.
Caption background/non-speech sounds only when they are essential to the plot and should be described in the third person tense.
- Abrupt sounds should be described in the present tense (e.g., [dog barks])
- Ongoing sounds should use the progressive tense (e.g., [dog barking])
Include onomatopoeia when possible (e.g., [thud], [splash], [woof], [pop])
When captioning music, if the type or style of music is important, include objective descriptors. Avoid subjective words (e.g., use [classical string music] instead of [beautiful music]).
- If music contains lyrics, caption the lyrics verbatim. The lyrics should be introduced with the name of the vocalist/vocal group, and the song title (in brackets) if known/significant.
- Caption lyrics with music note icons (♪) if possible. Use one music icon at the beginning and end of each caption within a song.
- If only background music is playing for an extended time, indicate this with [music] for no longer than ten seconds.
If multiple people are speaking, the captions will usually not be a perfect representation of the speech. You can put speakers on different lines in the caption block or have alternating speakers for successive caption blocks. However, make sure that every caption block is visible for at least one second and that each change in speaker is noted.

Readable

Captions should be displayed long enough to be read completely, be in synchronization with the audio, and not be obscured by (nor do they obscure) the visual content.

The text should be centered on the screen and left-aligned, with line breaks following logical grammar syntax chunking, e.g. Let’s all go / to the park, not “Let’s all go to the / park.” Where possible, try to ensure that captions do not end in the middle of a phrase.
The timing of captions should be synchronous to the audio in the media as closely as possible while allowing time for the captions to be read.
If possible, move captions below the visual content so that both captions and the video are fully viewable.
Captions should not exceed two visible lines of text at any point in time.
When possible, use closed captions (can be turned on/off) instead of open captions (always visible).