By Emily Clashe
“…if filmmakers were as involved with the captioning process as they are with sound editing, how might that improve the quality of the captions provided?”
Despite huge technological advances over the last 40 years, the stylistic conventions of captioning are essentially unchanged: one or two rows of small white text at the bottom of the screen, uniform in size and typeface.
Of course, the most important criteria for captions is that they be readable, and consistency is helpful — knowing where the next caption will appear gives you more time to read it. Even in films like Night Watch, where director Timur Bekmambetov helped design the English subtitles, only a few subtitles deviate from the norm using movement, color, blurring, and other techniques for rhetorical emphasis/visual impact. Most of the subtitles look quite familiar. (Sean Zdenek wrote a fascinating blog post on Night Watch’s artistic subtitles, “Subtitles as Art”)
But readability is contextual, and the current American conventions aren’t always the best answer. Captions that appear right by a speaker during an emotional scene might make the scene more readable because they allow the caption-user more time to take in facial expressions. Why don’t caption styles vary by genre– differing in appearance from documentaries to action movies to emergency news updates? What if the style of each caption was tailored to the scene it appears in?
The Current Model of Media Captioning
In the current model, captioning is most often “tacked on” at the end of production by a third-party vendor for the lowest cost/quickest turnaround possible. Most captionists don’t have the time, resources, or permissions to experiment with the style of captions. But because of these pressures on captionists, and how the quality of captions can suffer as a result, I believe it’s worth making space for imagining how things could be different.
I’m glad to be a text-to-speech-provider at a moment when the emerging field of caption studies offers this space. Janine Butler’s articles Embodied Captions in Multimodal Pedagogies and Integral Captions and Subtitles: Designing a Space for Embodied Rhetorics and Visual Access, and Sean Zdenek’s webtext “Designing Captions: Disruptive Experiments with typography, color, icons, and effects” are my jumping-off points here; in these works, both scholars reveal the rhetorical potential of captions and open up fascinating questions.
What if captions were considered as an integral part of how a message is conveyed?
Film translators are making the argument that subtitles should be considered vital to storytelling. As Subtitler David Buchanan pointed out in a Guardian article on the importance of quality subtitling, “Bad subtitles can ruin millions of dollars’ worth of hard work. A film-maker wouldn’t outsource their colour correction or audio mix and just think: ‘I’ll leave them to it, I’m sure it’ll be fine.’ They would want to see it, hear it, get a second opinion, make sure everybody is on the same page. It should be the same with subtitles.”
If filmmakers were as involved with the captioning process as they are with sound editing, how might that improve the quality of the captions provided?
Comic Books, Captions, and Miles Morales
Though there aren’t many (if any?) big-name movie studios taking on these questions, there are other forms of media that do so inherently, like comics. The text boxes in comic books physically and aesthetically fit on their pages– using color, typeface, bolding, italics, and more to creatively convey the tone of a comment or scene.
What could a film that drew from this comic book aesthetic show us about the possibilities of captions?
2018’s Spiderman: Into the Spider-Verse aimed to bring Spiderman comic books to life, using text boxes to representing dialogue in-frame, with striking color, movement, and other design elements. I don’t think anyone who worked on Spiderman would have called these brief moments “open captions.” However, in building this aesthetic into their film, the creators tapped into the visual language that comics have for representing non-linear sound in linear(ish) text.
One unintended result that I find exciting is that audiences saw– just for a moment– “captions” that look very different from conventional closed captions.
Here’s an example where the protagonist, Miles Morales, is walking down a hallway at his middle school:
His thoughts, in voiceover, appear on the screen: “Wait, why is the voice in my head so loud?” These words pop up and then recede on either side of him in yellow boxes filled with Ben-Day dots, like the pages of a color comic book, and the text is stylized to mimic comic book narration/dialogue — all-caps, with certain words emphasized via italics, bolding, or larger font size.
In the moment screenshotted above, Miles Morales is thinking (in voiceover), “I gotta get new pants.” Compare the post-production captions to the stylized text boxes at the center; The plain white captions are at the bottom of the screen with no emphasis, and the colorful text boxes are very much a part of the frame – a passing student even moves in front of them!
Because this line is delivered in voiceover, the conventional caption is entirely in italics; there’s no room for additional emphasis on the word “pants” even though that would better reflect the emotion and humor of the line.
Another creative use of text occurs when Miles runs into a busy intersection and almost gets hit by a taxi that comes to a screeching halt. All of this happens in around a second– there’s really no room for a caption to the effect of “[car screeches to a halt]”.
Instead, the word “screech” is stylized like comic book onomatopoeia, in big block letters that stretch out to match the stretched long “ee” sound effect.
My favorite use of the text boxes, though, happens as Miles is processing becoming Spiderman; In this scene, he’s literally trying to outrun his thoughts which follow him in the text boxes.
(Interestingly, the captionist chose “it’s” instead of “it was” in the actual caption, which introduces unnecessary confusion because the spider that bit Miles is dead.)
These text boxes take center stage, not the margins — they occupy the same space as characters and their actions. They also enrich the aesthetic of the film. Rather than a retrofitted accommodation, these are captions used as a site for meaning-making. They are integrated into the film as one element of a multimodal message.
By employing this comic style, the filmmakers demonstrated what it might look like to consider captions as a tool for expressing meaning, rather than as a post-production cost necessary in order to meet legal obligations.
Not as much of a novelty as you may think…
It’s worth pointing out, as Janine Butler does, that Deaf filmmakers have paid this kind of attention to the visual design of text in English subtitles for years now. Take a look at Justin Jackerson’s “Why ASL” or Rachel Benedict’s “Early Intervention,” both Journal of ASL papers. Both Jackerson and Benedict designed room for subtitles in their videos from the beginning, leaving space to use techniques that are impossible with traditional pop-on captions. For example, if a concept has to be split across multiple caption boxes, captions remain on screen with new text joining below the old, so that the concept can be displayed as a whole.
In her article Integral Captions and Subtitles: Designing a Space for Embodied Rhetorics and Visual Access, Butler discusses Wayne Betts, Jr., a Deaf filmmaker who chose to move away from the sharp cuts and breaks that characterize traditional film editing. Instead, he prefers continuous shots that keep both sides of a conversation in the frame, reflecting how ASL users maintain eye contact throughout an interaction. This change in film technique results in work that better expresses his Deaf experience. As he notes, he brought this approach to the captions when making Gallaudet: The Film, in order to allow viewers to “maintain eye contact with the meaningful space of the screen.”
In his 2010 TEDx talk, he said: (as translated from ASL by Janine Butler)
“Notice the captions? They weren’t fixed to the bottom of the screen. That feels the same as cuts and breaks. My eyes fall down to the bottom. No! I want to see and stay up here with the actor’s eyes. My eyes are following the action and I can still feel the flow. I feel connected to what’s going on. That’s my world. That’s it.”
Integral Captions against Marginalization
Butler created the term integral captions/subtitles to recognize how Deaf filmmakers have designed captions and subtitles as “integral to the meaning of their composition” in order to express the rhetorical meaning of their work and make it more accessible. In defining this term, she encourages all filmmakers to take the opportunity to design captions as essential elements of their productions.
By centering text in the middle of the screen and using other techniques, these videos also center caption/subtitle users’ experience (rather than the experience of viewers who don’t use captions, as conventional closed captioning is designed to do.) Integral captions/subtitles can make it easier to comprehend fast action sequences or to catch a character’s facial expression. As Butler notes, integral captions reflect the visual/kinetic emphasis of ASL/Deaf Space design practices. Therefore, they work against an understanding of disability as unexpected, non-normative, or marginal.
Look away, look away…
To look at the rhetorical problems created by conventional captions (and the possibilities of integral captions to solve these problems!), consider the Netflix series A Series of Unfortunate Events episode “Penultimate Peril: Part 1.”
In the episode, three siblings, in disguise as concierges at a hotel, perform various errands in the same short period of time, but we see their stories in sequential order: first Violet, then Klaus, then Sunny. All three stories end with the hotel clock striking at 3:00 PM. The narrator Lemony Snicket describes the sound of the clock in Violet’s story, saying: “The clock in the lobby of the Hotel Denouement was the stuff of legend, a phrase which here means ‘very famous for being very loud.’”
The way that the clock sounds when it strikes takes on unique importance within the story of the episode, because it is used to tie the three stories together. Here are the three lines delivered as the ending to each of the three siblings’ stories:
“The noise it makes sounded a lot like a certain word, and that word described Violet Baudelaire as she prepared to carry a harpoon gun up to the rooftop sunbathing salon.”
“This word described Klaus Baudelaire as he helped hang a strip of bird paper outside the fifth floor.”
“The sound of the clock in the lobby described Sunny Baudelaire as she turned the laundry room entrance into a Vernacularly Fastened Door. But it also described the story of the Baudelaires, because everything they thought they knew about their lives, their situation, and the Hotel Denouement, was wrong.”
If you are not a caption user, you only learn that the word the clock striking sounds like is “wrong” at the end of all three stories. (The audio of the clock striking does not sound precisely like “wrong” — it’s just a deep gong sound.)
However, if you are a caption user, this central piece of storytelling is given away after the first story (Violet’s), with the following caption:
This caption is used again at the end of Klaus’s story:
And so when narrator Lemony Snicket delivers the “everything they thought they knew about their lives, their situation, and the Hotel Denouement, was wrong” line, the impact of the reveal is lost. The storytelling has been sacrificed in an attempt to avoid excluding caption users.
Integral captions offer an alternative approach. What if the clock striking was indicated with wavy “sound” lines, like how Spiderman: Into the Spider-Verse represented Spidey-sense?
Throughout the movie, “Spidey-sense” has a tense, vibrato sound, but it’s also clearly represented visually, eliminating the need for an “informing” caption that might take the audience out of the moment.
Imagine this visual indicator being used in the clock scene of Unfortunate Events, instead. The clock-striking noise is indicated visually, with lines mimicking sound waves emanating from the clock. These lines could transform in shape slightly to look more like words written in cursive, becoming clearer after each story– only becoming readable as “Wrong! Wrong! Wrong!” when Lemony Snicket says “everything they thought they knew… was wrong.”
Would that result in clearer, more effective, more accessible storytelling?
At the very least, I think it’s worth investigating. I agree with Sean Zdenek that “it is time to imagine different futures for captioning, futures that welcome experimentation and multiple approaches, question the hegemony of the word, and elevate the needs of viewers who are deaf and hard of hearing.”
Does any of this apply to speech-to-text providers right now, especially if (like me) you aren’t a filmmaker and are not skilled in Adobe AfterEffects? I think so. The concept of integral captions offers us the opportunity to question established conventions and best practices, and to call for (or continue calling for) more research into caption usability.
About the Author:
Emily Clashe is a staff C-Print Captionist at the University of Minnesota with interests in disability studies, comics, and text of all kinds. She welcomes further discussion on anything caption-related at [email protected].