Key challenges in using automatic dubbing to translate educational YouTube videos


  • Rocío Baños University College London



automatic dubbing, machine translation, speech synthesis, dubbing, educational videos


The dubbing industry has been particularly concerned with how to cope with the higher demand in dubbing experienced over the past few years. Media companies have invested in Artificial Intelligence, with significant developments on automatic dubbing spearheaded by companies like Google (Kottahachchi & Abeysinghe, 2022), Amazon (Federico et al., 2020; Lakew et al., 2021) or AppTek (Di Gangi et al., 2022). These organisations have clearly seen the potential in automatic dubbing, understood as the combination of automatic speech recognition, machine translation and text to speech technologies to automatically replace the audio track of an original audiovisual text with synthetic speech in a different language, taking into consideration relevant synchronies. Against this backdrop, this article sets out to: 1) provide an overview of automatic dubbing in the current mediascape and to contextualise this practice accordingly; and 2) to identify some of the main challenges of its implementation, especially as regards the integration of MT and speech synthesis in the dubbing workflow. To this end, the performance of the tool Aloud will be evaluated, by analysing the videos currently available in the Spanish version of the YouTube Channel Amoeba Sisters, which have been dubbed using this tool. The evaluation will focus on the issues highlighted by YouTube users in their comments on this YouTube channel, which include naturalness and accuracy. Special attention will be paid to the use of synthetic voices, which are heavily criticised by users. However, they also highlight the usefulness of automatic dubbing for students interested in biology-related topics who are not fluent in English, in line with the original intent of the tool developers of increasing accessibility.


