Towards a Corpus-based, Statistical Approach to Translation Quality: Measuring and Visualizing Linguistic Deviance in Student Translations


  • Gert De Sutter Ghent University
  • Bert Cappelle Université de Lille 3
  • Orphée De Clercq Ghent University
  • Rudy Loock Université de Lille 3
  • Koen Plevoets University of Leuven



translation quality, student translations, target-language norms, multifactorial analysis, corpus-based translation studies


In this article we present a corpus-based statistical approach to measuring translation quality, more particularly translation acceptability, by comparing the features of translated and original texts. We discuss initial findings that aim to support and objectify formative quality assessment. To that end, we extract a multitude of linguistic and textual features from both student and professional translation corpora that consist of many different translations by several translators in two different genres (fiction, news) and in two translation directions (English to French and French to Dutch). The numerical information gathered from these corpora is exploratively analysed with Principal Component Analysis, which enables us to identify stable, language-independent linguistic and textual indicators of student translations compared to translations produced by professionals. The differences between these types of translation are subsequently tested by means of ANOVA. The results clearly indicate that the proposed methodology is indeed capable of distinguishing between student and professional translations. It is claimed that this deviant behaviour indicates an overall lower translation quality in student translations: student translations tend to score lower at the acceptability level, that is, they deviate significantly from target-language norms and conventions. In addition, the proposed methodology is capable of assessing the acceptability of an individual student’s translation – a smaller linguistic distance between a given student translation and the norm set by the professional translations correlates with higher quality. The methodology is also able to provide objective and concrete feedback about the divergent linguistic dimensions in their text.


Anckaert, P., Eyckmans, J., & Segers, W. (2008). Pour une évaluation normative de la compétence de traduction. International Journal of Applied Linguistics, 155, 53–76.

Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds), Text and technology: In honour of John Sinclair (pp. 233–250). Amsterdam: John Benjamins.

Bowker, L. (1998). Using specialized monolingual native-language corpora as a translation resource: A pilot study. Meta: Journal des traducteurs / Meta: Translators’ Journal, 43(4), 631–651.

Bowker, L. (1999). Exploring the potential of corpora for raising language awareness in student translators. Language Awareness, 8(3–4), 160–173.

Bowker, L. (2000). A corpus-based approach to evaluating student translations. The Translator, 6(2), 183–210.

Bowker, L. (2001). Towards a methodology for a corpus-based approach to translation evaluation. Meta: Journal des traducteurs / Meta: Translators’ Journal, 46(2), 345–364.

Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. London: Routledge.

Cappelle, B. (2012). English is less rich in manner-of-motion verbs when translated from French. Across Languages and Cultures, 13(2), 173–195.

Cappelle, B., & Loock, R. (2017). Typological differences shining through: The case of phrasal verbs in translated English. In G. De Sutter, I. Delaere, & M.-A. Lefer (Eds.), Empirical Translation Studies: New theoretical and methodological traditions (pp. 235–263). Berlin: Mouton de Gruyter.

Chesterman, A. (1999). The empirical status of prescriptivism. Folia Translatologica, 6, 9–19.

Chesterman, A. (2004). Beyond the particular. In A. Mauranen & P. Kujamäki (Eds.), Translation universals: Do they exist? (pp. 33–49). Amsterdam: John Benjamins.

Daems, J. (2016). A translation robot for each translator?: A comparative study of manual translation and post-editing of machine translations: process, quality and translator attitude (Unpublished doctoral dissertation). Ghent University, Ghent.

Daems, J., Vandepitte, S., Hartsuiker, R., & Macken, L. (in press). Translation methods and experience: A comparative analysis of human translation and post-editing with students and professional translators. Meta: Journal des traducteurs/Meta: Translators’ Journal.

Delaere, I., De Sutter, G., & Plevoets, K. (2012). Is translated language more standardized than non-translated language?: Using profile-based correspondence analysis for measuring linguistic distances between language varieties. Target, 24(2), 203–224.

De Sutter, G., Delaere, I., & Lefer, M.-A. (Eds.). (2017). Empirical Translation Studies: New theoretical and methodological traditions. Berlin: Mouton de Gruyter.

Evert, S., & Neumann, S. (2017). The impact of translation direction on characteristics of translated texts: A multivariate analysis for English and German. In G. De Sutter, I. Delaere, & M.-A. Lefer (Eds.), Empirical Translation Studies: New theoretical and methodological traditions (pp. 47–80). Berlin: Mouton de Gruyter.

François, T., & Miltsakaki, E. (2012). Do NLP and machine learning improve traditional readability formulas? In Proceedings of the First Workshop on Predicting and improving text readability for target reader populations (PITR2012) (pp. 49–57). Montréal: The Association for Computational Linguistics.

Frankenberg-Garcia, A. (2015). Training translators to use corpora hands-on: Challenges and reactions by a group of 13 students at a UK university. Corpora, 210(3), 351–380.

Hassani, G. (2011). A corpus-based evaluation approach to translation improvement. Meta: Journal des traducteurs / Meta: Translators’ Journal, 56(2), 351–373.

Johansson, S. (2007). Seeing through multilingual corpora. Amsterdam: John Benjamins.

Kruger, H. (2015, June). Translation and the intersection of social and cognitive aspects of bilingualism. Paper presented at Theory, Practice and Innovation: Social, Cognitive and Linguistic Perspectives in the Study of Bilingualism. University of New South Wales.

Kruger, H. (2016). What's happening when nothing's happening?: Combining eyetracking and keylogging to explore cognitive processing during pauses in translation production. Across Languages and Cultures, 17(1), 25–52.

Kübler, N. (2001). Corpora in terminology and translation teaching: Methodological approach. In S. De Cock, G. Gilquin, S. Granger, & S. Petch-Tyson (Eds.), Proceedings of the ICAME 01 Conference (pp. 53–55). Louvain-la-Neuve : Centre for English Corpus Linguistics.

Kübler, N. (2003). Corpora and LSP translation. In S. Bernardini, D. Stewart, & F. Zanettin (Eds.), Corpora in translator education (pp. 25–42). Manchester: St Jerome.

Kübler, N. (2008). A comparable learner translator corpus: Creation and use. In P. Zweigenbaum (Ed.), Proceedings of the Comparable Corpora Workshop of the LREC Conference (pp. 73–78). Marrakesh, Morocco.

Kübler, N. (Ed.). (2011a). Language corpora, teaching, and resources: From theory to practice. Bern: Peter Lang.

Kübler, N. (2011b). Working with corpora for translation teaching in a French-speaking setting. In A. Frankenberg-Garcia, L. Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 62–80). London: Continuum.

Loock, R. (2016a). La traductologie de corpus. Lille: Presses Universitaires du Septentrion.

Loock, R. (2016b). L’utilisation des corpus électroniques chez le traducteur professionnel: Quand? Comment? Pour quoi faire? ILCEA, 27. Retrieved from

Loock, R., Lefebvre-Scodeller, C., & Mariaule, M. (2012). Corpus CorTEx de français littéraire traduit depuis l’anglais. Retrieved from

Loock, R., Mariaule, M., & Oster, C. (2014). Traductologie de corpus et qualité: Étude de cas. Proceedings of the Tralogy II Conference. Retrieved from

Macken, L., De Clercq, O., & Paulussen, H. (2011). Dutch parallel corpus: A balanced copyright-cleared parallel corpus. Meta: Journal des traducteurs / Meta: Translators’ Journal, 56(2), 374–390.

New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36(3), 516.

Olohan, M., & Baker, M. (2000). Reporting that in translated English: Evidence for subconscious processes of explicitation? Across Languages and Cultures, 1(2), 141–158.

Pallier, C. (1999). Syllabation des representations phonetiques de brulex et de lexique. Technical Report, update 2004.

Pearson, J. (2003). Using parallel texts in the translation training environment. In S. Bernardini, D. Stewart, & F. Zanettin (Eds.), Corpora in translator education (pp. 15–24). Manchester: St Jerome.

Rabadán, R., Labrador, B., & Ramon, N. (2009). Corpus-based contrastive analysis and translation universals: A tool for translation quality assessment English-Spanish. Babel, 55(4), 303–328.

Ruiz Yepes, G. (2011). Parallel corpora in translator education. Redit, 7, 65–80.

Sánchez-Gijón, P. (2009). DIY corpora in the specialised translation course. In A. Beeby, P. Rodríguez-Inés, & P. Sánchez-Gijón (Eds), Corpus use and translating: Corpus use for learning to translate and learning corpus use to translate (pp. 109–128). Amsterdam: John Benjamins.

Secară, A. (2005). Translation evaluation: A state of the art Survey. Proceedings of the eCoLoRe-MeLLANGE Workshop. Retrieved from

Toudic, D., Hernandez Morin, K., Moreau, F., Barbin, F., & Phuez, G. (2014). Du contexte didactique aux pratiques professionnelles: Proposition d’une grille multicritères pour l’évaluation de la qualité en traduction spécialisée.’ ILCEA, 19. Retrieved from

Van de Kauter, M., Cooreman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.

Vandevoorde, L. (2016). On semantic differences: A multivariate corpus-based study of the semantic field of inchoativity in translated and non-translated Dutch (Unpublished doctoral dissertation). Ghent University, Ghent.

Van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010) (pp. 775–782). European Language Resources Association (ELRA).

Varantola, K. (2003). Translators and disposable corpora. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 55–70). Manchester: St. Jerome.

Williams, M. (2009). Translation quality assessment. Mutatis Mutandis, 2(1), 3–23.

Xiao, R. (2010). How different is translated Chinese from native Chinese?: A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5–35.

Zanettin, F. (2012). Translation-driven corpora. Manchester: St Jerome.

Zanettin, F., Bernardini, S., & Stewart, D. (Eds.) (2003). Corpora in translator education. Manchester: St Jerome.




How to Cite

De Sutter, G., Cappelle, B., De Clercq, O., Loock, R., & Plevoets, K. (2018). Towards a Corpus-based, Statistical Approach to Translation Quality: Measuring and Visualizing Linguistic Deviance in Student Translations. Linguistica Antverpiensia, New Series – Themes in Translation Studies, 16.