Monday, May 18, 2009

String concatenation used to be a big issue in the translation world, once upon a long ago. Programmers would insert variables into the middle of a sentence in their software strings, not realising that pluralisation in other languages involves more than sticking the letter S on the end of a word. But I thought that era was past, most programmers now know better, and internationalise their applications from the start. Based on some comments I heard at the recent ITI Conference in London, it appears that it is sometimes still an issue, so I am re-posting some old notes on the subject.

If we look at sentence structures in different languages, it quickly becomes clear that string concatenation is bound to result in problems in at least one language. Some languages, such as German, require the verb to be at the end of the sentence. Some require different structures for negative constructions (think of ne ... pas in French), other languages require adjective agreement for gender and even for different cases.

Back in 1997 in a "Global from Day One" article in Byte, the authors made the point that stringing together the local string for "file", the local string for "error", and the local string for "has occurred" may not give the local string for "file error has occurred". Hall quotes the example of "%d long green blade(s) of grass", and points out that in many languages the adjectives ("long" and "green") must reflect the number used in the variable. He also gives an example in Polish, showing that plural agreements are not always as simple as one might think:

1 red armchair 1 czerwony fotel

2, 3, 4 red armchairs 2, 3, 4 czerwone fotele

5-20 red armchairs 5-20 cerwonych foteli

Well-intentioned programmers often used to try to save space by using variables in strings, rather than creating a number of similar complete strings that have to be stored for calling up by the program. Unfortunately, this approach often does not work across different languages. For example, in Developing International Software for Windows 95 and Windows NT, Nadine Kano gives the example of the following three sentences:
  • "Not enough memory to open the file FileName1."
  • "Not enough memory to save the file FileName1."
  • "Not enough memory to spellcheck the file FileName1."

Let's have a look at what happens when we translate these sentences into German, and notice how the italicised variables move around (they do not all follow the same pattern):
  • "Nicht genug Speicher, um DateiName1 zu öffnen."
  • "Nicht genug Speicher, um DateiName1 zu speichern."
  • "Nicht genug Speicher für Rechtschreibprüfung von DateiName1."

Most books and articles on this subject strongly recommend storing each sentence in its entirety, and I would fully endorse that approach.


Post a Comment

<< Home