Core Features

The basic concept for the internal features of the Snap! Websites is to only offer templates (no text). There are obviously limits to that scheme. Text is used everywhere and thus we need to include the text somewhere. However, we want to clearly separate layouts and other text agnostic data from content itself. The layouts (templates) are not translated, content is.

For sure, C++ has a real problems when it comes to writing a message. In most cases, people want to use the std::cout << "Something" << std::endl; syntax because it's C++. However, this means sentences are broken up in pieces and in many languages those are not translatable (at least not coherently.)1

Our current solution

We want to use a form that allows people to enter strings in all languages. The result will represent an XML file (although we do not have to have those translations in XML files per se.)

The interface should be able to share the data so all Snap! website installations can benefit from the translations from all installations.

The data can be saved in an XML file so that way we can use it with a QTranslator object. The translations should somehow be limited to what the website needs (i.e. one file per plugin, but still need mergin.)

The plugin should be able to save the data so as to put it in the plugin source so that way new installations can benefit from existing translations without having to download all of them each time.

For programmers, we may want to allow Qt Linguist to search for all strings defined as QObject::tr() so that way we can build a database of strings to be translated without having to copy / paste existing on screen strings. That being said, this is always a moving target and a better solution may be to capture all calls to tr() -- probably by using a different function name -- and save that data on the fly. Of course, that means certain error messages may end up never translated...

Problems with gettext()

In a simple C/C++ software, the use of the gettext() feature makes sense. There are several drawbacks to the gettext() interface, however:

1) It requires a pre-processing of the source code to generate the PO files; each time the source code changes, the PO files get out of date.

2) A message is generally defined in English in the code (to avoid problems with accents, etc.) and translated to other languages in each PO file. If the English sentence changes, even remotely (i.e. you added a period at the end of the sentence,) that translation is lost (newer version of gettext() should behave better than that now though.)

3) The software needs to load the PO files from somewhere or link those using a resource like feature (i.e. Qt resources.) Linking resources is limited since adding new languages requires recompiling everything.

4) Often considered a good thing about the gettext(), if you have two words/sentences in two different places that are exactly the same (i.e. "Click Here") then both get translated at the same time. At times this is wrong. In one context it may be translated one way and in another context needs to be translated another way (i.e. "Cliquez ici" ou "Cliquer là".)

5) gettext() is often referenced as a great tool because it can handle all plurals properly; this is true, but the syntactic needs make translations unreadable, and very difficult to manage when you change source messages all the time.

Solution 1 (Message Number)

1) Use a specific function to retrieve messages by number and which is defined in the base plug-in class that knows how to load the corresponding translation files;

2) Each file has ONE language; the text file uses UTF-8; the format goes like this:

#<message number>:<message>

For messages that include new lines, continue the message on the next line after one or more spaces.

#123:This is a long message.
     It continues on the following line.
#1000:A short message here.

3) Have a pre-processor to verify that all the messages defined in the C++ files are also defined in the message file (not required)

We still have the same problem of finding the translation files2. However, the translations do not get lost.

The message numbers may be used multiple times generating problems. While compiling the message files, we can detect those.

The search of a message is via a very small numeric index (index / offset in the message file) as the message number is limited to 32 bits and we don't expect to have over 4Gb of messages in all of Snap! the offset is also 32 bits. So the compiled message file goes like this:

32 bit magic word ('SMF1' with the '1' representing the file version)
32 bit representing the number of messages (count)
32 bit x count with indices
32 bit x count with offsets
at offsets the string, the length is computed using the next offset;
   (the last string goes to the end of the file)

Note that this means English like all of the other languages will have a penalty to be displayed and the message file could go missing.

The idea of having an array of indices instead of just offsets gives us the ability to number our messages with any 32 bit number instead of having to go from 1 to the total number of messages.

Solution 2 (XML)

Since we are using XML for all our output templates (well, all [X]HTML output at least), we can look into using XML for the translation. Oasis has developed an XML format called XLIFF which is used for that purpose.

There are tools on different platforms that can be used to translate sentences found in XLIFF files. For example, Pootle/Virtaal or WordForge.

We may also look into tools such as XLIFF Round Trip which allows for XML files to be transform to XLIFF files that can be translated and converted back to the original XML, but translated. This would mean we'd put all the text contents in XML files and not in C++.

Also that XML can be translated to a Qt Linguist compatible file using an XSLT file. That way we can make use of it in our code with the QObject::tr() function.

What we probably want is to have all the translation work done online directly in Snap! rather than having external tools. That way (1) we can be sure to be continuously comaptible with our own format; and (2) we do not force anyone to download some tool they have to learn on top of Snap! to do their translations (plus a desktop tool is not unlikely to not work right on everyone's computer.)

Content Translation

The Content feature includes a revision table which includes a language parameter. This is used to know which language that specific revision is written in.

The idea is pretty simple and is detailed in the Content Concept section.

1. An example of two sentences that can be translated to Russian when in a printf(), but not when using std::cout:

printf("Read %d files\n", total);
printf("New data were found in %d files\n", found);

Read %d files => Прочитано %d файлов
New data were found in %d files => Новые данные были найдены в %d файлах

std::cout << "Read " << total << " files\n";
std::cout << "New data were found in " << found << " files\n";

The C++ version has "files" at the end. However, the proper Russian translation is either файлов or файлах, as it is context sensitive.

2. This is not really true since we can use Qt resources, although we'd want to have the data in the Cassandra database so we can dynamically update it with time.