Finding translations

Sometimes you realize that you or your developer may have used the t() helper function a lot throughout your templates, snippets, and controllers. While this is good, at some point in time you need to find all these occurrences to provide a proper translation in your languages files.

The t() helper function may be used with one or two parameters, the first being a key for the translation, the second an optional string which will be inserted as a fallback if no translation is provided for the given key.

With the following command - which is in fact a combination of some common UNIX shell commands piped together - you can find all occurrences of t() and output the used translations in a convenient format. Although written and tested in a Linux bash shell, all commands can be used in a Windows environvent as well if you install the required tools on your machine.

The following example assumes that your present working directory is the site folder of your installation. For your convenience, the one-line command is put here with continuation lines split by every new pipe symbol () but still it is a one-liner. In case you are unfamiliar with "piping", this is a method that strings commands together so that the output of one command becomes the input of another command.

The command

$ find controllers snippets templates -type f -print0 \
 | xargs -0r grep -Eoh "\s+t\([^\)]+\)" \
 | tr \" \' \
 | sed -Ee "s/^\s*t\(s*(.+)\s*\)$/\1/g" \
 | sort \
 | uniq \
 | sed -e "/,/! s/$/,/" -e "s/\s*\.\s*\\$/$/g" \
 | sed -e "s/'//g" -e "s/\s*,\s*/: /"

On first sight, this might look as weird as if an armadillo had rolled across the keyboard. Lets dig through it step by step:

find controllers snippets templates -type f -print0

The find command searches through the three subdirectories controllers, snippets, and templates of the site folder (the present working directory), finds all entries which are a file (-type f) and prints them to the standard output (STDOUT), which is the terminal. -print0 is used here to make the following command immutable against weird filenames by printing them with a NUL as separator instead of a newline. You can execute this command by its own, replacing -print0 with -print, to see all the files found line by line.

As said above, the output of this first command is fed into the input of the next command, which is:

xargs -0r grep -Eoh "\s+t\([^\)]+\)"

The xargs command takes its input which is assumed to be NUL-separated (-0) and runs it piece by piece through the grep command. grep will search the content of all files and find all occurrences which match the regular expression pattern (\s+t$[^$]+\)). This pattern matches any sequence of whitespace characters (\s+), followed by t(, followed by any sequence of characters which do not contain ) until it reaches the ). The -o and -h options cause the output of grep to contain only the matching pattern line by line even if there are input lines which contain more than one matching pattern.

The next command, tr \" \', will transliterate every " character into a ' character to make the use of quotation marks unique.

Next, we want to get rid of the surrounding t(), which is a good task for the stream editor sed:

sed -Ee "s/^\s*t\(s*(.+)\s*\)$/\1/g"

By using the \s* construct multiple times, we will also get rid of some leading and trailing whitespace characters.

After that, the sort and uniq commands will first sort the output and then get rid of adjacent matching lines, because we might have used some translations multiple times.

Let's continue:

sed -e "/,/! s/$/,/" -e "s/\s*\.\s*\\$/$/g"

We might have used the t() helper with or without the second parameter. In the latter case, there is no separation character , in our function and therefore not in our remaining sequence of characters. This will be fixed by the first regular expression substitution "/,/! s/$/,/" which will append a , at the end if this character has not been found before (/,/!). The second regular expression substitution will take into account that we might have used variables ($var) in our translation key, and will get rid of the string concatenation operator (.), regardless whether it is surrounded by any amount of whitespace character, but only if it is followed by the $ character. The $ character will be kept.

The command will not replace the variable with real values. You will have to fix this yourself.

We are almost done! What's left is a nice output. With

sed -e "s/'//g" -e "s/\s*,\s*/: /" -e "s/\s*$//"

we get rid of the quote characters and replace the , by : which will result in a format suitable for the translations in YAML files. If you are using

sed -e "s/\s*,\s*/ => /" -e "s/\s*$/,/"

instead, the output format will fit into PHP files as array elements. In both cases however, we get rid of whitespace characters at the end of the string.

To redirect the output into a file, use > en.yml or > en.php respectively.

Let's have a look at the output. If I use the complete command sequence from the first code block, the output in a current project looks like this:

articles.filtered.$type: Articles filtered by
back.to.overview: Back to overview
external.links: External links
in:
keep.reading: Keep reading
message.success: Your message has been sent - thank you!
published.in: Category
related.pages: Related pages
similar.pages: Similar pages
submit: Submit
tagged.with: Tags
valid.category: Please provide a correct category
valid.email: Please enter a valid email address
valid.name: Please enter a valid name
valid.text: Please enter a text between 10 and 3000 characters

Remember, these are the used keys and their fallback values, if any. As you can see in the first line, we have a variable in a translation key which still needs our attention and replacement(s). And in line 4, there is no fallback value for the key in, which we probably want to provide.

In order to provide the translations in different languages, we just need to copy this file now and add or replace the English phrases.