My favorite language (code)

Internationalizing the a website, you run into a problem where you don’t know what strings you’ve parsed out for localization or not.

For these tasks, my favorite language is “zxx.” I use that code to replace all strings with some XX’s. Now any strings (or images) I missed are immediately evident.

Meetme Play — Tagged (in zxx_XX)

Keeping the zxx localization file up to date is easy to update…

// {{{ l10n_upgrader($pofile,$callback)
/**
 * Allows you to upgrade a PO file automatically.
 *
 * Search for all "" (two double quotes) and translate these.
 *
 * @param string $pofile the messages.po file to upgrade
 * @param string $callback the callback function that does the string lookup
 * @return array
 */
function l10n_upgrader($pofile,$callback='')
{
    if ($callback) {
        $data = file_get_contents($pofile);
        $matches = array();
        $data = preg_replace_callback("!^msgid "(.+?)"nmsgstr ""!m", $callback, $data); // if I add s it matches too aggressively
        //unlink($pofile);
        file_put_contents($pofile,$data);
    }
    // generate output
    //msgfmt message.po
    $exec = sprintf(
        'cd %s;msgfmt %s',
        escapeshellarg(dirname($pofile)),
        escapeshellarg('messages.po')
    );
   pass($exec); //erroring out this is an exec function
}
// }}}
// {{{ l10n_ZXXer($matches)
/**
 * Add a bunch of XXXX's to a string
 *
 * @param array $matches
 * @return string
 */
function l10n_ZXXer($matches)
{
    return sprintf(
        "msgid "%s"nmsgstr "%s"",
        $matches[1],
        str_pad('',strlen($matches[1]),'XXXX ')
    );
}
// }}}
l10n_upgrader($gt_dird.'zxx_XX/LC_MESSAGES/messages.po', 'l10n_ZXXer');

You may have some trouble getting things working, so remember that gettext uses your system to set the language. If your system doesn’t have zxx installed, you can’t just flip your locale over and expect it to work. As a quick hack for develement, just symlink the directory in your dev machine over to a language you aren’t using. I’m currently using “zu_ZA” which will be okay until Tagged localizes for Zulu.

Also remember that your web server caches gettext strings, so be sure to shut down the webserver to free the file lock before updating your .pot file.

Improving your linguisticness

l10n_ZXXer() above is not that clever. It won’t handle the cases where you have nested sprintf() substitutions. But you can imagine improving it. Here are some ideas of different fake languages:

  • Multiply strlen() computation by 1.2 in order to handle spacing issues for localizing to German.
  • Replace many of the letters with the letters in the cyrillic alphabet. This way you know unicode is correctly supported. Don’t replace all of them so your pattern replace can be easier and strings don’t get smashed.
    Tagged in fake russian
  • Replace the code to run it through a filter so it’s linguistically recognizable. Maybe turn your site into 133t-speak. Personally, I used ebonics from my Plaxo days. Careful though, while it is amusing to see your website littered with argot, fo’ shizzle. Apparently, I was committing a lot of cuss words into the code base. It’s times like that you’re grateful for the symlink trick above preventing your users from stumbling on these fake languages.
    Tagged Meetme in Ebonics
  • Replace the strings with a zombie filter. RRRghgghghh! Brainz! Plus it’d be nice when you’re the only usable website after the zombie-pocalypse.
  • Replace the strings with a Pirate filter. Besides not taking sides,in the age-old dilemma, you’ll have something ready to ship for the moment the CEO becomes the last person in the company to figure out that there is an International Talk Like A Pirate Day.

Aye! Прeттч soon ARRZn’HG Brains! XXXXXX wiff da bomb o’ us all ye damn hood ratz. 🙂

5 thoughts on “My favorite language (code)

  1. Jordi,

    Nice. It’s sort of like the fake Russian example above but for every character. I found that X’s are much easier to spot (for instance, you have immediately see gender, location and “yes” “no” weren’t localized above), but you might have to use memory to interact with the U.I.

    With websites it’s easier because you have an “out” with an entire operating system… not so much. So I can totally understand where the Windows team is coming from.

  2. That made me think, because I was trying the X’s in a flash project earlier, that you could just empty everything so any text you see isn’t localized. Of course in the web context if you can’t click on a link it can be unconvenient, but in Flash there is usually the underlying button that remains usable.

  3. Apparently it’s pretty trivial to add a locale:

    I added ebo_US like this:
    $ sudo localedef -f UTF-8 -i /usr/share/i18n/locales/en_US ebo_US
    and added zxx_XX by changing just the last argument:
    $ sudo localedef -f UTF-8 -i /usr/share/i18n/locales/en_US zxx_XX

    the -i is the input file which defines the locale, and -f is the charset, so basically this makes a copy of en_US.UTF-8

    Why do this? Well some of our people have their checkouts in Windows and the symbolic link doesn’t work on NTFS and subversion.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.