Spent a good chunk of yesterday at OLPC re-teaching myself how to program in Python while making a library translator. I'm beginning to learn about internationalization, and spent far too long reading about gettext and now know why there are so many underscores inside printfs (print statements in the C language).
The short answer is that they allow substitution of strings within programs - for instance, _("Hello!") transforms into "Hello!" when you tell your software to build with English, and "Ba'ax ka wa'alik!" when you tell it to build in Mayan. When it's called, gettext performs a lookup-subsitution in the .po files you give it, which are basically lists of phrases you've got inside your program, written in a bunch of different languages.
.po files are apparently very widely used because they're so simple (they're literally just lists of translations of phrases in your program), but they break down when you try to translate larger bodies of information because there's no way to build structure into them. Every message has to have an unique ID. If you're building something on the scale of - say, the OLPC library, that's thousands and thousands of reference numbers floating around with no way to categorize them. Ouch. Okay, you could specify the format of the msgid and parse it to create a structural hierarchy, but... really, there must be a better way. Any ideas?
As part of translating the Biology page (read: while putzing around before writing the actual code that would translate the Biology page) I tried experimenting with XML to replace .po's and it seems to work. Kent Quirk pointed out that Python dictionaries would do the same thing (and they're what I'm using now), but the benefit of XML is that you can enforce template compliance with a DTD to make translations less random and spotty (like, you can't put in three different translations for the title in Spanish, you have to pick one) and it also fails more gracefully in case someone messes up while hand-editing... and translators are going to be hand-editing these files.
There's also Wikipedia's approach to multilingual coordination, which (as best I can tell) is "put lots of links to variants in different languages and let people figure it out and hand-link it themselves." Weirdly enough, each language's wiki is completely separate from the rest, meaning I'd have to create a separate account to edit the Chinese and English wikipedias. I see the rationale behind the split for content organizing reasons, but the inability to merge different accounts must be annoying for frequent translators there.
But yes. I learned a ton, I had a lot of fun, I started flaking the rust off my high-level programming fingers again (resulting in painfully slow coding yesterday, but it's getting better) and I'm going back. Hopefully over time my helpfulness-output will start outweighing my asks-stupid-questions-that-take-time-to-answer input. Hurrah, laptops!
Subscribe to:
Post Comments (Atom)
11 comments:
Okay, you could specify the format of the msgid and parse it to create a structural hierarchy, but... really, there must be a better way. Any ideas?
If you *were* going to structure the msgid, you might want to use quad trees. Which are really cool.
Cool! I've seen quadtrees (or quadtree-like objects) used in collision detection algorithms before, but had no idea they had a name. It would go a long way towards keeping the hierarchy from getting unbalanced.
I wonder whether any taxonomy of human knowledge can remain naturally balanced, though. Seems like information growth is a weirdly unpredictable organic thing that defies attempts to put it in boxes. Wikipedia is messy, but it works (and I think the messiness is a big part of why it works, but that's a different topic).
Siasat main-main Poker, dalam main-main poker elemen situs bandarq online terpercaya awal yg mesti diperhatikan yaitu card ditangan
Jangan sampai segera membawa judi online terpercaya ketentuan terhadap melaksanakan fold.
Sedang melakukan gertakan mungil situs judi qq seandainya pun di atas angin.
Tidak hanya itu saja jikalau member poker qq lakukan kejelekan sejumlah 3 kali.
Melihat bersumber rekening setiap murahqq delegasi bandarq online selalu berubah – berubah yang tidak menentu waktunya.
Bagi melaksanakan deposit situs bandarq online yg betul. Tiap-tiap pemain diwajibkan bagi lakukan testimoni rekening maksud yg repot sebelum jalankan transaksi.
Rekening mewarisi limit daftar campionqq surat kabar. Bank mewarisi ganjalan. Bank sudah tidak repot / blokir.
Rintangan terhadap masterdomino saat login web yang terdftar.
Akun bersama kiat otomatis daftar maindomino99 terpalang dan terblokir pada sementara.
Post a Comment