Thursday, December 13, 2007

Runet and the Plight of the Transnational Hipster

Over the summer, while I was immersed / (submerged) in Russian at Middlebury, I did a project about the Russian internet. This was a great project, because it basically consisted of me playing around on the internet and then writing about it.


Anyway! So there I was, hanging out on Runet, when I noticed something. The main defining feature of "Runet" is that it's written in Russian. That's it. That's the only thing required to gain admission to this nebulous and somewhat imaginary "neighborhood" of the Internet. But the Russian that's written actually comes in two flavors: Russian in Cyrillic characters (the word "Runet" in Cyrillic looks like "Рунет") and Russian in Latin characters.

Okay, you say. Not a problem. As long as Russians can read it, more power to them, right? In fact, Russian transliterated into the alphabet that English-speakers know so well seems almost cosmopolitan. Like, "sure, I learned English, but I prefer to use a hip amalgamation of my native tongue and your funny little letters which happen to be on all keyboards ever."

Except, it's not just the Russians who have to read these super-hip interlingual amalgamations. Search engines do, too. And search engines aren't exactly transnational hipsters.

When I was writing my own transliterated Russian this summer, actually, I did notice that Gmail would pick up on the Russian even when it wasn't in Cyrillic, and feed me back ads in Cyrillic Russian. However, this could be accounted for by the fact that some human knowledge goes into determining AdWords. Russian advertisers, especially in the States, are definitely aware of this interlingual phenomenon, and probably bid on AdWords written in both alphabets.

But purely algorithmic search engines run into difficulties. They're textual creatures, after all—especially when so much of the transliteration is informal and purely phonetic, i.e. not systematized. When half of Runet is written in one alphabet and half in the other, how do you search both simultaneously?

I never did solve that conundrum this summer. But I was reminded of it today by this article over at Google Blogoscoped about Yamli, a "transliterator" for Arabic-language web pages, which will transliterate Arabic written in English letters to proper Arabic script, and then feed that transliteration into a search engine. I agree with Google Blogoscoped that we're still pretty far from the ideal—a smart search engine that would understand all of these nuances and search in both the original letters the transliterated ones at the same time. I don't know if that's feasible, but I do know that the tenuous identity of Runet will be fragmented further if search engines, magical portals that they are, don't learn to recognize the unity of languages that just happen to be written in a couple of alphabets.