Google Translation via http Module

I just wanted to use the Google Translation Services in an application managing english and non-english messages for UIs and so I tried a bit.

The code below is just raw and has some flaws:

  • any format field inside the text to be translated will confuse the translation service, so in my application I map any format field, like e.g. "%ld", with a uppercase token, to remap after the translation to the original format fields
  • the resulting translation may contain HTML entities, so I used the tcllib package htmlparse to replace the HTML entities with their original characters - so there is a dependency on this external tcllib package htmlparse

Please see the differences in the example translations returned by Google!

Much fun,

Martin male

 package require htmlparse;
 package require http;
 namespace eval ::googleTranslation {
     variable postUrl http://translate.google.com/translate_t?langpair=en%7Cde;

     http::config -useragent {Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.12) Gecko/20070508 Firefox/1.5.0.12};

     proc progress {token total count} {
         if {$total == 0} {
             set output  [format \
                 {-> Received from Google: %ld Bytes} \
                 $count \
             ];
         } else {
             set output  [format \
                 {-> Received from Google: %.1%lf %% (%ld Bytes)} \
                 [expr {double($count) / $total * 100}] \
                 $count \
             ];
         }

         puts stdout $output;
         flush stdout;

         return;
     }

     proc translate {text {source en} {destination de}} {
         variable postUrl;

         set query   [http::formatQuery \
             hl          en \
             ie          UTF8 \
             text        $text \
             langpair    $source|$destination \
         ];

         if {[catch {
             set post [http::geturl \
                 $postUrl \
                 -query      $query \
                 -progress   ::googleTranslation::progress \
             ];
         } reason] == 1} {
             error "couldn't translate a text via Google: $reason";
         }

         set success [regexp -- \
             {<div id=result_box dir=ltr>([^<]+)<} \
             [http::data $post] \
             whole translation \
         ];

         http::cleanup $post;

         if {$success == 0} {
             error "couldn't translate a text via Google: error returned from Google's translation service";
         }

         return [htmlparse::mapEscapes $translation];
     }
 }

Here some examples:

 % googleTranslation::translate "This is only a test!"
 -> Received from Google: 2668 Bytes
 -> Received from Google: 4098 Bytes
 -> Received from Google: 5528 Bytes
 -> Received from Google: 6958 Bytes
 -> Received from Google: 9100 Bytes
 -> Received from Google: 9100 Bytes
 Dies ist nur ein Test!

 % googleTranslation::translate "The Alarm '%s (%ld)' occurred"
 -> Received from Google: 2668 Bytes
 -> Received from Google: 4098 Bytes
 -> Received from Google: 5528 Bytes
 -> Received from Google: 6958 Bytes
 -> Received from Google: 9260 Bytes
 -> Received from Google: 9260 Bytes
 The Alarm "% s, (%" SVN_REVNUM_T_FMT) 'aufgetreten

 % googleTranslation::translate "The Alarm %s occurred"
 -> Received from Google: 2668 Bytes
 -> Received from Google: 4098 Bytes
 -> Received from Google: 5528 Bytes
 -> Received from Google: 6958 Bytes
 -> Received from Google: 8388 Bytes
 -> Received from Google: 9108 Bytes
 -> Received from Google: 9108 Bytes
 The Alarm% n aufgetreten

 % googleTranslation::translate "The Alarm occurred"
 -> Received from Google: 2668 Bytes
 -> Received from Google: 4098 Bytes
 -> Received from Google: 5528 Bytes
 -> Received from Google: 6958 Bytes
 -> Received from Google: 9093 Bytes
 -> Received from Google: 9093 Bytes
 The Alarm aufgetreten

 % googleTranslation::translate "the alarm occurred"
 -> Received from Google: 2668 Bytes
 -> Received from Google: 4098 Bytes
 -> Received from Google: 5528 Bytes
 -> Received from Google: 6958 Bytes
 -> Received from Google: 9093 Bytes
 -> Received from Google: 9093 Bytes
 Der Alarm aufgetreten

male - 2007-11-09:

Now, that I've tried a lot of translation Google seems to identified me as "automated" user and redirects each access to a special site to identify a graphical code to get the translation. So ... this example is not that good until Google releases its translation API, which is currently only available via the Google University Research Program for Google Translate [L1 ].

Does anyone else has an idea to translate via the Web, via http?


jokerozen - 2010-03-29 13:30:38

I'm trying to do this, but now with the ajax API ... your link about the Google University Research Program scary me a little, if their limit are 1000 translations a day, what is the limit for the API ? I didn't read anything about that. I'll have to check 4000 titles and articles abstracts and if not in English, translate it. I don't want to do it manually.

let's test it anyway, I'll first wait a few secs between each call ...

I'll post here my procs if everything works as expected.

first : Official documentation for Non-Javascript Environments