Greeklish

Richard Suchenwirth - Greeklish is a name used in the Web for Greek text written in Latin letters, in other words, a transliteration. See http://homepages.lycos.com/cast00/lypersonal/ for an example - which uses 8 for Theta and differs slightly from the encoding used here.

The following proc translates text in Greeklish to the appropriate Unicodes (cf. Unicode and UTF-8) for the Greek letters. Transliteration is mostly strict, i.e. a 1:1 mapping (that's why slight oddities like Q for Theta occur, but it can be memorized as "a circle with something at it"). I made one exception for the accented letters, which in Greeklish are written with trailing apostrophe.

 array set i18n_a2g {
    A \u391 B \u392 G \u393 D \u394 E \u395 Z \u396 H \u397 Q \u398
    I \u399 K \u39a L \u39b M \u39c N \u39d J \u39e O \u39f P \u3a0
    R \u3a1 S \u3a3 T \u3a4 U \u3a5 F \u3a6 X \u3a7 Y \u3a8 W \u3a9
    a \u3b1 b \u3b2 g \u3b3 d \u3b4 e \u3b5 z \u3b6 h \u3b7 q \u3b8
    i \u3b9 k \u3ba l \u3bb m \u3bc n \u3bd j \u3be o \u3bf p \u3c0
    r \u3c1 c \u3c2 s \u3c3 t \u3c4 u \u3c5 f \u3c6 x \u3c7 y \u3c8 w \u3c9
    ";" \u387 ? ";"
 }

 proc greeklish {args} {
    global i18n_a2g
    set res ""
    foreach {in out} {
        A' \u386 E' \u388 H' \u389 I' \u38a O' \u38c U' \u38e W' \u38f
        a' \u3ac e' \u3ad h' \u3ae i' \u3af o' \u3cc u' \u3cd w' \u3ce
    } {regsub -all $in $args $out args}
    foreach char [split " \n\t.,:;" ""] {
        regsub -all "s\[$char\]" "$args " "c$char" args
    } ;# change to word-final sigma at evident word ends
    foreach i [split $args ""] {
        if {[array names i18n_a2g $i]!=""} {
            append res $i18n_a2g($i)
        } else {
            append res $i
        }
    }
    return $res
 }

Example: [greeklish Aqh'nai] gives the Greek name of Athens.


Nov 15,2000 RS: added conversion of word-final "s" to "c", to produce the final lowercase sigma.


See also: The Lish family - Arts and crafts of Tcl-Tk programming