Updated 2012-07-09 10:48:38 by RLE

Richard Suchenwirth 2001-06-21 - A Python user from Latvia asked how to make a Tk widget display Cyrillic (Russian) characters when typed into with a normal Latin (English) keyboard.

This is related to the more general problem of so-called "input managers", software that maps keyboard input to widget output in a not exactly trivial way. For each platform, there are existing solutions, but we don't have a generalized approach in Tk yet. So here comes my tiny "Tk input manager", or briefly tim, which does first steps into that direction.

As required, the first application is for Cyrillic; but the principle is clear enough to add other mappings with little effort (for a mapping foo, add a proc foo and a setupfoo). One problem is that the Cyrillic alphabet contains more than the 26 [A-Z] characters of the Latin one, so a sort-of "dead key" approach had to be taken. In the sketch below, I disable the exclamation mark, but use it in the beginning of two-character sequences that produce one Cyrillic character each (see Ruslish for a discussion of this approach). To get one real exclamation mark, add an extra space behind it (thanks to rmax for that tip!)

The widgets (text or entry - as both accept the insert insert method) created with the prefix tim::Russian get the Russian bindings prepended to their bindtags list; besides, they inherit everything from the original widgets.

A real input manager needs of course some more work. For example, allow toggling the keyboard encoding at runtime; display the current scheme, etc. Most challenging however is the extension to writing systems with a large character set: Chinese, Japanese, Korean... See taiku goes multilingual
 namespace eval tim {
        proc Russian {type w args} {makeit Russian $type $w $args}

        proc makeit {name type w argl} {
                variable know
                if ![info exists know($name)] {setup$name} 
                eval $type $w $argl
                bindtags $w [concat $name [bindtags $w]]
                set w
        proc setupRussian {} {
                variable know; set know(Russian) 1
                foreach {in out} {
                    ! "{}" !<space> !
            A  \u0410 B  \u0411 V  \u0412 G \u0413 D \u0414 E \u0415
            !Z \u0416 Z  \u0417 I  \u0418 J \u0419 K \u041A L \u041b
            M  \u041c N  \u041d O  \u041e P \u041f R \u0420 S \u0421
            T  \u0422 U  \u0423 F  \u0424 X \u0425 C \u0426 !C \u0427
            !S \u0428 !T \u0429 Q  \u042a Y \u042b H \u042c !E \u042d
            !U \u042e !A \u042F !O \u0401
            a  \u0430 b  \u0431 v  \u0432 g \u0433 d \u0434 e \u0435
            !z \u0436 z  \u0437 i  \u0438 j \u0439 k \u043a l \u043b
            m  \u043c n  \u043d o  \u043e p \u043f r \u0440 s \u0441
            t  \u0442 u  \u0443 f  \u0444 x \u0445 c \u0446 !c \u0447
            !s \u0448 !t \u0449 q  \u044a y \u044b h \u044c !e \u044d
            !u \u044e !a \u044f !o \u0451
                } {bind Russian $in "%W insert insert $out; break"}
 #------------------------------- demo and test code, usage examples
 # Hint: type in e.g. "Moskva i Leningrad - dva gorody Rossii".
 # or: "!A ne zna!u nicevo, a ne ponema!u nicevo."
 tim::Russian entry .e
 tim::Russian text .t
 eval pack [winfo children .] -fill x

Dimitry Golubovsky - If you have a X-windows keyboard switch set up to generate keycodes for non-Latin characters (e. g. Cyrillic) you may also use the following approach.

Put the following someplace in your script to be executed at startup:
 foreach {keysym unichar} {
   <Cyrillic_yu> \u044e
   <Cyrillic_a> \u0430
   <Cyrillic_be> \u0431

 # ... Put pairs of <Keysym> and \u Unicode character value
 # ... for all the characters you want to map: this is not
 # ... limited to Cyrillic
   <Cyrillic_SHCHA> \u0429
   <Cyrillic_CHE> \u0427
   <Cyrillic_HARDSIGN> \u042a
 } {
   bind . $keysym \
     [concat "catch \{" \
             {[focus -displayof %W] insert insert} \
             $unichar \
             "\}" \

Technically, the whole <keysymdef.h> may be processed (perhaps manually), so Unicode mapping may be set for each keysym.

The map contains pairs of event symbolic code and Unicode character value. The iterator walks over the map and binds each event to the insertion of the corresponding Unicode character at the current insertion point of widget in focus. The insertion is embraced by catch, so if any error occurs (like the widget in focus does not support "insert insert") error popups supposedly will not appear.

Scripts bound to the root window are constructed for each event code mapped. The trick is that [focus -displayof .] must be evaluated when the script is invoked, but $unichar must be supplied at the time of binding. Therefore concat is called to build the script from parts.

This method might be more generalized if 'event generate' made it possible to create events substituting %A (Unicode character value). Currently, only %k and %K (of ones relevant to this issue) are supported. The nature of the issue seems to be in the X locale setup when by some reason keysyms outside Latin-1 are not translated into Unicode properly by Xlib.

To find out whether your keyboard switch is set up to produce correct key codes, use the method described in bind, topic by KBK (You can find the keysym ... ).

I switch my keyboard using xmodmap, and the beginning of my .Xmodmap file looks:
 !        Key   Base              Shift           Mode    Mode+Shift
 keycode  24    = q               Q               Cyrillic_shorti       Cyrillic_SHORTI
 keycode  25    = w               W               Cyrillic_tse          Cyrillic_TSE
 keycode  26    = e               E               Cyrillic_u            Cyrillic_U
 keycode  27    = r               R               Cyrillic_ka           Cyrillic_KA
 keycode  28    = t               T               Cyrillic_ie           Cyrillic_IE
 keycode  29    = y               Y               Cyrillic_en           Cyrillic_EN

[Peter Schweitzer] 2006-03-24 I'm puzzled by a line above, within proc setupRussian:
           t  \u0442 u  \u0443 f  \u0444 x \22323u044b h \u044c !e \u044d

Is "\22323" correct syntax? Also the article Ruslish seems to show the value \u0445 corresponding to "x"; should it be different here?

RS - no, seems like some chaos paste, in which a row of the table got lost. Fixed - thanks!