proc tr:to {cmd args} {
switch -- $cmd {
upper {regsub -all i $args \u0130 args}
lower {regsub -all I $args \u0131 args}
default {error "bad option '$cmd': must be upper or lower"}
}
string to$cmd $args
}
# usage examples
tr:to upper izmir ;# produces İZMİR
tr:to lower YILDIZ ;# produces yıldızNotice how the minor command, after filtering for correctness, is pasted into the string to$cmd call.See also Eurolish for easy input of Turkish diacritics, and The Lish family for the whole picture.An even worse anomaly, which is not correctly reversible, exists in German: the lowercase Eszet/scharfes S (ß, \u00DF) corresponds to two uppercase letters SS, but not all SS sequences may be lowercased to ß.
Greek Sigma: There are two different lowercase forms for the Greek letter Sigma, \u03C2 (used at end of word only) and \u03c3 (used in all other positions), but only one uppercase \u03a3 (the preceding \u03a2 is not used, so for software that wants to keep this distinction, it might be 'abused' for uppercase final Sigma...) RS
LV: Richard, has this special case been mentioned to Scriptics so that they might have the routines do the right thing without programmers having to special case things?RS: No. The problem is that there is no general solution. Even a system localized in Turkey would be wrong in always toupper/lowering as above, if dealing with filenames - imagine how much code would break (there's files like CONFIG.SYS...). The application must have the 'conscience' that a string is Turkish, and only then apply tr:to {upper,lower} to it.
KBK:Case conversion also is different in Dutch - where converting 'ijssel' to titlecase results in 'IJssel' (see Things Dutch).AM Alas, precious little software is aware of this - one culprit being MS Word (unless you take the pain to instruct it do the "right thing". At the beginning of a word any combination "ij" is to be capitalised as "IJ".
