proc collatesort {list map} {
set l2 {}
foreach e $list {
lappend l2 [list $e [string map $map $e]]
}
set res {}
foreach e [lsort -index 1 $l2] {lappend res [lindex $e 0]}
set res
}Testing, Portuguese: % collatesort {ab ãc ãd ae} {ã a}
ab ãc ãd aeSpanish (ll sorts after lz): % collatesort {llano luxación leche} {ll lzz}
leche luxación llanoGerman (umlauts sorted as if "ä" was "ae"): % lsort {Bar Bär Bor}
Bar Bor Bär
% collatesort {Bar Bär Bor} {ä ae}
Bär Bar Borjima:To be precise, in what is normally known outside Spain as Spanish language (don't want to mess things up with Catalán or any other tongue spoken there) there is no letter accentuated with the ` character. Our tilde (that is the term we use for the graphical notation of an accent) is ´. Therefore, it should be luxación.If my precision is somewhat anoyying to anyone please, just delete it from this page. - RS: No, every correction is welcome. Fixed above - thanks!
RS: Even English data may have collation problems - if they contain ff, fi, fl, ffi, ffl ligatures. Then it might help to do
set sorted [collatesort $input {\uFB00 ff \uFB01 fi \uFB02 fl \uFB03 ffi \uFB04 ffl}]See also custom sorting
Category Language | Arts and crafts of Tcl-Tk programming
