set list [split [string map [list $substring $splitchar] $string] $splitchar]Now, what character could you choose for $splitchar? One fascinating choice is \u0080 - it is part of a region of the UNICODE character map that is more or less forbidden or reserved. It means that it is very unlikely to be present in the original string (unless that is a binary string of course, in which case most if not all bets are off, but splitting binary strings is a rare and dangerous thing anyway).If you need to split on a substring that may vary (for instance a sequence of one or more empty lines), check out the [split_re] method in Tcllib.
WJP (10 August 2006) \u0080 is fairly safe but you can't be quite sure since it is a legal Unicode control character. A better choice is \uFFFE or \uFFFF. Both are guaranteed not to be characters and so are absolutely safe.
JMN 2006-11-02
My timings indicate the above method is about 10x faster than textutil::splitxHowever.. Tcl's split alone on a single-char separator is 4x faster again.I'd love to see a multi-character 'split' in the core. [string split] perhaps?Lars H, 2008-07-18: Thinking about that same idea, I believe the following syntax may be appropriate for a split extended that way:
- split text ?string list ...? ?chars?
split {1+1-3+4-2} + plus - minusto get1 plus 1 minus 3 plus 4 minus 2or
split {1+1-3+4-2} + {} - minusto get1 1 minus 3 4 minus 2However, one could probably use clever combinations of regexp -all, regsub, and/or string map to get this effect as well, so it's no great leap in expressive power.
