string match

Synopsis

string match ?-nocase? pattern string

Description

Determine whether pattern matches string, returning return 1 if it does, 0 if it doesn't. If -nocase is specified, then the pattern attempts to match against the string in a case insensitive manner.

string equal compares strings literally, but string match matches interprets a pattern expression and matches a string against that.

For the two strings to match, their contents must be identical except that the following special sequences may appear in pattern:

*
Matches any sequence of characters in string, including a null string.
?
Matches any single character in string.
[chars]
Matches any character in the set given by chars. If a sequence of the form x-y appears in chars, then any character between x and y, inclusive, will match. When used with -nocase, the end points of the range are converted to lower case first. Whereas {[A-z]} matches '_' when matching case-sensitively ('_' falls between the 'Z' and 'a'), with -nocase this is considered like {[A-Za-z]} (and probably what was meant in the first place).
\x
Matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in pattern.

Beware that the parsing of strings inside grouping [] is not particularly robust -- neither the manual, the tests nor the code takes pains to specify how to interpret combinations of \[]*?- inside brackets. If you need a character class which includes any of these special characters, you are probably better off with a regexp. (see also [L1 ]).

string match does not use the same code as glob

JJM - With the primary notable difference being that glob supports the notion of (optionally nested?) curly braces allowing for a logical OR-style operation in the pattern.

Example

string match *\\* "thistest \\" ;# -> true

Layers of Quoting

to match a single left bracket, the match pattern should be a backslash followed by a left bracket so that string match sees the left bracket as a literal character. One possibility is to place the backslash and left bracket in curly quotes so that Tcl leaves them alone:

string match {\[} {[}

Alternatively, the backslash could be preceded by a backslash and the left bracket could be preceded by a backslash:

string match \\\[ \[

Pattern Ending in Backslash

A pattern ending in a backslash doesn't match a string ending in a backslash. Bug?

string match a\\ a\\
# -> 0

2015-01-26: one can match explicitly against a backslash character, though:

string match {a[\]} a\\
# -> 1

MG The reason the first fails is that the backslash at the end of the pattern is eaten by the string match parser as an escape character, leaving the pattern as just 'a' and the string as 'a\'. You'd need to use

string match a\\\\ a\\
# -> 1

aspect was about to suggest "conventional escaping with backslash" but MG beat me to it!

The manual leaves this in nasal demon territory, so the simple answer is: don't use such strings for patterns. glob seems to have the same issue.

Another unspecified case:

string match {[ab} a
# -> 1
string match {[ab} b
# -> 1
string match {[ab} c
# -> 0

... I note also that a generic escaping routine for these patterns is not simply a matter for string map, as metacharacters lose their special meaning within groups (as [name redacted]'s example above shows).

AMG: [string map] should be fine, or [regsub]. If you want to match the literal sequence a[\], you would prefix each character other than the first with a \backslash, giving a[[\\\]. All [name redacted] is demonstrating is that backslash is not the only possible way to quote a metacharacter.

set needle {a[\]}
# a[\]
set pattern [string map {* \\* ? \\? [ \\[ ] \\] \\ \\\\} $needle]
# a\[\\\]
string match $pattern $needle
# 1
set pattern [regsub -all {[][\\*?]} $needle {\\&}]
# a\[\\\]
string match $pattern $needle
# 1

AMG: I think [string match] ought to throw an error when presented with an invalid pattern. Either that or update the documentation to correctly describe the current behavior when the pattern ends with an unterminated [...] sequence or an odd number of backslashes.


AMG: There's a bug in [string match] and related commands due to inconsistent interpretation of -] within a bracket expression.

(See the existing tickets for this bug: [L2 ][L3 ].)

Let's take the expression [AZ-]X]. I understand this to be a single bracket expression matching one character that is A, anything from Z to ], or X. But does Tcl agree?

% string match {[AZ-]X]} Z
1
% string match {[AZ-]X]} ]
1
% string match {[AZ-]X]} X
1
% string match {[AZ-]X]} A
0
% string match {[AZ-]X]} AX]
1

The last two runs above have a problem. A doesn't match unless followed by X].

Following a successful bracket match [L4 ], the pattern is advanced to the end of the bracket expression [L5 ]. The first bit of logic understands that ] isn't the end of the bracket expression if it's the end of a character range. The second bit of logic doesn't. Consequently, the bracket expression ends at a different character depending on which character of the bracket expression matches.

See Also