** Summary **
The '''array''' command manipulates [Tcl]'s array [variables], which can also
be manipulated using [Dodekalogue%|%array(key)] syntax.
** Synopsis **
: '''[array anymore]''' ''arrayName searchId''
: <
>
: '''[array donesearch]''' ''arrayName searchId''
: <
>
: '''[array exists]''' ''arrayName''
: <
>
: '''[array get]''' ''arrayName ''?''pattern''?
: <
>
: '''[array names]''' ''arrayName ''?''mode''? ?''pattern''?
: <
>
: '''[array nextelement]''' ''arrayName searchId''
: <
>
: '''[array set]''' ''arrayName list''
: <
>
: '''[array size]''' ''arrayName''
: <
>
: '''[array startsearch]''' ''arrayName''
: <
>
: '''[array statistics]''' ''arrayName''
: <
>
: '''[array unset]''' ''arrayName ''?''pattern''?
** Documentation **
[http://www.tcl.tk/man/tcl/TclCmd/array.htm%|%man page]:
** Description **
A Tcl array is a special kind of variable which acts as a container for other
variables, analogously to the way a directory in a filesystem is a special file
which acts as a container for other files. The variables in an array are
accessed by name using [Dodekalogue%|%array variable substitution syntax]. An
array can be used like a "hash" or "associated array" in other languages, as a
data structure that allows one to associate a name with a value value.
Using the `$arrayName(varName)` syntax, [variable substitution] can be used to obtain the values of individual
variables within in array, but it can not be used on the array itself, because
an array is an [handle%|%opaque handle]. For this reason, all `[[array]`
commands expect the ''arrayName'' rather than the array value itself.
A '''Tcl''', the array is really better termed a hash map (ala perl) or perhaps
a [[use snobol, awk, python, etc. term here - "associative array"]].
Array keys are not restricted to just numbers, but can be any string.
Array keys are not ordered. It isn't straight-forward to get values out of a
Tcl array in the same order that they were set. Of course, it is possible to
get the names and then order them.
** Similar Constructs **
[vector] ([BLT]): one-dimensional arrays of values
[TclX] keyed lists:
[NAP]: has some sort of array/vector data structure.
[dictionary]: unlike arrays, dictionaries are first-class values rather than [handles], and dictionary values are ordered by the sequence in which they were set
** See Also **
[Arrays / hash maps]: more details about arrays
[A simple database]:
[parray]:
[Arrays as cached functions]:
[Arrays of function pointers]:
[Memory costs with Tcl]: for measurement of array/list element consumption in bytes.
[Persistent arrays]:
[Procedures stored in arrays]:
[array name string matching extension]:
[GUI for editing a Tcl array]:
[Fitting a new Rear-End to Arrays]:
[foreach]:
[iterating through an array]:
** Creating an Array **
To create an array, `[[[set]]` a variable within the array using the
``arrayName(key)`` form:
======
set balloon(color) red
======
or use `[[[array set]]`:
======
array set balloon color red
======
To create multiple array keys and values in one operation, use `[[[array set]]`.
To create an empty array:
======
array set myArray {}
======
Array names have the same restrictions as any other Tcl variable.
Array keys can be any string. Leading spaces, punctuation, etc. must all match
exactly when using retrieving the value.
When referencing a specific array element, the element portion is considered a
part of the name. Thus, if the array name itself requires {} for variable
substitution, then the element reference will too.
That is to say:
======
array unset {this stuff}
set {this stuff(one)} 1
parray {this stuff}
======
A common beginner mistake is to "over-quote" the key:
======none
#warning, bad code ahead!
% set a("key") value
value
% array get a
{"key"} value
======
In Tcl, [everything is a string]. Quoting strings is mostly not necessary - and
can even be harmful, as in this example.
Quotes group words when they '''start with a quote'''. The "inner quotes" are
just kept as part of the value.
In the following example the quoting is redundant:
======
array set myArray {"element" "value"}
======
Better syntax would be:
======
array set myArray {element value} ;# or:
set myArray(element) value
======
More examples:
======
unset x ; # x doesn't exist at all anymore
unset x ; array set x {} ; # x exists as an array but has no elements
array unset x ; # x doesn't exist at all anymore
foreach idx [array names x] {
set x($idx) {}
} ; # array exists - all the elements still
# exist, but values of each element are now
# empty
======
======
array set colors {
red #ff0000
green #00ff00
blue #0000ff
}
foreach name [array names colors] {
puts "$name is $colors($name)"
}
======
** Retrieving the Value of an array key **
======
set value1 $ballon(color) ; # use a literal key
set value2 $ballon($key) ; # use a key within a variable
======
----
[Iain B. Findleton] 2004-06: asked whether there was easier ways to
dereference an array element, given that the name of the array was in a
variable and the array key was in a variable. His example was:
======
eval { set ${key}($item) }
======
[DKF] writes, Just remove the [eval] from the outside (it just confuses things)
and it'll be fine:
======
puts [set ${key}($item)] ;# Read
set ${key}($item) $val ;# Write
======
If you're in a procedure, use `[[[upvar]]` to create a local reference to the
array so you get something like this:
======
upvar ${key} v
puts $v($item)
set v($item) $val
======
It's possible to also do an `[[[upvar]]` pointer to an array element, but I
don't recommend it (for example, it fails if you decide to set key equal to
::env since env-var management is done via a whole-array trace).
** Determine whether a Key Exists **
======
info exists array(key)
======
See [info exists]
** Unset an Array **
======
unset balloon
======
`[[[array unset]]` provides a way to unset a subset of keys
** Incrementing an Array, Creating it if it doesn't Exist **
======
proc incrArrayElement {var key {incr 1}} {
upvar $var a
if {[info exists a($key)]} {
incr a($key) $incr
} else {
set a($key) $incr
}
}
======
** Simulating Multiple Dimensions **
There are no multi-dimensional arrays in Tcl but they can be simulated by a
naming convention:
======
set a(1,1) 0 ;# set element 1,1 to 0
======
This works if the keys used do not contain the ',' character. If the keys can
be arbitrary strings then one can use the [list] of the indices as index into
the array:
======
set a([list $i1 $i2 $i3]) 0; # set element (i1,i2,i3) of array a to 0
======
This is completely unambiguous, but might look a bit uglier than the comma
solution. Also remember that
======
set a([list 1 2 3]) 0
======
is equivalent to
======
set {a(1 2 3)} 0
======
but not to
======
set a(1 2 3) 0
======
since the last passes four argument to `[[[set]]`.
[AMG]: To implement multidimensional arrays, I often use the convention given
above (commas, not [[[list]]], but that's a good idea), but it prevents me from
easily getting a list of elements in any one dimension. For the following
array:
======
array set data {
foo,x ecks foo,y why foo,z zed
bar,x ECKS bar,y WHY bar,z ZED
}
======
I'd like some means to get a list {foo bar}. How is this useful? I have
written many server programs that use multidimensions arrays to keep track of
state for all connected clients. To get a list of all client IDs, I have
another variable or special array element listing the client IDs, but I have to
always keep it in sync with the rest of the array. I dislike this.
What if multidimensional arrays were accessed using
'''$name(dim1)(dim2)(dim3)''' syntax? Thanks to a bug, we once had
multidimensional arrays, but the syntax was of course very very weird (I think
it used [[[uplevel] 0]]). This is a bit cleaner-looking. But it has very bad
interactions with [['''array''']]. How would the following be converted to use
[[[array set]]]?
======
set data(foo)(x) ecks; set data(foo)(y) why; set data(foo)(z) zed
set data(bar)(x) ECKS; set data(bar)(y) WHY; set data(bar)(z) ZED
======
What should [[[array get] data]] return?
[Lars H]: Well, why don't you ask Tcl? :-) It would tell you that after the
above commands, [[array get data]] returns
======
bar)(z ZED foo)(x ecks bar)(x ECKS foo)(y why bar)(y WHY foo)(z zed
======
and (as an aid to help overcome one's prejudices about how the above should be
interpreted)
======
join [[array names data]] \n
======
returns
======none
bar)(z
foo)(x
bar)(x
foo)(y
bar)(y
foo)(z
======
This is a recurring problem with attempts to extend Tcl syntax: the "new
syntaxes" people come up with usually already mean something, even if that
"something" looks rather silly.
[AMG]: in response to Lars: Wow, I didn't realize Tcl would accept such
syntax! It turns out that I'm simply using ''')(''' as my dimension delimiter.
Alright, now let's think about how to get a list of all elements ''in a given
dimension.'' This is easiest to do if the array indices are proper lists:
======
array set data [list \
[list foo x] ecks [list foo y] why [list foo z] zed\
[list bar x] ECKS [list bar y] WHY [list bar z] ZED\
]
proc array_dimnames {array_var dim_index} {
upvar 1 $array_var array
set result [list]
foreach name [lsort -unique -index $dim_index [array names array]] {
lappend result [lindex $name $dim_index]
}
return $result
}
% array_dimnames data 0
bar foo
% array_dimnames data 1
x y z
======
That works. For other delimiters, each element of [[[array names]] needs to
be [[[split]] before the list can be passed to [[[lsort]]. Another job for
[[[lcomp]] I guess.
For really big arrays, such as the enormous '''MV''' catchall array used in
[OpenVerse], I wonder if this costs too much, so much that it's worth it to
separately maintain element lists rather than extract that information from the
array names.
[AMG]: Continued from before: [[[array names] data]] should return {foo bar},
but '''$data(foo)''' wouldn't be valid, breaking old assumptions. Should
[[[set] data(foo) dummy]] unset '''data(foo)(*)'''? And so on.
If array notation could be applied to [dict]s we'd be in great shape. Doesn't
[Jim] do this?
[Lars H]: Why don't you just use nested [dict]s? It seems those will do
precisely what you ask for above.
[AMG]: I can do some things with '''array'''s that I can't do with [dict]s:
namely, [trace]s and [upvar]s and everything else that uses those features.
So, I often use '''array'''s when I need to use elements as
'''-textvariable'''s. Perhaps I should be using [namespace]s instead,
preferably wrapped by [snit].
----
[LV]: Over on comp.lang.tcl, 2007-02, ''Fredderic'' provides the following
[proc] in response to someone who was to declare an empty array at the start of
a tcl script.
======
proc declare_array {arrayName} {
upvar 1 $arrayName array
catch {unset array}
array set array {}
}
======
The idea here is - catch the unset, in case the variable was not already
declared. Then, the array set makes the variable an array, but without any
members. That way, a later reference to the name in a non-array setting
generates a ''variable is array'' error.
** Memory Usage **
Arrays use more memory than lists. Arrays provide O(1) access due to their
hashtable nature, while lists provide O(1) access only for numerical indices.
----
Thinking about using arrays as sets got me wondering:
[escargo] 2002-11-11: Thinking about using arrays as sets got me wondering:
Assuming the keys are what is important to me, I would want to take up the
least amount of storage for the values. So, what's smallest? An integer (or
zero specifically)? An empty string? The key itself?
[Lars H]:
This is a very tricky question (especially since Tcl does not provide much for
[Introspection] into the matter). I had expected that ''any'' value (TclObject)
which already exists should yield the same result, but it seems to matter:
======none
Bytes allocated Code
--------------- ----------------
970752 for {set n 1} {$n<10000} {incr n} {set A($n) [expr 0]}
729088 set zero [expr 0]; for {set n 1} {$n<10000} {incr n} {set A($n) $zero}
729088 for {set n 1} {$n<10000} {incr n} {set A($n) 0}
729088 for {set n 1} {$n<10000} {incr n} {set A($n) {}}
1130496 for {set n 1} {$n< 10000} {incr n} {set A($n) $n}
970752 for {set n 1} {$n<10000} {incr n} {set A([format %d $n]) $n}
======
These measurements were essentially obtained by comparing the vsize (as
reported by ps) of tclsh before and after evaluating the above code, hence it
is rather crude.
[escargo]: Those last two seem ''strange!'' Why would having the pure string
as the name make such a difference in the storage? Makes me wonder what this
w'ould be.
======
??????? for {set n 1} {$n<10000} {incr n} {set A($n) [format %d $n]}
======
Also, isn't there a fence post error here? Shouldn't the range start with `set
n 0`? Otherwise I see 9999 instances being created, not 10,000.
[Lars H]: And 10000 instances would be more natural than 9999 for what reason?
We'r e just trying to see what's best, and aren't particularly concerned with
how good the best are.
As for that mysterious result when the key was used as value, I'm just as
surprised as you are. But try it yourself. The code used for obtaining the
measurements can be found on [Measuring memory usage]. I also set up [Compact
data storage] for discussing matters of this kind.
[escargo] 2002-11-22: I would think that 10000 would be more natural than
9999 just in terms of thinking about averages. I would rather mentally try to
divide a number by 10000 than worry about dividing by 9999.
[Michael Schlenker]: Trying to explain whats going on: Tcl arrays do not yet
use Tcl_Obj* for the array keys (some code for it is in the core but #ifdef'ed
out for compatibility reasons) instead they use char* as keys. So 10000 char*
are created, with the string reps for 1-10000 for the first 4 examples, but
with a larger string rep for the last two examples. Example 1 creates a new
Tcl_Obj for every entry, as it cannot easily be shared. Examples 2,3 and 4
create only one Tcl_Obj that is shared. Examples 5 and 6 create one unshared
Tcl_Obj for each entry.
[MS]: Starting from Tcl8.5 arrays keys are Tcl_Obj and not strings; also the
measurements above should be much improved in 8.5+
[Lars H]: I might add that the reason that example 5 is more costly than
example 6 is that each of the unshared objects in example 5 have a string
representation (generated when the argument A($n) of set is substituted),
whereas the unshared objects in example 6 do not ([format] makes do with the
internal representation).
** Efficiently Comparing Arrays **
[escargo] 2002-11-19:
What is the most efficient way to compare the contents of two arrays?
If '''array get''' had an option to specify the method and order of the
results, then a simpler comparison could be done.
In [Icon] a table can be turned into a list by its sort function, which can
return the results in one of four ways:
1. List of key, value pairs sorted by key.
1. List of key, value pairs sorted by value.
1. List of alternating key and value sorted by key.
1. List of alternating key and value sorted by value.
This puts the table into a known canonical order. There appears to be no way to
know that '''array get''' would linearize two arrays in the same way.
It makes me wish there was an '''array compare''' function that could easily
answer the question.
[Michael A. Cleverly] 2002-11-19: Here's an ''array compare'' type proc:
======
proc array-compare {array1 array2} {
upvar 1 $array1 foo $array2 bar
if {![array exists foo]} {
return -code error "$array1 is not an array"
}
if {![array exists bar]} {
return -code error "$array2 is not an array"
}
if {[array size foo] != [array size bar]} {
return 0
}
if {[array size foo] == 0} {
return 1
}
set keys(foo) [lsort [array names foo]]
set keys(bar) [lsort [array names bar]]
set keys(keys) $keys(foo)
if {![string equal $keys(foo) $keys(bar)]} {
return 0
}
foreach key $keys(keys) {
if {![string equal $foo($key) $bar($key)]} {
return 0
}
}
return 1
}
======
[Michael Schlenker]: If using Tcl 8.4 one can speed this up a bit, by
optimizing the lsort:
======
proc array-compare2 {array1 array2} {
upvar 1 $array1 foo $array2 bar
if {![array exists foo]} {
return -code error "$array1 is not an array"
}
if {![array exists bar]} {
return -code error "$array2 is not an array"
}
if {[array size foo] != [array size bar]} {
return 0
}
if {[array size foo] == 0} {
return 1
}
;# some 8.4 optimization using the lsort -unique feature
set keys [lsort -unique [concat [array names foo] [array names bar]]]
if {[llength $keys] != [array size foo]} {
return 0
}
foreach key $keys {
if {$foo($key) ne $bar($key)} {
return 0
}
}
return 1
}
======
[escargo] 2002-11-20: So, just to summarize: Arrays are equal iff ''(if and
only if)''
1. They are equal size.
1. They have the same ''names''.
1. For all the names the values (associated with each name in each array) are equal.
Is there a significant performance or space penalty for having to call [lsort]
external to '''array names''' instead of having '''array names''' have a
parameter that does the sorting internally?
The performance and space penalty is insignificant if [lsort] is used as in the
above example.
** Copying an Array **
[Lars H]: Usually using [[array get] and [[array set], like so:
======
array set copy [array get original]
======
** Passing arrays to procedures **
See [How to pass arrays]
** Why arrays are handles instead of values **
The simple reason is that that's not how they were implemented.
[RS]: Also, [array]s are collections of variables (so: not a value), and have
been in Tcl for a long time. Given modern [dict] and [namespace], they might
not even have been invented...
[RHS]: Would it be unreasonable to treat $arrayName the same as [[array get
$arrayName]]? One could shimmer between the array rep and the list (and other)
representations by how they are accessed. In that vein, you could do something
like:
======
set bob [list a 1 b 2 c 3]
puts $bob(a)
======
...and it would shimmer the list to an array. The only "gotcha" I can think of
would be that the list order might? change when you modified the variable as an
array, but I don't think that would be unreasonable.
I can see namespaces being the preferred method for encapsulation. Still not
understanding dict, I don't understand the pros and cons of dicts vs arrays for
randomly accessible hash type data structures.
----
[KJN]: Yes, [[array get $arrayName]] is a good string representation. What
makes me slightly uncomfortable is that Tcl has two types of compound variables
(lists and arrays) that are appropriate in different situations and need
different handling (with arrays arguably not first-class objects). I wasn't
aware of the [dict] (in Tcl 8.5).
This would be most useful if it could do everything that lists and arrays can
do now, so that lists and arrays can either be deprecated, or implemented in
terms of a dict.
[RS] protests - [list]s are the most versatile containers (for structs,
vectors, matrices, trees, stacks, queues, ...), while [dict]s are more
specialized (but can take over most jobs of arrays, except for traces on array
elements). I'd like to have both of them in the future :)
[LV]: Some might say that using lists for vectors and matrices is a bit like
using duct tape to hold a boat together... [BLT]'s vector data type is often
mentioned as being a useful data structure when vectors are intended for
visualization. Also I guess I misremembered dicts as having more restrictions
than just traces.
[RS]: Hm.. vectors are one-dimensional containers for elements - as are
[list]s. Matrices are two-(or more-)dimensional containers for elements - as
are lists of lists. Tcl lists are implemented in C as Tcl_Obj*[[]], costing ~12
bytes of overhead per list elements. Restricted vectors or matrices could be
implemented slightly more efficiently, but would needlessly enlarge the variety
of data types that is seen as a problem on this page. Tcl isn't an
extreme-performance language (C or Assembler are much better at that), but it
has great abstractions (like [list]s and [array]s) to boast.
So I'd not call [list]s just "duct tape", but rather: simple yet powerful
abstractions of containers. More like Swiss Army Knives :)
[AM]: I consider Tcl's lists to be very similar to C's arrays and Fortran's
one-dimensional arrays, with the added advantages of bound checking, automatic
size management and heterogeneous content. That makes them more versatile than
either of Tcl's arrays or dicts in many ways, but these have their advantages
too ...
Compare this to the wealth of data structures that is described in literature!
If you only look at the different ways of specialising tree structures! Of
course you can do a lot with just C-style arrays. But it does not mean that
other structures are not useful from time to time.
[DKF]: Tcl arrays have a lot in common with [Java]'s ''java.util.HashMap''
class, as to [dict]s. Tcl [list]s are more like ''ArrayList''s
[LV]: I guess I would see the connection between lists and vectors and
matrices easier if there were built in syntactical sugar to allow accessing the
elements of a list simpler. For instance, maybe something like
======
set l [list this is a series of vector elements]
set m [$l{3}] ;# Sugar for [lindex $l 3]
set l{2:4} [list not just any] ;# Results in [list this is not just any vector elements]
======
or something else that made things seem a bit cleaner.
[DKF]: Well, I'm thinking of [fitting a new rear-end to arrays] which might
make such things easier.
** Misc **
[AB]: Is there a boolean function or command that identifies if an index of an
array (or the element of a list) is empty? For instance, if xcrd(1) = {} , is
there a boolean function that'll take in xcrd(1) and return 1, confirming that
it's an empty index?
[LES]: Does that help?
======
proc isempty {foo} {
regexp {^([^(]+)\(([^(]+)\)} $foo => array key
global $array
if { [info exists $array] == 0 } {
return "$array? There is no $array array."
}
if { [array get $array $key] == "" } {
return "$array exists, but there is no $key key in $array array"
}
if { [ string length [lindex [array get $array $key] 1] ] == 0 } {
return "YES - [join "$array ( $key )" {}] exists and IT IS EMPTY"
}
return "NO - [join "$array ( $key )" {}] exists and IT IS NOT EMPTY"
}
======
Testing:
======
set xcrd(1) this
set xcrd(2) that
set xcrd(3) ""
puts [isempty xcrd(1)]
puts [isempty xcrd(2)]
puts [isempty xcrd(3)]
puts [isempty xcrd(4)]
puts [isempty blah(4)]
======
[MG]: The regexp above is actually a little wrong - [[set myArray(key() value]]
sets the "key(" element of myArray to "value", but LES's regexp won't
match it (or the 'empty variable', '''$()'''). You can even use [[set
array(key(name)) value]] and get an element in "array" called "key(name)". So I
think the regexp pattern would need to be''
======
regexp {^([^(]*)\((.*)\)$} $foo => array key
======
''though there's probably a hole in that, somewhere, too (and it's untested, at
20 to midnight, so may not do what I meant anyway ;)''
[MG]: offers an alternative which works on lists, as well as arrays. It
treats non-existant array elements as empty, rather than raising an error.
======
proc isempty2 {_var elem} {
upvar $_var var
if { ![info exists var] } {
return -code error "variable \"$_var\" is not set";
}
if { ![array exists var] } {
if { ![string is integer -strict $elem] } {
return -code error "second arg must be a number, for non-arrays";
}
set text [lindex $var $elem]
} elseif { ![info exists var($elem)] } {
return 1; # empty - element doesn't exist
} else {
set text $var($elem)
}
return [expr {$text == ""}];
};# isempty2
set list [list 0 {} 2]
set a(zero) "value"
set a(one) ""
set a(two) "value"
% isempty2 list 0
0
% isempty2 list 1
1
% isempty2 a zero
0
% isempty2 a one
1
======
When using a list, instead of an array, the second argument has to be a number.
----
[LV] 2006-11-16: Looks like Wikipedia's page on associated arrays covers only
the minimal aspects of Tcl's contribution
[http://en.wikipedia.org/wiki/Associative_array#TCL].
[AMG]: This section has been moved:
[http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(mapping)#Tcl].
<>
<> Category Command | Tcl syntax | Category Data Structure | Arts and Crafts of Tcl-Tk Programming