[AMG]: I guess [RS] wanted this page, so here goes... :^)

[RS]: Oops, sorry, what I meant was [Shuffle a list] ...

[AMG]: Ah, well, at least I got to write another un/sorting story page.

This page discusses shuffling algorithms and implementations thereof.

----
**Version 1: sorting random reals**


----The conflict in Afghanistan is not only a struggle for power and resources between competing parties even though it is often cast in ethnic terms, and there are identifiable interest groups at work. It is also, for a broad cross-section of stakeholders, a legitimacy crisis stemming from a system of power and patronage distribution that is inherently unstable by encouraging the perception of negative sum politics. The regime are perceived to be related to individuals, factions, parties, and their lack of legitimacy, the logical response for many is not necessarily to make big changes to state structure or institutions a focus more often found among Afghanistan’s foreign partners or a few organized opposition parties. Many MPs – especially incoming ones and those involved in “third line” or independent politics – activists, women, and journalists, call for “a fresh start with new spirit and new blood for statecraft in Afghanistan…people who have not accumulated wealth through war, did not participate in war and do not see their interest in continuation of conflict” or “a fundamental change in the government from leadership to the bottom.
**Version 2: sorting random integers**

For starters, the real sort bothers me.  I imagine sorting by integer would be more efficient, but then again producing random integers is (currently) slower than producing random reals.  Change rand() for int(0x7fffffff * rand()) and -real for -integer and it winds up taking longer.  See the benchmark below.  (By the way, how am I supposed to portably get the maximum and minimum integer?)

Next, having up to three copies of the list (one of which is augmented) appears to be a waste of RAM.

----
**Version 3: in-place swaps**

So would it be better to do an in-place shuffle?  For each element of $data it could swap with another, randomly-selected element of $data.  I'll take a shot at it:

 proc shuffle {data} {
     set length [llength $data]
     for {set idx_1 0} {$idx_1 < $length} {incr idx_1} {
         set idx_2 [expr {int($length * rand())}]
         set temp [lindex $data $idx_1]
         lset data $idx_1 [lindex $data $idx_2]
         lset data $idx_2 $temp
     }
     return $data
 }

Big improvement.

----
**Benchmarking 1, 2, and 3**

Here's my magical test data:

 set data [lrepeat 100000 x y z z y]

If you're trying this at home from your interactive tclsh, you might want to type "; puts -nonewline {}" after the end of the above line or else its printed result will scroll for a very long time.

Now I [time] the shuffles:

 ..................................................
 :                    :   400 MHz   :  1500 MHz   :
 :....................:.............:.............:
 : 1, random reals    : 21.847024 s :  9.541117 s :
 : 2, random integers : 24.649960 s :  9.857447 s :
 : 3, in-place swaps  :  7.650484 s :  2.328508 s :
 :....................:.............:.............:

Wow, I think I will go with #3!  Hold on while I update [unsort]... :^)

----
**Version 4: in-place swaps, better random behavior**

[AMG]: Per [Lars H]'s suggestion, I improved the code again.  As Lars explains in [unsort], the old code doesn't have a perfectly uniform distribution because, for any given list, the size of the set of permutations from which it chooses is not a multiple of the size of the set of all possible permutations.  This is similar to doing "random() % 3" in C and getting 0's more often than 2's.

 proc shuffle {data} {
     set length [llength $data]
     for {} {$length > 1} {incr length -1} {
         set idx_1 [expr {$length - 1}]
         set idx_2 [expr {int($length * rand())}]
         set temp [lindex $data $idx_1]
         lset data $idx_1 [lindex $data $idx_2]
         lset data $idx_2 $temp
     }
     return $data
 }

That ought to do it!  However, it's (theoretically) a little bit slower due to modifying more variables in the inner loop.  I also made a less-readable version that [incr]'s idx_1 rather than recalculating it with [expr], for comparison's sake.  Here are the numbers, using the same benchmark.

 ..................................................
 :                    :   400 MHz   :  1500 MHz   :
 :....................:.............:.............:
 : 4 , more random    :  8.428660 s :  2.286190 s :
 : 4a, less readable  :  7.824165 s :  2.163990 s :
 :....................:.............:.............:

For reasonably-sized lists (under 500,000 elements, I guess), the speed difference between #3, #4, and #4a is negligible.  So I choose #4, which has a more uniform distribution and doesn't have grody sources due to silly optimizations.

I wonder why the 400 MHz results for #4 and #4a are slower than for #3 yet the 1500 MHz results are the other way around...?  I guess this proc is random in more than one way.

----
**More?**

Here's another algorithm.  Basically you randomly select an element from the input list, remove it from the input list, and place it at the end of the output list.  The reason I thought of it is, for [unsort] it's not necessary to maintain an output list; simply [puts] the elements as they come.

But this algorithm involves a lot of copying--- [lset] is nice for in-place modifies, but there's no in-place element removal.  So it's liable to be slower than #4.

----
!!!!!!
%| [Category Algorithm] | [Category Performance] |%
!!!!!!