Version 4 of Shuffle a file

Updated 2008-10-01 20:16:26 by ferrieux

Based on code from shuffle a list, here's a first draft of a program that shuffles a file - read in the file, turn it into a list, shuffle the list, then output the lines in the random order.

Note that from my initial test, the code is not right yet. The code needs to trim off the possible last empty element, perhaps setting a flag indicating whether or not there was a trailing newline.

Why bother with this? Sometimes having a set of data come into a program in a random order is useful for testing.

if { $::argc == 0 } {
	puts stderr "USAGE: $::argv0 filename"
	return 0
}

set fd [open [lindex $::argv 0] "r"]
set str [read $fd]
set lst [split $str "\n"]

 proc shuffle10a list {
    set len [llength $list]
    while {$len} {
	set n [expr {int($len*rand())}]
	set tmp [lindex $list $n]
	lset list $n [lindex $list [incr len -1]]
	lset list $len $tmp
    }
    return $list
 }

set str [join [shuffle10a $lst] "\n"]
puts $str

ferrieux A slight variation on this uses offset indexing to allow for very large files: first build a list containing the byte offsets of all beginnings-of-line in the file, then shuffle that list, and finally read back the lines with seek. Notice that tell is not even used, by sheer superstition (no perf measurements, sorry).

	puts stderr "(indexing...)"
	set fd [open [lindex $::argv 0] r]
	fconfigure $fd -translation binary
	set ll {}
	set off 0
	while {[gets $fd line]>=0} {
		lappend ll $off
		incr off [expr {[string length $line]+1}]
	}

	puts stderr "(now shuffling !)"
	foreach off [shuffle10a $ll] {
		seek $fd $off
		gets $fd line
		puts $line
	}