Performance of string reversing algorithms

Arjen Markus (13 december 2005) In the chatroom yesterday we discussed (or should I say mused over) the pros and cons of using Tcl for biomathematics. One of the subjects was how to reverse a string. Several possibilities have already been published on the Wiki, but this is a comparison of the performance of several on short and long strings.

I find the results remarkable: the behaviour for short strings is completely different than for moderately long strings ... well, judge for yourself.


 # Experiments:
 #    Reversing a string - several ways
 #
 proc rev_by_string {str} {
     set rev ""
     set length [string length $str]

     for {set i 0} {$i < $length} {incr i} {
         set rev "[string index $str $i]$rev"
     }
     return $rev
 }
 proc rev_by_append {str} {
     set rev ""
     set length [string length $str]

     for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} {
         append rev [string index $str $i]
     }
     return $rev
 }
 proc rev_by_list {str} {
     set rev ""

     foreach c [split $str ""] {
         set rev "$c$rev"
     }
     return $rev
 }
 proc rev_recursive {str} {
     set rev ""
     set length [string length $str]

     if { $length <= 4 } {
         return "[string index $str 3][string index $str 2][string index $str 1][string index $str 0]"
     } elseif { $length % 2 == 0 } {
         set half   [expr {$length/2}]
         set halfp1 [expr {$length/2+1}]
         return "[rev_recursive [string range $str $halfp1 end]][rev_recursive [string range $str 0 $half]]"
     } else {
         set half   [expr {$length/2}]
         set halfp1 [expr {$length/2+1}]
         return "[string index $str end][rev_recursive [string range $str $halfp1 end-1]][rev_recursive [string range $str 0 $half]]"
     }
 }

 # Test the procedures ...
 puts [rev_by_string   "abcdefg"]
 puts [rev_by_append   "abcdefg"]
 puts [rev_by_list     "abcdefg"]
 puts [rev_recursive   "abcdefg"]
 puts [rev_recursive   "abcdef"]

 # Measure the performance
 foreach len {1 10 100 1000 10000} repeat {10000 10000 1000 1000 10} {
     set str [string repeat ABCDE $len]
     puts "String length: [string length $str]"
     puts "rev_by_string:  [time {rev_by_string   $str} $repeat]"
     puts "rev_by_append:  [time {rev_by_append   $str} $repeat]"
     puts "rev_by_list:\    [time {rev_by_list     $str} $repeat]"
     puts "rev_recursive:  [time {rev_recursive   $str} $repeat]"
 }

Here is the result:

 gfedcba
 gfedcba
 gfedcba
 gfedcba
 fedcba
 String length: 5
 rev_by_string:  7 microseconds per iteration
 rev_by_append:  8 microseconds per iteration
 rev_by_list:    11 microseconds per iteration
 rev_recursive:  15 microseconds per iteration 
 String length: 50
 rev_by_string:  58 microseconds per iteration
 rev_by_append:  44 microseconds per iteration
 rev_by_list:    48 microseconds per iteration
 rev_recursive:  152 microseconds per iteration 
 String length: 500
 rev_by_string:  707 microseconds per iteration
 rev_by_append:  403 microseconds per iteration
 rev_by_list:    480 microseconds per iteration
 rev_recursive:  1561 microseconds per iteration 
 String length: 5000
 rev_by_string:  16722 microseconds per iteration
 rev_by_append:  3671 microseconds per iteration
 rev_by_list:    13910 microseconds per iteration
 rev_recursive:  13971 microseconds per iteration 
 String length: 50000
 rev_by_string:  1051731 microseconds per iteration
 rev_by_append:  37300 microseconds per iteration
 rev_by_list:    1032877 microseconds per iteration
 rev_recursive:  137735 microseconds per iteration

The solution with append is the definitive winner, but quite surprisingly the solution with recursion, which is slow for short strings, is a good second for long strings.

Lars H, 14 Dec 2005: Why on earth do you find it surprising that the recursive solution comes in second for long strings? This is precisely the result a trivial complexity analysis of the procedures would produce! Remember that string concatenation (as performed in e.g. commands like

  set rev "$c$rev"

above) is linear in the sum of the sizes of the strings concatenated. This implies rev_by_string and rev_by_list both do an O(n) operation at each iteration of the loop, giving a total O(n^2) complexity. rev_by_append on the other hand takes advantage of the fact that append is on average an O(1) (constant time) operation, whence the total complexity there is a mere O(n). rev_recursive manages an O(n log n) total complexity through the same recursive halving trick as a classical mergesort -- most of its concatenations are on short strings, so even though there are the same number of concatenations as in rev_by_string, the average cost is much lower. See also quadratic time.


RS The canonical recursive solution would be

 proc rev_rec2 str {
   expr {[string length $str]<2? $str: 
   "[string index $str end][rev_rec2 [string range $str 0 end-1]]"} 
 }

but that hits the recursion limit when strings get longer than 500 or so...


JMN The following two solutions show that a list-based approach is not necessarily slow as suggested by the above timings. These two appear to be ever so slightly faster than the append solution for medium & large lists.

 proc rev_by_lset {str} {
    set len [string length $str]
    foreach c [set str [split $str ""]] {
        lset str [incr len -1] $c
    }
    return [join $str ""]
 }
 proc rev_by_lappend {str} {
    set len [llength [set str [split $str ""]]]
    while {$len} {
        lappend res [lindex $str [incr len -1]]
    }
    return [join $res ""]
 }

For a teeny extra oomph in rev_by_lappend, preallocate the res list:

 proc rev_by_lappend_2 {str} {
    set len [llength [set str [split $str ""]]]
    set res [lreplace [lrepeat $len ""] 0 end] ;# == {}
    while {$len} {
        lappend res [lindex $str [incr len -1]]
    }
    return [join $res ""]
 } 

The saving then comes from not having to periodically realloc the internal list structure as it gets lappend'd to. Whether this saving outweighs the cost of calling lreplace & lrepeat... well YMMV :-)

 % proc main {} {
        set N 10
        foreach len { 999 9999 99999 999999 } {
                set STR [string repeat "abcde" $len]
                time { rev_by_lappend $STR } 1 ;# flush the timing
                puts "STR len [string length $STR]"
                puts "1 : [time { rev_by_lappend $STR } $N]"
                puts "2 : [time { rev_by_lappend_2 $STR } $N]"
        }
 }
 %
 % main
 STR len 4995
 1 : 1652.4 microseconds per iteration
 2 : 1608.6 microseconds per iteration
 STR len 49995
 1 : 17144.6 microseconds per iteration
 2 : 17155.3 microseconds per iteration
 STR len 499995
 1 : 170775.5 microseconds per iteration
 2 : 167460.7 microseconds per iteration
 STR len 4999995
 1 : 1729771.0 microseconds per iteration
 2 : 1726618.5 microseconds per iteration

See also: [string reverse]


bsg This is all kind-of moot since the core does it already, but I wanted to add the obvious combination missing from above. Combine the append version with the list version:

proc rev_by_append_list {str} {
    set rev ""
    set length [string length $str]
    set lstr [split $str ""]
    for {set i [expr {$length-1}]} {$i >= 0} {incr i -1} {
        append rev [lindex $lstr $i]
    }
    return $rev
 }

This variation mixes in nicely with the others.

Tcl patchLevel: 8.5.6
String length: 5
rev_by_string_reverse:  1.8265948 microseconds per iteration
rev_by_append:  3.2407781 microseconds per iteration
rev_by_string:  3.1731415000000003 microseconds per iteration
rev_by_list:  6.9053605 microseconds per iteration
rev_by_append_list:  6.885185099999999 microseconds per iteration
rev_by_lappend:  7.940962699999999 microseconds per iteration
rev_by_lset:  7.6243322000000004 microseconds per iteration
rev_by_lappend_2:  8.6450237 microseconds per iteration
rev_recursive:  10.6051011 microseconds per iteration
String length: 50
rev_by_string_reverse:  2.2105929 microseconds per iteration
rev_by_append_list:  17.4244228 microseconds per iteration
rev_by_append:  18.8571488 microseconds per iteration
rev_by_lappend:  20.859946400000002 microseconds per iteration
rev_by_lappend_2:  21.3999276 microseconds per iteration
rev_by_lset:  22.044439 microseconds per iteration
rev_by_list:  27.5186891 microseconds per iteration
rev_by_string:  32.9303397 microseconds per iteration
rev_recursive:  96.4504848 microseconds per iteration
String length: 500
rev_by_string_reverse:  2.172858 microseconds per iteration
rev_by_append_list:  103.158742 microseconds per iteration
rev_by_lappend_2:  123.131986 microseconds per iteration
rev_by_lset:  149.26255600000002 microseconds per iteration
rev_by_lappend:  162.248617 microseconds per iteration
rev_by_append:  165.596969 microseconds per iteration
rev_by_list:  243.793305 microseconds per iteration
rev_by_string:  369.550072 microseconds per iteration
rev_recursive:  961.125404 microseconds per iteration
String length: 5000
rev_by_string_reverse:  5.58321 microseconds per iteration
rev_by_lappend:  1132.712496 microseconds per iteration
rev_by_append_list:  1182.556687 microseconds per iteration
rev_by_lappend_2:  1392.8486950000001 microseconds per iteration
rev_by_lset:  1466.0263300000001 microseconds per iteration
rev_by_append:  1639.695497 microseconds per iteration
rev_by_list:  2891.902023 microseconds per iteration
rev_by_string:  3694.58081 microseconds per iteration
rev_recursive:  10026.851468 microseconds per iteration
String length: 50000
rev_by_string_reverse:  39.9818 microseconds per iteration
rev_by_lappend_2:  15873.6687 microseconds per iteration
rev_by_lappend:  16857.907199999998 microseconds per iteration
rev_by_append:  17876.612100000002 microseconds per iteration
rev_by_append_list:  24683.5453 microseconds per iteration
rev_by_lset:  29211.918199999996 microseconds per iteration
rev_recursive:  112051.476 microseconds per iteration
rev_by_list:  159145.1982 microseconds per iteration
rev_by_string:  164660.5375 microseconds per iteration