Fold in functional programming

See also fold, which discusses the same concept


What is a fold?


A fold is a way of getting a single value by operating on a list of values.

In the functional world (as opposed to the mostly imperative world of Tcl --see imperative programming), folding a list is the only way to "collapse" or "reduce" (or "fold") a list into a single value. Since the list is transformed into a single value, the fold function is a special type of transformer called a consumer.

The abstract reasoning behind list consumers is this:

  1. If I can do something to the first item in the list and the first item in the rest of the list
  2. Then I can do something to the whole list.

This is actually a very useful and important concept in computer science. Whenever you have anything that can be viewed as a list (whether or not it actually is a list), you can "consume" it using this principle.

As an example, let's consider summing a list of numbers. To do that, I just take the first item in the list and add it to the first item in the rest of the list. I keep adding-in the first item of the next "rest of the list" until there is no more list left. Let's see that in action.

  (1 2 3 4 5)       Here is my list.
  1   + (2 3 4 5)   I take the  first-item-of-the-list  and  the-rest-of-the-list.
  1+2 <-(_ 3 4 5)   I add  the-first-item-of-the-rest-of-the-list  to the initial first item.
  3   + (3 4 5)     This leaves me with a new  rest-of-the-list.
  6   + (4 5)       I do the same thing again.
  10  + (5)         And again.
  15  + ()          And again. Now I have no list left.
  15                Therefore, the sum is 15. (((1 +2) +3) +4) +5 = 15.

This is the essence of a fold operation.


Kinds of folds


Addition is associative, so it doesn't really matter in what order you add things. However, most things are not associative: the order in which the list is consumed is very important.

In the following examples, we have an initial value given to us (zero). The first "rest of the list" is just the list we start with. (That might seem silly, but in most functional contexts this is actually the simplest way to do it.)

Fold-left can be considered as operating on the list from the left:

  foldl + 0 (1 2 3)
  ((0 + 1) + 2) + 3
  (1 + 2) + 3
  3 + 3
  6

Fold-right works from the opposite end:

  foldr + 0 (1 2 3)
  ((0 + 3) + 2) + 1
  (3 + 2) + 1
  5 + 1
  6

There are two things to notice here. First, foldl works left-to-right, while foldr works in reverse. The second is the order in which our initial value was combined with the list. In both cases, the initial value was the first value, and the appropriate end of the list was the second. This is important.


Data type


The type of object in the list is generally considered homogeneous: it is a list of integers, or strings, or booleans, etc. However, the type of thing returned from the fold operation need not be the same type as the list elements. Consider a function that takes a string and a number, converts the number to a letter of the alphabet (where A=1, B=2, etc.), and returns the concatenation of the string and the letter. Invalid numbers represent spaces.

  proc decode {msg index} {
    # msg is a string. index is a number.
    if {($index > 0) && ($index < 27)} \
      then {set c [format %c [expr {65 +$index -1}]]} \
      else {set c { }}
    return $msg$c
    }

We can use this function to consume a list of numbers into a message.

  set message {8 5 12 12 15 -1 23 15 18 12 4}

  foldl decode "The message is: " $message
  --> The message is: HELLO WORLD

This works because the function used takes a string and a number and returns a string. The returned string is fed back into the function with the next number in the list, until the list is used up and all we are left with is a string. The only caveat is that the initial value must also be a string.

The thing to note is that the initial value, the function's first argument, and the function's return value must all have the same type. Each element in the list and the function's second argument must likewise have the same type. But these two types need not be the same.

To be complete, suppose we had consumed the list from the right?

  foldr decode "The message is: " $message
  --> The message is: DLROW OLLEH

Yeah!


Finally, let's see some code


With the advent of apply in Tcl 8.5 we can write a version of foldl that takes a true lambda. The lambda always takes exactly two arguments.

  proc ::foldl {lambda args} {
    set argc [llength $args]
    if {($argc > 2) || ($argc < 1)} {error {wrong # args: should be "foldl lambda ?init? ls"}}
    if {$argc == 2} \
      then {
        foreach {init ls} $args break
        } \
      else {
        set ls   [lindex $args 0]
        set init [lindex $ls 0]
        set ls   [lrange $ls 1 end]
        }
    foreach el $ls {set init [eval apply [list $lambda] [list $init] [list $el]]}
    return $init
    }

(NEM notes that you can drop the eval and just use

 set init [apply $lambda $init $el]

here.)

This version of foldl addresses one of the ideas mentioned by NEM over at the older fold page: you can choose to supply an initial value or have it automatically selected from the first item in the list.

Again recurring to Tcl 8.5's new addition of lreverse, we can express foldr in terms of foldl:

  proc ::foldr {lambda args} {
    set argc [llength $args]
    if {($argc > 2) || ($argc < 1)} {error {wrong # args: should be "foldr lambda ?init? ls"}}
    return [eval foldl [list $lambda] [list [lreverse [lindex $args end]]] [lindex $args end-1]]
    }

And here's that message function I showed you above:

  % foldl {{msg idx} {decode $msg $idx}} "The message is: " $message
  The message is: HELLO WORLD

  % foldr {{msg idx} {decode $msg $idx}} "The message is: " $message
  The message is: DLROW OLLEH

Hope you enjoy!

01-05-2007 Duoas

NEM thinks these examples should be:

 % foldl {{msg idx} {decode $msg $idx}} "The message is: " $message
 The message is: HELLO WORLD

(Duoas) Yes you're right. It was really late when I typed in the examples. I fixed them now.

I'd also again recommend against incorporating apply directly into these procedures, as it limits the commands that can be passed in, while not gaining anything in terms of readability. Instead, just use uplevel and a lambda-constructor:

 proc foldl {f z xs} {
     foreach x $xs { set z [invoke 1 $f $z $x] }
     return $z
 }
 proc invoke {level f args} {
     if {[string is integer $level]} { incr level }
     uplevel $level $f $args
 }
 proc lambda {params body} { list ::apply [list $params $body ::] }

You can then do:

 % foldl [lambda {msg idx} { decode $msg $idx }] "The message is: " $message
 The message is: HELLO WORLD

(Duoas) Using that you could drop the lambda altogether:

  % foldl decode "" $message
  HELLO WORLD

I have a problem with the fact that the supplied function is forced to operate at the top-level stackframe in the global namespace. But it does not limit the commands that can be passed in --only the command must be a Tcl lambda. Whether I say "{{msg idx} {decode $msg $idx}}" or just "decode", the functional constraints are the same. (However, it is a nice idea not to have to make a lambda.)

  foldl {f args} {
    ...
    foreach el $ls {set init [uplevel 1 $f [list $init] [list $el]]}
    return $init
    }
  proc lambda {formals body} {list apply [list $formals $body]}
  % foldl decode "" $message
  HELLO WORLD

  % foldl [lambda {msg idx} {decode $msg $idx}] "" $message
  HELLO WORLD

NEM I fixed invoke above to use the caller's scope. I prefer using it to using uplevel directly as it avoids the need for list-quoting arguments (the args magic takes care of that). As you say, using invoke/uplevel rather than apply directly allows you to just pass in decode rather than having to wrap it in a lambda, which is the point. There are a lot of advantages to avoiding explicit use of apply in this code: it works with 8.4; you can use more fancy invocation (e.g., simple closures and objects); and you can pass in "curried" commands, e.g.:

 dict set foo a b c 12
 foldl {dict get} $foo {a b c} ;# returns "12"

OK, dict get can do this already, but it illustrates the point. If you did still want to keep in apply, then you at least do not have to force the lambda to be a single argument like apply itself does. You can construct this lambda outside of the loop and then apply it:

 proc foldl {var body z xs} {
     set f [list $var [list expr $body]]
     foreach x $xs { set z [invoke $f $z $x] }
     return $z
 }
 foldl {x y} {$x + $y} {1 2 3 4 5}

This way you avoid all those ugly extra braces, while performance should be identical. Generally though I still prefer using a separate constructor proc.


Toto - 2016-05-12 07:03:26

foldr + 0 (1 2 3)

  ((0 + 3) + 2) + 1
  (3 + 2) + 1
  5 + 1
  6

rather:

 foldr + 0 (1 2 3)
  1 + (2 + (3 + 0))
  1 + (2 + 3)
  1 + 5
  6

Makes no difference with addition, but would not work with non symmetrical operation.