Scan and modify text files

Summary

Arjen Markus (21 february 2006) I am facing a task of modifying a lot of text files in a rather mechanical way. I used to do this kind of things with AWK, but Tcl lends itself for this too. It is just a matter of the right "little language".

The task I am facing is not really interesting for anyone else, but the characteristics are fairly common:

  • Certain modifications are required for a particular part of the file
  • Some modifications apply to particular lines
  • Defining regular expressions to capture exactly the lines you need can be tricky.
    So it is probably easier to do it in steps.

The script below allows you to delimit sections of a file by a start and a stop pattern. If the lines fall within a section, the associated script is run. To apply default processing (just copying to the output for instance), there is a fallback pattern - "otherwise".


 # modify.tcl --
 #     Yet another AWK-like utility. This one reads a file line by line
 #     and decides on the basis of patterns marking the beginning and
 #     end of a block of lines (section) what actions to take.
 #
 #     Note:
 #     - sections may overlap
 #     - what they do is up to you
 #     - special sections are: begin, end and otherwise
 #     - the command "nextline" causes the actions for any subsequent
 #       sections to be cancelled.
 #

 namespace eval ::Sections {
     variable section_number 0
     variable section_data   {}
     variable section_active {}
     variable nextline       0

     namespace export section begin end otherwise nextline scanfile

     proc _begin     {} {}
     proc _end       {} {}
     proc _otherwise line {}
 }

 # section --
 #     Define the beginning and end of a section and the actions to take
 #
 # Arguments:
 #     begin       The regexp pattern marking the start
 #     end         The regexp pattern marking the end
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::section {begin end actions} {
     variable section_number
     variable section_active
     variable section_data

     lappend section_data   $begin $end
     lappend section_active 0
     proc ::Sections::$section_number line $actions
     incr section_number
 }

 # begin --
 #     Define the actions for the beginning of a file
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::begin {actions} {
     proc ::Sections::_begin {} $actions
 }

 # end --
 #     Define the actions for the end of a file
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::end {actions} {
     proc ::Sections::_end {} $actions
 }

 # otherwise --
 #     Define the actions for lines not falling in any section
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::otherwise {actions} {
     proc ::Sections::_otherwise line $actions
 }

 # nextline --
 #     Instruct the scanning procedure to skip all remaining sections
 #
 # Arguments:
 #     None
 #
 # Result:
 #     None
 #
 proc ::Sections::nextline {} {
     variable nextline
     set nextline 1
 }

 # scanfile --
 #     Scan the file, taking actions appropriate for the
 #     sections the line is part of
 #
 # Arguments:
 #     filename    Name of the file to scan
 #
 # Result:
 #     None
 #
 proc ::Sections::scanfile {filename} {
     variable section_number
     variable section_data
     variable section_active
     variable nextline

     set infile [open $filename r]

     _begin

     while { [gets $infile line] >= 0 } {
         set nextline 0

         set id -1
         set insection 0
         foreach {start stop} $section_data active $section_active {
             incr id
             if { $active } {
                 if { [regexp $stop $line] } {
                     lset section_active $id 0
                 }
             } else {
                 if { [regexp $start $line] } {
                     lset section_active $id 1
                     set active 1
                 }
             }

             if { $active } {
                 set insection 1
                 $id $line
                 if { $nextline } {
                     break
                 }
             }
         }
         if { ! $insection } {
             _otherwise $line
         }
     }

     _end
     close $infile
 }

 # main --
 #     Simple test case and demo
 #
 namespace import ::Sections::*

 begin {
     puts "List of procedures:"
     set ::count 0
 }

 section "^#.*--" "^ *proc" {
     puts "| $line"
     if { [regexp "#.*--" $line] } {
         set ::count 0
     }
 }

 section "{" "^#.*--" {
     incr ::count

     if { $line == "\}" } {
         # Naive criterium for the end of a procedure
         puts "(Number of lines: $::count)"
     }
 }

 scanfile $argv0

Comments

Very useful indeed ! I fixed a small bug: the "if {$insection} ..." test is better placed outside the foreach loop


an excerpt of the demo's output:

 | # scanfile --
 | #     Scan the file, taking actions appropriate for the
 | #     sections the line is part of
 | #
 | # Arguments:
 | #     filename    Name of the file to scan
 | #
 | # Result:
 | #     None
 | #
 | proc ::Sections::scanfile {filename} {
 (Number of lines: 46)

JM (4 April 2024) Make sure you remove the 1 space at the beginning of each line (for wiki formatting) in this source when copy-pasting as this breaks the demo which uses:

    "^#.*--"
    "^ *proc"

then, the extra space at the beginning of each line will not match these regex's