Scan and modify text files

Difference between version 9 and 10 - Previous - Next
**Summary**
[Arjen Markus] (21 february 2006) I am facing a task of modifying a lot of text files 
in a rather mechanical way. 
I used to do this kind of things with [AWK], but Tcl lends itself for this too. 
It is just a matter of the right "little language". 

The task I am facing is not really interesting for anyone else, 
but the characteristics are fairly common:

   * Certain modifications are required for a particular part of the file
   * Some modifications apply to particular lines
   * Defining regular expressions to capture exactly the lines you need can be tricky. <<br>>So it is probably easier to do it in steps.

The script below allows you to delimit sections of a file by a start and a stop pattern. 
If the lines fall within a section, the associated script is run. 
To apply default processing (just copying to the output for instance), 
there is a fallback pattern - "otherwise". 

----
======tcl
 # modify.tcl --
 #     Yet another AWK-like utility. This one reads a file line by line
 #     and decides on the basis of patterns marking the beginning and
 #     end of a block of lines (section) what actions to take.
 #
 #     Note:
 #     - sections may overlap
 #     - what they do is up to you
 #     - special sections are: begin, end and otherwise
 #     - the command "nextline" causes the actions for any subsequent
 #       sections to be cancelled.
 #

 namespace eval ::Sections {
     variable section_number 0
     variable section_data   {}
     variable section_active {}
     variable nextline       0

     namespace export section begin end otherwise nextline scanfile

     proc _begin     {} {}
     proc _end       {} {}
     proc _otherwise line {}
 }

 # section --
 #     Define the beginning and end of a section and the actions to take
 #
 # Arguments:
 #     begin       The regexp pattern marking the start
 #     end         The regexp pattern marking the end
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::section {begin end actions} {
     variable section_number
     variable section_active
     variable section_data

     lappend section_data   $begin $end
     lappend section_active 0
     proc ::Sections::$section_number line $actions
     incr section_number
 }

 # begin --
 #     Define the actions for the beginning of a file
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::begin {actions} {
     proc ::Sections::_begin {} $actions
 }

 # end --
 #     Define the actions for the end of a file
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::end {actions} {
     proc ::Sections::_end {} $actions
 }

 # otherwise --
 #     Define the actions for lines not falling in any section
 #
 # Arguments:
 #     actions     The script to be run
 #
 # Result:
 #     None
 #
 proc ::Sections::otherwise {actions} {
     proc ::Sections::_otherwise line $actions
 }

 # nextline --
 #     Instruct the scanning procedure to skip all remaining sections
 #
 # Arguments:
 #     None
 #
 # Result:
 #     None
 #
 proc ::Sections::nextline {} {
     variable nextline
     set nextline 1
 }

 # scanfile --
 #     Scan the file, taking actions appropriate for the
 #     sections the line is part of
 #
 # Arguments:
 #     filename    Name of the file to scan
 #
 # Result:
 #     None
 #
 proc ::Sections::scanfile {filename} {
     variable section_number
     variable section_data
     variable section_active
     variable nextline

     set infile [open $filename r]

     _begin

     while { [gets $infile line] >= 0 } {
         set nextline 0

         set id -1
         set insection 0
         foreach {start stop} $section_data active $section_active {
             incr id
             if { $active } {
                 if { [regexp $stop $line] } {
                     lset section_active $id 0
                 }
             } else {
                 if { [regexp $start $line] } {
                     lset section_active $id 1
                     set active 1
                 }
             }

             if { $active } {
                 set insection 1
                 $id $line
                 if { $nextline } {
                     break
                 }
             }
         }
         if { ! $insection } {
             _otherwise $line
         }
     }

     _end
     close $infile
 }

 # main --
 #     Simple test case and demo
 #
 namespace import ::Sections::*

 begin {
     puts "List of procedures:"
     set ::count 0
 }

 section "^#.*--" "^ *proc" {
     puts "| $line"
     if { [regexp "#.*--" $line] } {
         set ::count 0
     }
 }

 section "{" "^#.*--" {
     incr ::count

     if { $line == "\}" } {
         # Naive criterium for the end of a procedure
         puts "(Number of lines: $::count)"
     }
 }

 scanfile $argv0
======
----

**Comments**
Very useful indeed ! I fixed a small bug: the "if {$insection} ..." test 
is better placed outside the foreach loop

----
an excerpt of the demo's output:

 | # scanfile --
 | #     Scan the file, taking actions appropriate for the
 | #     sections the line is part of
 | #
 | # Arguments:
 | #     filename    Name of the file to scan
 | #
 | # Result:
 | #     None
 | #
 | proc ::Sections::scanfile {filename} {
 (Number of lines: 46)
----[JM] (4 April 2024) Make sure you remove the 1 space at the beginning of each line (for wiki formatting) in this source when copy-pasting as this breaks the demo which uses:<<br>>
    "^#.*--"
    "^ *proc"
then, the extra space at the beginning of each line will not match these regex's
----

<<categories>>  File | String Processing