Version 16 of diff

Updated 2005-03-01 12:53:46 by lwv

The word diff , in many computer circles, refers to the concept of comparing two items and displaying, in some manner, the differences between the two items. Most frequently, it is a comparison of two files. If the output is in text, the Unix tradition is to display the differences in terms of the changes made to the first file to achieve a file similar to the second file.

Often in a GUI application, coloring or other techniques are used to convey more information about what changed. In some applcations, entire lines are highlighted, while in other, particular characters are highlited.


See diff in Tcl

The code that was here was crap (according to the author) and has been removed.

Arjen Markus We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design:

  • Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines)
  • Split the lines into words and compare the words as either strings or as numbers.
  • By using [string is float] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required).

This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format).


Arjen Markus Question: would not this be a nice addition for the fileutil module in Tcllib?

GPS maybe it would...

Arjen Markus If so, it would benefit (in my opinion) from two custom procedures:

  • A procedure one can supply to compare the lines (for instance: ignore white-space or interpret numbers as numbers - my original problem)
  • A procedure to process the output (in a manner as Tkdiff does for instance)

Arjen Markus A few thoughts for improving the performance:

  • Store the lines as {lineno content}
  • Sort by content (lsort has this ability via "-index")
  • Use binary search to replace the inner loop.

This would bring back the number of iterations from O(N^2) to O(NlogN). But perhaps it is not worth the trouble :-)


See also Using Snit to glue diff, patch, and md5sum.


Category Glossary Category Dev. Tools