'''diff''' utilities determine and present the differences between texts. ** See Also ** [comparing files in tcl]: [Using Snit to glue diff, patch, and md5sum]: ** Tcl Tools ** [diff in Tcl]: [Another diff in Tcl]: [DiffUtilTcl]: [tkdiff]: [eskil]: A very nice graphical diff tool. [bindiff]: ** Non-Tcl Tools ** [http://formulasoft.com/afc.html%|%Active File Compare]: Proprietary. [CL] has received mild testimonials. There's no particular Tcl connection; it's just been valuable to me as a Tcl developer when working under Windows. [http://winmerge.org/%|%WinMerge]: Open-source, for Windows. A future cross-platform version is planned. [http://xdelta.org/%|%xdelta]: Open-source binary diff, differential compression tools, VCDIFF (RFC 3284) delta compression Jean-Samuel Gauthier: ...supports a simple but flexible callback interface to feed/extract data to/from the compressor. Tcl Examples included. ** description ** Most frequently, '''diff'' is a comparison of two files. When the output is text, the Unix tradition is to display the differences in terms of the changes made to the first file to achieve a file similar to the second file. Often in a [GUI] application, coloring or other techniques are used to convey more information about what changed. In some applications, entire lines are highlighted, while in other, particular characters are highlighted. ** Specialized Diff ** [Arjen Markus]: We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design: * Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines) * Split the lines into words and compare the words as either strings or as numbers. * By using [[string is float]] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required). This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format). ---- [Arjen Markus]: Question: would not this be a nice addition for the fileutil module in Tcllib? [GPS]: maybe it would... [Arjen Markus]: If so, it would benefit (in my opinion) from two custom procedures: * A procedure one can supply to compare the lines (for instance: ignore white-space or interpret numbers as numbers - my original problem) * A procedure to process the output (in a manner as [Tkdiff] does for instance) ---- [Arjen Markus]: A few thoughts for improving the performance: * Store the lines as {lineno content} * Sort by content (lsort has this ability via "-index") * Use binary search to replace the inner loop. This would bring back the number of iterations from O(N^2) to O(NlogN). But perhaps it is not worth the trouble :-) <> Glossary | Dev. Tools | File