The Regexp engine of the TCL language should be improved in following ways: * '''Fixed character width:''' The code assumes the text being searched is a C-vector of chrs, i.e., the bytewidth of characters is fixed (in normal builds to 2 bytes), despite regular expression matching in principle being content with sequential access. One disadvantage of this is that data to be matched frequently has to be converted from Tcl's primary UTF-8 string representation to a monowidth representation. Another disadvantage is that this blocks some approaches for extending Tcl's Unicode support to characters beyond the BMP. Therefore, the width of a character should be dynamic. * '''Convert to tcl coding style:''' When originally incorporated into the Tcl core, further upstream development of the regexp engine was expected, and so it was admitted despite not adhering to the Tcl Style Guide and not being as readable as the Tcl core in general. Today there is no upstream development, so it should be brought in line with the rest. * '''Implement stream interface:''' Make it possible to run the engine on streams of characters being delivered by a callback. * '''Implement lookbehind constraints:''' The engine supports lookahead constraints (?=...), but not lookbehind constraints (?<=...). It should support both. * '''Implement reversion:''' The reverse of a regular language is also a regular language, so there is a theoretical foundation for an RE syntax or option meaning "this regexp is to be read backwards". Reversion has a practical application in backwards searches. * '''Improved performance and/or memory usage:''' ''To be specified'' Is there any reason not to use PCRE? [Larry Smith] **Schedule:** Start date ('''May 23''') '''May 30''' - Getting used to the code/Rewrite the code to use tcl's coding style '''June 6''' - Getting used to the code/Rewrite the code to use tcl's coding style '''June 13''' - Change "Fixed character width is assumed" '''June 20''' - Implement stream interface '''June 27''' - Implement lookbehind constraints '''July 4''' - Implement regexp reversion '''July 11''' - Improve performance and/or memory usage '''July 18''' - Improve performance and/or memory usage '''July 25''' - Improve performance and/or memory usage '''August 1''' - Improve performance and/or memory usage '''August 8''' - Improve performance and/or memory usage End date ('''August 17''') **About me:** [http://danielkloeck.wikidot.com/ %|%Daniel Klöck's portfolio%|%]