Names, packages, versions, builds

The aim of this page is to provide a spot to discuss the various ways in which Tcl apps are named, versioned, structured, and deployed -jcw


To kick off with a contentious statement (and start this mini-rant): I think there are too many ways to modularize things, and that flexibility is a disadvantage:

  • packages are versioned entities, grouping scripts and shared libs into directories
  • files are frequently used to collect related code together for easy editing
  • directories often only play a role as the basic unit in which packages are split
  • namespaces prevent global name clashes, and introduce a hierarchy

One trouble spot is naming conventions: package "Aha" version 1.2 may live in directory "aha1.2" (or "Aha-1.2", or "myaha"), the main script may be called "aha.tcl" (or "Aha.tcl", or "main.tcl"), it may or may not define its own "::aha" namespace (or "::Aha", or ":AHA"), and it may or may not define procs by the name "aha" (or "Aha", or "AhaInit/AhaDo/etc").

Some may say it's no big deal. The mechanism is deterministic, it works, anyone can figure it out.

But I think it adds nothing but trouble when re-using components, or merging them from different sources. One lookup to the docs is not enough. First "package require <what?>", then call <what?> in namespace <what?>, then docs are <where?>. Not to mention tracking upgrades at <what-was-that-url?>.

Take "expect". Who doesn't instantly associate this insanely great system with the name of its creator, Don Libes? Of course - but now the nitty gritty of finding it, getting it, using it, inspecting it. Not that expect is in any way an issue for me, i just wanted to use a well-known package as example. Oops, it's not a package, it's a custom-tclsh usually. Well, there you go, see? - yet another way to modularize.

And it'll get worse, far worse. Binaries anyone? What if we start deploying compiled code - which is inevitable IMO (ask anyone on the most used platform in the world). Luckily, Tcl now has stubs - so at least the issue of matching the Tcl version is gone.

What about net-based deployment? What if proc no longer end up living in files, but somewhere in repositories (in files, databases, or whatever someone else decides)? Ever thought of generalizing arrays as repositories for procs? It's one way in which the Tequila shared global array server has been evolving.

What about the granularity of versioning? Do you want to track an entire app, with all the package versions it is known to work with? Or single procs, which are just plain handy to tweak and re-used all over the place? And what if it fails, to what level does one revert?

And here's a new issue I ran into, which prompted this page in fact. There are now several versions of the MD5 algorithm. In Tcl, in Trf, in critlib, and there used to be one in TclKit. They are not identical, but as so often a combination of trade-offs and different coding styles. What does one do to use MD5 with maximum flexibility in allowing users to determine what code base they are combining. Some might not have a compiled version, others might already be using a specific one. Unfortunately, "package require <name?> <version?>" does not solve it. One version may need a trivial wrapper to work like the other one. Who writes and maintains that wrapper, and where does it end up? What if someone else wrote a wrapper too? Is there a gently motivating structure in place to move towards synergy? Should there be? Can there be?

Another example of things that I find hard. I have a package A, version X. It works, but requires compilation. I have another package B, version Y, which tries to emulate A in pure Tcl. It's not perfect, but it runs anywhere, so for some cases it becomes the only option. A continues to evolve, so X increases while A probably remains compatible but gains more features. B continues to be improved, so it emulates A more and more (which X?). How does one number X and Y? But that's a detail compared to the naming issue: given that B emulates A, what names does one use for the namespaces and procs they define? In the end, the application wants to use A, and preferably never notice that there is a fallback B in place sometimes. The worst possible outcome would be that the application ends up with lots of special-cased calls.

It's not a trivial issue. Some functions may work fine, others may not work at all with the emulation mode. Either intentional / by design, or because the emulation is incomplete.

Who has a solution, which not just offers a way out to for the above - but also provides a broad context for all such issues? It seems to me that having all the package, auto_path, auto_index, namespace hierarchy in the world is - from this perspective - more of a hindrance than a help. Tcl has been around for ages, and huge projects have been built with it. But how?

-jcw


RS currently has this perspective:

  • Distinguish internal resources (what the interpreter has available already) from external (what's out in the world). Namespaces are internal, the rest refers to external.
  • auto_path is used for tclIndex and pkgIndex files. Simple, well-known
  • packages are the way to go for serious development (they already hide details like pure-Tcl/compiled extensions)
  • auto_index is a simple but sometimes dangerous facility. For years it has been described as deprecated in the Welch book, while packages are recommended
  • So I'd say: pack stuff in packages of same name as the namespace they go in. Install packages next to tcl/lib, so you don't have to worry about auto_path. Keep things simple.