Discussions on making data file access simpler

This page covers thoughts from the chatroom here and others on the wiki regarding the usefulness, or lack there of, of a trace like facility to abstracting data file accesses. Please add more comments (or code) as you see fit.


LV

My idea was this - I would love to be able to get at the password data file as well as other similar files without having to write my own proc. Sure, I could write it, debug it, then submit it to tcllib - but it still means that the next person who wants the group file, or some similar file, has to clone my code and the number of parallel routines grows.

And certainly a set of code could be generalized that does that sort of thing- I would like to find some sort of abstraction that would seem natural to tcl and make the use of it very painless (nearly transparent) to the user .


Initial chat discussion


lvirden: This morning on the drive to work I was thinking...

lvirden: are you guys familar with the perl concept of "tie"ing a variable?

bach: Nope.

bbh: don't know squat about perl ('cept that any I've looked at is ugly)

davidw: vaguely... I think that's sort of what jc is trying to do with tequila

lvirden: what tie is is a bit different from tequila

lvirden: What tie is an in infrastructure set of classes.

lvirden: The idea is to bind a variable to a package that then impments access methods for that variable.

lvirden: Accessing the variable triggers the appropriate method calls in the appropriate class.

lvirden: The purpose is to create a 'magic link' between say an array and a database.

lvirden: All they have to do is associate an array name with the right magic, and accessing the array for instance would cause the right database stuff to go on in the black box

bbh: sort of like trace

lvirden: Thus one can tie the password file, group file, oracle database, favorite csv, dbase II, excell spreadsheet, whatever to the array, and then use normal array references to get at the data

lvirden: bruce, it is like a trace, array, and code to do parsing. The neat thing is that 99% of the magic is done for the user.

davidw: I don't know, seems like bad voodoo to me

lvirden: All they have to do is to specify the array name and the type of database and the database name ...

lvirden: it is very good magic - it makes programming database accesses a dream.

lvirden: package require OracleArray

bbh: but not neccassarily efficient

lvirden:

    ::associate::oracle my_funny_table funtime

rmax: If you are doing just plain table access: yes, but what about joins, transactions referential integrity and so on?

lvirden:

    set abc funtime(john doe)

suchenwi: Sounds promising to me too.. don't the Pythonites have something similar with their "pickle" concept?

lvirden: rmax - the bare mechanism is designed for the millions of cases where those types of things are not needed

lvirden: richard, I believe so

lvirden: for specialized uses, one would either extend the concept for the developer to do a bit more work or just continue to use the more specific database access

lvirden: Right now though, think about how many people are rewriting parsing code and access code for structured data like password files, group files, etc.

bbh: would make a nice module fot tcllib...

lvirden: exactly my thought

davidw: tie is orthagonal to code reuse

suchenwi: No branches though.

davidw: suchenwi: or you could just use a database;-)

lvirden: perhaps my understanding of the word orthagoal is different than yours

lvirden: I always took it to mean 'opposing' to - while I see tie as being the HEIGHT of reuse

davidw: lvirden: maybe I'm using it incorrectly - the point being that tie doesn't help with code reuse

lvirden: Write the database access code once and everyone wins

lvirden: no - you were using it the way I though I guess. We just have differing interps of how reusable the code is!

suchenwi: Sure. My thoughts always start from a bare-bones Tcl environment (cause that's what I have at home...)

rmax: I think all that makes perfectly sense for plain file access, but it is only of limited use for non-trivial database access.

lvirden: I would find it very useful not to have to reinvent the database code over and over again

lvirden: even if data happens to be IN a relational database, there are no joins, views, rollbacks, etc.

lvirden: Others I know have other types of work.

davidw: well, from that point of view, writing a layer could be useful

davidw: but tie is just one way of doing it

    package require flatfilemanipulator
    ffm::init backend passwd
    ffm::array get usernames

that's a silly example, but you could do it lots of ways

suchenwi:

    ffm::tie arr backend passwd

suchenwi: -- no, I withdraw that.

lvirden: certainly there are lots of ways to do things. I wasn't advocating using the perl name 'tie'. However, I myself would rather access the data through a variable rather than through commands.

lvirden: with just a command to set up the associations and traces done to do the reads and updates

davidw: I think that by accessing it through commands, your code is more legible

davidw: having generic commands, though, is probably a good idea

lvirden: Sounds like we are 'orthogonal' <smile>

suchenwi: [set] is a very generic command ;-)

davidw: I guess if you put it in its own namespace, that might be another way of keeping things cleaner

bbh: the array access method has some advantages --- initially your data is just an in memory stuff you use arrays to keep it all later you want to persist the data - you "tie" your array to a flat file - no other code changes now, you want multiple instances to share the data - tie your array to a DB (or tequila like sierver) - no other code changes.

suchenwi: Right.

rmax: Good, but if you then find yourself to need more database features than plain read/write access, this model is at the end.

bbh: it's more about mapping simple data to persistent stores, than mapping persistent stores to data

rmax: How would you express something like

 SELECT * FROM worker, department
 WHERE workers.dept = department.id

in array accesses, Richard?

rmax: bbh: agreed.

suchenwi: In SQL ? ;-)

davidw: tying it to a variable is a neat trick, but what's the advantage over having a command to do the same thing?

suchenwi: To tell the truth, years ago I wrote Tcl procs SELECT, WHERE etc.. that implement a subset of SQL behavior, but drew their data out of.. a Tcl array.

bbh: similar to using $a instead of [set a]

rmax:

    set workers_by_dept $db(SELECT ...)

;)

suchenwi: so I could write in Tcl:

    SELECT FNM,LNM FROM db WHERE FNM=J* AND LNM=BR*

lvirden: david, if a simple set of commands can be written to be used regardless of the database underneath, then perhaps there would be no big deal

bbh: also easier to swap back ends - i.e. array access is a common singel API

lvirden: the idea is to simplify things down to the bare minimum needed to get the job done quickly.

suchenwi: so: set workers_by_dept SELECT...]

lvirden: I don't want to have to mess around looking up a dozen different namespaces to write code that reads through the password and group file, compares against the company phone book, and looks up the results in a couple of different other tables.

rmax: Yes, Richard. You can of course _simulate_ a SQL database wit Tcl to a certain degree, but if you spent $$$ on a large scale RDBMS you certainly don't want to degrade it to a set of plain ASCII tables.

lvirden: I just want to get the job done

lvirden: it isn't a 'high profile' project that has any champion to get it in place

lvirden: and no one is going to champion it.

lvirden: So we are stuck with a dozen or more sources of employee related data

davidw: having a couple of simple things, like flatfile::set will help you realize what is going on 2 years down the road, instead of scratching your head wondering about the weird stuff going on with some variable

davidw: IMO, at least

lvirden: And it sticks you with either leaving it as a flatfile, or rewriting your code.

lvirden: Neither of which are attractive options

lvirden: It is like saying we shouldn't have mega widgets, or other kinds of objects, because using lower level programming is 'more obvious'.

lvirden: I prefer seeing people write abstractions over common sets of tasks, making a) it more obvious the logic being implemented. And if those abstractions are abstract enough, they can be reused.

rmax: Re commands vs. variable access: How about a set of namespaces, that all define the same commands like (open, close, get, set, etc.). One could import the commands for the backend he is currently using into a namespace owned by the application and use that namespace in his code...

rmax: ... Then, if the backend or backend driver changes, it is only needed to import the set of functions from a different namespace.

suchenwi: Yes.. a set of namespaces that implement a defined interface.

suchenwi: so the interface could be:

    puts $db "SELECT..."
    gets $db data

rmax: Oops, sorry Richard, I have misread your "defined interface" as "different interface"

suchenwi: no, it should remain pretty indifferent, that's what definititons are good for ;-)

rmax: Yes, of course. That's why I complained.

rmax: I think, the perl DBI has such a concept.

suchenwi: This gets/puts approach can be with an [open |dbcli ...], so it's only for the parsing what the gets returns.

rmax: You can access verything from plain ASCII to Oracle databases with a single interface. But I on't know how they handle database specific things.

rmax: If the database in question already has a Tcl interface, it is not needed to start a second process and parse its output.


Interesting discussion (I wish my internet connection weren't metered - "chat" is off limits for me), this is precisely what I was aiming at in comments at the bottom of Better Arrays for Tcl9 about "virtualizing" Tcl's array model, which is not powerful enough today to mimic Perl's "tie" -- JCW

AK: Good that Larry made this talk persistent then, ne ? Jean-Claude, I will dig out our email converstation on this topic and post the relevant bits here.

info trace, and the current implementation of tcl, let the experienced tcler do the same kind of thing as an experienced perl hacker can do with perl's tie. The only thing I can imagine which is "missing" is the ability to take a reference to the tied variable and pass that around -- but in tcl the name is the reference, and if you want more anonymity you use an interp to hide things. The one other thing perl has over tcl which might be relevant in this context is a bigger collection of prewritten code (CPAN). But maybe I missed something?

AM I can think of two "natural" solutions for storing the data: ::struct::matrix in Tcllib and MetaKit. The first has the advantage of an all-Tcl solution and the second adds more flexibility and functionality "out of the box". Filling the tables or matrices by parsing some file will be the major part of the work. The matrix module already provides functions for printing/reporting and MetaKit offers permament storage, searching and other functions.

AK: It should be possible to either write a matrix implementation based upon Metakit, or to write other code transfering matrices into a metakit db and back.


Stu Netinfo "... provides access to system protocol, service, network and host information which is retrieved from system databases and made available for querying." might be of interest; it would be easy to adapt it for passwd and other similar type files.