Updated 2011-04-29 03:56:18 by RLE

Working on making use of MetaKit within the Tcl Web Server, also known as TclHttpd...

Jeff Smith wrote - I am heading down this path also. I would like to share my idea and hopefully get some feedback as to whether this is feasible.

I hope to create a Database so my Squid proxy/cache server can authenticate users before they go to the Internet. I will use TclHttpd wrapped as a starkit installed on the Squid box to provide the server front end to the Database. The Squid client authentication program will be a Starkit using the http package to delivery the "username" and "password" to TclHttpd for authentication. Also I want my users to be able to change their own passwords, so that should be possible via a form on the TclHttpd server. Also an "Administrator" form to add more users. I don't know if this can all happen at the same time, I am reading the MetaKit documentation at the moment and noticed there is an issue about concurrency!

Down the track as I become more familiar with Metakit, parse(is that the correct term) the Squid access.log file with Tcl into individual user db files for reporting or billing etc. and have them accessible via TclHttpd. Users should be able to access there own reports on TclHttpd using the same authentication scheme as above.

Finally if this all works, share this with the Squid Community, promote Tcl and demonstrate what I think is a very powerfull combination, TclHttpd and Tclkit.

PS A simple trick to safely update the squid user database is to have two copies, one for reading only, the other for updating. Assuming you are using unix and squid re-opens the database every time (it is changed), simply [file copy -force workingcopy tmpcopy] and [file move -force tmpcopy readingcopy] everytime you update your working copy and the changes are guaranteed to be atomic for the squid process. Assuming your user database is small, this adds almost no overhead at all.

NB I have uploaded a small package, formkit demonstrating the use of Metakit and Tclhttpd for webforms

Jeff Smith Thanks for the feedback! Greatly appreciated.

Jeff Smith OK I have made a start. Here is a Username/Password Database for Tclhttpd using Metakit.

2003-07-04 Jeff Smith I have made further progress check out [1] for a Starkit. I think the best advise I can give anyone who is thinking of using Metakit with Tclhttpd, is to use the Session module in Tclhttpd for session management and record locking of the Metakit database when updating. Another example can be found at [2]

2003-11-12 Jeff Smith Well I have been in production now for 2 months. 1500 Squid users authenticating daily to my Metakit database with a TclHttpd front end. It works great! The reason behind this little project was my Squid users originally authenticated against a database which ran on an old Firewall. The Firewall was at its end of life and the new Firewall did not allow authentication from our Squid machines. Luckily the passwords on the old Firewall used unix crypt so I used crypt in pure tcl from the wiki so the passwords could remain the same. One night I extracted the usernames, passwords and other info from the database on the old Firewall and imported it to the Metakit database. I then made the changes to Squid so it would authenticate against the TclHttpd Metakit database. Next day my Squid users didn't even realise there had been a change!

OK, now I am feeling adventurous I want to move on to the next stage but as a novice I am uncertain how to approach this.

1. I want to process the Squid access.log file and store information about sites visited, bytes downloaded etc... by individual users in a Metakit database for reporting and billing. At present the daily Squid access.log file is approx. 40MB around 1.2GB per month. My thoughts are to process the access.log file on a daily basis and store individual user data in their own Metakit datafile (average 27KB daily). I think this maybe the best approach??

2. a) When I process the Squid access.log file should I first open 1500 Metakit datafiles and then store the data in its respective user datafile as I process the log (Is this a ridiculous suggestion?).

b) Or as I process each line of the log, open the respective user datafile, write the information, then close the datafile.

c) Or as I process each line of the log, open the respective user datafile, write the information, if the next log file line is for a different user, open another user datafile and so on. Limit this to say a maximum of 10 Metakit user datafiles open at any given time, closing the one that has been open the longest without data being stored in it.

Any suggestions??

d) Process in memory and then update Metakit datafiles.

e) add the lines to a single metakit view, then iterate over it by user.

jcw - Juggling 1500 open files would be awkward and may hit OS limits. One file per day would not be hard of course, but it leaves the info scattered in the wrong way. One datafile would hit MK limits (32-bit address space limits datafiles to be under 2 Gb, but practical limit is considerably lower). Some thoughts:

  • think carefully about size and storage: there is a lot of redundancy in logfiles, if you bring this down 5..10-fold, it'd all become a lot faster and easy to maintain in a single monthly DB - this is the most flexible / DB-like approach
  • single days can easily be converted to MK db - again, there should be several ways to reduce (store dates and IP's as ints, create second view with paths accessed and store index into it in the actual log view, perhaps store some fields in separate file if not used often ...)
  • consider updating user db's on-demand: when accessed, go through any not-yet processed daily db's and append new info, once
  • if single-db remains too large to update efficiently, consider using oomk, which has a lot of relational/set operations - what you can do is store in "blocked" format (which scales way better), and what you can also do is store in a few datafiles, and join or concat them during use (MK view operators work across datafiles)
  • other tricks to consider are: batching (go through parts of the collection, open/close, then next part), and scanning (go through all data, process users 0..99, then again for 100..199, etc)

IOW, 2a probably hits OS limits, 2b will be too slow (massive commits), 2c could work (I'd try 100, not 10). If you want a Tcl-style approach, you could consider the following:

  • append each log line to Tcl array, one item per user
  • once an entry is say 100 lines long, open the matching MK file, convert/store/save, and clear array entry
  • repeat, then save all the remaining ones

In the worst case, this buffers 150,000 lines in memory as all data gets processed.

This last approach probably leads to the best trade-offs in such "data pivoting" tasks, assuming per-user r/o access is the most frequent activity.

2003-11-13 Jeff Smith Thanks for taking the time to respond, your efforts are greatly appreciated! :)