myvzw.com and Tcl

See COMPANY: Re-route Inc. for background...

Between August 2002 and December 2002, a small company (Re-route Inc) provided email forwarding services for Verizon Wireless customers (myvzw.com) as part of the VZW/MSN migration effort. The systems supported tens of thousands of user accounts with over 103 million email deliveries per month.

The system handling the email forwarding was hosted by Re-route and deployed in less than 2 months. It all revolved around Qmail and Tcl.

In August, Myvzw.com's DNS mx record was redirected to Re-route servers and every email destined for myvzw.com was delivered to Re-route's customized Qmail configuration where it was then re-delivered to myvzw.com's email servers. Customers were offered the option to sign up for mail forwarding to Hotmail/MSN accounts. During the Hotmail/MSN signup, Re-route was notified of the forwarding address. Then, when email arrived at Re-route for that user, it would be forwarded to their new Hotmail account (as well as the Verizon Wireless email server). In addition, the sender of the email would be send a "change of address" notice indicating the user's new email account. Every piece of email received by Re-route (forwarded or not) was logged for statistical purposes.

Re-route ran 2 email servers, each with a Qmail system fronted by a custom C application that performed local database lookups to determine where the forwarded mail should go. The local databases were created by Tcl code connected to a Tuplespace running on a central Re-route server. Whenever a new signup occurred, the web server notified (via a Tcl script) the tuplespace and the 2 email server databases were updated simultaneously. The tuplespace ran without a glitch (24x7) backed by Metakit. It never went down by itself. It never leaked too. We only bounced it 3 times during the 4 month run (to upgrade its capabilities). See Street Programming for the story behind the tuplespace implementation.

All statistical information collected by the email servers (incoming, forwarded and all successfully delivered mail) was placed in the Tuplespace where it was tracked and dumped into special log files. The tracked data could be viewed by custom Tk apps that showed the system running "live" (you could watch the incoming, forwarded and delivered message count increase in 1 second intervals). You could also use the Tk app to search/view all account signups.

Once the system was deployed, we were able to do dynamic modifications and tune-ups using Tcl without dropping any email. This became important as we were subject to a massive spam surges at night and on weekends. Each email server was handling over 150 constantly active SMTP connections each. We used the tuplespace log to build a list of "spam" accounts so we wouldn't bother ourselves with sending them change of address notices.

The change of address notice was tuned in September to only deliver 1 "reply" per sender/recipient pair per day (to prevent us from annoying senders with redundant messages). The change of address notice software was written completely in Tcl.

By December, Verizon wanted to stop forwarding messages back to themselves for customers who did not signup for Hotmail/MSN accounts (essentially, this meant that Verizon Wireless stopped being an email server for their customers). This prompted us to deploy a Tcl wrapper for Qmail's SMTP that performed lookups against the signup database and rejected any email that didn't go to a Hotmail/MSN recipient. So, by that time, all email was being touched by Tcl code.

Keep in mind that all of this was running 24x7 on 4 dual processor Pentium IIIs (2 email servers, 1 web server and 1 tuplespace server) running just Qmail, Apache and Tcl 8.3

It was all implemented by two developers (me doing Tcl, the other guy doing the C code and monitoring software).

If this isn't a testimonial to Tcl's prowess, I don't know what is ;-)

-- Todd Coram


Does this project demonstrate any new benchmarks on Metakit's capabilities? Number of records? Rate of access? Concurrency issues?


SEH -- How much money did Verizon kick over to jcw for his excellent freely-available Metakit tool?


The system, in general, didn't make very sophisticated use of Metakit. It simply provided persistence for the tuplespace....

Concurrency: Single process; single thread. No concurrency issues ;-) -- Todd Coram

I'll take your word that you don't have concurrency issues, but single process, single thread isn't enough to establish that, with event loops and traces available. Do you have any event-driven communication over sockets? {01/07/03- All operations were driven by events (fileevent and after). All metakit interactions were triggered by a single variable r/w trace. The traced variable's proc is blocked (no other background invocations) until it is completed. -- Todd Coram}

Rate of access: Initially, all logging was persisted through Metakit (until a log manager consumed them one-by-one and committed them to appropriate log files). Since the tuplespace rarely went down, in September log tuples were demoted to transient and never again committed to Metakit. The short of this is that Metakit wasn't being hit with too much traffic. That being said, for every email handled by the system, Metakit was being updated 3 times (mostly counts -- incoming, forwarding, delivered). So, on a average rate of 5 emails per second, a Metakit record was being updated around 15 times per second. The Metakit database was only read when a tuplespace tuple was not in memory (that occurred when a tuple hadn't been accessed for 5 minutes and was therefore deleted from memory). -- Todd Coram

Number of Records: The Metakit database also contained account information for all registered users. The disk file size of the database grew to about 17MB. One single Metakit database file was used for the whole tuplespace (accounts, counts, etc). The tuplespace used just one record type: (tuplesignature, tuple) -- Todd Coram