burp:/tcl/s/bookmarks# ./deadlinks.tcl bookmarks.html Checking 1150 sites... Testing site  1024 Good sites 126 Dead sitesCaveats: Just because a site doesn't respond within 10 seconds while the script is running it doesn't mean the site is permanently a dead link. You can reprocesses your file.dead file again at later times (deadlinks.tcl file.dead) and it will produce a file.dead.good containing any good links it found. Those sites then need to be re-added manually to your bookmarks.
lv Would bracing the exprs below result in the application running a wee bit faster?MT: If you meant for the -timeout value in 'proc verifySite' I just changed that now. That makes much more sense. However, shaving milliseconds in a script that has to wait 10 seconds on each failed attempt isn't going to save much overall but is better programming in general and I thank you for that. Or do you mean in 'proc extractSite'? If so, I suppose it would, so I did it, but frankly I don't really like that proc myself. Got a better suggestion?Frink and procheck pointed to a number of lines (there are still at least 6) where there were expr's without their arguments braced - those are the ones to which I was referring. I was just facing feeding deadlinks a massive bookmarks file and figured that if bracing them would reduce each link processing by 1/4 a second, I'd still be saving nearly an hour of processing time...MT: OK, I never did understand this expr performance issue before. I did some reading and also a test script that created 10,000 vars using expr and indeed there is a huge performance gain on the expr usage and I think I almost understand why. ie; the braced portion gets byte-compiled at run time rather than substution occuring during every instance. I think that's right? and I hope my appyling the rule below is correct and complete. Thanks for making me learn.lv the way i think it is explained is that expr does a level of substitution of its own as it processes its arguments. Thus, if you brace its arguments, then in most cases you save the overhead of trying to do a substitution twice. There are, of course, possible designs which depend on two rounds of substitution - but I don't believe that your code is one of those designs.
TV You may want to consider a -command callback with the http::geturl command, at least you should be able to get a few sockets (probably a couple of dozens without getting to upsetting) to try to set up connections simultaneously. Or consider it a background process, depending on OS, it shouldn't eat up much processor time waiting.Multiple queries give you n*100% speedup...MT: I am looking at this because it looks correct, but struggling with how to implement it. The problem is that if the script continues on the HREF that is being waited for would end up getting written to a wrong position within the bookmarks.good file if it's just a slow validation or a retry, as I would assume you'd want to do. It may have orignally been in a folder named "Tcl" but end up in a folder named "Perl".
lv Bug report. After running a couple of hours, and getting about 60% done, I see this error message:
When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there. For example, you can use Wget to check your bookmarks: wget --spider --force-html -i bookmarks.htmlBut checking the output from that, and removing the dead links is left as an exercise for the user :)