Error processing request

Parameters

CONTENT_LENGTH	0
REQUEST_METHOD	GET
REQUEST_URI	/revision/Wibble+bugs?V=64
QUERY_STRING	V=64
CONTENT_TYPE
DOCUMENT_URI	/revision/Wibble+bugs
DOCUMENT_ROOT	/var/www/nikit/nikit/nginx/../docroot
SCGI	1
SERVER_PROTOCOL	HTTP/1.1
HTTPS	on
REMOTE_ADDR	172.71.254.77
REMOTE_PORT	19356
SERVER_PORT	4443
SERVER_NAME	wiki.tcl-lang.org
HTTP_HOST	wiki.tcl-lang.org
HTTP_CONNECTION	Keep-Alive
HTTP_ACCEPT_ENCODING	gzip, br
HTTP_X_FORWARDED_FOR	18.216.43.163
HTTP_CF_RAY	889fe2019a0322e8-ORD
HTTP_X_FORWARDED_PROTO	https
HTTP_CF_VISITOR	{"scheme":"https"}
HTTP_ACCEPT	/
HTTP_USER_AGENT	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
HTTP_REFERER	http://wiki.tcl.tk/revision/Wibble+bugs?V=64
HTTP_CF_CONNECTING_IP	18.216.43.163
HTTP_CDN_LOOP	cloudflare
HTTP_CF_IPCOUNTRY	US

Body

Error

Unknow state transition: LINE -> END

-code

-level

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit {Wibble bugs} \[AMG\]:\ This\ page\ documents\ bugs\ in\ the\ \[Wibble\]\ web\ server,\ as\ well\ as\ bugs\ in\ \[Tcl\]\ that\ affect\ Wibble.\ \ There\ may\ be\ some\ overlap\ with\ the\ \[Wibble\ discussion\]\ page.\n\n<<backrefs:Category\ Wibble>>\n\n----\n**Current\ bugs**\n\n***\ Wibble\ multiple\ header\ lines\ are\ overwritten\ ***\n\n\[APN\]\ The\ HTTP\ spec\ says\ if\ multiple\ lines\ with\ the\ same\ header\ name\ are\ received,\ they\ should\ be\ treated\ as\ a\ single\ header\ line\ with\ a\ comma\nseparating\ the\ values.\ Wibble\ overwrites\ values\ in\ previous\ headers\ with\ those\ in\ later\ headers.\n\n***\ Wibble\ forces\ HTTP/1.1\ features\ ***\n\n\[andrewsh\]:\ Wibble\ assumes\ all\ HTTP\ requests\ without\ `Connection:\ close`\ to\ initiate\ a\ persistent\ connection,\ even\ if\ it\ says\ `HTTP/1.0`.\ That\ isn't\ correct\ and\ breaks\ some\ clients.\ See\ http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.10%|%RFC2616\ §14.10%|%:\n\n\ \ A\ system\ receiving\ an\ HTTP/1.0\ (or\ lower-version)\ message\ that\n\ \ includes\ a\ Connection\ header\ MUST,\ for\ each\ connection-token\ in\n\ \ this\ field,\ remove\ and\ ignore\ any\ header\ field(s)\ from\ the\ message\n\ \ with\ the\ same\ name\ as\ the\ connection-token.\ This\ protects\ against\n\ \ mistaken\ forwarding\ of\ such\ header\ fields\ by\ pre-HTTP/1.1\ proxies.\n\n\[AMG\]:\ Wibble\ doesn't\ support\ HTTP/1.0\ or\ /0.9\ or\ anything\ pre-HTTP/1.1.\ \ Such\ support\ can\ be\ added\ but\ was\ not\ requested\ until\ now.\n\n\[andrewsh\]:\ The\ part\ of\ the\ spec\ I\ quoted\ was\ from\ HTTP/1.1.\ HTTP/1.1\ servers\ '''must'''\ support\ HTTP/1.0\ non-persistent\ connections,\ it's\ a\ part\ of\ a\ specification.\ Without\ this\ particular\ feature\ it's\ impossible\ to\ interface\ Wibble\ with\ many\ clients.\ Please\ also\ note\ that\ HTTP/1.0\ clients\ are\ still\ widespread.\n\nBasically,\ the\ fix\ here\ would\ be\ to\ default\ to\ `Connection:\ close`\ when\ the\ request\ has\ HTTP\ version\ less\ than\ 1.1.\ Also\ (however\ this\ needs\ checking\ with\ the\ standard),\ it\ would\ be\ great\ if\ Wibble\ could\ add\ `Connection:`\ header\ to\ the\ response\ (when\ it's\ HTTP/1.1\ of\ course).\n\nHere's\ a\ proof-of-concept\ patch\ to\ fix\ the\ issue:\n\n======\ndiff\ --git\ a/wibble.tcl\ b/wibble.tcl\n---\ a/wibble.tcl\n+++\ b/wibble.tcl\n@@\ -1180,6\ +1180,10\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ set\ size\ 0\n\ \ \ \ \ set\ dict_get_request_method\ \[dict\ get\ \$request\ method\]\n\ \ \ \ \ set\ dict_get_response_status\ \[dict\ get\ \$response\ status\]\n+\ \ \ \ set\ persistent_connection\ \[expr\ \{(\[dict\ get\\\n+\ \ \ \ \ \ \ \ \$request\ protocol\]\ >=\ \"HTTP/1.1\")\ &&\n+\ \ \ \ \ \ \ \ (!\[string\ equal\ -nocase\\\n+\ \ \ \ \ \ \ \ \[dict\ getnull\ \$request\ header\ connection\]\ close\])\}\]\n\ \ \ \ \ if\ \{\[dict\ exists\ \$response\ contentfile\]\}\ \{\n\ \ \ \ \ \ \ \ \ set\ dict_get_response_contentfile\ \[dict\ get\ \$response\ contentfile\]\n\ \ \ \ \ \ \ \ \ set\ size\ \[file\ size\ \$dict_get_response_contentfile\]\n@@\ -1219,8\ +1223,12\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ set\ end_begin_1\ \[expr\ \{\$end\ -\ \$begin\ +\ 1\}\]\n\ \ \ \ \ dict\ set\ response\ header\ content-length\ \$end_begin_1\n\ \n+\ \ \ \ if\ \{!\$persistent_connection\}\ \{\n+\ \ \ \ \ \ \ \ dict\ set\ response\ header\ connection\ close\n+\ \ \ \ \}\n+\n\ \ \ \ \ #\ Send\ the\ response\ header\ to\ the\ client.\n-\ \ \ \ chan\ puts\ \$socket\ \"HTTP/1.1\ \$dict_get_response_status\"\n+\ \ \ \ chan\ puts\ \$socket\ \"\[dict\ get\ \$request\ protocol\]\ \$dict_get_response_status\"\n\ \ \ \ \ chan\ puts\ \$socket\ \[enheader\ \[dict\ get\ \$response\ header\]\]\\n\n\ \n\ \ \ \ \ #\ If\ requested,\ send\ the\ response\ content\ to\ the\ client.\n@@\ -1250,8\ +1258,7\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ \}\n\ \n\ \ \ \ \ #\ Return\ 1\ to\ keep\ going\ or\ 0\ if\ the\ connection\ needs\ to\ close.\n-\ \ \ \ expr\ \{!\[string\ equal\ -nocase\\\n-\ \ \ \ \ \ \ \ \[dict\ getnull\ \$request\ header\ connection\]\ close\]\}\n+\ \ \ \ return\ \$persistent_connection\n\ \}\n\ \n\ #\ Main\ connection\ processing\ loop.\n@@\ -1344,7\ +1351,7\ @@\ proc\ ::wibble::panic\ \{options\ port\ socke\n\ \ \ \ \ if\ \{!\[dict\ exists\ \$response\ nonhttp\]\ &&\ \$socket\ ne\ \"\"\}\ \{\n\ \ \ \ \ \ \ \ \ catch\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ configure\ \$socket\ -translation\ crlf\n-\ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"HTTP/1.1\ 500\ Internal\ Server\ Error\"\n+\ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"\[dict\ get\ \$request\ protocol\]\ 500\ Internal\ Server\ Error\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Content-Type:\ text/plain\;charset=utf-8\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Content-Length:\ \[string\ length\ \$message\]\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Connection:\ close\"\n======\n\n\[AMG\]:\ Thanks.\ \ Will\ investigate\ when\ I\ have\ time.\n\n\[AMG\]:\ Again\ thanks.\ \ Give\ \[https://chiselapp.com/user/andy/repository/wibble/info/c62e5a1b5d\]\ a\ look.\ \ Unlike\ your\ patch,\ this\ version\ is\ baselined\ prior\ to\ \[SEH\]'s\ performance\ improvements.\ \ By\ the\ way,\ it\ indeed\ looks\ like\ I\ need\ to\ move\ to\ a\ different\ SCM\ because\ this\ wiki\ doesn't\ support\ branching.\n\n\[andrewsh\]:\ Great.\ Are\ you\ sure\ about\ this?\n\n======\n+\ \ \ \ #\ Determine\ if\ the\ connection\ is\ persistent.\n+\ \ \ \ set\ persist\ \[expr\ \{\n+\ \ \ \ \ \ \ \ \[dict\ get\ \$request\ protocol\]\ >=\ \"http/1.1\"\n+\ \ \ \ \ &&\ !\[string\ equal\ -nocase\ \[dict\ getnull\ \$request\ header\ connection\]\ close\]\n+\ \ \ \ \}\]\n======\n\nIf\ I\ understand\ this\ correctly,\ the\ first\ comparison\ will\ be\ always\ true\ as\ you\ do\ `\[string\ toupper\]`.\nSpeaking\ of\ VCS,\ is\ that\ chiselapp\ link\ the\ official\ upstream\ location\ for\ Wibble\ from\ now?\n\n\[AMG\]:\ No,\ actually\ always\ false.\ \ Capitals\ are\ less\ than\ lowercase.\ \ Even\ so,\ very\ good\ catch!\ \ I\ didn't\ have\ a\ good\ way\ to\ test\ it\ myself.\ \ See\ \[http://chiselapp.com/user/andy/repository/wibble/info/6dc367091c\]\ for\ the\ fix.\ \ I\ did\ test\ the\ comparisons\ in\ the\ Tcl\ shell\ to\ make\ sure\ they\ work\ the\ way\ we\ intend.\n\nAlso,\ yes,\ that\ is\ the\ official\ upstream\ location\ for\ the\ time\ being.\ \ I\ have\ no\ plans\ to\ move\ it\ further,\ but\ that\ doesn't\ mean\ it'll\ never\ happen.\ \ When\ I\ have\ time\ (can't\ give\ an\ ETA),\ I\ will\ see\ about\ migrating\ bug\ reports,\ documentation,\ and\ maybe\ discussion,\ reducing\ Wibble's\ presence\ on\ this\ wiki\ as\ I\ go.\ \ I\ also\ have\ a\ lot\ of\ new\ code\ I'd\ love\ to\ publish,\ but\ it\ breaks\ backward\ compatibility,\ so\ I\ hesitated.\ \ But\ now\ I\ have\ branch\ capability,\ so\ there's\ no\ reason\ not\ to.\n\n\[andrewsh\]:\ Well,\ now\ you\ need\ to\ fix\ the\ shebang\ :)\n\nRegarding\ version\ control,\ without\ knowing\ you're\ considering\ Fossil,\ I've\ imported\ the\ wiki\ page\ to\ Mercurial\ about\ two\ weeks\ ago:\ \[https://bitbucket.org/andrew_shadura/wibble/commits/all\].\ I\ think\ I\ will\ remove\ that\ and\ resync\ it\ to\ your\ Fossil\ repo\ instead.\n\n\[AMG\]:\ Thanks,\ also\ fixed\ \[http://chiselapp.com/user/andy/repository/wibble/info/180aada6d2\].\ \ Also\ I\ appreciate\ your\ effort\ to\ convert\ to\ Mercurial\ and\ am\ very\ sorry\ to\ see\ it\ wasted.\ \ Is\ there\ anything\ you\ have\ that\ I\ missed?\ \ I\ walked\ through\ every\ version\ of\ the\ \[Wibble\]\ and\ \[Wibble\ implementation\]\ pages,\ extracting\ the\ code\ using\ this\ wiki's\ little-known\ \".code\"\ suffix\ feature\ (e.g.\ \[http://wiki.tcl.tk/27382.code\]),\ then\ compared\ the\ checksums\ of\ each\ to\ identify\ the\ distinct\ versions,\ each\ of\ which\ I\ checked\ in.\ \ For\ the\ older\ versions\ I\ wrote\ descriptive\ check-in\ comments,\ and\ for\ newer\ I\ converted\ the\ comments\ from\ what\ I\ had\ recorded\ at\ \[Wibble\ change\ log\].\ \ I\ could\ well\ have\ made\ a\ mistake.\ \ I\ mean,\ it\ took\ several\ tries\ to\ get\ all\ this\ right,\ or\ at\ least\ not\ so\ wrong\ that\ I'd\ notice.\ \;^)\n\nI'm\ sorry\ I've\ not\ been\ able\ to\ do\ any\ Wibble\ work\ or\ testing\ lately.\ \ Life\ has\ been\ impossible.\ \ Converting\ to\ Fossil\ was\ the\ most\ I've\ been\ able\ to\ do\ for\ a\ long\ time.\ \ Actually\ I\ had\ long\ planned\ to\ move\ to\ Fossil,\ but\ every\ previous\ time\ I\ had\ made\ the\ move,\ I\ aborted\ when\ I\ found\ that\ the\ then-current\ version\ of\ Fossil\ was\ too\ clever\ for\ its\ own\ good\ with\ the\ way\ it\ displayed\ check-in\ comments\ on\ the\ timeline.\ \ Things\ have\ improved\ over\ time\ due\ to\ Fossil\ being\ used\ to\ host\ Tcl,\ Tk,\ and\ related\ repositories,\ all\ of\ which\ had\ the\ same\ issues\ with\ \[\[brackets\]\]\ that\ the\ historical\ Wibble\ check-in\ comments\ did.\ \ I\ also\ quailed\ at\ converting\ the\ historical\ and\ current\ related\ documentation\ to\ the\ Fossil\ wiki\ format.\n\nI\ really\ ought\ to\ dust\ off\ my\ private\ development\ code\ and\ check\ it\ in\ on\ a\ branch\ (it's\ incompatible),\ but\ even\ though\ it\ runs,\ it's\ not\ coherent\ enough\ to\ share.\n\n***\ Protection\ code?\ ***\n\n\[MaxJarek\]:\ Sample\ protection\ ::wibble::getrequest\ for\ error\ like:\ can't\ read\ \"uri\":\ no\ such\ variable\n\n======\n...\n#\ Receive\ and\ parse\ the\ first\ line.\ \ Normalize\ the\ path.\nset\ method\ GET\nset\ uri\ \"\"\nset\ protocol\ \"HTTP/1.1\"\nwhile\ \{\[set\ line\ \[getline\]\]\ eq\ \"\"\}\ \{continue\}\nregexp\ \{^\\s*(\\S*)\\s+(\\S*)\\s+(\\S*)\}\ \$line\ _\ method\ uri\ protocol\n...\n======\n\n\[AMG\]:\ What\ is\ the\ benefit\ of\ this\ protection\ code?\ \ Send\ invalid\ HTTP\ to\ Wibble,\ and\ it\ should\ die\ with\ an\ error.\ \ This\ won't\ take\ down\ all\ of\ Wibble,\ just\ the\ coroutine\ for\ the\ one\ connection,\ triggering\ the\ \[\[panic\]\]\ handler\ in\ the\ process.\n\n***Wiki\ corrupts\ \[\[enhtml\]\]\ and\ \[\[enattr\]\]***\n\n\[AMG\]:\ The\ \[\[enhtml\]\]\ and\ \[\[enattr\]\]\ procedures\ convert\ adjacent\ spaces\ to\ be\ alternating\ spaces\ and\ nonbreaking\ spaces.\ \ For\ some\ reason,\ the\ Wiki\ is\ replacing\ my\ nonbreaking\ space\ HTML\ entity\ (&\ #\ 1\ 6\ 0\ \;)\ with\ an\ ordinary\ space.\ \ You\ can\ still\ see\ the\ correct\ code\ in\ the\ WikiText,\ but\ it's\ gone\ from\ the\ formatted\ HTML.\ \ Therefore\ I\ have\ to\ remember\ to\ re-add\ it\ every\ time\ I\ edit\ the\ page.\ \ Sometimes\ I\ forget.\ :^(\ \ See\ \[http://code.google.com/p/wikitcl/issues/detail?id=209\]\ for\ the\ official\ bug\ report.\n\n\[AMG\],\ update:\ I\ put\ a\ workaround\ in\ the\ latest\ version.\ \ I\ simply\ replaced\ \"#\"\ with\ \"\\#\",\ so\ that\ the\ Wiki\ won't\ think\ it's\ an\ HTML\ entity.\n\n***Coroutine\ already\ running***\n\n\[dzach\]:\ I'm\ getting\ an\ error:\ \n\n\ coroutine\ \"::wibble::sock60\"\ is\ already\ running\n\ while\ executing\n\ \ \ \ \ \ \ \ \"\$coro\ \$args\"\n\ \ \ \ \ \ \ \ (lambda\ term\ \"\{coro\ args\}\ \{\$coro\ \$args\}\ ::wibble\"\ line\ 1)\n\ \ \ \ \ \ \ \ invoked\ from\ within\n\ \ \ \ \ \ \ \ \"apply\ \{\{coro\ args\}\ \{\$coro\ \$args\}\ ::wibble\}\ ::wibble::sock60\ copydone\ 0\"\ \nwhen\ putting\ wibble\ ver.\[http://wiki.tcl.tk/_/revision?N=27377.code&V=21\]\ under\ stress.\ I've\ yet\ to\ find\ under\ what\ exactly\ conditions\ this\ happens.\n\n\[AMG\]:\ It\ seems\ \[\[\[chan\ copy\]\]\]\ calls\ its\ -command\ script\ directly\ when\ it\ has\ zero\ bytes\ to\ copy.\ \ However,\ here\ this\ means\ having\ a\ coroutine\ call\ itself,\ which\ will\ give\ the\ \"already\ running\"\ error.\ \ Try\ printing\ \$begin\ and\ \$end\ just\ before\ the\ \[\[chan\ copy\]\].\ \ I\ expect\ this\ error\ to\ happen\ when\ \$begin\ is\ greater\ than\ \$end,\ which\ will\ occur\ if\ the\ file\ being\ downloaded\ is\ zero\ bytes.\ \ I\ don't\ think\ the\ range\ parsing\ logic\ (which\ sets\ \$begin\ and\ \$end)\ allows\ zero-byte\ ranges.\n\n\[dzach\]:\ I'm\ also\ getting\ the\ following\ error\ when\ hitting\ wibble\ with\ 5\ or\ more\ concurrent\ requests,\ using\ siege,\ http://www.joedog.org/index/siege-home:\n\ errorinfo:\ error\ writing\ \"sock8\":\ broken\ pipe\n\ \ \ \ while\ executing\n\ \"chan\ puts\ -nonewline\ \$socket\ \[string\ range\ \[dict\ get\ \$response\ content\]\ \$begin\ \$end\]\"\n\$begin\ \ \$end\ are\ as\ follows\ (\[\[puts\]\]\ inserted\ right\ after\ ''#\ Send\ buffered\ response\ content.''\ in\ wibble::process):\n\ 0\ \ \ \ \ \ \ 50947\nrepeated\ many\ times,\ and\ then,\ when\ siege\ stops,\ comes\ the\ error\ repeated\ many\ times\ too,\ depending\ on\ request\ concurrency.\n\nIn\ the\ same\ context\ I\ also\ get\ this\ error:\n\ errorinfo:\ error\ writing\ \"sock186\":\ broken\ pipe\n\ \ \ \ while\ executing\n\ \"error\ \[lindex\ \$result\ 2\]\"\n\nI\ should\ mention\ that\ older\ versions\ of\ wibble,\ I\ think\ before\ icc\ was\ introduced,\ did\ not\ give\ these\ errors.\ Hard\ to\ find\ exactly\ which\ version\ these\ were,\ as\ they\ all\ appear\ as\ '0.1'.\n\n\[AMG\]:\ Sorry\ about\ the\ versioning.\ \ Just\ use\ the\ Wiki\ page\ revision\ number.\ \ Regarding\ the\ errors:\ I\ expect\ \"broken\ pipe\"\ errors\ in\ many\ cases\ when\ the\ client\ disconnects\ unexpectedly.\ \ I\ don't\ think\ that's\ a\ big\ deal,\ and\ it's\ a\ nuisance\ to\ fix.\ \ I\ doubt\ ICC\ has\ anything\ to\ do\ with\ it.\ \ I\ haven't\ had\ time\ to\ repeat\ any\ of\ the\ tests\ you've\ been\ doing\ nor\ make\ any\ changes,\ but\ when\ I\ do,\ I\ suspect\ Siege\ will\ do\ a\ fair\ bit\ of\ disconnecting\ without\ waiting\ to\ receive\ the\ full\ page.\n\nAs\ for\ the\ \"coroutine\ already\ running\"\ problem,\ I\ will\ be\ looking\ at\ the\ Tcl\ sources\ to\ track\ down\ any\ and\ all\ circumstances\ in\ which\ \[\[\[chan\ copy\]\]\]\ will\ directly\ call\ its\ -command\ rather\ than\ schedule\ it\ as\ an\ event\ handler,\ then\ make\ sure\ none\ of\ those\ cases\ happen.\ \ I\ already\ know\ that\ it'll\ happen\ when\ working\ with\ zero-size\ files,\ but\ there\ may\ be\ other\ cases\ too.\ \ Whatever\ I\ find,\ I\ should\ also\ document\;\ I\ don't\ see\ any\ mention\ of\ this\ behavior\ in\ the\ man\ page.\n\nWhenever\ I\ get\ time\ for\ this,\ I'll\ also\ tackle\ the\ Firefox\ filename\ character\ encoding\ problem.\ \ After\ that,\ I\ have\ some\ more\ thinking\ to\ do\ about\ whether\ or\ not\ ICC\ is\ the\ right\ thing\ for\ the\ application\ I\ have\ in\ mind\;\ it\ might\ still\ change\ quite\ a\ bit\ before\ I\ decide\ to\ leave\ it\ alone.\ \ Right\ now\ I'm\ just\ too\ busy,\ sorry.\n\n***HTML\ entities\ embedded\ in\ HTTP\ headers***\n\n\[dzach\]:\ When\ faced\ with\ '''numeric\ character\ references'''\ \[http://www.w3.org/TR/html4/charset.html#h-5.3.1\],\ e.g.\ ''\"Υ\ \;Π\ \;Ε\ \;Σ\ \;-google.png\"'',\ wibble\ erroneously\ splits\ it\ at\ every\ ''\;''\ character,\ so\ for\ the\ example\ given\ here\ we\ get:\ ''content-disposition\ \{\{\}\ form-data\ name\ imagedata\ filename\ \{\"Υ\}\ Π\ \{\}\ Ε\ \{\}\ Σ\ \{\}\ -google.png\\\"\ \{\}\}''\ in\ the\ post\ data,\ where\ ''filename''\ contains\ \ ''\"Υ''\ instead\ of\ ''\"Υ\ \;Π\ \;Ε\ \;Σ\ \;-google.png\"''\ (spaces\ inserted\ before\ '\;'\ for\ display\ purposes).\ If\ the\ encoding\ of\ the\ page\ is\ set\ correctly,\ e.g.\ utf-8\ for\ the\ example\ above,\ the\ problem\ doesn't\ appear,\ although\ it\ still\ exists.\n\n\[AMG\]:\ Are\ you\ sure\ numeric\ character\ references\ are\ valid\ in\ the\ context\ of\ an\ HTTP\ header?\ \ Don't\ confuse\ HTML\ with\ HTTP.\ \ Is\ this\ HTTP\ header\ generated\ by\ your\ web\ browser\ (which\ one?)\ or\ are\ you\ doing\ it\ from\ a\ script\ or\ by\ hand?\ \ I\ will\ need\ a\ test\ case,\ if\ you\ want\ me\ to\ fix\ this.\n\n\[dzach\]:\ Sure.\ It\ is\ the\ browser\ that\ generates\ this\ (Firefox\ 3.6.13,\ Kubuntu\ 10.04),\ I\ use\ wibble\ \[http://wiki.tcl.tk/_/revision?N=27377.code&V=21\]\ without\ any\ modification\ and\ no\ extra\ scripts\ whatsoever.\ For\ a\ test,\ just\ rename\ any\ image\ to\ ''????-google.png'',\ which\ is\ the\ filename\ used\ in\ the\ example\ above\ and\ which\ contains\ 4\ non-latin\ characters.\ Do\ NOT\ change\ the\ page\ encoding\ to\ utf-8,\ or\ if\ that\ is\ the\ default\ change\ it\ to\ ISO-8859-1,\ so\ that\ the\ browser\ be\ forced\ to\ use\ numeric\ character\ references.\ Then\ use\ the\ wibble\ upload\ form\ to\ find\ and\ upload\ the\ renamed\ image\ and\ observe\ the\ \"File\ name\"\ in\ the\ table.\ I\ inserted\ a\ ''puts\ \$post''\ statement\ in\ the\ index.html.tmpl\ to\ get\ the\ content-disposition\ data.\ About\ the\ encoding\ used\ in\ an\ HTTP\ header\ \[http://www.w3.org/Protocols/HTTP/Issues/content-disposition.txt\],\ I\ understand\ that\ numeric\ character\ references\ are\ standard\ US-ASCII\ and\ therefore\ should\ be\ valid,\ depending\ on\ the\ operating\ system\ whether\ it\ can\ save\ such\ a\ file\ or\ not.\ Linux,\ as\ far\ as\ I\ can\ tell,\ accepts\ ''&#\;''\ as\ valid\ characters\ in\ a\ file\ name.\n\n\[AMG\]:\ I'm\ still\ not\ sure\ that's\ technically\ legal,\ but\ I\ don't\ see\ any\ other\ way\ for\ Firefox\ to\ send\ Greek\ in\ the\ HTTP\ header\ when\ constrained\ to\ Latin-1.\ \ So\ I\ guess\ I\ need\ to\ find\ a\ way\ to\ support\ this.\ \ I\ suppose\ I\ should\ replace\ numeric\ character\ references\ in\ the\ headers\ before\ attempting\ to\ split\ them.\ \ \[\[\[regexp\]\ -all\ -indices\]\],\ \[\[\[binary\ format\]\ s\]\],\ and\ \[\[\[encoding\ convertfrom\]\ unicode\]\]\ should\ do\ the\ trick:\n\n======\n%\ encoding\ convertfrom\ unicode\ \[binary\ format\ s*\ \{933\ 928\ 917\ 931\}\]\n????\n======\n\nAlso\ I\ ought\ to\ support\ hexadecimal,\ which\ means\ using\ \[\[\[scan\]\]\]\ or\ maybe\ \[\[binary\ format\ H\]\].\n\n\[dzach\]:\ The\ conversion\ itself\ might\ not\ be\ wibble's\ job,\ a\ user\ script\ could\ do\ it,\ but\ it\ should\ preserve\ the\ numeric\ character\ references\ in\ the\ filename,\ which\ it\ doesn't,\ as\ it\ stands\ now.\ I\ think\ the\ splitter\ should\ be\ able\ to\ distinguish\ a\ numeric\ character\ reference\ from\ a\ value\ separator\ '\;'.\n\n\[AMG\]:\ So\ we\ know\ that\ Firefox\ applies\ this\ encoding\ to\ characters\ in\ the\ names\ of\ uploaded\ files.\ \ Where\ else\ does\ it\ apply\ this\ encoding?\ \ Will\ it\ do\ it\ in\ cookies,\ for\ example?\ \ If\ it's\ only\ in\ filenames,\ I'm\ inclined\ to\ agree\ with\ you\ that\ Wibble\ should\ pass\ along\ the\ encoded\ text\ unmolested.\ \ But\ if\ it's\ universal\ throughout\ the\ HTTP\ headers,\ it\ makes\ more\ sense\ for\ Wibble\ to\ automatically\ decode\ all\ character\ references\ when\ constructing\ the\ header\ dictionaries.\ \ I\ would\ have\ already\ included\ this\ feature\ in\ Wibble\ had\ I\ found\ any\ mention\ of\ numeric\ character\ encoding\ in\ RFC\ 2616.\n\nThe\ danger\ is\ that\ this\ processing\ makes\ it\ impossible\ for\ browsers\ to\ include\ text\ resembling\ a\ character\ reference\ in\ a\ HTTP\ header\ (short\ of\ also\ encoding\ the\ ampersand,\ which\ I\ have\ ''not''\ seen\ Firefox\ do).\ \ But\ that\ danger\ exists\ not\ in\ Wibble\ but\ in\ Firefox\ or\ any\ other\ browser\ that\ has\ chosen\ to\ so\ extend\ HTTP.\ \ I\ think\ I\ might\ take\ a\ peek\ at\ the\ Firefox\ source\ code\ to\ see\ what's\ really\ going\ on\ behind\ the\ scenes.\ \ With\ luck,\ the\ sources\ contain\ references\ to\ standards\ documents.\n\n\[dzach\]:\ I\ checked\ Firefox\ with\ cookies\ and\ it\ looks\ like\ the\ cookie\ is\ not\ converted\ to\ numeric\ character\ references.\ I\ created\ a\ cookie:\n\ document.cookie=\"test=????-google.png\"\nThis\ is\ what\ the\ wibble\ sees:\n\ test\ \{\{\}\ ¥ £-google.png\}\nI\ also\ checked\ filename\ and\ cookies\ with\ Chromium.\ The\ filename\ appears\ in\ the\ table\ as:\n\ File\ name:\ ????-google.png\nwhile\ the\ cookie\ appears\ as:\n\ test\ \{\{\}\ Î¥Î ÎÎ£-google.png\}\ #\ first\ time\n\ test\ \{\{\}\ ????-google.png\}\ #\ with\ image\ load\nIt\ looks\ like\ Firefox\ is\ trying\ its\ best\ to\ properly\ represent\ the\ filename\ selected\ by\ an\ end\ user\ and\ thus\ decides\ to\ use\ numeric\ char\ refs\ for\ that,\ while\ for\ cookies,\ which\ are\ under\ the\ control\ of\ a\ developer,\ it\ sends\ them\ without\ conversion.\ Developer\ should\ know\ better.\n\nChromium\ does\ not\ understand\ the\ non-latin\ chars\ in\ the\ user\ selected\ filename\ and\ converts\ them\ to\ '?'\ while\ it\ second\ guesses\ the\ cookie\ value\ as\ utf-8\ and\ sends\ it\ correctly.\n\n\[AMG\]:\ Too\ bad\ Firefox\ chose\ to\ use\ the\ &#decimal\;\ notation,\ since\ the\ semicolon\ was\ already\ defined\ by\ HTTP\ to\ be\ a\ string\ delimiter.\n\n\[AMG\]:\ I\ just\ had\ another\ look\ at\ this.\ \ I\ created\ a\ file\ called\ は.txt\ containing\ \"hello\",\ then\ tried\ uploading\ it\ via\ a\ post\ with\ enctype=\"multipart/form-data\".\ \ I\ also\ included\ some\ text\ inputs\ in\ the\ form,\ into\ which\ I\ wrote\ は.\ \ Here's\ the\ rawpost\ from\ Firefox\ 8.0:\n\n======none\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"key1\"\n\nã\[81\]¯\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"key2\"\n\nã\[81\]¯\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"ã\[81\]¯.txt\"\nContent-Type:\ text/plain\n\nhello\n\n-----------------------------000000000000000--\n======\n\n(Replace\ `\[\[81\]\]`\ with\ \\x81.\ \ I\ had\ to\ change\ it\ to\ dodge\ the\ wiki's\ bogus\ character\ reject\ filter.)\n\nAs\ you\ might\ expect,\ `\[\[\[encoding\ convertfrom\]\ \[utf-8\]\ ã\[\[81\]\]¯\]\]`\ returns\ `は`.\n\nIt\ appears\ Firefox\ is\ taking\ the\ liberty\ of\ encoding\ its\ posted\ form\ data\ in\ UTF-8,\ even\ though\ there's\ nothing\ in\ the\ request\ headers\ indicating\ the\ charset.\ \ This\ happens\ both\ with\ and\ without\ `accept-charset=\"utf-8\"`\ in\ the\ `<form>`\ markup.\ \ The\ default\ should\ be\ ISO-8859-1,\ not\ UTF-8.\n\nPutting\ `accept-charset=\"iso-8859-1\"`\ in\ the\ `<form>`\ yields:\n\n======none\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"key1\"\n\nは\;\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"key2\"\n\nは\;\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"は\;.txt\"\nContent-Type:\ text/plain\n\nhello\n-----------------------------00000000000000--\n======\n\nAgain,\ `\[\[encoding\ convertfrom\ unicode\ \[\[\[string\ trimright\]\ \[\[\[binary\ format\]\ i*\ 12399\]\]\ \\0\]\]\]\]`\ returns\ `は`.\ \ Also,\ again\ there's\ no\ indication\ in\ the\ request\ header\ what\ the\ character\ set\ is.\n\nFirefox\ encodes\ は.txt\ in\ the\ URI\ as\ `%E3%81%AF.txt`.\ \ `\[\[encoding\ convertfrom\ utf-8\ \\xe3\\x81\\xaf\]\]`\ returns\ `は`.\n\n`<form\ enctype=\"text/plain\"\ accept-charset=\"iso-8859-1\">`\ yields:\n\n======none\nkey1=\nkey2=は\;\nfile=は\;.txt\n======\n\nRemoving\ `accept-charset`\ or\ setting\ it\ to\ `utf-8`\ yields:\n\n======none\nkey1=\nkey2=ã\[81\]¯\nfile=ã\[81\]¯.txt\n======\n\nUsing\ `enctype=\"application/x-www-form-urlencoded\"`\ or\ removing\ enctype\ altogether\ yields:\n\n%|`accept-charset`|`rawpost`|%\n&|`iso-8859-1`|`key1=&key2=%26%2312399%3B&file=%26%2312399%3B.txt`|&\n&|`utf-8`|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|(none)|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n\nNow,\ let's\ decode:\n\n&|%26|%23|12399|%3B|&\n&|&|#|12399|\;|&\n\nI'll\ check\ other\ browsers\ another\ day.\n\n\[dzach\]\ :\ So,\ decoding\ the\ charset,\ when\ the\ charset\ ''is''\ set\ for\ the\ form/page,\ looks\ straight\ forward\ for\ Firefox,\ although\ Wibble\ will\ need\ to\ do\ a\ double\ conversion\ for\ unicode\ characters\ sent\ with\ iso-8859-1.\n\n\[AMG\]:\ Yeah,\ not\ a\ huge\ deal.\ \ My\ big\ concern\ is\ that\ Firefox\ uses\ UTF-8\ by\ default\ even\ though\ RFC\ 2616\ requires\ that\ it\ use\ ISO-8859-1\ by\ default.\ \ That\ means\ other,\ compliant\ browsers\ would\ be\ incompatible,\ and\ I'd\ either\ have\ to\ do\ different\ processing\ based\ on\ the\ user-agent,\ or\ implement\ autodetection\ heuristics,\ or\ require\ that\ `enctype`\ be\ explicitly\ specified\ in\ all\ forms.\n\nI\ just\ had\ a\ look\ at\ `method=\"get\"`.\ \ It\ works\ similarly\ to\ the\ above,\ except\ the\ data\ goes\ in\ the\ query\ string,\ and\ it's\ always\ URL-encoded.\n\n%|`accept-charset`|`rawquery`|%\n&|`iso-8859-1`|`?key1=&key2=%26%2312399%3B&file=%26%2312399%3B.txt`|&\n&|`utf-8`|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|(none)|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n\nAs\ expected,\ MSIE\ 8\ does\ things\ differently.\n\n%|`method`|`enctype`|`accept-charset`|`rawquery`\ or\ `rawpost`|%\n&|`get`|N/A|`iso-8859-1`\ or\ (none)|`?key1=&key2=%82%CD&file=%82%CD.txt`|&\n&|`get`|N/A|`utf-8`|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|`post`|(none)|`iso-8859-1`\ or\ (none)|`key1=&key2=%82%CD&file=%82%CD.txt`|&\n&|`post`|(none)|`utf-8`|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|`post`|`text/plain`|`iso-8859-1`\ or\ (none)|`key1=<<br>>key2=\[\[82\]\]\[\[CD\]\]<<br>>file=\[\[82\]\]\[\[CD\]\].txt`|&\n&|`post`|`text/plain`|`utf-8`|`key1=<<br>>key2=\[\[E3\]\]\[\[81\]\]\[\[AF\]\]<<br>>file=\[\[E3\]\]\[\[81\]\]\[\[AF\]\].txt`|&\n&|`post`|`multipart/form-data`|`iso-8859-1`\ or\ (none)|`-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key1\"<<br>><<br>><<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key2\"<<br>><<br>>\[\[82\]\]\[\[CD\]\]<<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"\[\[82\]\]\[\[CD\]\].txt\"<<br>>Content-Type:\ text/plain<<br>><<br>>hello<<br>><<br>>-----------------------------7dbf24760414--`|&\n&|`post`|`multipart/form-data`|`utf-8`|`-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key1\"<<br>><<br>><<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key2\"<<br>><<br>>\[\[E3\]\]\[\[81\]\]\[\[AF\]\]<<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"\[\[E3\]\]\[\[81\]\]\[\[AF\]\].txt\"<<br>>Content-Type:\ text/plain<<br>><<br>>hello<<br>><<br>>-----------------------------7dbf24760414--`|&\n\n\[\[82\]\]\[\[CD\]\]?\ \ What's\ that?\ \ Turns\ out\ it's\ shiftjis\ (or\ macJapan\ or\ cp932).\ \ `\[\[encoding\ convertfrom\ shiftjis\ \\x82\\xcd\]\]`\ returns\ は.\n\nTo\ summarize,\ Firefox\ defaults\ to\ utf-8,\ whereas\ MSIE\ 8\ defaults\ to\ ISO-8859-1.\ \ As\ much\ as\ I\ hate\ to\ say\ it,\ MSIE\ 8\ is\ doing\ the\ correct\ thing\ here.\ :^)\ \ The\ other\ big\ difference\ is\ encoding\ non-Latin\ characters\ in\ ISO-8859-1.\ \ Firefox\ encodes\ them\ like\ HTML\ entities,\ whereas\ MSIE\ 8\ encodes\ them\ according\ to\ whatever\ character\ set\ it\ thinks\ is\ most\ appropriate\ for\ that\ character.\ \ Rough.\ \ I\ think\ I'll\ give\ this\ point\ to\ Firefox.\ \ Even\ though\ both\ are\ nonstandard\ behaviors\ (RFC\ 2616\ doesn't\ dictate\ what\ to\ do,\ so\ ''everything''\ is\ nonstandard),\ Firefox's\ approach\ is\ manageable.\n\nI'm\ not\ sure\ how\ I\ can\ deal\ with\ this\ situation.\ \ Different\ browsers\ handle\ the\ default\ (and\ likely\ common)\ case\ in\ different,\ incompatible\ ways,\ and\ at\ least\ two\ browses\ don't\ tell\ the\ server\ what\ charset\ is\ being\ used.\ \ MSIE\ 8's\ approach\ seems\ completely\ unpredictable.\ \ It\ seems\ to\ be\ guessing\ what\ character\ set\ is\ most\ appropriate\ based\ on\ the\ data\ it's\ encoding.\ \ How\ could\ I\ possibly\ reverse\ that\ encoding,\ unless\ I\ already\ knew\ the\ data\ in\ advance!?\ \ Maybe\ I\ try\ every\ possible\ encoding,\ then\ run\ each\ possible\ result\ through\ a\ multilingual\ spellchecker,\ and\ take\ whichever\ one\ has\ the\ fewest\ errors.\ \;^)\n\nSolution?\ \ Require\ everyone\ to\ put\ `content-type=\"utf-8\"`\ in\ their\ forms,\ every\ time.\ \ That's\ the\ only\ case\ in\ which\ Firefox\ and\ MSIE\ 8\ agree.\ \ Grumble\ grumble...\ \ Surely\ I\ can\ do\ better\ than\ that.\n\nOh\ well,\ at\ least\ Firefox\ and\ MSIE\ 8\ both\ use\ consistent\ encodings\ across\ the\ four\ types\ of\ `<form>`s:\n\n\ \ \ *\ `<form\ method=\"get\">`\n\ \ \ *\ `<form\ method=\"post\">`\ or\ `<form\ method=\"application/x-www-form-urlencoded\">`\n\ \ \ *\ `<form\ method=\"post\"\ enctype=\"text/plain\">`\n\ \ \ *\ `<form\ method=\"post\"\ enctype=\"multipart/form-data\">`\n\nOf\ course,\ for\ `get`\ and\ `application/x-www-form-urlencoded`,\ the\ character\ encoding\ is\ additionally\ URL-encoded.\ \ But\ that\ is\ also\ being\ done\ consistently.\ \ The\ inconsistencies\ are\ in:\n\n\ \ \ *\ How\ to\ encode\ characters\ not\ present\ in\ the\ target\ character\ set.\n\ \ \ *\ Which\ character\ set\ to\ use\ when\ the\ `<form>`\ doesn't\ specify.\n\nI\ just\ had\ a\ look\ at\ \[Chrome\].\ \ Like\ MSIE\ 8,\ it\ defaults\ to\ ISO-8859-1\ when\ `accept-charset`\ isn't\ specified.\ \ Like\ Firefox,\ in\ the\ form\ data\ it\ encodes\ characters\ not\ in\ the\ target\ charset\ using\ HTML\ entities.\ \ Like\ both\ browsers,\ it\ doesn't\ send\ the\ charset\ in\ the\ response.\ \ Unlike\ either\ browser,\ in\ the\ form\ metadata\ (filename)\ it\ replaces\ characters\ not\ in\ the\ target\ charset\ with\ question\ marks.\n\n\[dzach\]\ :\ I\ believe\ that\ all\ browsers\ have\ user\ definable\ settings\ regarding\ the\ Accept-Charset\ header,\ so\ the\ differences\ that\ you\ see\ might\ be\ because\ of\ this\ default\ setting.\ Probably\ Firefox's\ default\ is\ utf-8,\ mine\ is.\ You\ could\ look\ in\ the\ Accept-Charset\ HTTP\ request\ header,\ if\ present,\ and\ believe\ that\ it\ tells\ the\ truth.\ Otherwise\ I'd\ choose\ utf-8\ as\ the\ default,\ since\ the\ RFC-2616\ \ \[https://tools.ietf.org/html/rfc2616#section-14.2\]\ specification\ says:\n\ \ If\ no\ Accept-Charset\ header\ is\ present,\ the\ default\ is\ that\ any\n\ \ character\ set\ is\ acceptable.\ If\ an\ Accept-Charset\ header\ is\ present,\n\ \ and\ if\ the\ server\ cannot\ send\ a\ response\ which\ is\ acceptable\n\ \ according\ to\ the\ Accept-Charset\ header,\ then\ the\ server\ SHOULD\ send\n\ \ an\ error\ response\ with\ the\ 406\ (not\ acceptable)\ status\ code,\ though\n\ \ the\ sending\ of\ an\ unacceptable\ response\ is\ also\ allowed.\ \nand\ sending\ a\ 406\ response\ would\ seem\ to\ be\ an\ unnecessary\ punishment\ for\ the\ user.\ Even\ better,\ make\ it\ user\ definable\ in\ Wibble,\ with\ a\ utf-8\ default.\n\nThere\ is\ also\ the\ numeric\ character\ reference\ issue\ (see\ earlier\ on\ this\ page),\ or\ has\ that\ been\ solved\ already?\n\n\[AMG\]:\ Accept-Charset\ describes\ what\ the\ client\ is\ willing\ to\ accept\ from\ the\ server,\ not\ necessarily\ what\ the\ client\ is\ trying\ to\ send\ to\ the\ server.\ \ That's\ what\ the\ charset=\ parameter\ of\ Content-Type\ is\ for.\ \ However,\ I\ haven't\ seen\ any\ browsers\ actually\ use\ charset=\ yet,\ so\ Accept-Charset\ may\ be\ all\ that\ I\ have\ to\ go\ on.\ \ I'll\ research\ how\ well\ Accept-Charset\ aligns\ with\ the\ actual\ encodings\ used\ by\ the\ browser.\ \ Also\ I'll\ try\ changing\ the\ language\ options\ to\ see\ what\ effect\ that\ has.\ \ My\ pessimistic\ guess\ is\ that\ all\ it\ does\ is\ control\ the\ outgoing\ Accept-Charset,\ without\ affecting\ the\ charset\ used\ to\ encode\ form\ data.\ \ Also,\ \"Accept-Charset:\ *\"\ tells\ me\ nothing.\ \ And\ I\ still\ don't\ know\ what\ to\ do\ in\ the\ case\ of\ MSIE\ 8\ sending\ me\ shiftjis!\n\nI\ have\ not\ solved\ the\ numeric\ character\ reference\ issue\ yet,\ since\ I'm\ concerned\ about\ overzealously\ applying\ character\ substitution.\ \ For\ instance,\ if\ the\ user\ tries\ to\ upload\ HTML,\ and\ the\ browser\ doesn't\ apply\ any\ extra\ encoding\ on\ top\ of\ the\ text\ entered\ by\ the\ user,\ the\ server\ should\ ''not''\ replace\ character\ references.\ \ I\ will\ need\ to\ do\ some\ more\ investigation\ to\ see\ how\ each\ major\ browser\ handles\ HTML-like\ text\ entry.\ \ Fixing\ one\ problem\ is\ all\ well\ and\ good,\ but\ not\ if\ it\ creates\ two\ more!\n\nOne\ thing\ I\ haven't\ established\ is\ whether\ Firefox's\ default\ form\ charset\ is\ ''really''\ ISO-8859-1\ or\ if\ it's\ ASCII.\ \ They\ look\ alike,\ for\ the\ text\ I\ used.\ \ If\ it's\ actually\ ASCII,\ then\ I\ am\ free\ to\ assume\ UTF-8,\ which\ is\ fully\ compatible\ with\ ASCII\ but\ not\ ISO-8859-1,\ except\ to\ the\ extent\ that\ ISO-8859-1\ is\ also\ compatible\ with\ ASCII.\n\nThe\ problem\ I'm\ working\ is\ unrelated\ to\ 406\ errors.\ \ I'm\ trying\ to\ interpret\ the\ client's\ request,\ and\ 406\ is\ only\ appropriate\ when\ the\ server\ is\ unable\ to\ generate\ a\ response\ that\ meets\ the\ client's\ requirements.\ \ Besides,\ Tcl\ has\ such\ powerful\ \[encoding\]\ capabilities\ that\ I\ would\ be\ shocked\ if\ I\ ever\ actually\ needed\ to\ send\ a\ 406,\ at\ least\ once\ I\ implement\ outbound\ charset\ encoding!\n\nMeanwhile,\ I\ think\ I'll\ try\ lobbying\ the\ major\ open\ source\ browsers\ to\ include\ Content-Type\ charset\ with\ the\ form\ data\ in\ their\ requests.\n\n***Wibble\ doesn't\ play\ nice\ with\ ab\ (apache\ benchmark)***\n\n\[jnc\]\n\n\ tclsh8.6\ wibble.tcl\n\nIn\ another\ console\n\n\ \ ab\ -n\ 500\ -c\ 5\ http://localhost:8080/vars\n\nWibble\ accepts\ and\ responds\ to\ the\ first\ requests\ but\ for\ some\ reason\ ab\ is\ not\ accepting\ the\ result\ as\ complete\ and\ is\ not\ making\ the\ subsequent\ 495\ requests.\ Eventually\ ab\ gives\ up\ and\ reports:\n\n\ \ apr_poll:\ The\ timeout\ specified\ has\ expired\ (70007)\n\nI\ know\ it\ sounds\ a\ little\ like\ an\ ab\ bug\ because\ wibble\ seems\ to\ work\ with\ the\ browsers\ I\ tested\ but\ I\ can't\ imagine\ that\ the\ apache\ benchmark\ tool\ is\ broke,\ it\ works\ with\ other\ web\ servers\ I\ have\ used\ it\ on\ over\ the\ past\ few\ years.\ I\ wonder\ if\ there\ isn't\ some\ subtle\ thing\ that\ wibble\ isn't\ doing/closing\ out\ and\ the\ browsers\ are\ being\ forgiving\ while\ ab\ is\ not?\n\n***contentsize\ required\ when\ using\ contentchan***\n\n\[AMG\]:\ My\ intention\ was\ that\ contentsize\ be\ optional\ when\ using\ contentchan,\ but\ from\ looking\ at\ the\ code\ I\ can\ see\ that\ using\ contentchan\ without\ contentsize\ will\ result\ in\ an\ error\ when\ \[\[defaultsend\]\]\ tries\ to\ calculate\ the\ value\ of\ end.\n\nThat's\ not\ the\ only\ problem.\ \ The\ server\ would\ either\ need\ to\ use\ Connection:\ Close\ or\ chunked\ encoding,\ since\ those\ are\ the\ only\ two\ ways\ (I\ know\ of)\ to\ tell\ the\ client\ when\ the\ data\ ends\ without\ telling\ it\ the\ total\ size\ up\ front.\ \ \[JBR\]\ requests\ chunked\ encoding.\n\nWhen\ Wibble\ doesn't\ know\ the\ size\ in\ advance,\ it\ can't\ use\ \[\[\[chan\ copy\]\]\]\ with\ chunked\ encoding.\ \ That's\ because\ each\ chunk\ has\ a\ size\ header,\ and\ Wibble\ won't\ know\ until\ after\ \[\[chan\ copy\]\]\ terminates\ whether\ the\ chunk\ was\ cut\ short\ by\ EOF.\ \ Therefore,\ in\ this\ situation\ Wibble\ must\ buffer\ each\ chunk.\n\nHow\ large\ should\ the\ chunks\ be?\ \ The\ chunk\ size\ represents\ a\ tradeoff\ between\ overhead\ and\ latency.\ \ Likely\ the\ best\ thing\ to\ do\ is\ have\ Wibble\ do\ ''nonblocking''\ reads\ from\ the\ contentchan.\ \ This\ will\ minimize\ latency,\ and\ the\ chunks\ may\ have\ variable\ size.\n\nOne\ wacky\ possibility\ is\ that\ the\ contentchan\ is\ a\ \[refchan\],\ socket,\ pipe,\ etc.\ connected\ to\ a\ real-time\ data\ provider.\ \ Nonblocking\ reads\ will\ serve\ well.\ \ However,\ the\ only\ advantage\ of\ using\ chunked\ encoding\ in\ this\ situation\ instead\ of\ Connection:\ Close\ is\ that\ when\ the\ data\ source\ terminates,\ the\ connection\ can\ remain\ open.\n\n\[Bezoar\]\ 02-25-2016\n\nI\ have\ been\ force\ to\ do\ some\ development\ on\ Windows\ and\ have\ found\ that\ wibbles::handle\ function\ corrupts\ the\ zh_dict.\ On\ linux\ the\ same\ proc\ works\ correctly.\ The\ problem\ seems\ to\ be\ the\ file\ split\ on\ windows.\ The\ file\ split\ on\ windows\ seems\ to\ convert\ '\[\ file\ split\ \"/handle\\\\x0/2\"\ \]'\ to\ '/\ /handle\ /2'\ \ rather\ than\ '/\ handle\ 2'.\ Ive\ had\ to\ correct\ the\ dict\ as\ follows\n\nCurrent\ Code:\ \n\n======\nproc\ ::wibble::handle\ \{prefix\ cmd\ args\}\ \{\n\ \ \ \ \ set\ command\ \[concat\ \[list\ \$name\]\ \[lrange\ \$cmd\ 1\ end\]\]\n\ \ \ \ \ lappend\ zonehandlers\ \$prefix\ \$command\ \$args\n\ \ \ \ \ set\ h_count\ \[expr\ \[llength\ \$zonehandlers\]/3\ -\ 1\]\n\ \ \ \ \ dict\ set\ zh_dict\ \{*\}\[file\ split\ \$prefix/handlers\\x0/\$h_count\]\ \[list\ \$prefix\ \$command\ \$args\]\n\ \ \ \ \ #\ New:\ return\ place\ of\ newly-added\ handler\ in\ list.\n\ \ \ \ \ return\ \$h_count\ \n\ \}\n\ \n\n======\n\nCorrected\ Code\ \ :\n======\nproc\ ::wibble::handle\ \{prefix\ cmd\ args\}\ \{\n\ \ \ \ \ set\ command\ \[concat\ \[list\ \$name\]\ \[lrange\ \$cmd\ 1\ end\]\]\n\ \ \ \ \ lappend\ zonehandlers\ \$prefix\ \$command\ \$args\n\ \ \ \ \ set\ h_count\ \[expr\ \[llength\ \$zonehandlers\]/3\ -\ 1\]\n\ \ \ \ \ \ \ set\ slist\ \{\}\n\ \ \ \ \ \ \ if\ \{\ \$prefix\ eq\ \"/\"\ \}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ set\ slist\ \[list\ \"/\"\ handlers\\x0\ \$h_count\ \]\n\ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ set\ slist\ \[\ concat\ \[file\ split\ \$prefix\ \]\ handlers\\x0\ \$h_count\ \]\ \n\ \ \ \ \}\ \n\ \ \ \ dict\ set\ zh_dict\ \{*\}\$slist\ \ \[list\ \$prefix\ \$command\ \$args\]\n\ \ \ \ #\ New:\ return\ place\ of\ newly-added\ handler\ in\ list.\n\ \ \ \ return\ \$h_count\ \n\ \}\n\ \n\n\n----\n**Resolved\ bugs**\n\n***\[\[getline\]\]\ line\ limit\ check***\n\n\[APN\]\ Does\ the\ getline\ proc\ have\ a\ bug\ in\ its\ line\ limit\ check?\ It\ appears\ it\ will\ raise\ an\ error\ if\ the\ ''total''\ buffered\ input\ exceeds\ maxline,\ as\ opposed\ to\ individual\ lines\ exceeding\ that\ limit.\ For\ example,\ if\ it\ receives\ 50\ lines\ of\ 100\ chars\ each\ in\ a\ single\ network\ chunk,\ it\ will\ raise\ an\ error\ when\ it\ should\ not.\n\n\[AMG\]:\ I\ think\ you're\ right,\ but\ at\ the\ moment\ I\ don't\ have\ the\ means\ to\ test.\ \ If\ a\ client\ sends\ a\ bunch\ of\ short\ lines\ while\ the\ server\ is\ taking\ a\ nap,\ when\ the\ server\ wakes,\ it\ will\ act\ as\ if\ the\ client\ had\ sent\ one\ long\ line.\ \ For\ some\ reason\ I\ had\ thought\ \[\[\[chan\ pending\]\]\]\ would\ tell\ me\ how\ many\ characters\ would\ be\ in\ the\ next\ line\ returned\ by\ \[\[\[chan\ gets\]\]\],\ but\ here\ this\ is\ only\ the\ case\ when\ there's\ at\ most\ one\ line\ per\ packet.\ \ By\ \"packet\"\ I\ mean\ the\ batch\ of\ input\ characters\ which\ triggered\ the\ readable\ event.\n\nWhat's\ the\ right\ way\ to\ handle\ this?\n\nWhen\ I\ detect\ that\ the\ next\ \[\[gets\]\]\ ''may''\ exceed\ \$maxline,\ I\ could\ instead\ do\ a\ \[\[\[chan\ read\]\ \[\[expr\ \{\$maxline\ +\ 1\}\]\]\]\]\ and\ check\ for\ newline.\ \ If\ there's\ a\ newline,\ everything\ up\ to\ the\ (first)\ newline\ is\ the\ next\ line\ to\ process,\ and\ the\ newline\ itself\ is\ discarded.\ \ But\ what\ do\ I\ do\ with\ the\ rest\ of\ the\ input?\ \ What\ has\ been\ read,\ cannot\ be\ unread.\ \ Basically\ I\ would\ have\ to\ move\ the\ input\ buffering\ facility\ from\ the\ Tcl\ I/O\ subsystem\ into\ my\ own\ script.\ \ Things\ get\ really\ nasty\ when\ the\ read\ contains\ both\ HTTP\ header\ and\ body\ text,\ because\ the\ header\ text\ needs\ CR/LF\ translation\ and\ the\ body\ text\ does\ not.\ \ I\ can't\ put\ CR's\ back\ into\ the\ body\ text,\ because\ I\ don't\ know\ if\ they\ were\ there\ originally.\ \ So\ I\ would\ not\ be\ able\ to\ use\ Tcl's\ CR/LF\ translation\ for\ the\ header\;\ instead\ I'd\ need\ to\ strip\ CR's\ in\ the\ script.\n\nI\ think\ the\ goal\ of\ \[\[chan\ pending\]\]\ was\ to\ avoid\ having\ to\ do\ this,\ because\ the\ approach\ I\ have\ described\ is\ the\ same\ as\ what\ is\ required\ in\ the\ absence\ of\ \[\[chan\ pending\]\].\n\n\[APN\]\ How\ about\ doing\ the\ chan\ gets\ before\ the\ chan\ pending?\ Something\ like\ -\n\n======\nproc\ wibble::getline\ \{chan\}\ \{\n\ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ if\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ Got\ a\ whole\ line,\ may\ be\ more\ than\ 4096,\ but\ that's\ ok\n\ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ blocked\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ Partial\ line,\ see\ if\ max\ length\ exceeded\n\ \ \ \ \ \ \ \ \ \ \ \ if\ \{\[chan\ pending\ input\ \$chan\]\ >\ 4096\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ Be\ anal.\ Do\ a\ gets\ again\ since\ data\ might\ have\ come\ in\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ between\ the\ chan\ gets\ and\ the\ chan\ pending\ calls.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ For\ example,\ there\ are\ 4000\ bytes\ without\ crlf\ when\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ the\ chan\ gets\ was\ called.\ Then\ another\ 1000\ bytes\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ arrive\ with\ a\ crlf\ at\ the\ front\ before\ the\ chan\ pending\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ call.\ The\ line\ length\ limit\ is\ not\ exceeded\ in\ that\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ case\ even\ through\ chan\ pending\ returns\ >\ 4096.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ error\ \"line\ length\ greater\ than\ 4096\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ Incomplete\ line,\ but\ not\ exceeding\ max\ length.\ Wait\ for\ more.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ EOF\n\ \ \ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \}\n\}\n======\n\n\[AMG\],\ 2010-11-06:\ I\ reverted\ the\ changes\ I\ made\ for\ this\ bug.\ \ I\ really\ don't\ think\ they're\ necessary.\ \ Here's\ my\ current\ code:\n\n======\n#\ Get\ a\ line\ of\ data\ from\ a\ channel.\nproc\ wibble::getline\ \{chan\}\ \{\n\ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ if\ \{\[chan\ names\ \$chan\]\ eq\ \"\"\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ pending\ input\ \$chan\]\ >\ 4096\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ error\ \"line\ length\ exceeds\ limit\ of\ 4096\ bytes\"\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \}\n\}\n======\n\nIf\ a\ line\ is\ available,\ no\ matter\ how\ long\ it\ is,\ it\ gets\ returned.\ \ If\ there\ isn't\ a\ complete\ line,\ but\ there's\ more\ than\ 4K\ in\ the\ buffer\ (and\ no\ CRLF),\ the\ client\ is\ sending\ too\ long\ a\ line,\ and\ an\ error\ report\ is\ generated\ and\ the\ connection\ is\ closed.\ \ The\ client\ can\ send\ a\ long\ line,\ but\ there's\ no\ guarantee\ that\ it'll\ work.\ \ Also\ there's\ no\ guarantee\ that\ it'll\ get\ rejected\ immediately,\ but\ I\ think\ this\ is\ okay,\ so\ long\ as\ it\ gets\ rejected\ before\ the\ server\ runs\ out\ of\ RAM!\n\nNote\ that\ this\ only\ affects\ \[\[getline\]\].\ \ \[\[getblock\]\]\ has\ no\ line\ length\ limits,\ since\ it's\ not\ line-oriented.\ \ Line-oriented\ reads\ are\ only\ done\ for\ HTTP\ headers\ and\ such,\ which\ the\ client\ can\ legally\ break\ onto\ multiple\ lines.\n\nI\ suppose\ the\ next\ problem\ is\ the\ client\ sending\ a\ never-ending\ stream\ of\ header\ data,\ split\ onto\ multiple\ lines!\n\n***Invalid\ command\ name\ process,\ C\ stack\ busy***\n\n\[APN\]:\ When\ run\ against\ Tcl\ built\ against\ CVS\ head\ as\ of\ 12/18/2009,\ wibble\ fails\ with\ the\ error\ \"invalid\ command\ name\ process\".\ The\ fix\ is\ to\ change\ the\ coroutine\ call\ in\ wibble::accept\ to\ pass\ in\ \[\[\[namespace\ current\]\]\]::process\ instead\ of\ process.\ The\ question\ is\ whether\ this\ change\ in\ coroutine\ command\ in\ how\ it\ resolves\ names\ was\ intentional\ or\ not.\n\n\[AMG\]:\ That's\ odd.\ \ \[\[namespace\ current\]\]::process\ works\ fine,\ but\ \[\[\[namespace\ code\]\ process\]\]\ does\ not.\ \ When\ I\ try\ the\ latter,\ I\ get\ \"cannot\ yield:\ C\ stack\ busy\".\ \n\n\[MS\]\ (after\ today's\ fix,\ but\ it\ shouldn't\ matter)\n\ \ \ \ \n\ \ \ \ %\ proc\ a\ \{\}\ \{yield\;\ return\ hello\}\n\ \ \ \ %\ coroutine\ foo\ \{*\}\[namespace\ code\ a\]\n\ \ \ \ %\ foo\n\ \ \ \ hello\n\n\[APN\]\ The\ current\ coroutine\ docs\ state\ '''At\ the\ point\ when\ command\ is\ called,\ the\ current\ namespace\ will\ be\ the\ global\ namespace\ and\ there\ will\ be\ no\ stack\ frames\ above\ it\ (in\ the\ sense\ of\ upvar\ and\ uplevel).\ However,\ which\ command\ to\ call\ will\ be\ determined\ in\ the\ namespace\ that\ the\ coroutine\ command\ was\ called\ from.'''\ Based\ on\ this\ I\ would\ presume\ the\ original\ code\ should\ have\ worked\ as\ well\ (without\ qualifiers)\ since\ the\ coroutine\ command\ should\ have\ resolved\ '''process'''\ to\ be\ '''wibble::process'''\ based\ on\ the\ namespace\ it\ was\ called\ from.\ This\ behaviour\ seems\ to\ have\ changed\ very\ recently,\ perhaps\ the\ documentation\ has\ not\ been\ updated?\n\n\[MS\]\ 2009-12-19\ looks\ like\ \[\[bug\ #2917627\]\]\ \[https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2917627&group_id=10894\],\ fixed\ in\ head.\n\n\[APN\]\ Confirmed.\n\n\[AMG\]:\ MS,\ thanks\ for\ your\ fix.\ \ I\ reverted\ my\ workaround,\ since\ it's\ no\ longer\ needed.\ \ New\ topic:\ Why\ does\ \[\[namespace\ code\]\]\ add\ a\ frame\ to\ the\ C\ stack?\n\n\[AMG\]:\ As\ of\ 2010-11-07,\ \[\[\[namespace\ code\]\]\]\ seems\ to\ work\ fine.\n\n***Wibble\ hangs\ on\ IE8\ POST***\n\n\[dzach\]\ 2010-05-12:\ I've\ encountered\ an\ odd\ problem\ with\ proc\ ''getblock'',\ which\ hangs\ wibble\ when\ sending\ a\ POST\ request\ from\ an\ IE8\ in\ Windows\ XP:\ the\ ''append\ chunk\ \$chunklet''\ line\ seems\ to\ be\ failing\ to\ initialize\ variable\ ''chunk''.\ IE8,\ in\ my\ configuration,\ sends\ the\ POST\ content\ in\ two\ packets,\ the\ first\ of\ which\ is\ empty\ (size\ is\ 0).\ When\ this\ happens\ wibble\ hangs\ for\ ever.\ Firefox\ sends\ the\ POST\ content\ in\ one\ packet\ and\ the\ problem\ does\ not\ appear.\ Initializing\ ''chunk''\ outside\ the\ ''while''\ loop\ with:\ ''set\ chunk\ \"\"'',\ solves\ the\ problem.\ In\ the\ test\ setup,\ Wibble\ runs\ on\ Linux\ with\ Tcl8.6b1.2.\n\n\[AMG\]:\ That\ is\ very\ odd.\ \ I\ should\ get\ IE8\ for\ myself\ to\ test,\ but\ I\ don't\ have\ it\ right\ this\ minute.\ \ Two-argument\ \[\[\[append\]\]\]\ always\ creates\ the\ variable\ if\ it\ doesn't\ exist,\ regardless\ of\ its\ second\ argument.\ \ In\ tclExecute.c,\ doStoreStk\ is\ part\ of\ the\ implementation\ of\ compiled\ append,\ and\ it\ calls\ TclObjLookupVarEx()\ with\ the\ \"create\"\ flags\ set.h.\ \ For\ the\ sake\ of\ argument,\ let's\ say\ that\ \[\[append\]\]\ is\ buggy\ and\ fails\ to\ create\ chunk\ when\ chunklet\ is\ empty.\ \ \$size\ should\ be\ nonzero\ in\ this\ case,\ and\ \[\[chan\ eof\ \$chan\]\]\ is\ false\ because\ IE8\ hasn't\ closed\ the\ channel\ yet,\ so\ \[\[\[yield\]\]\]\ is\ executed.\ \ Nowhere\ in\ this\ code\ path\ does\ chunk's\ existence\ matter,\ and\ there's\ no\ reason\ why\ pre-creating\ chunk\ would\ have\ an\ effect.\ \ The\ little\ information\ available\ suggests\ there's\ a\ bug\ in\ Tcl,\ but\ I\ can't\ say\ for\ sure.\ \ Please\ put\ in\ some\ debug\ prints\ to\ stderr.\ \ In\ particular,\ I\ want\ prints\ before\ and\ after\ the\ yield,\ to\ see\ which\ branch\ of\ the\ \[\[\[if\]\]\]\ is\ being\ taken\ and\ whether\ the\ coroutine\ is\ ever\ resumed.\ \ Do\ this\ both\ without\ and\ with\ your\ ''set\ chunk\ \"\"''\ workaround.\ \ Oh,\ another\ question:\ When\ Wibble\ hangs,\ does\ it\ take\ up\ 0%\ or\ 100%\ CPU\ time?\n\n\[dzach\]:\ The\ tests\ run\ with\ utf-8\ encoding.\ I\ noticed\ that\ when\ I\ insert\ the\ ''puts\ stderr\ \$chunklet''\ line,\ the\ problem\ does\ not\ appear.\ If\ I\ take\ out\ that\ line,\ I\ reproduce\ the\ problem.\ CPU\ load\ is\ 100%.\ Here\ come\ the\ tests:\n\n\ #\ test\ proc\n\ proc\ ::wibble::getblock\ \{chan\ size\}\ \{\n\ \ \ \ \ puts\ stderr\ size=\$size\n\ \ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ set\ chunklet\ \[chan\ read\ \$chan\ \$size\]\n\ \ \ \ \ \ \ \ puts\ stderr\ \"\{\$size\ -\ \[string\ length\ \$chunklet\]\}\"\n\ \ \ \ \ \ \ \ set\ size\ \[expr\ \{\$size\ -\ \[string\ length\ \$chunklet\]\}\]\n\ \ \ \ \ \ \ \ append\ chunk\ \$chunklet\n\ \ \ \ \ \ \ \ if\ \{\$size\ ==\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ size==0\"\n\ \ \ \ \ \ \ \ \ \ return\ \$chunk\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ eof\"\n\ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"before\ yield\"\n\ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"after\ yield\"\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \}\n\ \}\n\n\ #\ output\n\ size=56\n\ \{56\ -\ 0\}\n\ before\ yield\n\ after\ yield\n\ \{56\ -\ 56\}\n\n\ #\ test\ proc\ (work-around)\n\ proc\ ::wibble::getblock\ \{chan\ size\}\ \{\n\ \ \ \ \ puts\ stderr\ size=\$size\n\ \ \ \ \ set\ chunk\ \"\"\n\ \ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ set\ chunklet\ \[chan\ read\ \$chan\ \$size\]\n\ \ \ \ \ \ \ \ puts\ stderr\ \"\{\$size\ -\ \[string\ length\ \$chunklet\]\}\"\n\ \ \ \ \ \ \ \ puts\ stderr\ chunklet=\$chunklet\n\ \ \ \ \ \ \ \ set\ size\ \[expr\ \{\$size\ -\ \[string\ length\ \$chunklet\]\}\]\n\ \ \ \ \ \ \ \ append\ chunk\ \$chunklet\n\ \ \ \ \ \ \ \ if\ \{\$size\ ==\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ size==0\"\n\ \ \ \ \ \ \ \ \ \ return\ \$chunk\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ eof\"\n\ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"before\ yield\"\n\ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"after\ yield\"\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \}\n\ \}\n\n\ #\ output\ (work-around)\n\ size=56\n\ \{56\ -\ 0\}\n\ chunklet=\n\ before\ yield\n\ after\ yield\n\ \{56\ -\ 56\}\n\ chunklet=name=&geometry=38.079226,23.930992&description=&dtstart=\n\ chunk=name=&geometry=38.079226,23.930992&description=&dtstart=\n\ return\ size==0\n\n\[AMG\]:\ I\ created\ a\ simple\ method=\"post\"\ form,\ installed\ IE8,\ and\ submitted\ data\ several\ times.\ \ IE8\ always\ sent\ the\ full\ form\ data\ as\ a\ single\ packet,\ with\ no\ incomplete\ or\ zero-sized\ reads.\ \ Is\ there\ anything\ special\ about\ your\ form\ that's\ making\ IE8\ do\ what\ it\ does?\ \ I'm\ running\ the\ Tcl\ CVS\ HEAD\ compiled\ earlier\ today\ (12\ May\ 2010,\ ChangeLog\ revision\ 1.5108),\ on\ \[Slackware\]\ 13.0.\ \ How\ about\ you?\n\nAlso,\ please\ explain\ why\ you\ stated\ that\ you\ used\ UTF-8\ encoding.\ \ At\ present,\ Wibble\ is\ only\ designed\ to\ work\ with\ ISO8859-1,\ because\ it\ doesn't\ have\ the\ smarts\ to\ figure\ out\ what\ encodings\ are\ acceptable\ to\ the\ client,\ and\ ISO8859-1\ is\ the\ legal\ fallback\ in\ that\ case.\ \ (I\ have\ a\ project\ to\ fix\ this,\ but\ it's\ stalled\ due\ to\ lack\ of\ time.)\ \ How\ is\ it\ that\ you're\ using\ UTF-8?\n\nIf\ simply\ \[puts\]'ing\ the\ value\ of\ \$chunklet\ makes\ the\ problem\ go\ away,\ most\ likely\ there\ is\ a\ Tcl\ bug.\ \ Please\ explore\ this\ a\ little\ more.\ \ What\ happens\ if\ you\ take\ the\ value\ of\ \$chunklet\ and\ ignore\ it?\ \ For\ example,\ \"list\ \$chuklet\".\ \ If\ adding\ that\ makes\ the\ problem\ go\ away,\ something\ is\ definitely\ going\ wacky\ under\ the\ hood,\ and\ you\ should\ file\ a\ bug\ report.\ \ Or\ maybe\ it's\ the\ puts\ that's\ having\ an\ effect,\ and\ it\ doesn't\ matter\ what\ text\ is\ being\ printed.\ \ Again,\ that's\ a\ Tcl\ problem,\ since\ the\ output\ channel\ is\ unrelated\ to\ network\ communication\ with\ the\ client.\n\n100%\ CPU\ load\ suggests\ that\ \[\[\[chan\ read\]\]\]\ is\ returning\ empty\ string,\ yet\ the\ channel\ is\ still\ readable\ and\ \[\[\[chan\ event\]\]\]\ keeps\ calling\ into\ the\ \[coroutine\].\ \ That\ doesn't\ make\ sense\ to\ me.\n\nI\ see\ one\ more\ oddity\ I\ can't\ explain.\ \ Your\ debug\ print\ shows\ geometry\ to\ be\ \"38.079226,23.930992\",\ yet\ both\ Firefox\ 3.6.3\ and\ IE8\ encode\ this\ as\ \"38.079226%2C23.930992\".\ \ What's\ going\ on\ here?\ \ Why\ isn't\ your\ comma\ also\ encoded?\n\nBy\ the\ way,\ this\ page\ is\ getting\ too\ long\ for\ my\ liking.\ \ I\ wish\ some\ benevolent\ \[wikignomes\]\ would\ clean\ it\ up\ for\ me!\ :^)\n\n\[dzach\]:\ Wibble,\ in\ my\ tests,\ substitutes\ a\ hand\ crafted\ -for\ speed-\ web\ server,\ and\ runs\ on\ a\ Kubuntu\ (2.6.32-22\ kernel)\ with\ tcl\ from\ CVS\ HEAD\ (changelog\ 1.5098/Sat\ May\ \ 1\ 11:20:12\ 2010).\ IE8\ is\ on\ the\ same\ machine\ running\ windows\ XP\ in\ a\ Sun's\ VirtualBox.\ \n\nThe\ POST\ is\ sent\ using\ a\ typical\ javascript\ AJAX\ XMLHttpRequest:\ `req.setRequestHeader(\"Content-type\",\ \"application/x-www-form-urlencoded\;\ charset=utf-8\")`.\ In\ my\ narrow\ scope\ application,\ Wibble\ serves\ UTF\ encoded\ (mostly\ dynamic)\ content\ (in\ Greek)\ and\ user\ POSTed\ submissions\ get\ decoded\ inside\ a\ modified\ `wibble::process`\ using\ `encoding\ convertto\ UTF-8\ \[\[dict\ get\ \$response\ content\]\]`\ (could\ as\ well\ do\ it\ outside\ Wibble,\ since\ all\ this\ is\ done\ after\ `getblock`\ finishes\ reading\ the\ channel).\ I\ suspect\ UTF-8\ might\ have\ something\ to\ do\ with\ the\ problem.\ \n\nThe\ POST\ content\ send\ by\ the\ browser\ using\ AJAX\ is\ encoded\ with\ javascript's\ ''encodeURI()''\ which\ leaves\ some\ characters\ (,\ /\ ?\ :\ @\ &\ =\ +\ \$\ #)\ unencoded.\ I'll\ check\ if\ something\ is\ missing\ or\ if\ changing\ this\ makes\ a\ difference.\ I'll\ also\ try\ to\ run\ IE8\ on\ another\ machine\ using\ a\ native\ windows\ XP\ installation.\n\n(A\ little\ later):\ IE8\ hangs\ the\ server\ from\ a\ native\ XP\ installation\ too.\ Changing\ the\ ''encodeURI''\ to\ ''encodeURIComponent''\ (which\ encodes\ comma\ too)\ has\ no\ effect\ to\ the\ result.\ Away\ from\ the\ UTF\ issue,\ it\ seems\ that\ the\ problem\ is\ cured\ as\ soon\ as\ ''chunklet''\ gets\ \"stringified\".\ I\ guess\ it\ would\ be\ worth\ while\ to\ test\ if\ ''chan\ read''\ returns\ a\ string\ type\ when\ it\ reads\ nothing.\ Having\ said\ that,\ initializing\ ''chunklet''\ instead\ of\ ''chunk''\ doesn't\ help,\ so\ my\ guessing\ might\ be\ wrong.\n\n\[AMG\]:\ dzach,\ thanks\ for\ your\ testing.\ \ Please\ keep\ us\ posted.\ \ If\ anyone\ has\ a\ clue\ what\ might\ be\ going\ on\ here,\ please\ jump\ in\ and\ give\ me\ a\ hint,\ 'cuz\ I'm\ mystified.\ :^)\n\n\[dzach\]:\ I\ like\ Wibble's\ Zen,\ so\ testing\ it\ is\ fun,\ given\ the\ existance\ of\ a\ work-around\ for\ this\ problem.\ If\ I\ get\ the\ time,\ I'll\ try\ to\ write\ a\ simple\ AJAX\ test\ so\ that\ others\ can\ reproduce\ it.\n\n\[dzach\]\ 2010-5-17:\ The\ problem\ appears\ with\ an\ unmodified\ Wibble\ (ISO8859-1\ encoding),\ ruling\ out\ a\ UTF-8\ involvement.\nTo\ reproduce\ the\ error,\ use\ the\ code\ in\ http://paste.tclers.tk/2087\ and\ Internet\ Explorer\ 8.\n\n\[AMG\]:\ Works\ fine\ for\ me.\ \ Am\ I\ doing\ something\ wrong?\ \;^)\ \ I\ get\ the\ same\ behavior\ for\ both\ Firefox\ and\ IE8:\n\n======\nsize=22\n\{22\ -\ 22\}\nreturn\ size==0\n======\n\nTonight\ I\ plan\ to\ cobble\ together\ a\ HTTP\ client\ that\ will\ split\ its\ POSTs\ over\ two\ packets,\ just\ to\ see\ if\ this\ really\ is\ the\ cause\ of\ the\ problem.\n\n\[AMG\]:\ Looks\ like\ I\ haven't\ found\ time\ to\ do\ this,\ yet.\n\n\[AMG\],\ update:\ This\ appears\ to\ be\ a\ Tcl\ bug,\ which\ I\ have\ now\ reported\ on\ Sourceforge\ \[http://sourceforge.net/tracker/?func=detail&aid=3091701&group_id=10894&atid=110894\].\ \ Thanks\ \[dzach\]\ for\ bringing\ it\ to\ my\ attention!\ \ It\ sure\ took\ me\ a\ long\ time\ to\ reproduce\ it.\n\n\[dzach\]\ I'm\ glad\ to\ see\ the\ bug\ nailed.\n\n\[AMG\]:\ The\ bug\ is\ fixed\ in\ the\ Tcl\ CVS\ HEAD,\ as\ of\ 2010-09-15.\ \ See\ \[http://sourceforge.net/tracker/?func=detail&aid=3067036&group_id=10894&atid=110894\].\ \ Thanks\ \[DKF\]!\ \ I\ confirmed\ the\ fix\ earlier\ tonight.\n\n***Error:\ command\ returned\ bad\ code:\ 2***\n\n\[AMG\]:\ This\ error\ pops\ up\ whenever\ the\ client\ closes\ the\ connection,\ at\ least\ when\ Wibble\ is\ running\ inside\ \[tkcon\].\ \ I\ really\ have\ no\ clue\ what's\ going\ on.\ \ I\ reported\ it\ on\ the\ Tcl\ bug\ tracker:\ \[https://sourceforge.net/tracker/?func=detail&aid=3104471&group_id=10894&atid=110894\].\ \ This\ is\ new\;\ older\ CVS\ versions\ didn't\ have\ this\ issue.\n\n\[AMG\],\ update:\ Alright,\ I've\ found\ a\ workaround.\ \ In\ \[\[listen\]\],\ change\ this\ line:\n\n======\nchan\ event\ \$socket\ readable\ \[namespace\ code\ \$socket\]\n======\n\nto\ instead\ read:\n\n======\nchan\ event\ \$socket\ readable\ \[list\ apply\ \[list\ \{\}\ \[list\ \$socket\]\ \[namespace\ current\]\]\]\n======\n\nIt's\ ugly,\ but\ it\ works.\n\n\[AMG\],\ update\ #2:\ I\ incorporated\ this\ workaround\ in\ the\ latest\ version.\ \ I\ hope\ to\ back\ it\ out\ again\ someday,\ 'cuz\ it's\ gross.\n\n\[AMG\],\ update\ #3:\ Now\ the\ workaround\ is\ integral\ to\ the\ way\ Wibble\ wakes\ up\ coroutines.\ \ See\ \[\[wake\]\].\ \ Now\ this\ is\ just\ a\ Tcl\ bug,\ with\ no\ impact\ on\ Wibble.\n\n\[AMG\],\ update\ #4:\ Further\ Wibble\ developments\ have\ caused\ this\ workaround\ to\ no\ longer\ be\ integral,\ so\ once\ again\ it's\ back\ to\ being\ an\ annoying\ bug.\ \ Hopefully\ the\ bug\ will\ be\ fixed\ one\ day\ so\ that\ I\ can\ simplify\ \[\[listen\]\]\ to\ have\ \[\[\[socket\]\]\]\ call\ \[\[\[coroutine\]\]\]\ directly\ without\ using\ \[\[\[apply\]\]\]\ to\ construct\ an\ intervening\ stack\ layer.\ \ At\ that\ time\ I'll\ also\ have\ to\ update\ \[\[cleanup\]\]\ to\ use\ \[\[\[upvar\]\ #1\]\]\ instead\ of\ \[\[upvar\ #2\]\].\n\n***Zone\ handlers\ applied\ in\ wrong\ order***\n\n\[AMG\]:\ Zone\ handlers\ should\ be\ applied\ in\ the\ order\ they're\ defined.\ \ Instead,\ they're\ sorted\ first\ by\ zone\ (directory).\n\n\[AMG\]:\ This\ is\ now\ fixed.\n\n***Filename\ spoofing***\n\n\[AMG\]:\ It's\ possible\ to\ defeat\ pattern-based\ filtering\ by\ appending\ a\ NUL\ (%00)\ to\ the\ URL.\ \ I\ blame\ Tcl.\ \;^)\ \ See\ \[https://sourceforge.net/tracker/?func=detail&aid=3118489&group_id=10894&atid=110894\].\n\n\[SEH\]\ --\ note\ that\ on\ the\ page\ \[Tilde\ Substitution\]\ I've\ placed\ a\ package\ of\ procs\ that\ can\ substitute\ by\ aliasing\ for\ the\ standard\ Tcl\ file\ access\ commands\;\ these\ procs\ both\ eliminate\ tilde\ substitution\ and\ throw\ errors\ if\ NULs\ are\ found\ in\ pathnames.\ \ This\ package\ should\ transparently\ insulate\ Wibble\ and\ any\ other\ Tcl\ code\ from\ the\ above\ vulnerability.\n\n\[AMG\]:\ This\ is\ now\ fixed\ in\ Tcl:\ \[http://core.tcl.tk/tcl/info/b5dd510e85bb0711c9f00a94620d602ffd2a9350\].\ \ Thanks\ \[Jan\ Nijtmans\]!\n\n***Cookie:\ Expires***\n\n\[dzach\]:\ It\ looks\ like\ the\ set-cookie\ expires\ handling\ in\ ''::wibble::enheader''\ has\ a\ bug:\n\ \ \}\ expires\ \{\n\ \ \ \ switch\ \[lindex\ \$val3\ 0\]\ \{\n\ \ \ \ \ \ \ abstime\ \{append\ str\ \\\;expires=\[entime\ \$val3\ 1\]\}\n\ \ \ \ \ \ \ reltime\ \{append\ str\ \\\;max-age=\[lindex\ \$val3\ 1\]\}\n\ \ \ \ \}\n\ \ \}\n''::wibble::entime''\ expects\ a\ two\ item\ list,\ ''\[list\ \$key3\ \$val3\]'',\ but\ gets\ two\ separate\ arguments\ instead,\ i.e.\ ''\$val3\ 1''.\ The\ ''entime''\ command\ should\ be\ as\ follows:\n\ \ \}\ expires\ \{\n\ \ \ \ switch\ \[lindex\ \$val3\ 0\]\ \{\n\ \ \ \ \ \ \ abstime\ \{append\ str\ \\\;expires=\[entime\ \$val3\]\}\n\ \ \ \ \ \ \ ...\n\ \ \ \ \}\n\ \ \}\n\n\n<<categories>>\ Wibble regexp2} CALL {my render {Wibble bugs} \[AMG\]:\ This\ page\ documents\ bugs\ in\ the\ \[Wibble\]\ web\ server,\ as\ well\ as\ bugs\ in\ \[Tcl\]\ that\ affect\ Wibble.\ \ There\ may\ be\ some\ overlap\ with\ the\ \[Wibble\ discussion\]\ page.\n\n<<backrefs:Category\ Wibble>>\n\n----\n**Current\ bugs**\n\n***\ Wibble\ multiple\ header\ lines\ are\ overwritten\ ***\n\n\[APN\]\ The\ HTTP\ spec\ says\ if\ multiple\ lines\ with\ the\ same\ header\ name\ are\ received,\ they\ should\ be\ treated\ as\ a\ single\ header\ line\ with\ a\ comma\nseparating\ the\ values.\ Wibble\ overwrites\ values\ in\ previous\ headers\ with\ those\ in\ later\ headers.\n\n***\ Wibble\ forces\ HTTP/1.1\ features\ ***\n\n\[andrewsh\]:\ Wibble\ assumes\ all\ HTTP\ requests\ without\ `Connection:\ close`\ to\ initiate\ a\ persistent\ connection,\ even\ if\ it\ says\ `HTTP/1.0`.\ That\ isn't\ correct\ and\ breaks\ some\ clients.\ See\ http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.10%|%RFC2616\ §14.10%|%:\n\n\ \ A\ system\ receiving\ an\ HTTP/1.0\ (or\ lower-version)\ message\ that\n\ \ includes\ a\ Connection\ header\ MUST,\ for\ each\ connection-token\ in\n\ \ this\ field,\ remove\ and\ ignore\ any\ header\ field(s)\ from\ the\ message\n\ \ with\ the\ same\ name\ as\ the\ connection-token.\ This\ protects\ against\n\ \ mistaken\ forwarding\ of\ such\ header\ fields\ by\ pre-HTTP/1.1\ proxies.\n\n\[AMG\]:\ Wibble\ doesn't\ support\ HTTP/1.0\ or\ /0.9\ or\ anything\ pre-HTTP/1.1.\ \ Such\ support\ can\ be\ added\ but\ was\ not\ requested\ until\ now.\n\n\[andrewsh\]:\ The\ part\ of\ the\ spec\ I\ quoted\ was\ from\ HTTP/1.1.\ HTTP/1.1\ servers\ '''must'''\ support\ HTTP/1.0\ non-persistent\ connections,\ it's\ a\ part\ of\ a\ specification.\ Without\ this\ particular\ feature\ it's\ impossible\ to\ interface\ Wibble\ with\ many\ clients.\ Please\ also\ note\ that\ HTTP/1.0\ clients\ are\ still\ widespread.\n\nBasically,\ the\ fix\ here\ would\ be\ to\ default\ to\ `Connection:\ close`\ when\ the\ request\ has\ HTTP\ version\ less\ than\ 1.1.\ Also\ (however\ this\ needs\ checking\ with\ the\ standard),\ it\ would\ be\ great\ if\ Wibble\ could\ add\ `Connection:`\ header\ to\ the\ response\ (when\ it's\ HTTP/1.1\ of\ course).\n\nHere's\ a\ proof-of-concept\ patch\ to\ fix\ the\ issue:\n\n======\ndiff\ --git\ a/wibble.tcl\ b/wibble.tcl\n---\ a/wibble.tcl\n+++\ b/wibble.tcl\n@@\ -1180,6\ +1180,10\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ set\ size\ 0\n\ \ \ \ \ set\ dict_get_request_method\ \[dict\ get\ \$request\ method\]\n\ \ \ \ \ set\ dict_get_response_status\ \[dict\ get\ \$response\ status\]\n+\ \ \ \ set\ persistent_connection\ \[expr\ \{(\[dict\ get\\\n+\ \ \ \ \ \ \ \ \$request\ protocol\]\ >=\ \"HTTP/1.1\")\ &&\n+\ \ \ \ \ \ \ \ (!\[string\ equal\ -nocase\\\n+\ \ \ \ \ \ \ \ \[dict\ getnull\ \$request\ header\ connection\]\ close\])\}\]\n\ \ \ \ \ if\ \{\[dict\ exists\ \$response\ contentfile\]\}\ \{\n\ \ \ \ \ \ \ \ \ set\ dict_get_response_contentfile\ \[dict\ get\ \$response\ contentfile\]\n\ \ \ \ \ \ \ \ \ set\ size\ \[file\ size\ \$dict_get_response_contentfile\]\n@@\ -1219,8\ +1223,12\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ set\ end_begin_1\ \[expr\ \{\$end\ -\ \$begin\ +\ 1\}\]\n\ \ \ \ \ dict\ set\ response\ header\ content-length\ \$end_begin_1\n\ \n+\ \ \ \ if\ \{!\$persistent_connection\}\ \{\n+\ \ \ \ \ \ \ \ dict\ set\ response\ header\ connection\ close\n+\ \ \ \ \}\n+\n\ \ \ \ \ #\ Send\ the\ response\ header\ to\ the\ client.\n-\ \ \ \ chan\ puts\ \$socket\ \"HTTP/1.1\ \$dict_get_response_status\"\n+\ \ \ \ chan\ puts\ \$socket\ \"\[dict\ get\ \$request\ protocol\]\ \$dict_get_response_status\"\n\ \ \ \ \ chan\ puts\ \$socket\ \[enheader\ \[dict\ get\ \$response\ header\]\]\\n\n\ \n\ \ \ \ \ #\ If\ requested,\ send\ the\ response\ content\ to\ the\ client.\n@@\ -1250,8\ +1258,7\ @@\ proc\ ::wibble::defaultsend\ \{socket\ reque\n\ \ \ \ \ \}\n\ \n\ \ \ \ \ #\ Return\ 1\ to\ keep\ going\ or\ 0\ if\ the\ connection\ needs\ to\ close.\n-\ \ \ \ expr\ \{!\[string\ equal\ -nocase\\\n-\ \ \ \ \ \ \ \ \[dict\ getnull\ \$request\ header\ connection\]\ close\]\}\n+\ \ \ \ return\ \$persistent_connection\n\ \}\n\ \n\ #\ Main\ connection\ processing\ loop.\n@@\ -1344,7\ +1351,7\ @@\ proc\ ::wibble::panic\ \{options\ port\ socke\n\ \ \ \ \ if\ \{!\[dict\ exists\ \$response\ nonhttp\]\ &&\ \$socket\ ne\ \"\"\}\ \{\n\ \ \ \ \ \ \ \ \ catch\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ configure\ \$socket\ -translation\ crlf\n-\ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"HTTP/1.1\ 500\ Internal\ Server\ Error\"\n+\ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"\[dict\ get\ \$request\ protocol\]\ 500\ Internal\ Server\ Error\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Content-Type:\ text/plain\;charset=utf-8\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Content-Length:\ \[string\ length\ \$message\]\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ chan\ puts\ \$socket\ \"Connection:\ close\"\n======\n\n\[AMG\]:\ Thanks.\ \ Will\ investigate\ when\ I\ have\ time.\n\n\[AMG\]:\ Again\ thanks.\ \ Give\ \[https://chiselapp.com/user/andy/repository/wibble/info/c62e5a1b5d\]\ a\ look.\ \ Unlike\ your\ patch,\ this\ version\ is\ baselined\ prior\ to\ \[SEH\]'s\ performance\ improvements.\ \ By\ the\ way,\ it\ indeed\ looks\ like\ I\ need\ to\ move\ to\ a\ different\ SCM\ because\ this\ wiki\ doesn't\ support\ branching.\n\n\[andrewsh\]:\ Great.\ Are\ you\ sure\ about\ this?\n\n======\n+\ \ \ \ #\ Determine\ if\ the\ connection\ is\ persistent.\n+\ \ \ \ set\ persist\ \[expr\ \{\n+\ \ \ \ \ \ \ \ \[dict\ get\ \$request\ protocol\]\ >=\ \"http/1.1\"\n+\ \ \ \ \ &&\ !\[string\ equal\ -nocase\ \[dict\ getnull\ \$request\ header\ connection\]\ close\]\n+\ \ \ \ \}\]\n======\n\nIf\ I\ understand\ this\ correctly,\ the\ first\ comparison\ will\ be\ always\ true\ as\ you\ do\ `\[string\ toupper\]`.\nSpeaking\ of\ VCS,\ is\ that\ chiselapp\ link\ the\ official\ upstream\ location\ for\ Wibble\ from\ now?\n\n\[AMG\]:\ No,\ actually\ always\ false.\ \ Capitals\ are\ less\ than\ lowercase.\ \ Even\ so,\ very\ good\ catch!\ \ I\ didn't\ have\ a\ good\ way\ to\ test\ it\ myself.\ \ See\ \[http://chiselapp.com/user/andy/repository/wibble/info/6dc367091c\]\ for\ the\ fix.\ \ I\ did\ test\ the\ comparisons\ in\ the\ Tcl\ shell\ to\ make\ sure\ they\ work\ the\ way\ we\ intend.\n\nAlso,\ yes,\ that\ is\ the\ official\ upstream\ location\ for\ the\ time\ being.\ \ I\ have\ no\ plans\ to\ move\ it\ further,\ but\ that\ doesn't\ mean\ it'll\ never\ happen.\ \ When\ I\ have\ time\ (can't\ give\ an\ ETA),\ I\ will\ see\ about\ migrating\ bug\ reports,\ documentation,\ and\ maybe\ discussion,\ reducing\ Wibble's\ presence\ on\ this\ wiki\ as\ I\ go.\ \ I\ also\ have\ a\ lot\ of\ new\ code\ I'd\ love\ to\ publish,\ but\ it\ breaks\ backward\ compatibility,\ so\ I\ hesitated.\ \ But\ now\ I\ have\ branch\ capability,\ so\ there's\ no\ reason\ not\ to.\n\n\[andrewsh\]:\ Well,\ now\ you\ need\ to\ fix\ the\ shebang\ :)\n\nRegarding\ version\ control,\ without\ knowing\ you're\ considering\ Fossil,\ I've\ imported\ the\ wiki\ page\ to\ Mercurial\ about\ two\ weeks\ ago:\ \[https://bitbucket.org/andrew_shadura/wibble/commits/all\].\ I\ think\ I\ will\ remove\ that\ and\ resync\ it\ to\ your\ Fossil\ repo\ instead.\n\n\[AMG\]:\ Thanks,\ also\ fixed\ \[http://chiselapp.com/user/andy/repository/wibble/info/180aada6d2\].\ \ Also\ I\ appreciate\ your\ effort\ to\ convert\ to\ Mercurial\ and\ am\ very\ sorry\ to\ see\ it\ wasted.\ \ Is\ there\ anything\ you\ have\ that\ I\ missed?\ \ I\ walked\ through\ every\ version\ of\ the\ \[Wibble\]\ and\ \[Wibble\ implementation\]\ pages,\ extracting\ the\ code\ using\ this\ wiki's\ little-known\ \".code\"\ suffix\ feature\ (e.g.\ \[http://wiki.tcl.tk/27382.code\]),\ then\ compared\ the\ checksums\ of\ each\ to\ identify\ the\ distinct\ versions,\ each\ of\ which\ I\ checked\ in.\ \ For\ the\ older\ versions\ I\ wrote\ descriptive\ check-in\ comments,\ and\ for\ newer\ I\ converted\ the\ comments\ from\ what\ I\ had\ recorded\ at\ \[Wibble\ change\ log\].\ \ I\ could\ well\ have\ made\ a\ mistake.\ \ I\ mean,\ it\ took\ several\ tries\ to\ get\ all\ this\ right,\ or\ at\ least\ not\ so\ wrong\ that\ I'd\ notice.\ \;^)\n\nI'm\ sorry\ I've\ not\ been\ able\ to\ do\ any\ Wibble\ work\ or\ testing\ lately.\ \ Life\ has\ been\ impossible.\ \ Converting\ to\ Fossil\ was\ the\ most\ I've\ been\ able\ to\ do\ for\ a\ long\ time.\ \ Actually\ I\ had\ long\ planned\ to\ move\ to\ Fossil,\ but\ every\ previous\ time\ I\ had\ made\ the\ move,\ I\ aborted\ when\ I\ found\ that\ the\ then-current\ version\ of\ Fossil\ was\ too\ clever\ for\ its\ own\ good\ with\ the\ way\ it\ displayed\ check-in\ comments\ on\ the\ timeline.\ \ Things\ have\ improved\ over\ time\ due\ to\ Fossil\ being\ used\ to\ host\ Tcl,\ Tk,\ and\ related\ repositories,\ all\ of\ which\ had\ the\ same\ issues\ with\ \[\[brackets\]\]\ that\ the\ historical\ Wibble\ check-in\ comments\ did.\ \ I\ also\ quailed\ at\ converting\ the\ historical\ and\ current\ related\ documentation\ to\ the\ Fossil\ wiki\ format.\n\nI\ really\ ought\ to\ dust\ off\ my\ private\ development\ code\ and\ check\ it\ in\ on\ a\ branch\ (it's\ incompatible),\ but\ even\ though\ it\ runs,\ it's\ not\ coherent\ enough\ to\ share.\n\n***\ Protection\ code?\ ***\n\n\[MaxJarek\]:\ Sample\ protection\ ::wibble::getrequest\ for\ error\ like:\ can't\ read\ \"uri\":\ no\ such\ variable\n\n======\n...\n#\ Receive\ and\ parse\ the\ first\ line.\ \ Normalize\ the\ path.\nset\ method\ GET\nset\ uri\ \"\"\nset\ protocol\ \"HTTP/1.1\"\nwhile\ \{\[set\ line\ \[getline\]\]\ eq\ \"\"\}\ \{continue\}\nregexp\ \{^\\s*(\\S*)\\s+(\\S*)\\s+(\\S*)\}\ \$line\ _\ method\ uri\ protocol\n...\n======\n\n\[AMG\]:\ What\ is\ the\ benefit\ of\ this\ protection\ code?\ \ Send\ invalid\ HTTP\ to\ Wibble,\ and\ it\ should\ die\ with\ an\ error.\ \ This\ won't\ take\ down\ all\ of\ Wibble,\ just\ the\ coroutine\ for\ the\ one\ connection,\ triggering\ the\ \[\[panic\]\]\ handler\ in\ the\ process.\n\n***Wiki\ corrupts\ \[\[enhtml\]\]\ and\ \[\[enattr\]\]***\n\n\[AMG\]:\ The\ \[\[enhtml\]\]\ and\ \[\[enattr\]\]\ procedures\ convert\ adjacent\ spaces\ to\ be\ alternating\ spaces\ and\ nonbreaking\ spaces.\ \ For\ some\ reason,\ the\ Wiki\ is\ replacing\ my\ nonbreaking\ space\ HTML\ entity\ (&\ #\ 1\ 6\ 0\ \;)\ with\ an\ ordinary\ space.\ \ You\ can\ still\ see\ the\ correct\ code\ in\ the\ WikiText,\ but\ it's\ gone\ from\ the\ formatted\ HTML.\ \ Therefore\ I\ have\ to\ remember\ to\ re-add\ it\ every\ time\ I\ edit\ the\ page.\ \ Sometimes\ I\ forget.\ :^(\ \ See\ \[http://code.google.com/p/wikitcl/issues/detail?id=209\]\ for\ the\ official\ bug\ report.\n\n\[AMG\],\ update:\ I\ put\ a\ workaround\ in\ the\ latest\ version.\ \ I\ simply\ replaced\ \"#\"\ with\ \"\\#\",\ so\ that\ the\ Wiki\ won't\ think\ it's\ an\ HTML\ entity.\n\n***Coroutine\ already\ running***\n\n\[dzach\]:\ I'm\ getting\ an\ error:\ \n\n\ coroutine\ \"::wibble::sock60\"\ is\ already\ running\n\ while\ executing\n\ \ \ \ \ \ \ \ \"\$coro\ \$args\"\n\ \ \ \ \ \ \ \ (lambda\ term\ \"\{coro\ args\}\ \{\$coro\ \$args\}\ ::wibble\"\ line\ 1)\n\ \ \ \ \ \ \ \ invoked\ from\ within\n\ \ \ \ \ \ \ \ \"apply\ \{\{coro\ args\}\ \{\$coro\ \$args\}\ ::wibble\}\ ::wibble::sock60\ copydone\ 0\"\ \nwhen\ putting\ wibble\ ver.\[http://wiki.tcl.tk/_/revision?N=27377.code&V=21\]\ under\ stress.\ I've\ yet\ to\ find\ under\ what\ exactly\ conditions\ this\ happens.\n\n\[AMG\]:\ It\ seems\ \[\[\[chan\ copy\]\]\]\ calls\ its\ -command\ script\ directly\ when\ it\ has\ zero\ bytes\ to\ copy.\ \ However,\ here\ this\ means\ having\ a\ coroutine\ call\ itself,\ which\ will\ give\ the\ \"already\ running\"\ error.\ \ Try\ printing\ \$begin\ and\ \$end\ just\ before\ the\ \[\[chan\ copy\]\].\ \ I\ expect\ this\ error\ to\ happen\ when\ \$begin\ is\ greater\ than\ \$end,\ which\ will\ occur\ if\ the\ file\ being\ downloaded\ is\ zero\ bytes.\ \ I\ don't\ think\ the\ range\ parsing\ logic\ (which\ sets\ \$begin\ and\ \$end)\ allows\ zero-byte\ ranges.\n\n\[dzach\]:\ I'm\ also\ getting\ the\ following\ error\ when\ hitting\ wibble\ with\ 5\ or\ more\ concurrent\ requests,\ using\ siege,\ http://www.joedog.org/index/siege-home:\n\ errorinfo:\ error\ writing\ \"sock8\":\ broken\ pipe\n\ \ \ \ while\ executing\n\ \"chan\ puts\ -nonewline\ \$socket\ \[string\ range\ \[dict\ get\ \$response\ content\]\ \$begin\ \$end\]\"\n\$begin\ \ \$end\ are\ as\ follows\ (\[\[puts\]\]\ inserted\ right\ after\ ''#\ Send\ buffered\ response\ content.''\ in\ wibble::process):\n\ 0\ \ \ \ \ \ \ 50947\nrepeated\ many\ times,\ and\ then,\ when\ siege\ stops,\ comes\ the\ error\ repeated\ many\ times\ too,\ depending\ on\ request\ concurrency.\n\nIn\ the\ same\ context\ I\ also\ get\ this\ error:\n\ errorinfo:\ error\ writing\ \"sock186\":\ broken\ pipe\n\ \ \ \ while\ executing\n\ \"error\ \[lindex\ \$result\ 2\]\"\n\nI\ should\ mention\ that\ older\ versions\ of\ wibble,\ I\ think\ before\ icc\ was\ introduced,\ did\ not\ give\ these\ errors.\ Hard\ to\ find\ exactly\ which\ version\ these\ were,\ as\ they\ all\ appear\ as\ '0.1'.\n\n\[AMG\]:\ Sorry\ about\ the\ versioning.\ \ Just\ use\ the\ Wiki\ page\ revision\ number.\ \ Regarding\ the\ errors:\ I\ expect\ \"broken\ pipe\"\ errors\ in\ many\ cases\ when\ the\ client\ disconnects\ unexpectedly.\ \ I\ don't\ think\ that's\ a\ big\ deal,\ and\ it's\ a\ nuisance\ to\ fix.\ \ I\ doubt\ ICC\ has\ anything\ to\ do\ with\ it.\ \ I\ haven't\ had\ time\ to\ repeat\ any\ of\ the\ tests\ you've\ been\ doing\ nor\ make\ any\ changes,\ but\ when\ I\ do,\ I\ suspect\ Siege\ will\ do\ a\ fair\ bit\ of\ disconnecting\ without\ waiting\ to\ receive\ the\ full\ page.\n\nAs\ for\ the\ \"coroutine\ already\ running\"\ problem,\ I\ will\ be\ looking\ at\ the\ Tcl\ sources\ to\ track\ down\ any\ and\ all\ circumstances\ in\ which\ \[\[\[chan\ copy\]\]\]\ will\ directly\ call\ its\ -command\ rather\ than\ schedule\ it\ as\ an\ event\ handler,\ then\ make\ sure\ none\ of\ those\ cases\ happen.\ \ I\ already\ know\ that\ it'll\ happen\ when\ working\ with\ zero-size\ files,\ but\ there\ may\ be\ other\ cases\ too.\ \ Whatever\ I\ find,\ I\ should\ also\ document\;\ I\ don't\ see\ any\ mention\ of\ this\ behavior\ in\ the\ man\ page.\n\nWhenever\ I\ get\ time\ for\ this,\ I'll\ also\ tackle\ the\ Firefox\ filename\ character\ encoding\ problem.\ \ After\ that,\ I\ have\ some\ more\ thinking\ to\ do\ about\ whether\ or\ not\ ICC\ is\ the\ right\ thing\ for\ the\ application\ I\ have\ in\ mind\;\ it\ might\ still\ change\ quite\ a\ bit\ before\ I\ decide\ to\ leave\ it\ alone.\ \ Right\ now\ I'm\ just\ too\ busy,\ sorry.\n\n***HTML\ entities\ embedded\ in\ HTTP\ headers***\n\n\[dzach\]:\ When\ faced\ with\ '''numeric\ character\ references'''\ \[http://www.w3.org/TR/html4/charset.html#h-5.3.1\],\ e.g.\ ''\"Υ\ \;Π\ \;Ε\ \;Σ\ \;-google.png\"'',\ wibble\ erroneously\ splits\ it\ at\ every\ ''\;''\ character,\ so\ for\ the\ example\ given\ here\ we\ get:\ ''content-disposition\ \{\{\}\ form-data\ name\ imagedata\ filename\ \{\"Υ\}\ Π\ \{\}\ Ε\ \{\}\ Σ\ \{\}\ -google.png\\\"\ \{\}\}''\ in\ the\ post\ data,\ where\ ''filename''\ contains\ \ ''\"Υ''\ instead\ of\ ''\"Υ\ \;Π\ \;Ε\ \;Σ\ \;-google.png\"''\ (spaces\ inserted\ before\ '\;'\ for\ display\ purposes).\ If\ the\ encoding\ of\ the\ page\ is\ set\ correctly,\ e.g.\ utf-8\ for\ the\ example\ above,\ the\ problem\ doesn't\ appear,\ although\ it\ still\ exists.\n\n\[AMG\]:\ Are\ you\ sure\ numeric\ character\ references\ are\ valid\ in\ the\ context\ of\ an\ HTTP\ header?\ \ Don't\ confuse\ HTML\ with\ HTTP.\ \ Is\ this\ HTTP\ header\ generated\ by\ your\ web\ browser\ (which\ one?)\ or\ are\ you\ doing\ it\ from\ a\ script\ or\ by\ hand?\ \ I\ will\ need\ a\ test\ case,\ if\ you\ want\ me\ to\ fix\ this.\n\n\[dzach\]:\ Sure.\ It\ is\ the\ browser\ that\ generates\ this\ (Firefox\ 3.6.13,\ Kubuntu\ 10.04),\ I\ use\ wibble\ \[http://wiki.tcl.tk/_/revision?N=27377.code&V=21\]\ without\ any\ modification\ and\ no\ extra\ scripts\ whatsoever.\ For\ a\ test,\ just\ rename\ any\ image\ to\ ''????-google.png'',\ which\ is\ the\ filename\ used\ in\ the\ example\ above\ and\ which\ contains\ 4\ non-latin\ characters.\ Do\ NOT\ change\ the\ page\ encoding\ to\ utf-8,\ or\ if\ that\ is\ the\ default\ change\ it\ to\ ISO-8859-1,\ so\ that\ the\ browser\ be\ forced\ to\ use\ numeric\ character\ references.\ Then\ use\ the\ wibble\ upload\ form\ to\ find\ and\ upload\ the\ renamed\ image\ and\ observe\ the\ \"File\ name\"\ in\ the\ table.\ I\ inserted\ a\ ''puts\ \$post''\ statement\ in\ the\ index.html.tmpl\ to\ get\ the\ content-disposition\ data.\ About\ the\ encoding\ used\ in\ an\ HTTP\ header\ \[http://www.w3.org/Protocols/HTTP/Issues/content-disposition.txt\],\ I\ understand\ that\ numeric\ character\ references\ are\ standard\ US-ASCII\ and\ therefore\ should\ be\ valid,\ depending\ on\ the\ operating\ system\ whether\ it\ can\ save\ such\ a\ file\ or\ not.\ Linux,\ as\ far\ as\ I\ can\ tell,\ accepts\ ''&#\;''\ as\ valid\ characters\ in\ a\ file\ name.\n\n\[AMG\]:\ I'm\ still\ not\ sure\ that's\ technically\ legal,\ but\ I\ don't\ see\ any\ other\ way\ for\ Firefox\ to\ send\ Greek\ in\ the\ HTTP\ header\ when\ constrained\ to\ Latin-1.\ \ So\ I\ guess\ I\ need\ to\ find\ a\ way\ to\ support\ this.\ \ I\ suppose\ I\ should\ replace\ numeric\ character\ references\ in\ the\ headers\ before\ attempting\ to\ split\ them.\ \ \[\[\[regexp\]\ -all\ -indices\]\],\ \[\[\[binary\ format\]\ s\]\],\ and\ \[\[\[encoding\ convertfrom\]\ unicode\]\]\ should\ do\ the\ trick:\n\n======\n%\ encoding\ convertfrom\ unicode\ \[binary\ format\ s*\ \{933\ 928\ 917\ 931\}\]\n????\n======\n\nAlso\ I\ ought\ to\ support\ hexadecimal,\ which\ means\ using\ \[\[\[scan\]\]\]\ or\ maybe\ \[\[binary\ format\ H\]\].\n\n\[dzach\]:\ The\ conversion\ itself\ might\ not\ be\ wibble's\ job,\ a\ user\ script\ could\ do\ it,\ but\ it\ should\ preserve\ the\ numeric\ character\ references\ in\ the\ filename,\ which\ it\ doesn't,\ as\ it\ stands\ now.\ I\ think\ the\ splitter\ should\ be\ able\ to\ distinguish\ a\ numeric\ character\ reference\ from\ a\ value\ separator\ '\;'.\n\n\[AMG\]:\ So\ we\ know\ that\ Firefox\ applies\ this\ encoding\ to\ characters\ in\ the\ names\ of\ uploaded\ files.\ \ Where\ else\ does\ it\ apply\ this\ encoding?\ \ Will\ it\ do\ it\ in\ cookies,\ for\ example?\ \ If\ it's\ only\ in\ filenames,\ I'm\ inclined\ to\ agree\ with\ you\ that\ Wibble\ should\ pass\ along\ the\ encoded\ text\ unmolested.\ \ But\ if\ it's\ universal\ throughout\ the\ HTTP\ headers,\ it\ makes\ more\ sense\ for\ Wibble\ to\ automatically\ decode\ all\ character\ references\ when\ constructing\ the\ header\ dictionaries.\ \ I\ would\ have\ already\ included\ this\ feature\ in\ Wibble\ had\ I\ found\ any\ mention\ of\ numeric\ character\ encoding\ in\ RFC\ 2616.\n\nThe\ danger\ is\ that\ this\ processing\ makes\ it\ impossible\ for\ browsers\ to\ include\ text\ resembling\ a\ character\ reference\ in\ a\ HTTP\ header\ (short\ of\ also\ encoding\ the\ ampersand,\ which\ I\ have\ ''not''\ seen\ Firefox\ do).\ \ But\ that\ danger\ exists\ not\ in\ Wibble\ but\ in\ Firefox\ or\ any\ other\ browser\ that\ has\ chosen\ to\ so\ extend\ HTTP.\ \ I\ think\ I\ might\ take\ a\ peek\ at\ the\ Firefox\ source\ code\ to\ see\ what's\ really\ going\ on\ behind\ the\ scenes.\ \ With\ luck,\ the\ sources\ contain\ references\ to\ standards\ documents.\n\n\[dzach\]:\ I\ checked\ Firefox\ with\ cookies\ and\ it\ looks\ like\ the\ cookie\ is\ not\ converted\ to\ numeric\ character\ references.\ I\ created\ a\ cookie:\n\ document.cookie=\"test=????-google.png\"\nThis\ is\ what\ the\ wibble\ sees:\n\ test\ \{\{\}\ ¥ £-google.png\}\nI\ also\ checked\ filename\ and\ cookies\ with\ Chromium.\ The\ filename\ appears\ in\ the\ table\ as:\n\ File\ name:\ ????-google.png\nwhile\ the\ cookie\ appears\ as:\n\ test\ \{\{\}\ Î¥Î ÎÎ£-google.png\}\ #\ first\ time\n\ test\ \{\{\}\ ????-google.png\}\ #\ with\ image\ load\nIt\ looks\ like\ Firefox\ is\ trying\ its\ best\ to\ properly\ represent\ the\ filename\ selected\ by\ an\ end\ user\ and\ thus\ decides\ to\ use\ numeric\ char\ refs\ for\ that,\ while\ for\ cookies,\ which\ are\ under\ the\ control\ of\ a\ developer,\ it\ sends\ them\ without\ conversion.\ Developer\ should\ know\ better.\n\nChromium\ does\ not\ understand\ the\ non-latin\ chars\ in\ the\ user\ selected\ filename\ and\ converts\ them\ to\ '?'\ while\ it\ second\ guesses\ the\ cookie\ value\ as\ utf-8\ and\ sends\ it\ correctly.\n\n\[AMG\]:\ Too\ bad\ Firefox\ chose\ to\ use\ the\ &#decimal\;\ notation,\ since\ the\ semicolon\ was\ already\ defined\ by\ HTTP\ to\ be\ a\ string\ delimiter.\n\n\[AMG\]:\ I\ just\ had\ another\ look\ at\ this.\ \ I\ created\ a\ file\ called\ は.txt\ containing\ \"hello\",\ then\ tried\ uploading\ it\ via\ a\ post\ with\ enctype=\"multipart/form-data\".\ \ I\ also\ included\ some\ text\ inputs\ in\ the\ form,\ into\ which\ I\ wrote\ は.\ \ Here's\ the\ rawpost\ from\ Firefox\ 8.0:\n\n======none\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"key1\"\n\nã\[81\]¯\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"key2\"\n\nã\[81\]¯\n-----------------------------000000000000000\nContent-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"ã\[81\]¯.txt\"\nContent-Type:\ text/plain\n\nhello\n\n-----------------------------000000000000000--\n======\n\n(Replace\ `\[\[81\]\]`\ with\ \\x81.\ \ I\ had\ to\ change\ it\ to\ dodge\ the\ wiki's\ bogus\ character\ reject\ filter.)\n\nAs\ you\ might\ expect,\ `\[\[\[encoding\ convertfrom\]\ \[utf-8\]\ ã\[\[81\]\]¯\]\]`\ returns\ `は`.\n\nIt\ appears\ Firefox\ is\ taking\ the\ liberty\ of\ encoding\ its\ posted\ form\ data\ in\ UTF-8,\ even\ though\ there's\ nothing\ in\ the\ request\ headers\ indicating\ the\ charset.\ \ This\ happens\ both\ with\ and\ without\ `accept-charset=\"utf-8\"`\ in\ the\ `<form>`\ markup.\ \ The\ default\ should\ be\ ISO-8859-1,\ not\ UTF-8.\n\nPutting\ `accept-charset=\"iso-8859-1\"`\ in\ the\ `<form>`\ yields:\n\n======none\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"key1\"\n\nは\;\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"key2\"\n\nは\;\n-----------------------------00000000000000\nContent-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"は\;.txt\"\nContent-Type:\ text/plain\n\nhello\n-----------------------------00000000000000--\n======\n\nAgain,\ `\[\[encoding\ convertfrom\ unicode\ \[\[\[string\ trimright\]\ \[\[\[binary\ format\]\ i*\ 12399\]\]\ \\0\]\]\]\]`\ returns\ `は`.\ \ Also,\ again\ there's\ no\ indication\ in\ the\ request\ header\ what\ the\ character\ set\ is.\n\nFirefox\ encodes\ は.txt\ in\ the\ URI\ as\ `%E3%81%AF.txt`.\ \ `\[\[encoding\ convertfrom\ utf-8\ \\xe3\\x81\\xaf\]\]`\ returns\ `は`.\n\n`<form\ enctype=\"text/plain\"\ accept-charset=\"iso-8859-1\">`\ yields:\n\n======none\nkey1=\nkey2=は\;\nfile=は\;.txt\n======\n\nRemoving\ `accept-charset`\ or\ setting\ it\ to\ `utf-8`\ yields:\n\n======none\nkey1=\nkey2=ã\[81\]¯\nfile=ã\[81\]¯.txt\n======\n\nUsing\ `enctype=\"application/x-www-form-urlencoded\"`\ or\ removing\ enctype\ altogether\ yields:\n\n%|`accept-charset`|`rawpost`|%\n&|`iso-8859-1`|`key1=&key2=%26%2312399%3B&file=%26%2312399%3B.txt`|&\n&|`utf-8`|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|(none)|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n\nNow,\ let's\ decode:\n\n&|%26|%23|12399|%3B|&\n&|&|#|12399|\;|&\n\nI'll\ check\ other\ browsers\ another\ day.\n\n\[dzach\]\ :\ So,\ decoding\ the\ charset,\ when\ the\ charset\ ''is''\ set\ for\ the\ form/page,\ looks\ straight\ forward\ for\ Firefox,\ although\ Wibble\ will\ need\ to\ do\ a\ double\ conversion\ for\ unicode\ characters\ sent\ with\ iso-8859-1.\n\n\[AMG\]:\ Yeah,\ not\ a\ huge\ deal.\ \ My\ big\ concern\ is\ that\ Firefox\ uses\ UTF-8\ by\ default\ even\ though\ RFC\ 2616\ requires\ that\ it\ use\ ISO-8859-1\ by\ default.\ \ That\ means\ other,\ compliant\ browsers\ would\ be\ incompatible,\ and\ I'd\ either\ have\ to\ do\ different\ processing\ based\ on\ the\ user-agent,\ or\ implement\ autodetection\ heuristics,\ or\ require\ that\ `enctype`\ be\ explicitly\ specified\ in\ all\ forms.\n\nI\ just\ had\ a\ look\ at\ `method=\"get\"`.\ \ It\ works\ similarly\ to\ the\ above,\ except\ the\ data\ goes\ in\ the\ query\ string,\ and\ it's\ always\ URL-encoded.\n\n%|`accept-charset`|`rawquery`|%\n&|`iso-8859-1`|`?key1=&key2=%26%2312399%3B&file=%26%2312399%3B.txt`|&\n&|`utf-8`|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|(none)|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n\nAs\ expected,\ MSIE\ 8\ does\ things\ differently.\n\n%|`method`|`enctype`|`accept-charset`|`rawquery`\ or\ `rawpost`|%\n&|`get`|N/A|`iso-8859-1`\ or\ (none)|`?key1=&key2=%82%CD&file=%82%CD.txt`|&\n&|`get`|N/A|`utf-8`|`?key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|`post`|(none)|`iso-8859-1`\ or\ (none)|`key1=&key2=%82%CD&file=%82%CD.txt`|&\n&|`post`|(none)|`utf-8`|`key1=&key2=%E3%81%AF&file=%E3%81%AF.txt`|&\n&|`post`|`text/plain`|`iso-8859-1`\ or\ (none)|`key1=<<br>>key2=\[\[82\]\]\[\[CD\]\]<<br>>file=\[\[82\]\]\[\[CD\]\].txt`|&\n&|`post`|`text/plain`|`utf-8`|`key1=<<br>>key2=\[\[E3\]\]\[\[81\]\]\[\[AF\]\]<<br>>file=\[\[E3\]\]\[\[81\]\]\[\[AF\]\].txt`|&\n&|`post`|`multipart/form-data`|`iso-8859-1`\ or\ (none)|`-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key1\"<<br>><<br>><<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key2\"<<br>><<br>>\[\[82\]\]\[\[CD\]\]<<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"\[\[82\]\]\[\[CD\]\].txt\"<<br>>Content-Type:\ text/plain<<br>><<br>>hello<<br>><<br>>-----------------------------7dbf24760414--`|&\n&|`post`|`multipart/form-data`|`utf-8`|`-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key1\"<<br>><<br>><<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"key2\"<<br>><<br>>\[\[E3\]\]\[\[81\]\]\[\[AF\]\]<<br>>-----------------------------7dbf24760414<<br>>Content-Disposition:\ form-data\;\ name=\"file\"\;\ filename=\"\[\[E3\]\]\[\[81\]\]\[\[AF\]\].txt\"<<br>>Content-Type:\ text/plain<<br>><<br>>hello<<br>><<br>>-----------------------------7dbf24760414--`|&\n\n\[\[82\]\]\[\[CD\]\]?\ \ What's\ that?\ \ Turns\ out\ it's\ shiftjis\ (or\ macJapan\ or\ cp932).\ \ `\[\[encoding\ convertfrom\ shiftjis\ \\x82\\xcd\]\]`\ returns\ は.\n\nTo\ summarize,\ Firefox\ defaults\ to\ utf-8,\ whereas\ MSIE\ 8\ defaults\ to\ ISO-8859-1.\ \ As\ much\ as\ I\ hate\ to\ say\ it,\ MSIE\ 8\ is\ doing\ the\ correct\ thing\ here.\ :^)\ \ The\ other\ big\ difference\ is\ encoding\ non-Latin\ characters\ in\ ISO-8859-1.\ \ Firefox\ encodes\ them\ like\ HTML\ entities,\ whereas\ MSIE\ 8\ encodes\ them\ according\ to\ whatever\ character\ set\ it\ thinks\ is\ most\ appropriate\ for\ that\ character.\ \ Rough.\ \ I\ think\ I'll\ give\ this\ point\ to\ Firefox.\ \ Even\ though\ both\ are\ nonstandard\ behaviors\ (RFC\ 2616\ doesn't\ dictate\ what\ to\ do,\ so\ ''everything''\ is\ nonstandard),\ Firefox's\ approach\ is\ manageable.\n\nI'm\ not\ sure\ how\ I\ can\ deal\ with\ this\ situation.\ \ Different\ browsers\ handle\ the\ default\ (and\ likely\ common)\ case\ in\ different,\ incompatible\ ways,\ and\ at\ least\ two\ browses\ don't\ tell\ the\ server\ what\ charset\ is\ being\ used.\ \ MSIE\ 8's\ approach\ seems\ completely\ unpredictable.\ \ It\ seems\ to\ be\ guessing\ what\ character\ set\ is\ most\ appropriate\ based\ on\ the\ data\ it's\ encoding.\ \ How\ could\ I\ possibly\ reverse\ that\ encoding,\ unless\ I\ already\ knew\ the\ data\ in\ advance!?\ \ Maybe\ I\ try\ every\ possible\ encoding,\ then\ run\ each\ possible\ result\ through\ a\ multilingual\ spellchecker,\ and\ take\ whichever\ one\ has\ the\ fewest\ errors.\ \;^)\n\nSolution?\ \ Require\ everyone\ to\ put\ `content-type=\"utf-8\"`\ in\ their\ forms,\ every\ time.\ \ That's\ the\ only\ case\ in\ which\ Firefox\ and\ MSIE\ 8\ agree.\ \ Grumble\ grumble...\ \ Surely\ I\ can\ do\ better\ than\ that.\n\nOh\ well,\ at\ least\ Firefox\ and\ MSIE\ 8\ both\ use\ consistent\ encodings\ across\ the\ four\ types\ of\ `<form>`s:\n\n\ \ \ *\ `<form\ method=\"get\">`\n\ \ \ *\ `<form\ method=\"post\">`\ or\ `<form\ method=\"application/x-www-form-urlencoded\">`\n\ \ \ *\ `<form\ method=\"post\"\ enctype=\"text/plain\">`\n\ \ \ *\ `<form\ method=\"post\"\ enctype=\"multipart/form-data\">`\n\nOf\ course,\ for\ `get`\ and\ `application/x-www-form-urlencoded`,\ the\ character\ encoding\ is\ additionally\ URL-encoded.\ \ But\ that\ is\ also\ being\ done\ consistently.\ \ The\ inconsistencies\ are\ in:\n\n\ \ \ *\ How\ to\ encode\ characters\ not\ present\ in\ the\ target\ character\ set.\n\ \ \ *\ Which\ character\ set\ to\ use\ when\ the\ `<form>`\ doesn't\ specify.\n\nI\ just\ had\ a\ look\ at\ \[Chrome\].\ \ Like\ MSIE\ 8,\ it\ defaults\ to\ ISO-8859-1\ when\ `accept-charset`\ isn't\ specified.\ \ Like\ Firefox,\ in\ the\ form\ data\ it\ encodes\ characters\ not\ in\ the\ target\ charset\ using\ HTML\ entities.\ \ Like\ both\ browsers,\ it\ doesn't\ send\ the\ charset\ in\ the\ response.\ \ Unlike\ either\ browser,\ in\ the\ form\ metadata\ (filename)\ it\ replaces\ characters\ not\ in\ the\ target\ charset\ with\ question\ marks.\n\n\[dzach\]\ :\ I\ believe\ that\ all\ browsers\ have\ user\ definable\ settings\ regarding\ the\ Accept-Charset\ header,\ so\ the\ differences\ that\ you\ see\ might\ be\ because\ of\ this\ default\ setting.\ Probably\ Firefox's\ default\ is\ utf-8,\ mine\ is.\ You\ could\ look\ in\ the\ Accept-Charset\ HTTP\ request\ header,\ if\ present,\ and\ believe\ that\ it\ tells\ the\ truth.\ Otherwise\ I'd\ choose\ utf-8\ as\ the\ default,\ since\ the\ RFC-2616\ \ \[https://tools.ietf.org/html/rfc2616#section-14.2\]\ specification\ says:\n\ \ If\ no\ Accept-Charset\ header\ is\ present,\ the\ default\ is\ that\ any\n\ \ character\ set\ is\ acceptable.\ If\ an\ Accept-Charset\ header\ is\ present,\n\ \ and\ if\ the\ server\ cannot\ send\ a\ response\ which\ is\ acceptable\n\ \ according\ to\ the\ Accept-Charset\ header,\ then\ the\ server\ SHOULD\ send\n\ \ an\ error\ response\ with\ the\ 406\ (not\ acceptable)\ status\ code,\ though\n\ \ the\ sending\ of\ an\ unacceptable\ response\ is\ also\ allowed.\ \nand\ sending\ a\ 406\ response\ would\ seem\ to\ be\ an\ unnecessary\ punishment\ for\ the\ user.\ Even\ better,\ make\ it\ user\ definable\ in\ Wibble,\ with\ a\ utf-8\ default.\n\nThere\ is\ also\ the\ numeric\ character\ reference\ issue\ (see\ earlier\ on\ this\ page),\ or\ has\ that\ been\ solved\ already?\n\n\[AMG\]:\ Accept-Charset\ describes\ what\ the\ client\ is\ willing\ to\ accept\ from\ the\ server,\ not\ necessarily\ what\ the\ client\ is\ trying\ to\ send\ to\ the\ server.\ \ That's\ what\ the\ charset=\ parameter\ of\ Content-Type\ is\ for.\ \ However,\ I\ haven't\ seen\ any\ browsers\ actually\ use\ charset=\ yet,\ so\ Accept-Charset\ may\ be\ all\ that\ I\ have\ to\ go\ on.\ \ I'll\ research\ how\ well\ Accept-Charset\ aligns\ with\ the\ actual\ encodings\ used\ by\ the\ browser.\ \ Also\ I'll\ try\ changing\ the\ language\ options\ to\ see\ what\ effect\ that\ has.\ \ My\ pessimistic\ guess\ is\ that\ all\ it\ does\ is\ control\ the\ outgoing\ Accept-Charset,\ without\ affecting\ the\ charset\ used\ to\ encode\ form\ data.\ \ Also,\ \"Accept-Charset:\ *\"\ tells\ me\ nothing.\ \ And\ I\ still\ don't\ know\ what\ to\ do\ in\ the\ case\ of\ MSIE\ 8\ sending\ me\ shiftjis!\n\nI\ have\ not\ solved\ the\ numeric\ character\ reference\ issue\ yet,\ since\ I'm\ concerned\ about\ overzealously\ applying\ character\ substitution.\ \ For\ instance,\ if\ the\ user\ tries\ to\ upload\ HTML,\ and\ the\ browser\ doesn't\ apply\ any\ extra\ encoding\ on\ top\ of\ the\ text\ entered\ by\ the\ user,\ the\ server\ should\ ''not''\ replace\ character\ references.\ \ I\ will\ need\ to\ do\ some\ more\ investigation\ to\ see\ how\ each\ major\ browser\ handles\ HTML-like\ text\ entry.\ \ Fixing\ one\ problem\ is\ all\ well\ and\ good,\ but\ not\ if\ it\ creates\ two\ more!\n\nOne\ thing\ I\ haven't\ established\ is\ whether\ Firefox's\ default\ form\ charset\ is\ ''really''\ ISO-8859-1\ or\ if\ it's\ ASCII.\ \ They\ look\ alike,\ for\ the\ text\ I\ used.\ \ If\ it's\ actually\ ASCII,\ then\ I\ am\ free\ to\ assume\ UTF-8,\ which\ is\ fully\ compatible\ with\ ASCII\ but\ not\ ISO-8859-1,\ except\ to\ the\ extent\ that\ ISO-8859-1\ is\ also\ compatible\ with\ ASCII.\n\nThe\ problem\ I'm\ working\ is\ unrelated\ to\ 406\ errors.\ \ I'm\ trying\ to\ interpret\ the\ client's\ request,\ and\ 406\ is\ only\ appropriate\ when\ the\ server\ is\ unable\ to\ generate\ a\ response\ that\ meets\ the\ client's\ requirements.\ \ Besides,\ Tcl\ has\ such\ powerful\ \[encoding\]\ capabilities\ that\ I\ would\ be\ shocked\ if\ I\ ever\ actually\ needed\ to\ send\ a\ 406,\ at\ least\ once\ I\ implement\ outbound\ charset\ encoding!\n\nMeanwhile,\ I\ think\ I'll\ try\ lobbying\ the\ major\ open\ source\ browsers\ to\ include\ Content-Type\ charset\ with\ the\ form\ data\ in\ their\ requests.\n\n***Wibble\ doesn't\ play\ nice\ with\ ab\ (apache\ benchmark)***\n\n\[jnc\]\n\n\ tclsh8.6\ wibble.tcl\n\nIn\ another\ console\n\n\ \ ab\ -n\ 500\ -c\ 5\ http://localhost:8080/vars\n\nWibble\ accepts\ and\ responds\ to\ the\ first\ requests\ but\ for\ some\ reason\ ab\ is\ not\ accepting\ the\ result\ as\ complete\ and\ is\ not\ making\ the\ subsequent\ 495\ requests.\ Eventually\ ab\ gives\ up\ and\ reports:\n\n\ \ apr_poll:\ The\ timeout\ specified\ has\ expired\ (70007)\n\nI\ know\ it\ sounds\ a\ little\ like\ an\ ab\ bug\ because\ wibble\ seems\ to\ work\ with\ the\ browsers\ I\ tested\ but\ I\ can't\ imagine\ that\ the\ apache\ benchmark\ tool\ is\ broke,\ it\ works\ with\ other\ web\ servers\ I\ have\ used\ it\ on\ over\ the\ past\ few\ years.\ I\ wonder\ if\ there\ isn't\ some\ subtle\ thing\ that\ wibble\ isn't\ doing/closing\ out\ and\ the\ browsers\ are\ being\ forgiving\ while\ ab\ is\ not?\n\n***contentsize\ required\ when\ using\ contentchan***\n\n\[AMG\]:\ My\ intention\ was\ that\ contentsize\ be\ optional\ when\ using\ contentchan,\ but\ from\ looking\ at\ the\ code\ I\ can\ see\ that\ using\ contentchan\ without\ contentsize\ will\ result\ in\ an\ error\ when\ \[\[defaultsend\]\]\ tries\ to\ calculate\ the\ value\ of\ end.\n\nThat's\ not\ the\ only\ problem.\ \ The\ server\ would\ either\ need\ to\ use\ Connection:\ Close\ or\ chunked\ encoding,\ since\ those\ are\ the\ only\ two\ ways\ (I\ know\ of)\ to\ tell\ the\ client\ when\ the\ data\ ends\ without\ telling\ it\ the\ total\ size\ up\ front.\ \ \[JBR\]\ requests\ chunked\ encoding.\n\nWhen\ Wibble\ doesn't\ know\ the\ size\ in\ advance,\ it\ can't\ use\ \[\[\[chan\ copy\]\]\]\ with\ chunked\ encoding.\ \ That's\ because\ each\ chunk\ has\ a\ size\ header,\ and\ Wibble\ won't\ know\ until\ after\ \[\[chan\ copy\]\]\ terminates\ whether\ the\ chunk\ was\ cut\ short\ by\ EOF.\ \ Therefore,\ in\ this\ situation\ Wibble\ must\ buffer\ each\ chunk.\n\nHow\ large\ should\ the\ chunks\ be?\ \ The\ chunk\ size\ represents\ a\ tradeoff\ between\ overhead\ and\ latency.\ \ Likely\ the\ best\ thing\ to\ do\ is\ have\ Wibble\ do\ ''nonblocking''\ reads\ from\ the\ contentchan.\ \ This\ will\ minimize\ latency,\ and\ the\ chunks\ may\ have\ variable\ size.\n\nOne\ wacky\ possibility\ is\ that\ the\ contentchan\ is\ a\ \[refchan\],\ socket,\ pipe,\ etc.\ connected\ to\ a\ real-time\ data\ provider.\ \ Nonblocking\ reads\ will\ serve\ well.\ \ However,\ the\ only\ advantage\ of\ using\ chunked\ encoding\ in\ this\ situation\ instead\ of\ Connection:\ Close\ is\ that\ when\ the\ data\ source\ terminates,\ the\ connection\ can\ remain\ open.\n\n\[Bezoar\]\ 02-25-2016\n\nI\ have\ been\ force\ to\ do\ some\ development\ on\ Windows\ and\ have\ found\ that\ wibbles::handle\ function\ corrupts\ the\ zh_dict.\ On\ linux\ the\ same\ proc\ works\ correctly.\ The\ problem\ seems\ to\ be\ the\ file\ split\ on\ windows.\ The\ file\ split\ on\ windows\ seems\ to\ convert\ '\[\ file\ split\ \"/handle\\\\x0/2\"\ \]'\ to\ '/\ /handle\ /2'\ \ rather\ than\ '/\ handle\ 2'.\ Ive\ had\ to\ correct\ the\ dict\ as\ follows\n\nCurrent\ Code:\ \n\n======\nproc\ ::wibble::handle\ \{prefix\ cmd\ args\}\ \{\n\ \ \ \ \ set\ command\ \[concat\ \[list\ \$name\]\ \[lrange\ \$cmd\ 1\ end\]\]\n\ \ \ \ \ lappend\ zonehandlers\ \$prefix\ \$command\ \$args\n\ \ \ \ \ set\ h_count\ \[expr\ \[llength\ \$zonehandlers\]/3\ -\ 1\]\n\ \ \ \ \ dict\ set\ zh_dict\ \{*\}\[file\ split\ \$prefix/handlers\\x0/\$h_count\]\ \[list\ \$prefix\ \$command\ \$args\]\n\ \ \ \ \ #\ New:\ return\ place\ of\ newly-added\ handler\ in\ list.\n\ \ \ \ \ return\ \$h_count\ \n\ \}\n\ \n\n======\n\nCorrected\ Code\ \ :\n======\nproc\ ::wibble::handle\ \{prefix\ cmd\ args\}\ \{\n\ \ \ \ \ set\ command\ \[concat\ \[list\ \$name\]\ \[lrange\ \$cmd\ 1\ end\]\]\n\ \ \ \ \ lappend\ zonehandlers\ \$prefix\ \$command\ \$args\n\ \ \ \ \ set\ h_count\ \[expr\ \[llength\ \$zonehandlers\]/3\ -\ 1\]\n\ \ \ \ \ \ \ set\ slist\ \{\}\n\ \ \ \ \ \ \ if\ \{\ \$prefix\ eq\ \"/\"\ \}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ set\ slist\ \[list\ \"/\"\ handlers\\x0\ \$h_count\ \]\n\ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ set\ slist\ \[\ concat\ \[file\ split\ \$prefix\ \]\ handlers\\x0\ \$h_count\ \]\ \n\ \ \ \ \}\ \n\ \ \ \ dict\ set\ zh_dict\ \{*\}\$slist\ \ \[list\ \$prefix\ \$command\ \$args\]\n\ \ \ \ #\ New:\ return\ place\ of\ newly-added\ handler\ in\ list.\n\ \ \ \ return\ \$h_count\ \n\ \}\n\ \n\n\n----\n**Resolved\ bugs**\n\n***\[\[getline\]\]\ line\ limit\ check***\n\n\[APN\]\ Does\ the\ getline\ proc\ have\ a\ bug\ in\ its\ line\ limit\ check?\ It\ appears\ it\ will\ raise\ an\ error\ if\ the\ ''total''\ buffered\ input\ exceeds\ maxline,\ as\ opposed\ to\ individual\ lines\ exceeding\ that\ limit.\ For\ example,\ if\ it\ receives\ 50\ lines\ of\ 100\ chars\ each\ in\ a\ single\ network\ chunk,\ it\ will\ raise\ an\ error\ when\ it\ should\ not.\n\n\[AMG\]:\ I\ think\ you're\ right,\ but\ at\ the\ moment\ I\ don't\ have\ the\ means\ to\ test.\ \ If\ a\ client\ sends\ a\ bunch\ of\ short\ lines\ while\ the\ server\ is\ taking\ a\ nap,\ when\ the\ server\ wakes,\ it\ will\ act\ as\ if\ the\ client\ had\ sent\ one\ long\ line.\ \ For\ some\ reason\ I\ had\ thought\ \[\[\[chan\ pending\]\]\]\ would\ tell\ me\ how\ many\ characters\ would\ be\ in\ the\ next\ line\ returned\ by\ \[\[\[chan\ gets\]\]\],\ but\ here\ this\ is\ only\ the\ case\ when\ there's\ at\ most\ one\ line\ per\ packet.\ \ By\ \"packet\"\ I\ mean\ the\ batch\ of\ input\ characters\ which\ triggered\ the\ readable\ event.\n\nWhat's\ the\ right\ way\ to\ handle\ this?\n\nWhen\ I\ detect\ that\ the\ next\ \[\[gets\]\]\ ''may''\ exceed\ \$maxline,\ I\ could\ instead\ do\ a\ \[\[\[chan\ read\]\ \[\[expr\ \{\$maxline\ +\ 1\}\]\]\]\]\ and\ check\ for\ newline.\ \ If\ there's\ a\ newline,\ everything\ up\ to\ the\ (first)\ newline\ is\ the\ next\ line\ to\ process,\ and\ the\ newline\ itself\ is\ discarded.\ \ But\ what\ do\ I\ do\ with\ the\ rest\ of\ the\ input?\ \ What\ has\ been\ read,\ cannot\ be\ unread.\ \ Basically\ I\ would\ have\ to\ move\ the\ input\ buffering\ facility\ from\ the\ Tcl\ I/O\ subsystem\ into\ my\ own\ script.\ \ Things\ get\ really\ nasty\ when\ the\ read\ contains\ both\ HTTP\ header\ and\ body\ text,\ because\ the\ header\ text\ needs\ CR/LF\ translation\ and\ the\ body\ text\ does\ not.\ \ I\ can't\ put\ CR's\ back\ into\ the\ body\ text,\ because\ I\ don't\ know\ if\ they\ were\ there\ originally.\ \ So\ I\ would\ not\ be\ able\ to\ use\ Tcl's\ CR/LF\ translation\ for\ the\ header\;\ instead\ I'd\ need\ to\ strip\ CR's\ in\ the\ script.\n\nI\ think\ the\ goal\ of\ \[\[chan\ pending\]\]\ was\ to\ avoid\ having\ to\ do\ this,\ because\ the\ approach\ I\ have\ described\ is\ the\ same\ as\ what\ is\ required\ in\ the\ absence\ of\ \[\[chan\ pending\]\].\n\n\[APN\]\ How\ about\ doing\ the\ chan\ gets\ before\ the\ chan\ pending?\ Something\ like\ -\n\n======\nproc\ wibble::getline\ \{chan\}\ \{\n\ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ if\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ Got\ a\ whole\ line,\ may\ be\ more\ than\ 4096,\ but\ that's\ ok\n\ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ blocked\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ Partial\ line,\ see\ if\ max\ length\ exceeded\n\ \ \ \ \ \ \ \ \ \ \ \ if\ \{\[chan\ pending\ input\ \$chan\]\ >\ 4096\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ Be\ anal.\ Do\ a\ gets\ again\ since\ data\ might\ have\ come\ in\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ between\ the\ chan\ gets\ and\ the\ chan\ pending\ calls.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ For\ example,\ there\ are\ 4000\ bytes\ without\ crlf\ when\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ the\ chan\ gets\ was\ called.\ Then\ another\ 1000\ bytes\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ arrive\ with\ a\ crlf\ at\ the\ front\ before\ the\ chan\ pending\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ call.\ The\ line\ length\ limit\ is\ not\ exceeded\ in\ that\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ case\ even\ through\ chan\ pending\ returns\ >\ 4096.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ error\ \"line\ length\ greater\ than\ 4096\"\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ #\ Incomplete\ line,\ but\ not\ exceeding\ max\ length.\ Wait\ for\ more.\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ #\ EOF\n\ \ \ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \}\n\}\n======\n\n\[AMG\],\ 2010-11-06:\ I\ reverted\ the\ changes\ I\ made\ for\ this\ bug.\ \ I\ really\ don't\ think\ they're\ necessary.\ \ Here's\ my\ current\ code:\n\n======\n#\ Get\ a\ line\ of\ data\ from\ a\ channel.\nproc\ wibble::getline\ \{chan\}\ \{\n\ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ if\ \{\[chan\ names\ \$chan\]\ eq\ \"\"\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ gets\ \$chan\ line\]\ >=\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ return\ \$line\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ pending\ input\ \$chan\]\ >\ 4096\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ error\ \"line\ length\ exceeds\ limit\ of\ 4096\ bytes\"\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \}\n\}\n======\n\nIf\ a\ line\ is\ available,\ no\ matter\ how\ long\ it\ is,\ it\ gets\ returned.\ \ If\ there\ isn't\ a\ complete\ line,\ but\ there's\ more\ than\ 4K\ in\ the\ buffer\ (and\ no\ CRLF),\ the\ client\ is\ sending\ too\ long\ a\ line,\ and\ an\ error\ report\ is\ generated\ and\ the\ connection\ is\ closed.\ \ The\ client\ can\ send\ a\ long\ line,\ but\ there's\ no\ guarantee\ that\ it'll\ work.\ \ Also\ there's\ no\ guarantee\ that\ it'll\ get\ rejected\ immediately,\ but\ I\ think\ this\ is\ okay,\ so\ long\ as\ it\ gets\ rejected\ before\ the\ server\ runs\ out\ of\ RAM!\n\nNote\ that\ this\ only\ affects\ \[\[getline\]\].\ \ \[\[getblock\]\]\ has\ no\ line\ length\ limits,\ since\ it's\ not\ line-oriented.\ \ Line-oriented\ reads\ are\ only\ done\ for\ HTTP\ headers\ and\ such,\ which\ the\ client\ can\ legally\ break\ onto\ multiple\ lines.\n\nI\ suppose\ the\ next\ problem\ is\ the\ client\ sending\ a\ never-ending\ stream\ of\ header\ data,\ split\ onto\ multiple\ lines!\n\n***Invalid\ command\ name\ process,\ C\ stack\ busy***\n\n\[APN\]:\ When\ run\ against\ Tcl\ built\ against\ CVS\ head\ as\ of\ 12/18/2009,\ wibble\ fails\ with\ the\ error\ \"invalid\ command\ name\ process\".\ The\ fix\ is\ to\ change\ the\ coroutine\ call\ in\ wibble::accept\ to\ pass\ in\ \[\[\[namespace\ current\]\]\]::process\ instead\ of\ process.\ The\ question\ is\ whether\ this\ change\ in\ coroutine\ command\ in\ how\ it\ resolves\ names\ was\ intentional\ or\ not.\n\n\[AMG\]:\ That's\ odd.\ \ \[\[namespace\ current\]\]::process\ works\ fine,\ but\ \[\[\[namespace\ code\]\ process\]\]\ does\ not.\ \ When\ I\ try\ the\ latter,\ I\ get\ \"cannot\ yield:\ C\ stack\ busy\".\ \n\n\[MS\]\ (after\ today's\ fix,\ but\ it\ shouldn't\ matter)\n\ \ \ \ \n\ \ \ \ %\ proc\ a\ \{\}\ \{yield\;\ return\ hello\}\n\ \ \ \ %\ coroutine\ foo\ \{*\}\[namespace\ code\ a\]\n\ \ \ \ %\ foo\n\ \ \ \ hello\n\n\[APN\]\ The\ current\ coroutine\ docs\ state\ '''At\ the\ point\ when\ command\ is\ called,\ the\ current\ namespace\ will\ be\ the\ global\ namespace\ and\ there\ will\ be\ no\ stack\ frames\ above\ it\ (in\ the\ sense\ of\ upvar\ and\ uplevel).\ However,\ which\ command\ to\ call\ will\ be\ determined\ in\ the\ namespace\ that\ the\ coroutine\ command\ was\ called\ from.'''\ Based\ on\ this\ I\ would\ presume\ the\ original\ code\ should\ have\ worked\ as\ well\ (without\ qualifiers)\ since\ the\ coroutine\ command\ should\ have\ resolved\ '''process'''\ to\ be\ '''wibble::process'''\ based\ on\ the\ namespace\ it\ was\ called\ from.\ This\ behaviour\ seems\ to\ have\ changed\ very\ recently,\ perhaps\ the\ documentation\ has\ not\ been\ updated?\n\n\[MS\]\ 2009-12-19\ looks\ like\ \[\[bug\ #2917627\]\]\ \[https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2917627&group_id=10894\],\ fixed\ in\ head.\n\n\[APN\]\ Confirmed.\n\n\[AMG\]:\ MS,\ thanks\ for\ your\ fix.\ \ I\ reverted\ my\ workaround,\ since\ it's\ no\ longer\ needed.\ \ New\ topic:\ Why\ does\ \[\[namespace\ code\]\]\ add\ a\ frame\ to\ the\ C\ stack?\n\n\[AMG\]:\ As\ of\ 2010-11-07,\ \[\[\[namespace\ code\]\]\]\ seems\ to\ work\ fine.\n\n***Wibble\ hangs\ on\ IE8\ POST***\n\n\[dzach\]\ 2010-05-12:\ I've\ encountered\ an\ odd\ problem\ with\ proc\ ''getblock'',\ which\ hangs\ wibble\ when\ sending\ a\ POST\ request\ from\ an\ IE8\ in\ Windows\ XP:\ the\ ''append\ chunk\ \$chunklet''\ line\ seems\ to\ be\ failing\ to\ initialize\ variable\ ''chunk''.\ IE8,\ in\ my\ configuration,\ sends\ the\ POST\ content\ in\ two\ packets,\ the\ first\ of\ which\ is\ empty\ (size\ is\ 0).\ When\ this\ happens\ wibble\ hangs\ for\ ever.\ Firefox\ sends\ the\ POST\ content\ in\ one\ packet\ and\ the\ problem\ does\ not\ appear.\ Initializing\ ''chunk''\ outside\ the\ ''while''\ loop\ with:\ ''set\ chunk\ \"\"'',\ solves\ the\ problem.\ In\ the\ test\ setup,\ Wibble\ runs\ on\ Linux\ with\ Tcl8.6b1.2.\n\n\[AMG\]:\ That\ is\ very\ odd.\ \ I\ should\ get\ IE8\ for\ myself\ to\ test,\ but\ I\ don't\ have\ it\ right\ this\ minute.\ \ Two-argument\ \[\[\[append\]\]\]\ always\ creates\ the\ variable\ if\ it\ doesn't\ exist,\ regardless\ of\ its\ second\ argument.\ \ In\ tclExecute.c,\ doStoreStk\ is\ part\ of\ the\ implementation\ of\ compiled\ append,\ and\ it\ calls\ TclObjLookupVarEx()\ with\ the\ \"create\"\ flags\ set.h.\ \ For\ the\ sake\ of\ argument,\ let's\ say\ that\ \[\[append\]\]\ is\ buggy\ and\ fails\ to\ create\ chunk\ when\ chunklet\ is\ empty.\ \ \$size\ should\ be\ nonzero\ in\ this\ case,\ and\ \[\[chan\ eof\ \$chan\]\]\ is\ false\ because\ IE8\ hasn't\ closed\ the\ channel\ yet,\ so\ \[\[\[yield\]\]\]\ is\ executed.\ \ Nowhere\ in\ this\ code\ path\ does\ chunk's\ existence\ matter,\ and\ there's\ no\ reason\ why\ pre-creating\ chunk\ would\ have\ an\ effect.\ \ The\ little\ information\ available\ suggests\ there's\ a\ bug\ in\ Tcl,\ but\ I\ can't\ say\ for\ sure.\ \ Please\ put\ in\ some\ debug\ prints\ to\ stderr.\ \ In\ particular,\ I\ want\ prints\ before\ and\ after\ the\ yield,\ to\ see\ which\ branch\ of\ the\ \[\[\[if\]\]\]\ is\ being\ taken\ and\ whether\ the\ coroutine\ is\ ever\ resumed.\ \ Do\ this\ both\ without\ and\ with\ your\ ''set\ chunk\ \"\"''\ workaround.\ \ Oh,\ another\ question:\ When\ Wibble\ hangs,\ does\ it\ take\ up\ 0%\ or\ 100%\ CPU\ time?\n\n\[dzach\]:\ The\ tests\ run\ with\ utf-8\ encoding.\ I\ noticed\ that\ when\ I\ insert\ the\ ''puts\ stderr\ \$chunklet''\ line,\ the\ problem\ does\ not\ appear.\ If\ I\ take\ out\ that\ line,\ I\ reproduce\ the\ problem.\ CPU\ load\ is\ 100%.\ Here\ come\ the\ tests:\n\n\ #\ test\ proc\n\ proc\ ::wibble::getblock\ \{chan\ size\}\ \{\n\ \ \ \ \ puts\ stderr\ size=\$size\n\ \ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ set\ chunklet\ \[chan\ read\ \$chan\ \$size\]\n\ \ \ \ \ \ \ \ puts\ stderr\ \"\{\$size\ -\ \[string\ length\ \$chunklet\]\}\"\n\ \ \ \ \ \ \ \ set\ size\ \[expr\ \{\$size\ -\ \[string\ length\ \$chunklet\]\}\]\n\ \ \ \ \ \ \ \ append\ chunk\ \$chunklet\n\ \ \ \ \ \ \ \ if\ \{\$size\ ==\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ size==0\"\n\ \ \ \ \ \ \ \ \ \ return\ \$chunk\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ eof\"\n\ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"before\ yield\"\n\ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"after\ yield\"\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \}\n\ \}\n\n\ #\ output\n\ size=56\n\ \{56\ -\ 0\}\n\ before\ yield\n\ after\ yield\n\ \{56\ -\ 56\}\n\n\ #\ test\ proc\ (work-around)\n\ proc\ ::wibble::getblock\ \{chan\ size\}\ \{\n\ \ \ \ \ puts\ stderr\ size=\$size\n\ \ \ \ \ set\ chunk\ \"\"\n\ \ \ \ \ while\ \{1\}\ \{\n\ \ \ \ \ \ \ \ set\ chunklet\ \[chan\ read\ \$chan\ \$size\]\n\ \ \ \ \ \ \ \ puts\ stderr\ \"\{\$size\ -\ \[string\ length\ \$chunklet\]\}\"\n\ \ \ \ \ \ \ \ puts\ stderr\ chunklet=\$chunklet\n\ \ \ \ \ \ \ \ set\ size\ \[expr\ \{\$size\ -\ \[string\ length\ \$chunklet\]\}\]\n\ \ \ \ \ \ \ \ append\ chunk\ \$chunklet\n\ \ \ \ \ \ \ \ if\ \{\$size\ ==\ 0\}\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ size==0\"\n\ \ \ \ \ \ \ \ \ \ return\ \$chunk\n\ \ \ \ \ \ \ \ \}\ elseif\ \{\[chan\ eof\ \$chan\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ chan\ close\ \$chan\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"return\ eof\"\n\ \ \ \ \ \ \ \ \ \ return\ -level\ \[info\ level\]\n\ \ \ \ \ \ \ \ \}\ else\ \{\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"before\ yield\"\n\ \ \ \ \ \ \ \ \ \ yield\n\ \ \ \ \ \ \ \ \ \ puts\ stderr\ \"after\ yield\"\n\ \ \ \ \ \ \ \ \}\n\ \ \ \ \ \}\n\ \}\n\n\ #\ output\ (work-around)\n\ size=56\n\ \{56\ -\ 0\}\n\ chunklet=\n\ before\ yield\n\ after\ yield\n\ \{56\ -\ 56\}\n\ chunklet=name=&geometry=38.079226,23.930992&description=&dtstart=\n\ chunk=name=&geometry=38.079226,23.930992&description=&dtstart=\n\ return\ size==0\n\n\[AMG\]:\ I\ created\ a\ simple\ method=\"post\"\ form,\ installed\ IE8,\ and\ submitted\ data\ several\ times.\ \ IE8\ always\ sent\ the\ full\ form\ data\ as\ a\ single\ packet,\ with\ no\ incomplete\ or\ zero-sized\ reads.\ \ Is\ there\ anything\ special\ about\ your\ form\ that's\ making\ IE8\ do\ what\ it\ does?\ \ I'm\ running\ the\ Tcl\ CVS\ HEAD\ compiled\ earlier\ today\ (12\ May\ 2010,\ ChangeLog\ revision\ 1.5108),\ on\ \[Slackware\]\ 13.0.\ \ How\ about\ you?\n\nAlso,\ please\ explain\ why\ you\ stated\ that\ you\ used\ UTF-8\ encoding.\ \ At\ present,\ Wibble\ is\ only\ designed\ to\ work\ with\ ISO8859-1,\ because\ it\ doesn't\ have\ the\ smarts\ to\ figure\ out\ what\ encodings\ are\ acceptable\ to\ the\ client,\ and\ ISO8859-1\ is\ the\ legal\ fallback\ in\ that\ case.\ \ (I\ have\ a\ project\ to\ fix\ this,\ but\ it's\ stalled\ due\ to\ lack\ of\ time.)\ \ How\ is\ it\ that\ you're\ using\ UTF-8?\n\nIf\ simply\ \[puts\]'ing\ the\ value\ of\ \$chunklet\ makes\ the\ problem\ go\ away,\ most\ likely\ there\ is\ a\ Tcl\ bug.\ \ Please\ explore\ this\ a\ little\ more.\ \ What\ happens\ if\ you\ take\ the\ value\ of\ \$chunklet\ and\ ignore\ it?\ \ For\ example,\ \"list\ \$chuklet\".\ \ If\ adding\ that\ makes\ the\ problem\ go\ away,\ something\ is\ definitely\ going\ wacky\ under\ the\ hood,\ and\ you\ should\ file\ a\ bug\ report.\ \ Or\ maybe\ it's\ the\ puts\ that's\ having\ an\ effect,\ and\ it\ doesn't\ matter\ what\ text\ is\ being\ printed.\ \ Again,\ that's\ a\ Tcl\ problem,\ since\ the\ output\ channel\ is\ unrelated\ to\ network\ communication\ with\ the\ client.\n\n100%\ CPU\ load\ suggests\ that\ \[\[\[chan\ read\]\]\]\ is\ returning\ empty\ string,\ yet\ the\ channel\ is\ still\ readable\ and\ \[\[\[chan\ event\]\]\]\ keeps\ calling\ into\ the\ \[coroutine\].\ \ That\ doesn't\ make\ sense\ to\ me.\n\nI\ see\ one\ more\ oddity\ I\ can't\ explain.\ \ Your\ debug\ print\ shows\ geometry\ to\ be\ \"38.079226,23.930992\",\ yet\ both\ Firefox\ 3.6.3\ and\ IE8\ encode\ this\ as\ \"38.079226%2C23.930992\".\ \ What's\ going\ on\ here?\ \ Why\ isn't\ your\ comma\ also\ encoded?\n\nBy\ the\ way,\ this\ page\ is\ getting\ too\ long\ for\ my\ liking.\ \ I\ wish\ some\ benevolent\ \[wikignomes\]\ would\ clean\ it\ up\ for\ me!\ :^)\n\n\[dzach\]:\ Wibble,\ in\ my\ tests,\ substitutes\ a\ hand\ crafted\ -for\ speed-\ web\ server,\ and\ runs\ on\ a\ Kubuntu\ (2.6.32-22\ kernel)\ with\ tcl\ from\ CVS\ HEAD\ (changelog\ 1.5098/Sat\ May\ \ 1\ 11:20:12\ 2010).\ IE8\ is\ on\ the\ same\ machine\ running\ windows\ XP\ in\ a\ Sun's\ VirtualBox.\ \n\nThe\ POST\ is\ sent\ using\ a\ typical\ javascript\ AJAX\ XMLHttpRequest:\ `req.setRequestHeader(\"Content-type\",\ \"application/x-www-form-urlencoded\;\ charset=utf-8\")`.\ In\ my\ narrow\ scope\ application,\ Wibble\ serves\ UTF\ encoded\ (mostly\ dynamic)\ content\ (in\ Greek)\ and\ user\ POSTed\ submissions\ get\ decoded\ inside\ a\ modified\ `wibble::process`\ using\ `encoding\ convertto\ UTF-8\ \[\[dict\ get\ \$response\ content\]\]`\ (could\ as\ well\ do\ it\ outside\ Wibble,\ since\ all\ this\ is\ done\ after\ `getblock`\ finishes\ reading\ the\ channel).\ I\ suspect\ UTF-8\ might\ have\ something\ to\ do\ with\ the\ problem.\ \n\nThe\ POST\ content\ send\ by\ the\ browser\ using\ AJAX\ is\ encoded\ with\ javascript's\ ''encodeURI()''\ which\ leaves\ some\ characters\ (,\ /\ ?\ :\ @\ &\ =\ +\ \$\ #)\ unencoded.\ I'll\ check\ if\ something\ is\ missing\ or\ if\ changing\ this\ makes\ a\ difference.\ I'll\ also\ try\ to\ run\ IE8\ on\ another\ machine\ using\ a\ native\ windows\ XP\ installation.\n\n(A\ little\ later):\ IE8\ hangs\ the\ server\ from\ a\ native\ XP\ installation\ too.\ Changing\ the\ ''encodeURI''\ to\ ''encodeURIComponent''\ (which\ encodes\ comma\ too)\ has\ no\ effect\ to\ the\ result.\ Away\ from\ the\ UTF\ issue,\ it\ seems\ that\ the\ problem\ is\ cured\ as\ soon\ as\ ''chunklet''\ gets\ \"stringified\".\ I\ guess\ it\ would\ be\ worth\ while\ to\ test\ if\ ''chan\ read''\ returns\ a\ string\ type\ when\ it\ reads\ nothing.\ Having\ said\ that,\ initializing\ ''chunklet''\ instead\ of\ ''chunk''\ doesn't\ help,\ so\ my\ guessing\ might\ be\ wrong.\n\n\[AMG\]:\ dzach,\ thanks\ for\ your\ testing.\ \ Please\ keep\ us\ posted.\ \ If\ anyone\ has\ a\ clue\ what\ might\ be\ going\ on\ here,\ please\ jump\ in\ and\ give\ me\ a\ hint,\ 'cuz\ I'm\ mystified.\ :^)\n\n\[dzach\]:\ I\ like\ Wibble's\ Zen,\ so\ testing\ it\ is\ fun,\ given\ the\ existance\ of\ a\ work-around\ for\ this\ problem.\ If\ I\ get\ the\ time,\ I'll\ try\ to\ write\ a\ simple\ AJAX\ test\ so\ that\ others\ can\ reproduce\ it.\n\n\[dzach\]\ 2010-5-17:\ The\ problem\ appears\ with\ an\ unmodified\ Wibble\ (ISO8859-1\ encoding),\ ruling\ out\ a\ UTF-8\ involvement.\nTo\ reproduce\ the\ error,\ use\ the\ code\ in\ http://paste.tclers.tk/2087\ and\ Internet\ Explorer\ 8.\n\n\[AMG\]:\ Works\ fine\ for\ me.\ \ Am\ I\ doing\ something\ wrong?\ \;^)\ \ I\ get\ the\ same\ behavior\ for\ both\ Firefox\ and\ IE8:\n\n======\nsize=22\n\{22\ -\ 22\}\nreturn\ size==0\n======\n\nTonight\ I\ plan\ to\ cobble\ together\ a\ HTTP\ client\ that\ will\ split\ its\ POSTs\ over\ two\ packets,\ just\ to\ see\ if\ this\ really\ is\ the\ cause\ of\ the\ problem.\n\n\[AMG\]:\ Looks\ like\ I\ haven't\ found\ time\ to\ do\ this,\ yet.\n\n\[AMG\],\ update:\ This\ appears\ to\ be\ a\ Tcl\ bug,\ which\ I\ have\ now\ reported\ on\ Sourceforge\ \[http://sourceforge.net/tracker/?func=detail&aid=3091701&group_id=10894&atid=110894\].\ \ Thanks\ \[dzach\]\ for\ bringing\ it\ to\ my\ attention!\ \ It\ sure\ took\ me\ a\ long\ time\ to\ reproduce\ it.\n\n\[dzach\]\ I'm\ glad\ to\ see\ the\ bug\ nailed.\n\n\[AMG\]:\ The\ bug\ is\ fixed\ in\ the\ Tcl\ CVS\ HEAD,\ as\ of\ 2010-09-15.\ \ See\ \[http://sourceforge.net/tracker/?func=detail&aid=3067036&group_id=10894&atid=110894\].\ \ Thanks\ \[DKF\]!\ \ I\ confirmed\ the\ fix\ earlier\ tonight.\n\n***Error:\ command\ returned\ bad\ code:\ 2***\n\n\[AMG\]:\ This\ error\ pops\ up\ whenever\ the\ client\ closes\ the\ connection,\ at\ least\ when\ Wibble\ is\ running\ inside\ \[tkcon\].\ \ I\ really\ have\ no\ clue\ what's\ going\ on.\ \ I\ reported\ it\ on\ the\ Tcl\ bug\ tracker:\ \[https://sourceforge.net/tracker/?func=detail&aid=3104471&group_id=10894&atid=110894\].\ \ This\ is\ new\;\ older\ CVS\ versions\ didn't\ have\ this\ issue.\n\n\[AMG\],\ update:\ Alright,\ I've\ found\ a\ workaround.\ \ In\ \[\[listen\]\],\ change\ this\ line:\n\n======\nchan\ event\ \$socket\ readable\ \[namespace\ code\ \$socket\]\n======\n\nto\ instead\ read:\n\n======\nchan\ event\ \$socket\ readable\ \[list\ apply\ \[list\ \{\}\ \[list\ \$socket\]\ \[namespace\ current\]\]\]\n======\n\nIt's\ ugly,\ but\ it\ works.\n\n\[AMG\],\ update\ #2:\ I\ incorporated\ this\ workaround\ in\ the\ latest\ version.\ \ I\ hope\ to\ back\ it\ out\ again\ someday,\ 'cuz\ it's\ gross.\n\n\[AMG\],\ update\ #3:\ Now\ the\ workaround\ is\ integral\ to\ the\ way\ Wibble\ wakes\ up\ coroutines.\ \ See\ \[\[wake\]\].\ \ Now\ this\ is\ just\ a\ Tcl\ bug,\ with\ no\ impact\ on\ Wibble.\n\n\[AMG\],\ update\ #4:\ Further\ Wibble\ developments\ have\ caused\ this\ workaround\ to\ no\ longer\ be\ integral,\ so\ once\ again\ it's\ back\ to\ being\ an\ annoying\ bug.\ \ Hopefully\ the\ bug\ will\ be\ fixed\ one\ day\ so\ that\ I\ can\ simplify\ \[\[listen\]\]\ to\ have\ \[\[\[socket\]\]\]\ call\ \[\[\[coroutine\]\]\]\ directly\ without\ using\ \[\[\[apply\]\]\]\ to\ construct\ an\ intervening\ stack\ layer.\ \ At\ that\ time\ I'll\ also\ have\ to\ update\ \[\[cleanup\]\]\ to\ use\ \[\[\[upvar\]\ #1\]\]\ instead\ of\ \[\[upvar\ #2\]\].\n\n***Zone\ handlers\ applied\ in\ wrong\ order***\n\n\[AMG\]:\ Zone\ handlers\ should\ be\ applied\ in\ the\ order\ they're\ defined.\ \ Instead,\ they're\ sorted\ first\ by\ zone\ (directory).\n\n\[AMG\]:\ This\ is\ now\ fixed.\n\n***Filename\ spoofing***\n\n\[AMG\]:\ It's\ possible\ to\ defeat\ pattern-based\ filtering\ by\ appending\ a\ NUL\ (%00)\ to\ the\ URL.\ \ I\ blame\ Tcl.\ \;^)\ \ See\ \[https://sourceforge.net/tracker/?func=detail&aid=3118489&group_id=10894&atid=110894\].\n\n\[SEH\]\ --\ note\ that\ on\ the\ page\ \[Tilde\ Substitution\]\ I've\ placed\ a\ package\ of\ procs\ that\ can\ substitute\ by\ aliasing\ for\ the\ standard\ Tcl\ file\ access\ commands\;\ these\ procs\ both\ eliminate\ tilde\ substitution\ and\ throw\ errors\ if\ NULs\ are\ found\ in\ pathnames.\ \ This\ package\ should\ transparently\ insulate\ Wibble\ and\ any\ other\ Tcl\ code\ from\ the\ above\ vulnerability.\n\n\[AMG\]:\ This\ is\ now\ fixed\ in\ Tcl:\ \[http://core.tcl.tk/tcl/info/b5dd510e85bb0711c9f00a94620d602ffd2a9350\].\ \ Thanks\ \[Jan\ Nijtmans\]!\n\n***Cookie:\ Expires***\n\n\[dzach\]:\ It\ looks\ like\ the\ set-cookie\ expires\ handling\ in\ ''::wibble::enheader''\ has\ a\ bug:\n\ \ \}\ expires\ \{\n\ \ \ \ switch\ \[lindex\ \$val3\ 0\]\ \{\n\ \ \ \ \ \ \ abstime\ \{append\ str\ \\\;expires=\[entime\ \$val3\ 1\]\}\n\ \ \ \ \ \ \ reltime\ \{append\ str\ \\\;max-age=\[lindex\ \$val3\ 1\]\}\n\ \ \ \ \}\n\ \ \}\n''::wibble::entime''\ expects\ a\ two\ item\ list,\ ''\[list\ \$key3\ \$val3\]'',\ but\ gets\ two\ separate\ arguments\ instead,\ i.e.\ ''\$val3\ 1''.\ The\ ''entime''\ command\ should\ be\ as\ follows:\n\ \ \}\ expires\ \{\n\ \ \ \ switch\ \[lindex\ \$val3\ 0\]\ \{\n\ \ \ \ \ \ \ abstime\ \{append\ str\ \\\;expires=\[entime\ \$val3\]\}\n\ \ \ \ \ \ \ ...\n\ \ \ \ \}\n\ \ \}\n\n\n<<categories>>\ Wibble} CALL {my revision {Wibble bugs}} CALL {::oo::Obj3754098 process revision/Wibble+bugs} CALL {::oo::Obj3754096 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 56)
    invoked from within
"$server process [string trim $uri /]"

Error processing request

Parameters

Body

Error

-code

-level

-errorstack

-errorcode

-errorinfo

-errorline