File Upload with tcl's http

CMcC 20050227: The process of uploading a file to a server entails encoding the file (and other form elements) as multipart/form-data. The code below handles that.

The second part of the upload will use http to send a -query with -type multipart/form-data, causing the client to POST the mime multipart/form-data to the server. Voila.

Note: Has anyone been able to upload a file with this anywhere? Not only that it does not read the file in binary mode (which causes data corruption).. also the returned data does contain headerlines that do not get put in the actual http header, if you simply pass the returned block to the -query parameter of http::geturl .. which causes a wrong content-type and then everything goes down the drain.. also, but thats the smallest issue, most places like it when you pass a filename aswell - but that could just be added to the disposition of add_binary as 'filename="somename.jpg"'.. MHo: Yes, a while ago I implemented file uploads with tclhttpd in our environment. But that's not used anymore. It wasn't much complicated, if I remember right... I only programmed a little dialog and used the Upload_Url-mechanism... it just worked, with one drawback: control over what is uploaded wasn't possible before the upload, only after.

Note II: I did some changes to bring the script a little closer to a state, where its actually working, but its still not fully doing what it should because of the "header-problem"..


BR 2005-06-02 - I also tried to make it work. I wanted to test a Java upload servlet which was based on Jakarta's commons-fileupload.

Problems:

  • The mime package doesn't have a getheaders method and getbody doesn't work for multipart. I think buildmessage should be split into those two functions. That is the "headers-problem" mentioned above.
  • The mime package appends a spurious additional \r\n to each item in the multipart data. That is because mime appends a \r\n to each item and than also adds a \r\n before the boundary.
  • The http package can't source POST data from memory, it wants a channel (or alternatively, the mime package can't produce a channel for use with http).
  • Commons-fileupload doesn't understand the quoted boundary spec correctly which the mime package produces in the Content-Type header. This is a bug in commons-fileupload, I guess, but maybe the mime package should not make things complicated here. Update 2006-06-06 (just for the record): Commons-fileupload has this fixed in the CVS.

All those problems can be worked around but the amount of code is so much that it seems easier to just roll my own for now.


 # Provide multipart/form-data for http
 
 package provide form-data 1.0
 package require mime
 
 namespace eval form-data {}
 
 proc form-data::compose {partv {type multipart/form-data}} {
     upvar 1 $partv parts
 
     set mime [mime::initialize -canonical $type -parts $parts]
     set packaged [mime::buildmessage $mime]
     foreach part $parts {
         mime::finalize $part
     }
     mime::finalize $mime
 
     return $packaged
 }
 
 proc form-data::add_binary {partv name filename value type} {
     upvar 1 $partv parts
     set disposition "form-data; name=\"${name}\"; filename=\"$filename\""
     lappend parts [mime::initialize -canonical $type \
                    -string $value \
                    -encoding binary \
                    -header [list Content-Disposition $disposition]]
 }
 
 proc form-data::add_field {partv name value} {
     upvar 1 $partv parts
     set disposition "form-data; name=\"${name}\""
     lappend parts [mime::initialize -canonical text/plain -string $value \
                        -header [list Content-Disposition $disposition]]
 }
 
 proc form-data::format {name filename value type args} {
     set parts {}
     foreach {n v} $args {
         add_field parts $n $v
     }
     add_binary parts $name $filename $value $type
     return [compose parts]
 }
 
 if {[info script] eq $argv0} {
 
     # format a gif file upload according to the following form:
     #<FORM METHOD="POST" ENCTYPE="multipart/form-data" ACTION="upload.php"> 
     #<INPUT TYPE="HIDDEN" NAME="MAX_FILE_SIZE" VALUE=" "> 
     #<INPUT TYPE="HIDDEN" NAME="action" VALUE="1"> 
     #<INPUT TYPE="FILE" NAME="file1">
     #<INPUT TYPE="SUBMIT" VALUE="Host It"> <br> 
     #<INPUT TYPE="text" NAME="img_resize"  SIZE="4" MAXLENGTH="4">
     #</FORM>
 
     # get contents of the gif
     set fd [open ./logo125.gif]
     fconfigure $fd -translation binary
     set image [read $fd]
     close $fd
 
     # set up other fields
     array set fields {
         MAX_FILE_SIZE " "
         action 1
         img_resize "100%"
     }
 
     # format the image and form
     puts [form-data::format file1 "logo125.gif" $image image/gif {*}[array get fields]]
 }

BR 2005-06-02 - This proc works around the "header-problem" and it uses a temporary file for the body data to connect to the http package.

 package require http
 
 proc form-data::post {url field type file {params {}} {headers {}}} {
     # get contents of the file
     set fd [open $file r]
     fconfigure $fd -translation binary
     set content [read $fd]
     close $fd
 
     # format the file and form
     set message [eval [list form-data::format \
                                $field [file tail $file] $content $type] \
                          $params]
 
     # parse the headers out of the message body
     set message [split [string map {"\r\n\r\n" "\1"} $message] "\1"]
     set headers_raw [lindex $message 0]
     set body [join [lrange $message 1 end] "\r\n\r\n"]
 
     set headers_raw [string map {"\r\n " " " "\r\n" "\n"} $headers_raw]
     regsub {  +} $headers_raw " " headers_raw
     #set headers {} -- initial value comes from parameter
     foreach line [split $headers_raw "\n"] {
         regexp {^([^:]+): (.*)$} $line all label value
         lappend headers $label $value
     }
 
     # get the content-type
     array set ha $headers
     set content_type $ha(Content-Type)
     unset ha(Content-Type)
     set headers [array get ha]
 
     # create a temporary file for the body data (getting the temp directory
     # is more involved if you want to support Windows right)
     set datafile "/tmp/post[pid]"
     set data [open $datafile w+]
     fconfigure $data -translation binary
     puts -nonewline $data $body
     seek $data 0
 
     # POST it
     set token [http::geturl $url -type $content_type -binary true \
                        -headers $headers -querychannel $data]
     http::wait $token
 
     # cleanup the temporary
     close $data
     file delete $datafile
 
     return $token
 }

Erl 2005-08-09 (August 9)

I have submitted a patch (#1254934 in SourceForge) to mime.tcl to fix the extra line feed added to attachments. Just line feeds removed in two places. I created a SourceForge tcllib bug #1254937 for it as well.

Further, the form-data-post function above has a problem with binary files, because it replaces all 0x01 bytes with a \r\n sequence. Here is a modified version, which does not require an external file either.


 proc  form-data::post {url field type file {params {}} {headers {}}} {
     # get contents of the file
     set fd [open $file r]
     fconfigure $fd -translation binary -encoding binary
     set content [read $fd]
     close $fd

     # format the file and form
     set message [eval [list form-data::format \
                             $field [file tail $file] $content $type] \
                       $params]

     # parse the headers out of the message body because http get url wants
     # them as a separate parameter
     set headerEnd [string first "\r\n\r\n" $message]
     incr headerEnd 1
     set bodystart [expr $headerEnd + 3]
     set headers_raw [string range $message 0 $headerEnd]
     set body [string range $message $bodystart end]
     set headers_raw [string map {"\r\n " " " "\r\n" "\n"} $headers_raw]
     regsub {  +} $headers_raw " " headers_raw
                                                                               
     foreach line [split $headers_raw "\n"] {
         regexp {^([^:]+): (.*)$} $line all label value
         lappend headers $label $value
     }

     # get the content-type
     array set ha $headers
     set content_type $ha(Content-Type)
     unset ha(Content-Type)
     set headers [array get ha]

     # POST it
     set token [http::geturl $url -type $content_type -binary true \
                             -headers $headers -query $body]
     http::wait $token

     return $token
 }

vinniyo - 2013-10-26 02:27:13

Has anyone seen any method of upload large files as a xml part(500MB to 2GB)? I have been unsucessful. ---Correction. The Header-Problem from BR 2005-06-02 has a temp file that it writes to and uses "seek $data 0" for posting. Great code. Here is my youtube Data API video uploader derived from everyone elses code:

package require mime
package require xmlgen
namespace import ::xmlgen::*

proc format_upload {file_location title description category keywords} {
    
    set authx [get_refresh]
    set del_key <>
    
    declaretag entry
    declaretag media:group
    declaretag media:title
    declaretag media:description
    declaretag media:category
    declaretag media:keywords
    
    xmlgen::buffer xml_meta {entry xmlns=http://www.w3.org/2005/Atom xmlns:media=http://search.yahoo.com/mrss/ xmlns:yt=http://gdata.youtube.com/schemas/2007 ! {
        media:group ! {
            media:title type=plain - $title
            media:description type=plain - $description
            media:category scheme=http://gdata.youtube.com/schemas/2007/categories.cat - $category
            media:keywords - $keywords
        }  
    }}
    
    set parts {}
    lappend parts [mime::initialize -canonical {application/atom+xml} -string $xml_meta -encoding binary]
    lappend parts [mime::initialize -canonical {video/avi} -string "video_file" -encoding binary]

    set mime [mime::initialize -canonical {multipart/related} -parts $parts]

    set packaged [mime::buildmessage $mime]
    ::mime::finalize [lindex $parts 0]
    ::mime::finalize [lindex $parts 1]
    ::mime::finalize $mime
    
    puts "getting header"
    update
    
    set headerEnd [string first "\r\n\r\n" $packaged]
    incr headerEnd 1
    set bodystart [expr $headerEnd + 3]
    set headers_raw [string range $packaged 0 $headerEnd]
    
    set bodyend [string first "video_file" $packaged]
    set body [string range $packaged $bodystart $bodyend-1]
    set ender [string range $packaged $bodyend+10 end]
    
    set headers_raw [string map {"\r\n " " " "\r\n" "\n"} $headers_raw]
    regsub {  +} $headers_raw " " headers_raw
    foreach line [split $headers_raw "\n"] {
        regexp {^([^:]+): (.*)$} $line all label value
        lappend headers $label $value
    }
    
    array set ha $headers
    set content_type $ha(Content-Type)
    
    set datafile [file join tmp post[pid]]
    set data [open $datafile w+]
    fconfigure $data -translation binary
    puts -nonewline $data $body
    
    set input [open $file_location r]
    fconfigure $input -translation binary
    while {[gets $input line] != -1} {puts $data $line}
    close $input
    
    puts -nonewline $data $ender
    seek $data 0
    
    puts "uploading now"
    update

    if {[catch {set token [http::geturl "http://uploads.gdata.youtube.com/feeds/api/users/default/uploads" -binary true -type $content_type -headers "Slug afv.mp4 Connection close GData-Version 2 X-GData-Key key=$del_key Authorization {Bearer $authx}" -querychannel $data]} error]} {puts "error from geturl in upload: $error"; return 0}     
    http::wait $token
    puts "done with upload"
    update
    set post_return [http::data $token]
    http::cleanup $token
    close $data
    file delete -- $datafile
    return $post_return   
}

proc get_refresh {} {
    set refresh_token <>
    set client_secret <>
    set client_id <>
    
    set token [http::geturl https://accounts.google.com/o/oauth2/token -headers "Content-Type application/x-www-form-urlencoded" -query client_id=$client_id&client_secret=$client_secret&refresh_token=$refresh_token&grant_type=refresh_token]
    set info [http::data $token]
    http::cleanup $token
    if {[regexp {\"access_token\"...\"([^\"]*)\"} $info dump access]} {
        return $access
    } else {
        puts "refreshing didnt work :( Im poor $info"
        return 0
    }
}

proc upload {t3_id title username} {
    
    regsub -all {[\"\;\'\-\]\[$^?+*()|\\%&#]} $title "" title
    set desc $title
    set desc_db $title
    append desc \n\n[annotate]
    if {[string length $title] > 60} {set title "[string range $title 0 56]..."}
    
    set keywords [get_longest $desc_db 2]
    set keywords [string map {" " ", "} $keywords]
    set data [format_upload [file join O: AFV $t3_id.f4v] $title $desc Comedy $keywords]
    
    switch -regexp -- $data {
       <yt:videoid> {regexp {<yt:videoid>([^<]*)</yt:videoid>} $data dump videoID; write_db $username $t3_id $videoID $desc_db; wait 2}
       too_many_recent_calls {puts "waiting 1 min Too many calls"; wait 60} 
       Forbidden {puts "Forbidden..Stopping"; vwait forever}
       default {puts "UPLOAD ERROR IS: $data"; return 0} 
    }
}