[Web-SIG] Random thoughts

Tue Nov 4 21:01:15 EST 2003

On 04 November 2003, David Fraser said:
> However, we should deal with uploaded files differently here - they 
> could be huge! You dno't want them read in automatically.

I'm pretty happy with the solution I came up with for Quixote 0.5.1: a
subclass of HTTPRequest, HTTPUploadRequest, specialized to handle
"multipart/form-data" requests (which are mainly used for uploads, hence
the name of the class).  From upload.py in the Quixote distribution:

class HTTPUploadRequest (HTTPRequest):
    """
    Represents a single HTTP request with Content-Type
    "multipart/form-data", which is used for HTTP uploads.  (It's
    actually possible for any HTML form to specify an encoding type of
    "multipart/form-data", even if there are no file uploads in that
    form.  In that case, you'll still get an HTTPUploadRequest object --
    but since this is a subclass of HTTPRequest, that shouldn't cause
    you any problems.)

    When processing the upload request, any uploaded files are stored
    under a temporary filename in the directory specified by the
    'upload_dir' instance attribute (which is normally set, by
    Publisher, from the UPLOAD_DIR configuration variable).
    HTTPUploadRequest then creates an Upload object which contains the
    various filenames for this upload.

    Other form variables are stored as usual in the 'form' dictionary,
    to be fetched later with get_form_var().  Uploaded files can also be
    accessed via get_form_var(), which returns the Upload object created
    at upload-time, rather than a string.

    Eg. if your upload form contains this:
      <input type="file" name="upload">

    then, when processing the form, you might do this:
      upload = request.get_form_var("upload")

    after which you could open the uploaded file immediately:
      file = open(upload.tmp_filename)

    or move it to a more permanent home before doing anything with it:
      permanent_name = os.path.join(permanent_upload_dir,
                                    upload.base_filename)
      os.rename(upload.tmp_filename, permanent_name)
    """

Even though this design was fairly strongly motivated by backwards
compatibility concerns, it turns out to be pretty neat and elegant.  The
request body isn't read until Quixote's Publisher calls
request.process_inputs(), which means that Quixote can still return
certain types of error response (mainly 404 "not found" or 403 "access
denied") before it reads that potentially huge upload.  And the uploaded
file is written to disk with a secure temporary name, so the application
can rename it, read it, or whatever without worrying about sudden leaps
in memory consumption.

        Greg
-- 
Greg Ward <gward at python.net>                         http://www.gerg.ca/
Hand me a pair of leather pants and a CASIO keyboard -- I'm living for today!