upload file via web form
Trent Mick
trentm at ActiveState.com
Fri Mar 7 18:35:55 EST 2003
[John Hunter wrote]
>
> I want to automatically upload some data to a password protected web
> server where some form variables must be filled out and a file must be
> uploaded.
>
> I know how to do it directly with httplib. Is there a better way?
I don't think so. Normally urllib.urlopen() (or urllib2.urlopen()) would
be what you should use here however then do not handle HTTP POST
requests properly when a file upload is involved. Uploading form data
when a file is involved means that the HTTP POST body must be encoded as
multipart/form-data. .urlopen() is hardcoded to use a Content-Type
header of application/x-www-form-urlencoded (which is what you want if
_no_ file is involved).
Here is a method that I have been working on to do this. I suppose a bug
should be logged on urlopen for this as well.
Trent
--
Trent Mick
TrentM at ActiveState.com
-------------- next part --------------
"""Some httplib helper methods."""
def httprequest(url, postdata={}, headers={}):
"""A urllib.urlopen() replacement for http://... that gets the
content-type right for multipart POST requests.
"url" is the http URL to open.
"postdata" is a dictionary describing data to post. If the dict is
empty (the default) a GET request is made, otherwise a POST
request is made. Each postdata item maps a string name to
either:
- a string value; or
- a file part specification of the form:
{"filename": <filename>, # file to load content from
"content": <content>, # (optional) file content
"headers": <headers>} # (optional) headers
<filename> is used to load the content (can be overridden by
<content>) and as the filename to report in the request.
<headers> is a dictionary of headers to use for the part.
Note: currently the file part content but be US-ASCII text.
"headers" is an optional dictionary of headers to send with the
request. Note that the "Content-Type" and "Content-Length"
headers are automatically determined.
The current urllib.urlopen() *always* uses:
Content-Type: application/x-www-form-urlencoded
for POST requests. This is incorrect if the postdata includes a file
to upload. If a file is to be posted the post data is:
Content-Type: multipart/form-data
This returns the response content if the request was successfull
(HTTP code 200). Otherwise an IOError is raised.
For example, this invocation:
url = 'http://www.perl.org/survey.cgi'
postdata = {
"name": "Gisle Aas",
"email": "gisle at aas.no",
"gender": "M",
"born": "1964",
"init": {"filename": "~/.profile"},
}
Would generate a request similar to this (your boundary and
~/.profile content will likely be different):
POST http://www.perl.org/survey.cgi
Content-Length: 388
Content-Type: multipart/form-data; boundary="6G+f"
--6G+f
Content-Disposition: form-data; name="name"
Gisle Aas
--6G+f
Content-Disposition: form-data; name="email"
gisle at aas.no
--6G+f
Content-Disposition: form-data; name="gender"
M
--6G+f
Content-Disposition: form-data; name="born"
1964
--6G+f
Content-Disposition: form-data; name="init"; filename=".profile"
Content-Type: text/plain
PATH=/local/perl/bin:$PATH
export PATH
--6G+f--
Limitations:
- I don't think binary files are handled properly. And I don't
think Unicode files will be handled properly. We will have to
get smart on allowing the mimetype and (if text) charset to be
specified. By default we try to guess: text/plain or
application/octet-stream. If text/* then try to guess the
charset. See Lib/email/Charset.py for inspiration here. There
are also a couple of Python Cookbook recipes for encoding
guessing.
- This doesn't do HTTP error handling for some code as does
urllib.urlopen() for error codes 301, 302 and 401.
- I don't know if the return semantics are good. For instance
the reponse headers are not accessible.
Inspiration: Perl's HTTP::Request module.
http://aspn.activestate.com/ASPN/Reference/Products/ActivePerl/site/lib/HTTP/Request/Common.html
"""
import httplib, urllib, urlparse
from email.MIMEText import MIMEText
from email.MIMEMultipart import MIMEMultipart
if not url.startswith("http://"):
raise "Invalid URL, only http:// URLs are allow: url='%s'" % url
if not postdata:
method = "GET"
body = None
else:
method = "POST"
# Determine if require a multipart content-type: 'contentType'.
for part in postdata.values():
if isinstance(part, dict):
contentType = "multipart/form-data"
break
else:
contentType = "application/x-www-form-urlencoded"
headers["Content-Type"] = contentType
# Encode the post data: 'body'.
if contentType == "application/x-www-form-urlencoded":
body = urllib.urlencode(postdata)
elif contentType == "multipart/form-data":
message = MIMEMultipart(_subtype="form-data")
for name, value in postdata.items():
if isinstance(value, dict):
# Get content.
if "content" in value:
content = value["content"]
else:
fp = open(value["filename"], "rb")
content = fp.read()
fp.close()
# Create text part. Do not use ctor to set payload
# to avoid adding a trailing newline.
part = MIMEText(None)
part.set_payload(content, "us-ascii")
# Add content-disposition header.
dispHeaders = value.get("headers", {})
if "Content-Disposition" not in dispHeaders:
#XXX Should be a case-INsensitive check.
part.add_header("Content-Disposition", "form-data",
name=name, filename=value["filename"])
for dhName, dhValue in dispHeaders:
part.add_header(dhName, dhValue)
else:
# Do not use ctor to set payload to avoid adding a
# trailing newline.
part = MIMEText(None)
part.set_payload(value, "us-ascii")
part.add_header("Content-Disposition", "form-data",
name=name)
message.attach(part)
message.epilogue = "" # Make sure body ends with a newline.
# Split off the headers block from the .as_string() to get
# just the message content. Also add the multipart Message's
# headers (mainly to get the Content-Type header _with_ the
# boundary attribute).
headerBlock, body = message.as_string().split("\n\n",1)
for hName, hValue in message.items():
headers[hName] = hValue
#print "XXX ~~~~~~~~~~~~ multi-part body ~~~~~~~~~~~~~~~~~~~"
#import sys
#sys.stdout.write(body)
#print "XXX ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
else:
raise "Invalid content-type: '%s'" % contentType
# Make the HTTP request and get the response.
# Precondition: 'url', 'method', 'headers', 'body' are all setup properly.
scheme, netloc, path, parameters, query, fragment = urlparse.urlparse(url)
if parameters or query or fragment:
raise "Unexpected URL form: parameters, query or fragment parts "\
"are not allowed: parameters=%r, query=%r, fragment=%r"\
% (parameters, query, fragment)
conn = httplib.HTTPConnection(netloc)
try:
conn.request(method, path, body, headers)
response = conn.getresponse()
# Process the reponse. Here is a summary of HTTP responses:
# http://www.btinternet.com/~wildfire/reference/httpstatus/index.htm
if response.status == 200:
return response.read()
else:
#print "XXX http error:"
#print " status:", response.status
#print " reason:", response.reason
#print " msg:", response.msg
raise IOError, ('http error', response)
finally:
conn.close()
More information about the Python-list
mailing list