[Python-Dev] Bug in SimpleHTTPRequestHandler.send_head?

Kim Gräsman kim.grasman at gmail.com
Fri Sep 5 13:56:02 CEST 2008


Hi all,

I'm new to this group and the Python language as such. I stumbled on
it when I joined a project to build a rich network library for C++,
which in turn uses Python and its CGI HTTP server implementation as
part of its unit test suite.

We're having a little trouble when serving a text file containing
Windows line endings (CRLF) -- the resulting content contains Unix
line endings only (LF). This breaks our tests, because we can't verify
that the body, as parsed by our HTTP client, is the same as the source
file we're serving through the Python HTTP server.

I've isolated it to the SimpleHTTPRequestHandler.send_head method in
SimpleHTTPServer.py:

--
        ctype = self.guess_type(path)
        if ctype.startswith('text/'):
            mode = 'r'
        else:
            mode = 'rb'
        try:
            f = open(path, mode)
        except IOError:
            self.send_error(404, "File not found")
            return None
--

The f object is returned from this method, and used with
shutil.copyfileobj to copy the contents to the output stream.

This is easily fixed by omitting the content-type check entirely, and
blindly using mode 'rb', and I think that makes sense, because the
server should not be concerned with the contents of the body, so
treating it as a binary stream seems right.

This also fixes another issue, where the actual body size differs from
what's specified in the Content-Length header, because CR characters
are stripped when the body is served, but Content-Length contains the
source file's binary size.

I'm not sure which source control system you're using, so I won't try
to provide a patch, but I believe the code should read:

--
        if os.path.isdir(path):
            if not self.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(301)
                self.send_header("Location", self.path + "/")
                self.end_headers()
                return None
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
        #patch: removed content-type check
        try:
            f = open(path, 'rb')  #patch: always open in binary mode
        except IOError:
            self.send_error(404, "File not found")
            return None
        self.send_response(200)
        self.send_header("Content-type", self.guess_type(path))
#patch: content-type check here instead
        fs = os.fstat(f.fileno())
--

My changes marked with "#patch[...]".

Grateful for any comments!

Best wishes,
- Kim


More information about the Python-Dev mailing list