[Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4
Glenn Linderman
v+python at g.nevcal.com
Sun Nov 21 08:52:45 CET 2010
On 11/20/2010 10:19 AM, Glenn Linderman wrote:
> Oops. Yes, that fixes the problem with creation of the temp file,
> thanks for catching that. I now get a complete report of the
> original error in the temp file (below). I am a bit less confused
> now... but it seems that there are still a number of issues. Here is
> an enumeration of problems I was hard pressed to make before you
> removed my confusion on this issue.
Related issues, regarding binary stream requirements for cgi interface.
Perhaps the cgi module should have the API to set binary mode.
http://bugs.python.org/issue1610654
http://bugs.python.org/issue8077
http://bugs.python.org/issue4953
Sadly, cgi.py input handling seems to depend on the email module,
thought to be fixed for 3.2, but it is not clear if that has been
achieved, or if the surrogate encode workaround is sufficient for this.
More testing needed, but I don't have such a test case developed yet.
> 1. cgitb should expect to report to a binary stdout, using whatever
> encoding (possibly ASCII) that seems appropriate for the output that
> in generates.
Maybe cgi.py should have an API to set the stdin and stdout to binary
streams. Although cgi.py deals more with stdin than stdout, cgitb
deals more with stdout.
Created http://bugs.python.org/issue10479
>
> 2. Some appropriate documentation or API or both should be provided to
> enable a script to set "binary" mode for stdout for CGI scripts. This
> link
> <http://www.eggheadcafe.com/software/aspnet/36023550/cgi-python-3-write-raw-binary.aspxhttp://www.eggheadcafe.com/software/aspnet/36023550/cgi-python-3-write-raw-binary.aspx>
> demonstrates the confusion (wish I had found it earlier) that is
> encountered by such lack. One must tell msvcrt the stream is binary
> (I had figured that out early on), one must also sidestep the use of
> the cp1252 default when printing binary, one must also choose a proper
> text encoding corresponding to the HTTP headers sent. My second email
> in this thread, sent a few hours after the first, shows a convenient
> set of cures for all but msvcrt (as long as only "write" is used for
> writing. "print" support could be added, similarly). Likely
> something along this line is needed for stdin as well, I haven't yet
> experimented with uploading binary content to a CGI.
>
> One could speculate about having the Python runtime auto-detect CGI
> mode, but I don't know of any foolproof technique for that, and the
> selection of the "proper" text encoding depends on the details of the
> CGI, so having instead an API or two that assists with doing this sort
> of thing would be better; the need for documentation, at least, seems
> imperative.
Created http://bugs.python.org/issue10480
>
> 3. subprocess documentation could be improved to point out that when
> using subprocess.PIPE to talk to a Python subprocess, that the
> communications will be in binary. Again, I don't know of any way to
> autodetect the subprocess environment, but if it were possible to
> select an appropriate encoding and use it consistently on both sides
> of the PIPE, that would be a convenience to its use; if not possible,
> documenting the issue, and providing an API to use to easily select
> such encodings both in client and server, would be helpful.
>
> While the layers are all there, and ".buffer" is documented for
> TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a
> full set of operations isn't immediately obvious from the reference
> material; perhaps it is in a tutorial I haven't found, but... I was
> looking, and didn't find it.
>
> Of course, subprocess may launch non-Python programs; they will have
> their own ideas of binary vs text encoding, so it is important that it
> is convenient to match them on the Python side.
>
> It would be nice if subprocess had a mechanism for providing
> no-deadlock stdout data to the parent prior to the child terminating.
> A CGI implementation via subprocess shouldn't accumulate all of stdout
> (or all of stderr, for that matter, although less important). I don't
> (yet) know enough about Python threading to know if this is possible,
> but it certainly would be useful.
http://bugs.python.org/issue1048 for subprocess to document that
communicate produces byte stream output.
http://bugs.python.org/issue10482 for subprocess enhancements to handle
more cases without deadlock.
Found http://bugs.python.org/issue4571 which documents how to switch
stdin/stdout/stderr to binary mode, and even back! I couldn't track
the documented change to the actual documentation, though, but I did
find it in section 26.1, under the documentation for the three stdio
streams:
def make_streams_binary():
sys.stdin = sys.stdin.detach()
sys.stdout = sys.stdout.detach()
> 4. http.server has a number of bugs and limitations.
> 4a. _url_collapse_path_split seems inefficient (although I have to
> benchmark it against what I think would be more efficient), and for
> its only use within http.server it produces the wrong information, so
> the information has to be recombined and resplit to make it function
> properly, adding to the perception of inefficiency.
> 4b. Detection of "executable" on Windows is simply wrong. Unix
> execution bits do not exist.
http://bugs.python.org/issue10483 for 4b.
> 4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is
> the other half of 4a. The Python2.x CGIHTTPServer.py had this right,
> but the introduction and use of _url_collapse_path_split broke it.
http://bugs.python.org/issue10484 for 4a and 4c.
> 4d. Searching for a ? to find an explicit query string should use
> .find('?') rather than .rfind('?') as there is no prohibition on using
> '?' within a query string, AFAIK.
http://bugs.python.org/issue10485 for 4d.
> 4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT environment
> variables for the CGI.
http://bugs.python.org/issue10486 for 4e.
> 4f. Should not send the 200 response until it sees if the CGI sends a
> Status: header.
http://bugs.python.org/issue10487 for 4f and 4g.
> 4g. Should not buffer all of stdout: subprocess.communicate is
> inappropriate for a web server CGI interface. The data should stream
> through to avoid consuming inordinate amounts of memory. The only
> solution within the current limitations of subprocess is to abandon
> stderr, force the CGI to do its own error logging, and use
> shutil.copyfileobj to hook up p.stdout to self.wfile once the Status:
> message processing has happened.
> 4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary,
> it may happen when p is garbage collected, but effort was made to
> close p.stdout and p.stderr, which seem similar.)
Discovered that subprocess.communicate closes p.stdin, so it wasn't
needed until I quit using .communicate in my version of the code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20101120/df858165/attachment-0001.html>
More information about the Python-Dev
mailing list