[Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4

Sat Nov 20 19:19:11 CET 2010

On 11/20/2010 3:38 AM, Éric Araujo wrote:
> Hello
>
>> cgitb.enable(0,"d:\temp")
> Isn’t that expanded to “d:<tab>emp”?
>

Oops.  Yes, that fixes the problem with creation of the temp file, 
thanks for catching that.  I  now get a complete report of the original 
error in the temp file (below).  I am a bit less confused now... but it 
seems that there are still a number of issues.  Here is an enumeration 
of problems I was hard pressed to make before you removed my confusion 
on this issue.

1. cgitb should expect to report to a binary stdout, using whatever 
encoding (possibly ASCII) that seems appropriate for the output that in 
generates.

2. Some appropriate documentation or API or both should be provided to 
enable a script to set "binary" mode for stdout for CGI scripts. This 
link 
<http://www.eggheadcafe.com/software/aspnet/36023550/cgi-python-3-write-raw-binary.aspxhttp://www.eggheadcafe.com/software/aspnet/36023550/cgi-python-3-write-raw-binary.aspx> 
demonstrates the confusion (wish I had found it earlier) that is 
encountered by such lack.  One must tell msvcrt the stream is binary (I 
had figured that out early on), one must also sidestep the use of the 
cp1252 default when printing binary, one must also choose a proper text 
encoding corresponding to the HTTP headers sent.  My second email in 
this thread, sent a few hours after the first, shows a convenient set of 
cures for all but msvcrt (as long as only "write" is used for writing.  
"print" support could be added, similarly).  Likely something along this 
line is needed for stdin as well, I haven't yet experimented with 
uploading binary content to a CGI.

One could speculate about having the Python runtime auto-detect CGI 
mode, but I don't know of any foolproof technique for that, and the 
selection of the "proper" text encoding depends on the details of the 
CGI, so having instead an API or two that assists with doing this sort 
of thing would be better; the need for documentation, at least, seems 
imperative.

3. subprocess documentation could be improved to point out  that when 
using subprocess.PIPE to talk to a Python subprocess, that the 
communications will be in binary.  Again, I don't know of any way to 
autodetect the subprocess environment, but if it were possible to select 
an appropriate encoding and use it consistently on both sides of the 
PIPE, that would be a convenience to its use; if not possible, 
documenting the issue, and providing an API to use to easily select such 
encodings both in client and server, would be helpful.

While the layers are all there, and ".buffer" is documented for 
TextIOWrapper, the use of sys.stdout.buffer and the fact that it has a 
full set of operations isn't immediately obvious from the reference 
material; perhaps it is in a tutorial I haven't found, but... I was 
looking, and didn't find it.

Of course, subprocess may launch non-Python programs; they will have 
their own ideas of binary vs text encoding, so it is important that it 
is convenient to match them on the Python side.

It would be nice if subprocess had a mechanism for providing no-deadlock 
stdout data to the parent prior to the child terminating.  A CGI 
implementation via subprocess shouldn't accumulate all of stdout (or all 
of stderr, for that matter, although less important).  I don't (yet) 
know enough about Python threading to know if this is possible, but it 
certainly would be useful.

4. http.server has a number of bugs and limitations.
4a. _url_collapse_path_split seems inefficient (although I have to 
benchmark it against what I think would be more efficient), and for its 
only use within http.server it produces the wrong information, so the 
information has to be recombined and resplit to make it function 
properly, adding to the perception of inefficiency.
4b. Detection of "executable" on Windows is simply wrong.  Unix 
execution bits do not exist.
4c. is_cgi doesn't properly handle PATHINFO parts of the path, this is 
the other half of 4a.  The Python2.x CGIHTTPServer.py had this right, 
but the introduction and use of _url_collapse_path_split broke it.
4d. Searching for a ? to find an explicit query string should use 
.find('?') rather than .rfind('?') as there is no prohibition on using 
'?' within a query string, AFAIK.
4e. doesn't set the REQUEST_URI, HTTP_HOST, or HTTP_PORT  environment 
variables for the CGI.
4f. Should not send the 200 response until it sees if the CGI sends a 
Status: header.
4g. Should not buffer all of stdout: subprocess.communicate is 
inappropriate for a web server CGI interface.  The data should stream 
through to avoid consuming inordinate amounts of memory.  The only 
solution within the current limitations of subprocess is to abandon 
stderr, force the CGI to do its own error logging, and use 
shutil.copyfileobj to hook up p.stdout to self.wfile once the Status: 
message processing has happened.
4h. Doesn't seem to close p.stdin (I'm not sure if that is necessary, it 
may happen when p is garbage collected, but effort was made to close 
p.stdout and p.stderr, which seem similar.)

*TypeError* 	Python 3.2a4: c:\python32\python.exe
Sat Nov 20 09:28:41 2010

A problem occurred in a Python script. Here is the sequence of function 
calls leading up to the error, in the order they occurred.

d:\my\py\test12.py in **()
     4 import cgitb
     5 sys.stdout.write("out")
     6 fhb = open("fhb", "wb")
     7 cgitb.enable(0,"d:\\temp")
=>    8 fhb.write("abcdef")  # try writing non-binary to binary file.  Expect an error, of course.
*fhb* = <_io.BufferedWriter name='fhb'>, fhb.*write* = <built-in method 
write of _io.BufferedWriter object>

*TypeError*: 'str' does not support the buffer interface
args = ("'str' does not support the buffer interface",)
with_traceback = <built-in method with_traceback of TypeError object>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20101120/53f6522e/attachment.html>