[Web-SIG] Re: Bill's comments on WSGI draft 1.4
Phillip J. Eby
pje at telecommunity.com
Thu Sep 2 05:25:56 CEST 2004
At 06:07 PM 9/1/04 -0700, Bill Janssen wrote:
>1. The "environ" parameter must be a Python dict: I think subclasses
>should be allowed. A true subclass supports all methods of its
>ancestors, so the rationale presented in the back of the PEP for
>excluding them doesn't hold water. I think the appropriate check
>would be to see if the returned class is a subclass of the "dict"
>class. That is, "isinstance(e, dict)" should return True.
Paradoxically, allowing subclasses eliminates the usefulness of allowing
subclasses. Presumably, the purpose of using a subclass is to provide some
extended behavior, e.g. as an attribute/method, or as a byproduct of
requesting particular keys or values. In both cases, these extended
behaviors would be destroyed the minute that a piece of middleware decides
to use its *own* dictionary subclass.
This also ignores the issue that creating a dictionary subclass that
*consistently* enforces some extended behavior (e.g. lazy evaluation of a
key) is intrinsically difficult and fragile, because new versions of Python
often introduce new dictionary methods that are not implemented in terms of
other existing methods, thus breaking a previously "perfect" subclass when
a new Python version is released.
These are "practicality beats purity" argument, so I need to see some
*practical* applications of dictionary subclasses that would be useful
enough to outweigh both of the above issues.
>2. The "fileno" attribute on the returned iterable. I'm a bit
>concerned about using operating system file descriptors, due to
>resource constraints; I think a better check would be to see if the
>returned iterable is a subclass of the "file" class. That is,
>"isinstance(f, file)" should return true.
The purpose of 'fileno' is specifically to allow the use of operating
system APIs that copy data from one file descriptor to another. Many
Python objects have valid 'fileno' attributes besides files, including
sockets and pipes. Many non-stdlib objects in common use have 'fileno'
attributes that serve this purpose. 'select.select' takes objects with
'fileno', and so on.
Because 'file' has a 'fileno' attribute, 'isinstance(f,file)' implies
'hasattr(f,"fileno")'. Therefore, the latter is the preferred behavior
here, because it doesn't unnecessarily exclude other valid wrappers of file
descriptors.
>3. Comments about "The [status-line] string must be 7-bit
>ASCII...containing no control characters." That's overly restrictive;
>I think it would be better to simply refer to RFC 2616 and say that it
>should follow the rules defined there for "Reason-Phrase".
>
>4. Similarly, the rules about header values are more restrictive than
>HTTP; they therefore prevent perfectly valid HTTP header values from
>being returned. That's bad. Again, I think the PEP should simply
>refer to RFC 2616 and say, "Use those rules".
These restrictions are intended to simplify servers and middleware; nobody
has yet presented an example of a scenario where this imposed any practical
limitation.
The fallback position would be that the status string and headers must not
be CR or CRLF terminated. But, I'd prefer to stick with a "no embedded
control characters" approach, mainly to avoid situations where people embed
'\n' and think that will be correct.
Here's what RFC 2616 has to say about TEXT, which is the format of the
status message and of header values:
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047
[14].
TEXT = <any OCTET except CTLs,
but including LWS>
A CRLF is allowed in the definition of TEXT only as part of a header
field continuation. It is expected that the folding LWS will be
replaced with a single SP before interpretation of the TEXT value.
In other words, no control characters except for folding, and 7-bit ASCII
with optional ISO-8859-1. In practice, however, RFC 2047 allows for
encoding ISO-8859-1 *in* 7-bit ASCII as well. So, the only actual
limitation being imposed by the PEP is on folding, and on the necessary
encoding of non-ASCII characters.
Again, this is a practicality v. purity issue. Are you aware of any
applications that currently fold their headers, or transmit ISO-8859-1
characters without using the encoding prescribed by RFC 2047? Is there a
practical use case for either one?
I'm willing to listen on this point, but as of the moment I find it hard to
imagine what the use case for either of these features is. By contrast, I
do have very specific use cases in mind where supporting those features
causes problems:
* Applications creating broken headers (e.g. with '\n' instead of '\r\n')
or broken folds
* Applications mistakenly transmitting Unicode without considering encoding
issues
* Middleware and servers forgetting to factor out folds when parsing data
for interpretation
* In order to ensure safe interpretation, smart middleware and server
developers will have to write routines to *unfold* potentially-folded
headers; why not just disallow folding to begin with?
>5. The phrase about "if a server or gateway discards or overrides any
>application header for any reason, it must record this in a log"; that
>should be "should" instead of "must". Otherwise you'll have your log
>cluttered with innocuous header re-write messages, and no way to turn
>that off.
How about "must provide the *option*" and "must be enabled by default"? Or,
leave it as is, but add something like, "may provide the user with the
option of suppressing this output, so that users who cannot fix a broken
application are not forced to bear the pain of its error."
>6. The "write()" callable is important; it should not be deprecated
>or in some other way made a poor stepchild of the iterable.
But it *is* one. The presence of the 'write()' facility significantly
increases the implementation complexity for middleware and server
authors. If it weren't necessary to support existing streaming APIs, it
wouldn't exist.
Earlier drafts treated it as a peer, which led to people making bad
assumptions about its proper use. Making it a "poor stepchild" encourages
people to investigate it only if they really need it, and only a very few
applications actually need it.
>7. If an application returns an iterable after calling write(), are
>the strings produced by iteration written after those written by calls
>to write?
Yes. This is implicit in the way 'write()' and the iterable are defined,
because the server must transmit a block yielded or passed to write()
before returning control to the application. The only way to meet this
constraint is for them to occur in sequence.
However, the language should perhaps be clarified to be explicit about this
point, and to address what happens if code *within* the iterator calls
'write()'. (I don't think it should be allowed to, but I'm open to
arguments either way.)
>8. The note on Unicode: Unfortunately, Web standards like HTTP rely
>on using proper character sets. By *not* using Unicode strings, and
>by *not* specifying the character set encoding of the "raw" byte
>strings, we open the door for disastrous misunderstandings. The
>safest thing to do would be to require the framework to traffic in
>Unicode strings for things like header values, which the WSGI
>middleware would translate to or from the various required encodings
>used by the server and external protocols. At least with Unicode
>strings you know what encoding is being used.
This seems at odds with your previous desire to use RFC 2616, which is
pretty clear that it's ISO-8859-1 or RFC 2047. PEP 333 goes further and
says, it's ASCII, dammit, and use MIME header encodings (RFC 2047) if you
need to do something special, because God help you if you're trying to mess
with non-ASCII in HTTP headers and you don't know how to deal with that stuff.
Granted, that part could be more explicit in the PEP, so I'll work on that. :)
(Maybe not this week; I expect to spend tomorrow putting hurricane panels
on my house, just ahead of Frances' arrival...)
>A riskier, more error-prone option would be to require the byte
>strings to be in particular encodings.
That's actually what's required, it's merely implied by the PEP rather than
explicitly stated. But it's a fully RFC-compliant way to do it.
>The content strings, those written to the "write()" calls, or returned
>by the iterable, should in fact be byte vectors, exactly as they are
>currently specified.
Glad there was something you liked. ;) (j/k)
>9. There should be a non-optional way of indicating the URL scheme,
>whether it is "http", "https", or "ftp". I'd suggest "wsgi.scheme" in
>the environ.
I rather like this, although I don't at all see how FTP gets into
this. What the heck would CGI variables for FTP look like, I
wonder? Anyway, it's handy for "http" and "https" at the very least. I'd
prefer "wsgi.url_scheme" for the name, though, as it's otherwise a somewhat
ambiguous name.
More information about the Web-SIG
mailing list