[Web-SIG] My experiences implement WSGI on java/j2ee/jython.

Mon Aug 30 04:22:25 CEST 2004

Great to see more implementation.  My thoughts on some of the questions 
(only quoting the relevant portions)...

Alan Kennedy wrote:
> 1. Default values of environment variables when not present.
> ============================================================
> 
> The spec says that compulsory environment variables, for example 
> "CONTENT_LENGTH" or "CONTENT_TYPE", must have a value, i.e. "must be 
> present, but may be an empty string, if there is no more appropriate 
> value for them". I read "empty string" to mean "".
> 
> There are obviously two different choices for how to represent values 
> for headers/env-vars that are not present in the request, i.e. 1. an 
> empty string as described above or 2. as a python None value. It seems 
> more correct to me to use the latter option, None, for when the 
> header/env-var is not available, i.e. the client did not send it. This 
> allows the use of the "" value to indicate (the admittedly rare and 
> malformed case) that the client sent the header name, but did not 
> specify a header value. If WSGI uses the empty string for both cases, 
> then we lose the ability to distinguish between when the header was sent 
> with no value, and when it wasn't sent at all .

Elsewhere in the spec (I forget where) I believe it is very strict that 
all CGI variables (if present) must have non-unicode string values.  So 
None would not be allowed in any CGI variable (only extension 
variables).  I think for all the required variables using the empty 
string should be sufficient to indicate ambiguity.  Applications can't 
depend on there being a good distinction between a missing key and a 
empty string, as different parent containers can go either way, so the 
WSGI gateway might not have any information to work on.

> 2. The SCRIPT_NAME variable.
> ============================
> 
> At first I was a little wary of the SCRIPT_NAME variable, and how I 
> would construct it, until I realised that the beginning of the 
> URL->Callable mapping is outside the scope of WSGI: it is in the control 
> of whichever program/process/container is receiving HTTP requests 
> through sockets from the client, and resolving/dispatching them 
> according to its configuration files: in my case that was a J2EE 
> container, e.g. Tomcat.
> 
> The J2EE call that returns a value equivalent to the CGI SCRIPT_NAME 
> variable is HTTPServletRequest.getServletPath method. It is an 
> interesting note on it which says that "This method will return an empty 
> string ("") if the servlet used to process this request was matched 
> using the "/*" pattern." Which seems a little odd, until you realise 
> that the SCRIPT_NAME = "" case is when the application object is 
> responsible for dealing with the entire URL space. Maybe it's worth 
> adding a note to this effect in the WSGI spec as well? It helped me 
> understand things better.

That makes sense to me.  I don't think SCRIPT_NAME should ever be "/" -- 
usually PATH_INFO should either be the empty string, or start with /, so 
if your application applies to the root domain then PATH_INFO should be 
the entire request URL, and SCRIPT_NAME the empty string.

> An idea occurs to me for a nice little reusable WSGI middleware 
> component which is a URI mapper, with functionality akin to apache 
> mod_rewrite, resolving URIs to python callable's. A lot of frameworks 
> like to do things with URL rewriting and mapping, in order to present a 
> nice clean URL interface to a tree of objects. Quixote is one such 
> framework that likes to have crisp URLs. But much of the time installing 
> such frameworks requires configuring apache and invoking mod_rewrite and 
> its "cool voodoo" to get the job done. Which can be difficult to debug 
> and get working, and scares newbies. (On re-reading the spec, and the 
> mailing list, I see I'm not the only one to have thought of such a uri 
> mapping component :-)

Definitely.  I like the idea that most WSGI servers and middleware 
(except for the URL mappers) would just take a single application, to 
keep the techniques separate.

> 3. Status code and message.
> ===========================
> 
> The WSGI spec states that the status value passed to start_response 
> should be of the form "999 Message here". That's fine, I can parse up 
> the string easily enough to get the java data types I need to send to 
> the container. However, J2EE does not allow me to set the message 
> string: I can only set the status code, and that must have an integer 
> value.

That raises an interesting question.  As far as I know, no client ever 
pays any attention to the message.  It's purely noise, conveying no 
information.  It might make sense, for simplicity, for the status code 
to be an integer, as it apparently is in Java.

> 5A. Python 2.1 vs. python 2.2: iterators and generators.
> ========================================================
> 
> The WSGI spec says that python 2.2 features are required to be 
> compliant. However, it appears to me that the only python 2.2 features 
> in use are iterators and generators, used when the application object 
> returns an iterator. In fact, it's just that the example in the WSGI 
> spec uses a generator (and its corresponding 'yield' keyword): actual 
> applications are not required to use a generator: they can also return 
> an object that implements the iterator protocol. Which means returning 
> an object with a .next() method when the .__iter__() method is called. 
> The iterator.next() method keeps returning values, until the iterator 
> runs out, in which case it raises StopIteration. Like generators, the 
> iterator protocol was also introduced in python 2.2, but they are two 
> separate things.
> 
> However, even though jython is based on python 2.1, and thus doesn't 
> have built-in support for either iterators or generators, I have still 
> implemented the iterator protocol in my java/jython framework, by simply 
> invoking the .__iter__() and .next() methods on application objects, and 
> catching StopIteration exceptions. So I can support components and 
> applications returning iterators, and I'm thus compliant with the spec, 
> even though I'm running on 2.1. (This is only possible because I'm 
> embedding: it is still not possible to support the iterator protocol in, 
> say, jython for-loops)
> 
> Does the spec need to be changed to reflect this iterators/versioning 
> issue? Or to more clearly define the difference between iterators and 
> generators?
> 
> It's conceivable that even a python 1.5 framework could be programmed to 
> support the iterator protocol: it's *very* easy to implement.

That's also an interesting question.  I guess with both Jython and Zope 
2.6 and earlier being Python 2.1, it should be given some consideration.

One question: should the application iterable be a Python 2.2 style 
iterable?  I.e., it is up to Python 2.1 servers to implement the Python 
2.2 iterator protocol themselves?  Or, should the application be 
responsible to return an iterator, appropriate for the Python version?

In Python <2.2 (including 1.5.2) the protocol was that you called 
__getitem__ with ever-increasing integers, until an IndexError was 
raised.  There was no concept of a special __iter__() function.  But I 
guess Python 2.2's iter() builtin could be simulated:

def iter(obj):
     if type(obj) in (types.ListType, types.TupleType):
         return obj
     elif type(obj) is types.FileType:
         return FileIter(obj)
     elif hasattr(obj, '__iter__'):
         return IterWrapper(obj.__iter__())
     else:
         return IterWrapper(obj)

class FileIter:
     def __init__(self, file):
         self.file = file
     def __getitem__(self, index):
         # while this copies Python 2.2, you wouldn't actually have to
         # iterate line by line:
         value = self.file.readline()
         if value == '':
             raise IndexError
         return value

class IterWrapper:
     def __init__(self, obj):
         self.obj = obj
     def __getitem__(self, index):
         # we ignore the index
         try:
             return self.obj.next()
         except StopIteration:
             raise IndexError

Then in Jython you'd do:

for s in iter(obj):
     write(s)

One issue is that StopIteration isn't defined in earlier versions of 
Python.  You may be able to add it to __builtins__.

Obviously none of this means anything if the application uses 
generators, but in many cases that should make it more portable.

I think it might be the right idea to have the server implement this 
kind of backward portability, rather than applications.  But that might 
be something for the spec, if so.

> 5B. A "python.version" WSGI variable?
> =====================================
> 
> Of course, it will be case that some middleware and applications will 
> require to use more advanced and recent (2.2, 2.3, 2.4) language 
> features, such as generators, generator expressions, decorators, etc. 
> But such components and applications will not be usable under jython, 
> which is 2.1. It would be nice for components and applications to have a 
> way of knowing what version of python they are running under. Similarly, 
> there will jython components and applications that require java 
> libraries, and thus won't be usable on cpython of any version.
> 
> Would it be useful to define a WSGI variable "python.version", similar 
> to "wsgi.version", which gives the python version in effect? In most 
> cases under jython, it wouldn't help, because its 2.1 compiler would 
> choke when loading python files with newer python syntax anyway, giving 
> syntax errors. But it might be useful in some circumstances, perhaps for 
> sophisticated dispatchers with the requisite meta-data available to 
> them? I'm not sure on this one. Maybe the values of sys.platform and 
> os.name give enough information to deal with this problem?

sys.version_info has the information you are looking for.

> 7. Redirects.
> =============
> 
> I read some discussion in the lists on how to handle container specific 
> facilities, e.g. Apache/mod_python's ability to internally redirect a 
> request.
> 
> J2EE offers the same capabilities, to internally redirect a request, 
> without sending a response back to the client. It happens in a slightly 
> different way, because you first ask your container for a dispatcher, 
> based on a url, and then call that dispatcher to redirect to the URL. 
> And the client may not see any redirect HTTP responses: it's all 
> internal to the container.
> 
> I see the solution to this redirect platform-dependence problem in the 
> implementation of a platform-independent WSGI middleware component that 
> takes all responsiblity for redirects. This component examines the 
> wsgi.environment present, seeking hints for the optimal way to redirect 
> the request: if mod_python is available, use the mopd_python API call: 
> if modjy is available, use the getDispatcher(uri).redirect() dance, etc. 
> If none of these platform specific techniques are available, it can fall 
> back to sending a 302 or 307 response back to the client, and let the 
> client re-reqeust the new URL.
> 
> If the platform specific techniques are available, their availability 
> will be signalled in wsgi.envvars by the presence of variables such 
> "mod_python.request" or "modjy.servlet_context", etc. So one 
> ultraportable component could do it all (albeit chock full of special 
> cases).
> 
> Problem solved?

I can also imagine in some future version of WSGI (or some standard 
building on it) that we could decide on a standard interface for doing 
internal redirects, available under a standard key.

> 9. Server-detected headers.
> ===========================
> 
> I can see the reason for servers/containers intercepting client headers 
> and translating/augmenting/deleting them. However, do we need a 
> specification of what to do with certained specified headers? As with 
> CGI, should I recognise the "Status: " header or the "Location: " 
> header, and translate it to the relevant status code, or do a redirect, 
> respectively? If I don't do those translations, won't I be breaking 
> reams of python CGI code out there that relies on Apache doing this?

Right now there should be no Status header, and a Location header should 
not imply a redirect, unlike with CGI.  Any CGI responses have to be 
wrapped to comply.  But there's other issues besides this, so they 
already had to be wrapped.

> 10. The "wsgi.errors" environment variable.
> ==========================================
> 
> Under J2EE, setting the "wsgi.input" variable is easy, I just wrap the 
> HttpServletRequest.getInputStream() with an org.python.core.PyFile, and 
> bingo.
> 
> However, the J2EE HttpServletRequest has no corresponding error stream, 
> nor does the corresponding HttpServletResponse paired with each request. 
> The only mechanism I can use to send error output is the "sendError(int, 
> message)" method of HttpServletResponse. Which allows me to send both an 
> integer status code and a textual message, which the J2EE docs say "The 
> server defaults to creating the response to look like an HTML-formatted 
> server error page containing the specified message, setting the content 
> type to "text/html", leaving cookies and other headers unmodified".

Stuff to wsgi.errors isn't supposed to go to the client.  Under Apache 
it would typically end up in the error log.  Under CGI wsgi.errors is 
usually stderr (and CGI script run under Apache that write to stderr 
also end up writing to the error log).  Error logs -- at least the kind 
that WSGI implies -- are fairly free form.  Though I guess a server 
could buffer the output sent to wsgi.errors, put in some delimiters, add 
some request information, and turn it into a nicely formatted log entry.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org