[Web-SIG] Reviewing WSGI open issues, again...

Thu Sep 9 18:01:51 CEST 2004

[Phillip J. Eby]
 > * File-like objects -- I think anything we offer for file-like objects
 > should be optional.  The big question is whether to offer a single,
 > introspection-based extension for all file-like things, or whether to
 > use separate extensions for different sorts of things, like
 > 'wsgi.fd_wrapper' for file descriptors and 'wsgi.nio_wrapper' for Java
 > NIO objects, etc.  Does anybody have any arguments/use cases one way
 > or the other?

Optionality is fine by me.

But I don't understand what reasons there might be to have separate 
class names per platform?

It's always been my understanding that the intention for this capability 
is so that applications can give "hints", to servers that support 
high-performance methods of file transmission, that the resource being 
returned is a candidate for bulk transfer. So, as an application author, 
I'll surely want that hinting process to work on as many servers as 
possible, regardless of the platform.

So, if there is a choice of multiple such hinting processes, and I have 
to look for each one of them at runtime, my code is longer and less 
efficient than it could be, e.g.

def app_object(environ, start_response):
   start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] )
   result = open('humungous.pdf')
   for cname in ['fd','nio','dotnet','stackless','pypy','smalltalk']:
     try:
       return environ['wsgi.%s_wrapper' % cname](result):
     except KeyError:
       pass
   return result

Instead, if a single class is used, the definition of which is different 
per server, then I have only to look at that one class.

def app_object(environ, start_response):
     start_response('200 AuQuay', [ ('content-type', 'x-humungous-pdf') ] )
     result = open('humungous.pdf')
     if environ.has_key('wsgi.file_wrapper'):
         return environ['wsgi.file_wrapper'](result)
     return result

One reason I can see for having multiple classes is if they really 
represent fundamentally different concepts.

For example, there are possibly more types of optimisations available, 
e.g. return a stream of bytes from a shared memory partition, if the 
platform supported DMA access to that shared memory, which would then be 
bulk-transferable, i.e. bypassing the CPU. Since shared memory is a 
concept whose implementation varies subtly between platforms, should we 
be trying to abstract that concept into one class with a single 
interface, whose implementation differs between platforms, or into 
separate classes, one for each platform?

What about an optimised transfer from an RDBMS, say a BLOB stored in a 
database row. Should that be wrapped with a file_wrapper (because it's 
really coming from a file descriptor?), or with a special 
db_blob_wrapper class? Would these db_blob_wrappers differ between 
different database platforms? Because it is quite possible that the 
RDBMS data is also coming through the network subsystem, this bulk 
transfer could potentially be arranged at the network level, conceivably 
on a sophisticated network-card/router/etc, and thus never even reach 
the bus on the serving machine. OK, that's a bit wild and unlikely :-), 
but I'm just trying to foresee as many scenarios for bulk transfers as I 
can, to see if the proposed WSGI model fits.

I suppose it's about recording enough meta-information for the server to 
recognise such optimisable scenarios. So the question has to be asked: 
how portable do we need these optimisations to be between servers. Is 
medusa likely to have its middleware component dedicated to sendfile, 
for example? And twisted have its own, thread-pool based, 
implementation, for example. In which case portability of, say the 
sendfile optimisation, becomes an issue of server configuration, not 
support classes.

Or might it be that we need to facilitate the application at two levels 
in the server? Take the example of shared memory :-

1. In the middleware stack, a component maps a certain URL space into 
the shared memory partition, and returns a specialised wrapper class 
that contains a shared memory reference, i.e. a handle, start/end/len, etc.

2. The application also needs to plug into the server, below the 
middleware stack, so that it can implement the actual bulk transfer from 
the shared memory (assuming that the shared_memory_wrapper wasn't 
obscured by some component below it in the stack). Since shared memory 
support, and probably DMA support, would vary between platform, this is 
where the platform specific element comes in: there would be different 
versions of that "server plug-in"  for different platforms/servers.

Lastly, I should also point out that, with the current jython I/O 
subsystem, the sendfile/transferTo optimisation is not currently 
possible, inside most existing J2EE containers anyway. This is because 
sockets created using the old java.net APIs, do not by default have 
nio.channels associated with them. Most existing J2EE containers, which 
must support blocking servlets by definition, don't bother to handle 
sockets using java.nio, because it's more work, not necessary, and not 
portable to older versions of the platform. So it's not possible to use 
the sockets they create for bulk transfers.

A container could be redesigned to use the java.nio APIs, completely in 
a blocking fashion, if desired. Which still wouldn't be any use in 
existing jython, because jython's current socket modules are entirely 
based on old java.net classes. Which means that jython code couldn't 
access the channel nature of the sockets, even if those sockets 
supported it, without modification of the standard library.

I have a (~60% complete) side-project to develop aysnchronous socket 
support on jython 2.1, by porting the socket, select and (maybe) 
asyncore modules to java.nio. When that is complete (timescale==months, 
v busy), I hope to see experimentation, from myself and others, on 
running python asynchronous models on jython.

Here is what the jython file_wrapper code might look like.

class jython_file_wrapper:

     def __init__(self, wrapped):
         self.wrapped = wrapped

     def sendfile(self, jynio_socket):
         if hasattr(self.wrapped, 'getChannel') :
             self.wrapped.getChannel().transferTo(jynio_socket)
         else:
             self.send_in_chunks_instead(jynio_socket)

Regards,

Alan.