[Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.

Wed Mar 21 11:36:07 CET 2007

When one is using CGI as a means of implementing a WSGI application,
although one would return content through the iterable returned from
the application or by calling write() method returned from
start_response(), one could actually write to sys.stdout directly as
well since that is where the WSGI adapter writes it to anyway.

Obviously this isn't something that should be done but then the WSGI
PEP doesn't say anything about code not writing to sys.stdout and more
than likely at some point someone is going to think they can just use
'print' to have some debugging statements output where they think they
will see them. In the case of CGI such output would wrongly end up in
the response and screw things up.

To clarify this, a future update to WSGI specification or this
environment specification people have been talking about, should
perhaps clarify what behaviour one can expect out of sys.stdin,
sys.stdout and sys.stderr.

In the case of sys.stdout, do people see it as being at least good
practice, if not required by specification, that the WSGI adapter
should ensure that sys.stdout cannot be written to directly or by
using 'print' from a WSGI application. Thus, in a CGI adapter it would
do something like:

  import sys

  class dummystdout:
    def write(self, *args):
      raise IOError("WSGI prohibits use of sys.stdout.")
    ....

  def run_with_cgi(application):
    ...

    stdout = sys.stdout
    sys.stdout = dummystdout()

    ...

    def write(data):
      ...
      stdout.write(data)
      stdout.flush()

In other words, it saves a reference to sys.stdout for its own use and
then replaces sys.stdout with a dummy file like object that raises an
exception if written to in any way or flushed.

Even in Apache where sys.stdout (if flushed) eventually makes its way
to the Apache error log, it seems it would also be a good idea to
disable sys.stdout. The idea here is that if all WSGI adapters ensured
that sys.stdout wasn't usable you would reduce the possibility of
someones code inadvertently using it with one server and have it
seemingly work and then move to CGI and find it screws everything up.
Thus we are sort of protecting people by locking down the environment
a bit so application portability issues are more easily found.

With sys.stdin, you have a similar issue with CGI whereby you don't
want a WSGI application reading from it directly. Thus sys.stdin
should probably also be replaced with a file like object that always
returns EOF (empty string). Having sys.stdin do anything meaningful in
a multiple process server system like Apache also doesn't make sense,
although in the case of Apache it already ensures that stdin returns
EOF.

The tricky one is single process servers (which don't use sys.stdin
like CGI), as people may want to use interactive debuggers such as
pdb, although where a single process is actually multithreaded it
could preclude that to a degree unless you can stop two interactive
debuggers sessions being triggered at the same time. In Apache even if
one configures it to use only one child process this will still not
work. To get Apache to allow you to use pdb you have to run up httpd
direct with -DONE_PROCESS option.

Anyway, it may seem good practice for a WSGI adapter to still prevent
use of sys.stdin unless configured explicitly to allow it and even
then it might only allow it if the server is running in a mode whereby
it would work.

Finally, sys.stderr also presents problems of its own. Although
wsgi.errors is provided with the request environment, this can't be
used at global scope within a module when importing and also shouldn't
be used beyond the life time of the specific request. Thus, there
isn't a way to log stuff outside of a request and ensure it gets to
the server log. One could try and mandate use of 'logging' module, but
this isn't available in old versions of Python. Thus probably easier
to say that a WSGI adapter should always ensure that sys.stderr is
redirected to the server log. Only problem with this idea is that you
can potentially get interleaving of text when multithreading is being
used. What you need is for sys.stderr to be underlayed with thread
specific log objects each with its own buffering mechanism that
ensures that only complete lines of text get sent to the actual log
file. For log object associated with threads created to service a
request, easy enough to flush and cleanup such log object at the end
of the request, but what to do about user created threads as harder to
know when thread has finished and cleanup as necessary.

Yes one could simply ignore the whole issue, but I feel that a good
quality WSGI adapter/server should address these issues and either
lock things down as appropriate to protect users from themselves or
ensure that using them results in a sensible outcome.

Anyone who appreciates what I am talking here got any opinions of
their own about these issues?

Graham