[Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.

Thu Mar 22 17:03:50 CET 2007

Graham Dumpleton wrote:
> When one is using CGI as a means of implementing a WSGI application,
> although one would return content through the iterable returned from
> the application or by calling write() method returned from
> start_response(), one could actually write to sys.stdout directly as
> well since that is where the WSGI adapter writes it to anyway.
> 
> Obviously this isn't something that should be done but then the WSGI
> PEP doesn't say anything about code not writing to sys.stdout and more
> than likely at some point someone is going to think they can just use
> 'print' to have some debugging statements output where they think they
> will see them. In the case of CGI such output would wrongly end up in
> the response and screw things up.

Apparently I didn't ever fix up sys.stdout in my cgi-related code (I 
don't know if anyone actually uses it either), but I always intended to 
do so.  Particularly because the resulting bugs will be totally weird 
and hard to understand if people do print stuff.

I personally would capture stdout and put everything on stderr.

> To clarify this, a future update to WSGI specification or this
> environment specification people have been talking about, should
> perhaps clarify what behaviour one can expect out of sys.stdin,
> sys.stdout and sys.stderr.
> 
> In the case of sys.stdout, do people see it as being at least good
> practice, if not required by specification, that the WSGI adapter
> should ensure that sys.stdout cannot be written to directly or by
> using 'print' from a WSGI application. Thus, in a CGI adapter it would
> do something like:
> 
>   import sys
> 
>   class dummystdout:
>     def write(self, *args):
>       raise IOError("WSGI prohibits use of sys.stdout.")
>     ....
> 
>   def run_with_cgi(application):
>     ...
> 
>     stdout = sys.stdout
>     sys.stdout = dummystdout()
> 
>     ...
> 
>     def write(data):
>       ...
>       stdout.write(data)
>       stdout.flush()
> 
> In other words, it saves a reference to sys.stdout for its own use and
> then replaces sys.stdout with a dummy file like object that raises an
> exception if written to in any way or flushed.

As an avid use of "print" for debugging, this would bug me.  I would 
prefer just avoiding the CGI case where stdout goes to the client, and 
otherwise saying that the server should try to put stdout output 
someplace where it can be read.  But it could very well be a console, 
not necessarily a log file.  Or the same log file as stderr, or... 
something.

> With sys.stdin, you have a similar issue with CGI whereby you don't
> want a WSGI application reading from it directly. Thus sys.stdin
> should probably also be replaced with a file like object that always
> returns EOF (empty string). Having sys.stdin do anything meaningful in
> a multiple process server system like Apache also doesn't make sense,
> although in the case of Apache it already ensures that stdin returns
> EOF.

Yes, I don't see any real utility to sys.stdin, except potential confusion.

> The tricky one is single process servers (which don't use sys.stdin
> like CGI), as people may want to use interactive debuggers such as
> pdb, although where a single process is actually multithreaded it
> could preclude that to a degree unless you can stop two interactive
> debuggers sessions being triggered at the same time. In Apache even if
> one configures it to use only one child process this will still not
> work. To get Apache to allow you to use pdb you have to run up httpd
> direct with -DONE_PROCESS option.

Well... that's all true.  So I think this can be left up to the server. 
  Any CGI server should protect the user from unintentional bypassing 
the server.  Otherwise using sys.stdin probably implies some intention 
that we don't really need to get in the way of.

> Finally, sys.stderr also presents problems of its own. Although
> wsgi.errors is provided with the request environment, this can't be
> used at global scope within a module when importing and also shouldn't
> be used beyond the life time of the specific request. Thus, there
> isn't a way to log stuff outside of a request and ensure it gets to
> the server log. One could try and mandate use of 'logging' module, but
> this isn't available in old versions of Python. Thus probably easier
> to say that a WSGI adapter should always ensure that sys.stderr is
> redirected to the server log. Only problem with this idea is that you
> can potentially get interleaving of text when multithreading is being
> used. What you need is for sys.stderr to be underlayed with thread
> specific log objects each with its own buffering mechanism that
> ensures that only complete lines of text get sent to the actual log
> file. For log object associated with threads created to service a
> request, easy enough to flush and cleanup such log object at the end
> of the request, but what to do about user created threads as harder to
> know when thread has finished and cleanup as necessary.

I think sys.stderr and sys.stdout are fairly similar.  wsgi.stderr 
*could* be improved over a simple stream (e.g., you could cache stuff 
written to it, and write it in one chunk that is all the errors for the 
request).  But you could also just create some middleware that does 
that, writing to the server logs.

> Yes one could simply ignore the whole issue, but I feel that a good
> quality WSGI adapter/server should address these issues and either
> lock things down as appropriate to protect users from themselves or
> ensure that using them results in a sensible outcome.
> 
> Anyone who appreciates what I am talking here got any opinions of
> their own about these issues?

I guess in practice this hasn't been a problem for me.  In a CGI context 
these things certainly should be resolved because of the overlap.  But 
very few people use a CGI server, so it doesn't seem to come up often.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org
             | Write code, do good | http://topp.openplans.org/careers