[Web-SIG] about WSGI adoption

Mon Nov 19 13:06:07 CET 2007

Graham Dumpleton ha scritto:
> [...]
>> I think that the "deployment" must be done by the WSGI gateway/server
>> and not by the application.
>>
>> That is, the "application" should only expose the callable object, and
>> should not "start a server", opening logging and configuration files, or
>> stacking middlewares.
> 
> This would require the WSGI adapter layer to encompass the means of
> loading the script file (as Python module) when required the first
> time. The only thing that really does it that way at present is
> mod_wsgi.
> 

Right.

> Current CGI-WSGI adapters expect the WSGI application entry point to
> effectively be in the same file as the main for the CGI script. Ie.,
>

Ok.

 > [...]
 >
> Anyway, hope this at least half illustrates that it isn't necessarily
> that simple to come up with one concept of having a single WSGI
> application script file which knows nothing about the means in which
> it is launched. In mod_wsgi it has made this as seamless as possible,
> but with other hosting mechanisms such as CGI, FASTCGI and SCGI where
> the WSGI adapter isn't actually embedded within the web server itself,
> but is within the process launched, it is much harder to make it
> transparent to the point where one could just throw a whole lot of
> WSGI application scripts in a directory and have it work.
> 

Not sure here.
As an example, in the trac.fcgi example, the code that run the server 
can be moved to a separate file.

It is true, however, that this make things more complicated, but maybe 
one can write a generic flup server "launcher" script:

flup_run -p 4030 -b 127.0.0.1 -.script=/usr/local/bin/myapp.wsgi \
          --application=application --protocol=fastcgi --daemon \
          --user=x --group=x --log=/var/log/myapp.log

> [...]
 >
>> As an example, WSGI says nothing about what happens when an application
>> module is imported (and the Python application process is created).
> 
> And it can't easily do so as the differences in hosting technology
> make it hard to come up with one system which would work for
> everything. For some ideas put up previously, see thread about Web
> Site Process bus in:
> 
>   http://mail.python.org/pipermail/web-sig/2007-June/thread.html
> 

Thanks for the link.
However a function called at module first import should suffice, for now.

> Some of the things that make it difficult are multi process web
> servers, plus web servers that only load applications on demand and
> not at the start when the processes are started up. 

The server can just execute the function when the module is imported 
(the problem is what should be done when the module script is reloaded 
in the same process).

An application can execute startup code at module level, but a function 
is necessary since the application may need more informations from the 
web server (the log object, as an example).

I don't see any problems with multiprocess web servers.

> Some hosting
> technologies from memory allow a logical application to be stopped and
> started within the context of the same process, whereas others don't.
> So, where as atexit() may be a reasonable of doing shutdown actions
> for some hosting technologies, it isn't for others.
> 

Ok.

>> It can be useful if the gateway can execute an
>>
>>     init_application(enviroment)
>>
>> function, where environment contains the same objects of the request
>> enviroment, excluding the HTTP headers and the input object, and with a
>> separate errors object.
> 
> The closest you can probably get to portable application
> initialisation is for the application itself to track whether it has
> been called before and do something special if it hasn't. Even this is
> tricky because of multithreading issues.
> 
>> Logging is another thing that should be clarified.
>> How should an application do logging?
>>
>> As an example for a WSGI gateway embedded in an existing server (like
>> Apache and Nginx) it can be useful and convenient to keep logging in an
>> unique log file.
>> And if the server logging system uses "log levels", this should be
>> usable by the WSGI application.
> 
> There is always the Python 'logging' module.  Where things get
> interesting with this is how to configure the logging. 

Right, this is exactly the problem.
But there is one more bigger problem: if I want to use the server 
logging (for Apache or Nginx) I have to use a non portable solution.

> In Pylons,
> provided you use 'paster', it will note that the .ini file mentions
> 'loggers' and so will push the config automatically to the 'logging'
> module. Run a Pylons application under mod_wsgi though and this
> doesn't happen so Pylons logging doesn't work. 

This is the reason why I think is it necessary to standardize a 
deployment method.

> Thus need to make the
> magic Pylons call to get it to push the config to the 'logging' module
> manually. Use of log levels is almost impossible. If using CGI your
> only logging mechanism is sys.stderr and that gets logged as ERR in
> Apache. Same for mod_wsgi, and similar for SCGI and FASTCGI I think.
> In mod_python its sys.stderr is broken in that output isn't
> automatically flush. Yes WSGI specification says that error output
> needs to be flushed to ensure it is displayed, but usually isn't done.
> 

Here is an idea.
First of all, the wsgi.errors should have an additional log_level attribute.
It must be an integer, and its value is not specified (but should use 
the log levels value from the standard logging module?).

For mod_wsgi, we can add a log_level directive (instead of using a fixed 
value) and a log_level_map to map server error levels to Python logging 
levels.

As an example (for nginx):
wsgi_log_level       NGX_LOG_INFO;
wsgi_log_level_map   7:20;
wsgi_log_level_order asc;

The second directive maps NGX_LOG_INFO to logging.INFO and the last 
directive is necessary since in nginx the more criticals error have low 
values (but I'm not sure if this information is needed)

A WSGI application now can setup the logging module:
logging.basicConfig(level=wsgi.errors.log_level,
                     format=%(levelname)s %(message)s',
                     stream=wsgi.errors.)

This is not perfect, since a log entry will be something like:
2007/10/29 19:41:09 [info] 29902#0: *1 CRITICAL ops

That is: the log level is duplicated.

There one more problem here: the log object *MUST* be stored in the wsgi 
dictionary, since it is defined on a per request basis.

What happens if an "external" module in the application make use of the 
global logging object?
And what should I do if, as an example, I use SQLAlchemy and want to 
enable logging for the connection pool?

>> The same is valid for application configuration.
> 
> And you will probably never get everyone to agree on that.
> 

Options should be stored in the wsgi dictionary.
For mod_wsgi, options can be set from the server configuration file, 
paste can instead read a config file and copy the values to the wsgi 
environment dictionary:

[aaa]
x = 2          {
y = 5            'aaa.x': '2',
          ===>    'aaa.y': '5',
[bbb]            'bbb.a': '1',
a = 1            'bbb.b': 2
b = 2          }

> The whole thing with WSGI was that it defined as little as possible so
> it left enough room for people to experiment with how to do all the
> other issues. I doubt you will ever seem a single solution, instead,
> you will though see different ways come together into a number of
> different frameworks. (or no frameworks). Overall, that probably isn't
> a bad thing.
> 

This is true, but this has some problems.
Good logging is one of these problems, IMHO.

Regards  Manlio Perillo