[Web-SIG] Standardized configuration

Sun Jul 24 02:08:03 CEST 2005

On Fri, 2005-07-22 at 17:26 -0500, Ian Bicking wrote:

> >   To do this, we use a ConfigParser-format config file named
> >   'myapplication.conf' that looks like this::
> > 
> >     [application:sample1]
> >     config = sample1.conf
> >     factory = wsgiconfig.tests.sample_components.factory1
> > 
> >     [application:sample2]
> >     config = sample2.conf
> >     factory = wsgiconfig.tests.sample_components.factory2
> > 
> >     [pipeline]
> >     apps = sample1 sample2
> 
> I think it's confusing to call both these applications.  I think 
> "middleware" or "filter" would be better.  I think people understand 
> "filter" far better, so I'm inclined to use that.  So...

The reason I called them applications instead of filters is because all
of them implement the WSGI "application" API (they all implement "a
callable that accepts two parameters, environ and start_response").
Some happen to be gateways/filters/middleware/whatever but at least one
is just an application and does no delegation.  In my example above,
"sample2" is not a filter, it is the end-point application.  "sample1"
is a filter, but it's of course also an application too.

Would you maybe rather make it more explicit that some apps are also
gateways, e.g.:

[application:bleeb]
config = bleeb.conf
factory = bleeb.factory

[filter:blaz]
config = blaz.conf
factory = blaz.factory

?  I don't know that there's any way we could make use of the
distinction between the two types in the configurator other than
disallowing people to place an application "before" a filter in a
pipeline through validation.  Is there something else you had in mind?

> [application:sample2]
> # What is this relative to?  I hate both absolute paths and
> # paths relative to pwd equally...
> config = sample1.conf
> factory = wsgiconfig...

This was from a doctest I wrote so I could rely on relative paths,
sorry.  You're right.  Ummmm... we could probably cause use the
environment as "defaults" to ConfigParser inerpolation and set whatever
we need before the configurator is run:

$ export APP_ROOT=/home/chrism/myapplication
$ ./wsgi-configurator.py myapplication.conf

And in myapplication.conf:

[application:sample1]
config = %(APP_ROOT)s/sample1.conf
factory = myapp.sample1.factory

That would probably be the least-effort and most flexible thing to do
and doesn't mandate any particular directory structure.  Of course, we
could provide a convention for a recommended directory structure, but
this gives us an "out" from being painted in to that in specific cases.

> [pipeline]
> # The app is unique and special...?
> app = sample2
> filters = sample1
> 
> 
> 
> Well, that's just a first refactoring; I'm having other inclinations...

I'm not sure whether this is just a stylistic thing or if there's a
reason you want to treat the endpoint app specially.  By definition, in
my implementation, the endpoint app is just the last app mentioned in
the pipeline.

> > Potential points of contention
> > 
> >  - The WSGI configurator assumes that you are willing to write WSGI
> >    component factories which accept a filename as a config file.  This
> >    factory returns *another* factory (typically a class) that accepts
> >    "the next" application in the pipeline chain and returns a WSGI
> >    application instance.  This pattern is necessary to support
> >    argument currying across a declaratively configured pipeline,
> >    because the WSGI spec doesn't allow for it.  This is more contract
> >    than currently exists in the WSGI specification but it would be
> >    trivial to change existing WSGI components to adapt to this
> >    pattern.  Or we could adopt a pattern/convention that removed one
> >    of the factories, passing both the "next" application and the
> >    config file into a single factory function.  Whatever.  In any
> >    case, in order to do declarative pipeline configuration, some
> >    convention will need to be adopted.  The convention I'm advocating
> >    above seems to already have been for the current crop of middleware
> >    components (using a factory which accepts the application as the
> >    first argument).
> 
> I hate the proliferation of configuration files this implies.  I 
> consider the filters an implementation detail; if they each have 
> partitioned configuration then they become a highly exposed piece of the 
> architecture.
> 
> It's also a lot of management overhead.  Typical middleware takes 0-5 
> configuration parameters.  For instance, paste.profilemiddleware is 
> perfectly usable with no configuration at all, and only has two parameters.

True.  The config file param should be optional.  Apps might use the
environment to configure themselves.

> But this is reasonably easy to resolve -- there's a perfectly good 
> configuration section sitting there, waiting to be used:
> 
>    [filter:profile]
>    factory = paste.profilemiddleware.ProfileMiddleware
>    # Show top 50 functions:
>    limit = 50
> 
> This in no way precludes 'config', which is just a special case of this 
> general configuration.  The only real problem is a possible conflict if 
> we wanted to add new special names to the configuration, i.e., 
> meta-filter-configuration.

I think I'd maybe rather see configuration settings for apps that don't
require much configuration to come in as environment variables (maybe
not necessarily in the "environ" namespace that is implied by the WSGI
callable interface but instead in os.environ).  Envvars are
uncontroversial, so they don't cost us any coding time, PEP time, or
brain cycles.

But if you really do want a bunch of config to happen in the pipeline
deployment file itself (definitely to be able to visually inspect it all
in one place would be nice), maybe there could be one optional section
in the pipeline deployment config file that sets keys and values into
os.environ before creating any application instances:

[environment]
app1.hosed = true
app2.disabled = false

... apps could just look for these keys and values in os.environ within
their factories and configure themselves appropriately.  If you didn't
particularly want this, you could not define the section and just do:

$ app1.hosed=true app2.hosed=false ./wsgi-configurator.py \ 
    myapplication.conf

or run a shell script to export these things before running the
configurator.

> Another option is indirection like:
> 
>    [filter:profile]
>    factory = paste.profilemiddleware.ProfileMiddleware
> 
>    [config:profile]
>    limit = 50
> 
> If we do something like this, the interface for these factories does 
> become larger, as we're passing in objects that are more complex than 
> strings.

Sure.  If this were a democracy, I'd vote to use a single well-known
already-existing namespace (os.environ) as a config namespace for all
apps that don't require their own config files instead of baking the
idea of configuration sections for the apps themselves into the
configurator logic.  But I'd like to hear what others besides you and me
think.

> Another thing this could allow is recursive configuration, like:
> 
> [application:urlmap]
> factory = paste.urlmap.URLMapBuilder
> app1 = blog
> app1.url = /
> app2 = statview
> app2.url = /stats
> app3 = cms
> app3.host = dev.*
> 
> [application:blog]
> factory = leonardo.wsgifactory
> config = myblog.conf
> 
> [application:statview]
> factory = statview
> log_location = /var/logs/apache2
> 
> [application:cms]
> factory = proxy
> location = http://localhost:8080
> map = / /cms.php
> 
> [pipeline]
> app = urlmap
> 
> 
> So URLMapBuilder needs the entire configuration file passed in, along 
> with the name of the section it is building.  It then reads some keys, 
> and builds some named applications, and creates an application that 
> delegates based on patterns.  That's the kind of configuration file I 
> could really use.

Maybe one other (less flexible, but declaratively configurable and
simpler to code) way to do this might be by canonizing the idea of
"decision middleware", allowing one component in an otherwise static
pipeline to decide which is the "next" one by executing a Python
expression which runs in a context that exposes the WSGI environment.

[application:blog]
factory = leonardo.wsgifactory
config = myblog.conf

[application:statview]
factory = statview

[application:cms]
factory = proxy

[decision:urlmapper]
cms = environ['PATH_INFO'].startswith('/cms')
statview = environ['PATH_INFO'].startswith('/statview')
blog = environ['PATH_INFO'].startswith('/blog')

[environment]
statview.log_location = /var/logs/apache2
cms.location = http://localhost:8080
cms.map = / /cms.php

[pipeline]
apps = urlmapper

> Of course, if I really wanted this I could implement:
> 
> [application:configurable]
> factory = paste.configurable_pipeline
> conf = abetterconffile.conf
> 
> But then the configuration file becomes a dummy configuration, and no 
> one else gets to use my fancier middleware with the normal configuration 
> file.

> >  - Pipeline deployment configuration should be used only to configure
> >    essential information about pipeline and individual pipeline
> >    components.  Where complex service data configuration is necessary,
> >    the component which implements a service should provide its own
> >    external configuration mechanism.  For example, if an XSL service
> >    is implemented as a WSGI component, and it needs configuration
> >    knobs of some kind, these knobs should not live within the WSGI
> >    pipeline deployment file.  Instead, each component should have its
> >    own configuration file.  This is the purpose (undemonstrated above)
> >    of allowing an [application] section to specify a config filename.
> 
> The intelligent finding of files is important to me with any references 
> to filenames.  Working directory is, IMHO, fragile and unreliable. 
> Absolute paths are reliable but fragile.

Yup.  

> In some cases module names are a more robust way of location resources, 
> if those modules are self-describing applications.  Mostly because 
> there's a search path.  Several projects encourage this kind of system, 
> though I'm not particularly fond of it because it mixes 
> installation-specific files with code.
> 
> >  - Some people have seem to be arguing that there should be a single
> >    configuration format across all WSGI applications and gateways to
> >    configure everything about those components.  I don't think this is
> >    workable.  I think the only thing that is workable is to recommend
> >    to WSGI component authors that they make their components
> >    configurable using some configuration file or other type of path
> >    (URL, perhaps).  The composition, storage, and format of all other
> >    configuration data for the component should be chosen by the
> >    author.
> 
> While I appreciate the difficulty of agreeing on a configuration format, 
> the way this proposal avoids that is by underpowering the deployment 
> file so that authors are forced to create other configuration files.

I *think* promoting a convention of using environment variables to do
configuration and allowing envvars to be set in the main deployment file
solves this for apps that don't actually need their own config file.

> >  - There were a few mentions of being able to configure/create a WSGI
> >    application at request time by passing name/value string pairs
> >    "through the pipeline" that would ostensibly be used to create a
> >    new application instance (thereby dynamically extending or
> >    modifying the pipeline).  I think it's fine if a particular
> >    component does this, but I'm suggesting that a canonization of the
> >    mechanism used to do this is not necessary and that it's useful to
> >    have the ability to define static pipelines for deployment.
> 
> It does concern me that we allow for dynamic systems.  A dynamic system 
> allows for more levels of abstraction in deployment, meaning more 
> potential for automation.

Yes.  OTOH, when a certain level of dynamicism is reached, it's no
longer possible to configure things declaratively because it becomes a
programming task, and this proposal is (so far) just about being able to
configure things declaratively so I think we need some sort of
compromise.

> I think this can be achieved simply by defining a standard based on the 
> object interface, where the configuration file itself is a reference 
> implementation (that we expect people will usually use).  Semantics from 
> the configuration file will leak through, but it's lot easier to deal 
> with (for example) a system that can only support string configuration 
> values, than a system based on concrete files in a specific format.

Sorry, I can't parse that paragraph.

> >  - If elements in the pipeline depend on "services" (ala
> >    Paste-as-not-a-chain-of-middleware-components), it may be
> >    advantageous to create a "service manager" instead of deploying
> >    each service as middleware.  The "service manager" idea is not a
> >    part of the deployment spec.  The service manager would itself
> >    likely be implemented as a piece of middleware or perhaps just a
> >    library.
> 
> That might be best.  It's also quite possible for the factory to 
> instantiate more middleware.

Which factory?

Thanks,

- C