ConfigParser shootout, preliminary entry
A few weeks ago, the suggestion was made on Python-Dev that it might be time to consider replacing the ConfigParser module and that we should hold a "shootout" (ie ask for implementations and see what we get). Since then I've been playing around with this... not the parsing part (which so far I have completely ignored) but the programmer interface. There needs to be a well-thought-out data model for the information stored, and the user interface needs to be very easy to use, yet not so "magical" that it becomes difficult to understand. I have put together what I think is probably my best proposal. It is based on a superset of ini config files and java .property files. There is a convenient access mechanism ("config.my_app.some_value") as well as more general approaches ("config.values['my_app.serviceByPort.80']"). I have tried to consider issues like unicode (I permit fairly lenient mixing of unicode and str), and unit testing ("... call config.clear_all() in the tearDown() method of any unittests that use the config module..."). I have even considered carefully what to leave OUT (converting to non-string data types, interpolating values, things like that). I think that I am now at the point where I could really use some input from others. So I'd like to invite people to review my design and send me your suggestions. I'm not expecting this as a *useful* module yet (it doesn't yet parse files!), but it seemed like a good stage at which to ask for feedback. I'm attaching two files, config.py and configTest.py, and they are also available from these urls: http://www.mcherm.com/publish/2004-10-17/config.py http://www.mcherm.com/publish/2004-10-17/configTest.py Thanks in advance for reviewing this. -- Michael Chermside
On Sun, 17 Oct 2004 15:10:44 -0700, Michael Chermside <mcherm@mcherm.com> wrote:
A few weeks ago, the suggestion was made on Python-Dev that it might be time to consider replacing the ConfigParser module and that we should hold a "shootout" (ie ask for implementations and see what we get).
Now that we are in 'shootout mode', let me plug my own solution :-) I've written a generic data strucuture templating package, and one of the samples is a ini-file reader. The syntax is elegant and simple. It allows one to define the expected INI file structure, using a class definition. The class is then able to read itself from the INI file. I've not yet implemented the write code, but it should be a snap. Check this sample: class WebServerIni(IniFile): class server(IniSection): socketPort = TypedAttribute(8080) threadPool = TypedAttribute(10) class staticContent(IniSection): bitmaps = TypedAttribute('c:/work/bitmaps') class session(IniSection): storageType = TypedAttribute('ram') Attributes may be 'generic' (untyped) or 'typed', and are stored in the order they are declared in the class statement. The value provided in the definition above is the 'default value'. Simple attributes may also be provided, but won't be type checked or kept in the original order. To read the ini file, just do this: inifile = WebServerIni() inifile.load(<optional-file-name>) If not provided, it will read the file named '<classname>.ini'. It's a pretty simple and clean interface; it's quite easy to provide default arguments; it's more readable than a dict-based configuration; and it works fine, at least for me :-). -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com
Attributes may be 'generic' (untyped) or 'typed', and are stored in the order they are declared in the class statement. The value provided in the definition above is the 'default value'. Simple attributes may also be provided, but won't be type checked or kept in the original order. To read the ini file, just do this:
I think the syntax looks good, but as per a thread in python-list, you cannot discover the order of class variables by any solution (metaclass or otherwise), due to the fact that they become part of the class dictionary; which is arbitrarily unordered. If ordering is important to a user, one could have an optional __order__ attribute that gives the list of items in-order.
class WebServerIni(IniFile): class server(IniSection): socketPort = TypedAttribute(8080) threadPool = TypedAttribute(10) class staticContent(IniSection): bitmaps = TypedAttribute('c:/work/bitmaps') class session(IniSection): storageType = TypedAttribute('ram')
One nice thing about your solution is that one could pull out docstrings to provide per-section documentation in the INI file, though per-item docstrings (like WebServerIni.session.storageType) would be a bit more difficult. - Josiah
On Oct 18, 2004, at 13:46, Josiah Carlson wrote:
Attributes may be 'generic' (untyped) or 'typed', and are stored in the order they are declared in the class statement. The value provided in the definition above is the 'default value'. Simple attributes may also be provided, but won't be type checked or kept in the original order. To read the ini file, just do this:
I think the syntax looks good, but as per a thread in python-list, you cannot discover the order of class variables by any solution (metaclass or otherwise), due to the fact that they become part of the class dictionary; which is arbitrarily unordered.
If ordering is important to a user, one could have an optional __order__ attribute that gives the list of items in-order.
That's not quite true. TypedAttribute instances and iniSection's __new__ could have serial numbers. -bob
I think the syntax looks good, but as per a thread in python-list, you cannot discover the order of class variables by any solution (metaclass or otherwise), due to the fact that they become part of the class dictionary; which is arbitrarily unordered.
If ordering is important to a user, one could have an optional __order__ attribute that gives the list of items in-order.
That's not quite true. TypedAttribute instances and iniSection's __new__ could have serial numbers.
I'm not saying that they can't be numbered, I'm saying that one cannot discover the ordering of assignment of attr1 and attr2 in the following: class foo: attr1 = value1 attr2 = value2 If there is a mechanism for discovering the original ordering of those assignments, there are a group of users in c.l.py who would like to know, and Carlos' seemingly non-existant implementation could also use it. Please advise, - Josiah
On Oct 18, 2004, at 15:20, Josiah Carlson wrote:
I think the syntax looks good, but as per a thread in python-list, you cannot discover the order of class variables by any solution (metaclass or otherwise), due to the fact that they become part of the class dictionary; which is arbitrarily unordered.
If ordering is important to a user, one could have an optional __order__ attribute that gives the list of items in-order.
That's not quite true. TypedAttribute instances and iniSection's __new__ could have serial numbers.
I'm not saying that they can't be numbered, I'm saying that one cannot discover the ordering of assignment of attr1 and attr2 in the following:
class foo: attr1 = value1 attr2 = value2
If there is a mechanism for discovering the original ordering of those assignments, there are a group of users in c.l.py who would like to know, and Carlos' seemingly non-existant implementation could also use it.
That is true, but you do know that the expression value1 is evaluated before the expression value2, so it is possible to sort later for clever enough choices of value1 and value2. Since his proposed syntax invokes something for each attribute, then this trick can certainly be used.. here's a small demonstration: from itertools import count class Value(object): def __init__(self, value, serial=count()): self.serial = serial.next() self.value = value class Container(object): class __metaclass__(type): def __new__(cls, name, bases, dct): sorted = filter(lambda (k,v): isinstance(v, Value), dct.iteritems()) sorted.sort(lambda (ka,va),(kb, vb): cmp(va.serial, vb.serial)) dct['sorted'] = sorted return type.__new__(cls, name, bases, dct) class MyContainer(Container): p = Value(2) y = Value(0) t = Value(-1) h = Value(20) o = Value('x') n = Value('z') print ''.join([k for k,v in MyContainer.sorted]) python -bob
On Mon, 18 Oct 2004 15:52:03 -0400, Bob Ippolito <bob@redivi.com> wrote:
On Oct 18, 2004, at 15:20, Josiah Carlson wrote:
[snip]
I'm not saying that they can't be numbered, I'm saying that one cannot discover the ordering of assignment of attr1 and attr2 in the following:
class foo: attr1 = value1 attr2 = value2
If there is a mechanism for discovering the original ordering of those assignments, there are a group of users in c.l.py who would like to know, and Carlos' seemingly non-existant implementation could also use it.
That is true, but you do know that the expression value1 is evaluated before the expression value2, so it is possible to sort later for clever enough choices of value1 and value2. Since his proposed syntax invokes something for each attribute, then this trick can certainly be used.. here's a small demonstration:
from itertools import count
class Value(object): def __init__(self, value, serial=count()): self.serial = serial.next() self.value = value
class Container(object): class __metaclass__(type): def __new__(cls, name, bases, dct): sorted = filter(lambda (k,v): isinstance(v, Value), dct.iteritems()) sorted.sort(lambda (ka,va),(kb, vb): cmp(va.serial, vb.serial)) dct['sorted'] = sorted return type.__new__(cls, name, bases, dct)
class MyContainer(Container): p = Value(2) y = Value(0) t = Value(-1) h = Value(20) o = Value('x') n = Value('z')
This breaks down for the perfectly reasonable case of: hostname = Value("foo") class First(Container): port = Value(64) hostname = hostname username = Value("zoop") class Second(Container): username = Value("pooz") hostname = hostname port = Value(24) ie, it breaks down as soon as you try to re-use anything, which is quite surprising to the unsuspecting user, and pretty unfortunate even once you do understand why. Jp
This breaks down for the perfectly reasonable case of:
hostname = Value("foo")
class First(Container): port = Value(64) hostname = hostname username = Value("zoop")
class Second(Container): username = Value("pooz") hostname = hostname port = Value(24)
ie, it breaks down as soon as you try to re-use anything, which is quite surprising to the unsuspecting user, and pretty unfortunate even once you do understand why.
So you use... hostname = "foo" class First(Container): port = Value(36) hostname = Value(hostname) username = Value("zoop") ... As long as the documentation describes the proper semantics if one cares about orderings (create new TypedAttrribute instance for every value), when anyone asks, a quick "read the documentation page" would be sufficient. One could also explain why the hoops are to be jumped through in the first place (dict non-ordering during class instantiation). - Josiah
On Oct 18, 2004, at 17:39, <exarkun@divmod.com> wrote:
On Mon, 18 Oct 2004 15:52:03 -0400, Bob Ippolito <bob@redivi.com> wrote:
On Oct 18, 2004, at 15:20, Josiah Carlson wrote:
[snip]
I'm not saying that they can't be numbered, I'm saying that one cannot discover the ordering of assignment of attr1 and attr2 in the following:
class foo: attr1 = value1 attr2 = value2
If there is a mechanism for discovering the original ordering of those assignments, there are a group of users in c.l.py who would like to know, and Carlos' seemingly non-existant implementation could also use it.
That is true, but you do know that the expression value1 is evaluated before the expression value2, so it is possible to sort later for clever enough choices of value1 and value2. Since his proposed syntax invokes something for each attribute, then this trick can certainly be used.. here's a small demonstration:
from itertools import count
class Value(object): def __init__(self, value, serial=count()): self.serial = serial.next() self.value = value
class Container(object): class __metaclass__(type): def __new__(cls, name, bases, dct): sorted = filter(lambda (k,v): isinstance(v, Value), dct.iteritems()) sorted.sort(lambda (ka,va),(kb, vb): cmp(va.serial, vb.serial)) dct['sorted'] = sorted return type.__new__(cls, name, bases, dct)
class MyContainer(Container): p = Value(2) y = Value(0) t = Value(-1) h = Value(20) o = Value('x') n = Value('z')
This breaks down for the perfectly reasonable case of:
hostname = Value("foo")
class First(Container): port = Value(64) hostname = hostname username = Value("zoop")
class Second(Container): username = Value("pooz") hostname = hostname port = Value(24)
ie, it breaks down as soon as you try to re-use anything, which is quite surprising to the unsuspecting user, and pretty unfortunate even once you do understand why.
Yes, it breaks down in the general case if you do things like that. The answer is just to not do things like that. You can make Value instances raise an exception the second time they're attached to a class and tell the user to subclass Value instead if they want to provide some default values. -bob
On Monday 18 October 2004 05:39 pm, exarkun@divmod.com wrote:
This breaks down for the perfectly reasonable case of: ... ie, it breaks down as soon as you try to re-use anything, which is quite surprising to the unsuspecting user, and pretty unfortunate even once you do understand why.
This approach is used to advantage for Zope 3 schema. If you don't want to confuse the ordering, you don't re-use field objects (the things that go in schema). If you don't care about the ordering (because you're going to control ordering some other way), you don't need to worry about it. This is still Python; use what makes sense for your application, but know what you're doing. The "consenting adults" playground expects consenting adults to educate themselves. Good documentation comes in handy with frameworks such as these. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
Let me suggest two variations that I have used successfully in my day job (which is also my night job :). 1. For parsing .ini files, I wrote a wrapper around ConfigParser. The Python-level API looks like this (anonymized and idealized): from XXX import MyConfigWrapper, optstr, optint, optbool, optfloat class Config(MyConfigWrapper): poll_time = optfloat("network-parameters", "poll-time") use_ssl = optbool("network-parameters", "use-ssl") window_title = optstr("ui-parameters", "window-title") # etc. This allows the Python names for variables to differ from the names used in the .ini file, and abstracts away the section names completely from the Python API. This makes it possible to rename variables and sections in the config file without having to touch the Python code that uses them. This will save my butt when our marketing team comes up with the *real* name for our product, since currently the section names all have the provisional product name in it, and the config file is considered "user-facing" so references to the old product name have to be expunged. Also, I don't see much point in having to use longer references in my Python code -- the total number of config parameters and their uses are such that I can easily come up with a single unique name for every option, and yet in the .ini file I'd like to have more than one section. Note that optstr etc. construct full properties that allow me to set and delete the parameter values as well, and then ConfigParser can be told to write back the modified .ini file. This loses the ordering and comment, but I don't care (although I wish ConfigParser would order things alphabetically rather than per dictionary hash). I don't think I have to explain that optint returns a Python int, etc. 2. We're handling modest amounts of XML, all using home-grown DTDs and with no specific requirements to interface to other apps or XML tools. I wrote a metaclass which lets me specify the DTD using Python syntax. Again, my approach is slightly lower-level than previous proposals here but has the advantage of letting you be explicit about the mapping between Python and XML names, both for attributes and for subelements. The metaclass handles reading and writing. It supports elements containing text (is that CDATA? I never know) or sub-elements, but not both. For sub-elements, it supports cases where one element has any number of sub-elements of a certain type, which are then collected in a list, so you can refer to them using Python sequence indexing/slicing notation. It also supports elements that have zero or one sub-element of a certain type; absence is indicated by setting the corresponding attribute to None. I don't support namespaces, although I expect it would be easy enough to add them. I don't support unrecognized elements or attributes: while everything can be omitted (and defaults to None), unrecognized attributes or elements are always rejected. (I suppose that could be fixed too if desired.) Here's an example: from XXX import ElementMetaClass, String, Integer, Float, Boolean, Date class Inner(ElementMetaClass): "Definition for <inner>" __element__ = "inner" __attributes__ = [("count", Integer), ("name", String), ("expiration-date", Date), ("date-created", Date), ("special", Boolean)] __characters__ = "text" # CDATA (?) is stored as self.text class Outer(ElementMetaClass): "Definition for <outer>" __element__ = "outer" __attributes__ = [("name", String)] __children__ = [("innerElements[]", Inner)] Note that for attributes, the name given is used both as the Python name and as the XML name, except that hyphens in XML are translated into underscores in Python, and vice versa. For sub-elements, the __element__ attribute of the class determines the element name, and the name given in the list of __children__ determines the Python name; if this ends in "[]" it is a repeatable element. I'm undecided on whether I like the approach with lists of (name, type) tuples better than the approach with property factories like in the first example; the list approach allows me to order the attributes and sub-elements consistently upon rendering, but I'm not particularly keen on typing string quotes around Python identifiers. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, 19 Oct 2004 08:00:23 -0700, Guido van Rossum <gvanrossum@gmail.com> wrote:
Let me suggest two variations that I have used successfully in my day job (which is also my night job :).
1. For parsing .ini files, I wrote a wrapper around ConfigParser. The Python-level API looks like this (anonymized and idealized):
from XXX import MyConfigWrapper, optstr, optint, optbool, optfloat
class Config(MyConfigWrapper): poll_time = optfloat("network-parameters", "poll-time") use_ssl = optbool("network-parameters", "use-ssl") window_title = optstr("ui-parameters", "window-title") # etc.
The opt* property builders could be substituted by a single opt which took the default value as an argument: opt(section_name, key_name, default_value) The default_value type would be used for conversion on reading & writing: class Config(MyConfigWrapper): poll_time = opt("network-parameters", "poll-time", 1.0) use_ssl = opt("network-parameters", "use-ssl", False) window_title = opt("ui-parameters", "window-title", '') It solves the type cheking and the default value requirements with a single parameter. It's also extensible -- any other type that knows how to construct itself out of a string could be used as an option type.
This allows the Python names for variables to differ from the names used in the .ini file, and abstracts away the section names completely from the Python API. This makes it possible to rename variables and sections in the config file without having to touch the Python code that uses them. This will save my butt when our marketing team comes up with the *real* name for our product, since currently the section names all have the provisional product name in it, and the config file is considered "user-facing" so references to the old product name have to be expunged. Also, I don't see much point in having to use longer references in my Python code -- the total number of config parameters and their uses are such that I can easily come up with a single unique name for every option, and yet in the .ini file I'd like to have more than one section.
It works well as far as the option names are unique. In some cases, there are sections with a similar internal structure, and in this case the use of nested structures to represent sections is better. I had one such situation in a communications program of mine a few years ago: each interface (com1:, com2:, etc) had its own section in the config file, with the same parameters being stated for each section.
Note that optstr etc. construct full properties that allow me to set and delete the parameter values as well, and then ConfigParser can be told to write back the modified .ini file. This loses the ordering and comment, but I don't care (although I wish ConfigParser would order things alphabetically rather than per dictionary hash). I don't think I have to explain that optint returns a Python int, etc.
One of the discussion we've had was about what ordering was desired: arbitrary (hash ordering), alphabethical, or declaration ordering. Alphabethical is better than arbitrary, and current implementations can solve this by sorting the key names before writing the data. I personally like to have my config files ordered as per the original declaration ordering. For example: a webserver most relevant parameter for a quick-and-dirty setup is the port number; if stored in alphabethical ordering, it may be hidden down the list. The use of the original source ordering allows for the developer to write the parameters in the order that makes more sense (from a usability standpoint) for the users that may need to read or edit them, without the need to provide a explicity ordering list. The problem with the list is that it's easy to forget to update it as one adds new parameters, specially if this is done using inheritance. -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com
The opt* property builders could be substituted by a single opt which took the default value as an argument:
opt(section_name, key_name, default_value)
What if I want the default to be None? Sometimes absence of a value is useful info. If you want a single factory function (not sure why) then maybe the type should be passed in (perhaps optional, only if no default is given).
It works well as far as the option names are unique. In some cases, there are sections with a similar internal structure, and in this case the use of nested structures to represent sections is better.
"Better" is subjective. With my approach, you can give them different prefixes in the Python API independent from their names as seen by the user, and you only have to do this when there are actual conflicts (which are rare except when certain styles are used). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, 20 Oct 2004 08:25:34 -0700, Guido van Rossum <gvanrossum@gmail.com> wrote:
The opt* property builders could be substituted by a single opt which took the default value as an argument:
opt(section_name, key_name, default_value)
What if I want the default to be None? Sometimes absence of a value is useful info. If you want a single factory function (not sure why) then maybe the type should be passed in (perhaps optional, only if no default is given).
I like the optional type idea: opt(section_name, key_name, default_value, type) So it can be written this way: opt('section', 'key', None, StringType) ...but -- to answer your question -- the point here isn't really the 'singleness' of the factory function, but the fact that it is type-independent, which (in principle) would allow it to be extended to handle arbitrary types by delegating some functionality to the types themselves. IMHO, it's just a different way to design it: instead of writing a specialized property handler for each supported type, use a standard interface provided by types themselves to handle the conversion to & from the string representation that is used in the config file. Unfortunately, this generalization is not as practical as I thought it would be. In my implementation, I used standard Python type constructors to convert string values to the desired object type. It works for str, int and float [see note #1], which solved nearly all cases in my own experience. However, this trick doesn't work for bool(), which is an important type. To solve it, I had to write my own adapt() method, which just handles str, int and float using the builtin types, and treats bool() as a particular case. User defined types have to expose special conversion methods (as they should anyway). As it is, it's a general solution, but it ended up being more confusing and convoluted that I would like it to be.
It works well as far as the option names are unique. In some cases, there are sections with a similar internal structure, and in this case the use of nested structures to represent sections is better.
"Better" is subjective. With my approach, you can give them different prefixes in the Python API independent from their names as seen by the user, and you only have to do this when there are actual conflicts (which are rare except when certain styles are used).
Agreed. It depends a lot on the particular application, and to some extent, on personal taste. My own bias come from some of the applications that I have written, where I feared that the use of prefixes would end up cluttering the namespace. Besides the above mentioned communications software example (where I had several communication ports, each one stored into a section), another real scenario is the configuration of user interface parameters for several UI elements -- for example, font selection and background colors for specific forms. The same parameters ('font', 'font-color', 'background-color', etc.) are repeated for several sections. But surely, my own experience is limited, and I can't speak for other people on this point. -------- [1]As an apart: I was *really* surprised when I first saw that this code worked:
IntType('12') 12 FloatType('-1e+12') -1000000000000.0
The following test also worked, but I expected it to convert the value using an octal representation, but it didn't:
IntType('012') 12
And finally, the bool test:
BooleanType('False') True
Which obviously don't work because the 'False' string converts to True, as any other non-empty string do :-P. But it seemed a little bit weird when a I first saw it, specially when compared to FloatType and IntType. curiousness-killed-the-cat'ly-yours, -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com
I like the optional type idea:
opt(section_name, key_name, default_value, type)
So it can be written this way:
opt('section', 'key', None, StringType)
...but -- to answer your question -- the point here isn't really the 'singleness' of the factory function, but the fact that it is type-independent, which (in principle) would allow it to be extended to handle arbitrary types by delegating some functionality to the types themselves.
This is all a nice generalization, but I don't see that much use for it. There's only a handful of types that are worth supporting here. So the cost of the generalization isn't worth the benefits.
IMHO, it's just a different way to design it: instead of writing a specialized property handler for each supported type, use a standard interface provided by types themselves to handle the conversion to & from the string representation that is used in the config file. Unfortunately, this generalization is not as practical as I thought it would be.
Right, it pretty much only works for int, str, float, and for custome types that were designed with this usage in mind. So again I'm not sure that the generality is worth having.
Agreed. It depends a lot on the particular application, and to some extent, on personal taste. My own bias come from some of the applications that I have written, where I feared that the use of prefixes would end up cluttering the namespace.
But your "solution" is to introduce a mandatory extra namespace, which adds just as much typing/clutter to the code for your example (x.foo.bar instead of x.foo_bar) while forcing others who don't need it to use the extra namespace.
Besides the above mentioned communications software example (where I had several communication ports, each one stored into a section), another real scenario is the configuration of user interface parameters for several UI elements -- for example, font selection and background colors for specific forms. The same parameters ('font', 'font-color', 'background-color', etc.) are repeated for several sections. But surely, my own experience is limited, and I can't speak for other people on this point.
Most likely, you're hinting at a different use case, where there is an *arbitrary* set of sections, and you can't fix their number or names beforehand. In that case, using namespaces of the x.foo.bar form typically doesn't work well, because in your Python code, you can't hardcode foo -- rather, you have a variable whose contents loops over the different values of foo, and with this approach you'd end up doing getattr(x, sectionname).bar. In that case, I'd much rather have an alternative API where I can write x[sectionnname].bar, x.foo.bar now being equivalent to x["foo"].bar (note string quotes). As long as your section names aren't too weird (__getitem__ would be a bad section name :-) this should work well.
-------- [1]As an apart: I was *really* surprised when I first saw that this code worked:
IntType('12') 12 FloatType('-1e+12') -1000000000000.0
Normally we write that as int('12') and float('-1e+12'). It was a new feature in Python 2.2; before that int wasn't the same thing as IntType.
The following test also worked, but I expected it to convert the value using an octal representation, but it didn't:
IntType('012') 12
That's intentional; the int() constructor is often used to parse input from non-programming humans, who might cut and paste a number with leading zeros and not expect the zeros to turn the whole thing into octal. If you want this behavior, use int('012', 0).
And finally, the bool test:
BooleanType('False') True
Which obviously don't work because the 'False' string converts to True, as any other non-empty string do :-P. But it seemed a little bit weird when a I first saw it, specially when compared to FloatType and IntType.
The saying "a foolish consistency is the hobgoblin of little minds" applies to this kind of hypergeneralization.
curiousness-killed-the-cat'ly-yours,
Fortunately cats have nine lives. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I like the optional type idea:
opt(section_name, key_name, default_value, type)
So it can be written this way:
opt('section', 'key', None, StringType)
...but -- to answer your question -- the point here isn't really the 'singleness' of the factory function, but the fact that it is type-independent, which (in principle) would allow it to be extended to handle arbitrary types by delegating some functionality to the types themselves.
This is all a nice generalization, but I don't see that much use for it. There's only a handful of types that are worth supporting here. So the cost of the generalization isn't worth the benefits.
I definitely disagree. A common case is a constrained type, where only a limited number of strings are allowed. Or an IP address, or domain name, or an internationalized boolean converter ("si" -> True), or a color specification, or a valid CSS class name, or... well, the list goes on forever. The advantage of putting this in the parser is that you could have better error messages when the values were malformed. If the parser doesn't do the conversion, you are likely to have lost the location information by the time you try to do the conversion. Good error messages are one of the primary visible features for people who use the configuration files. An additional complication, though; if you plan to make the configuration file writable, these types shouldn't just support converting from a string to a Python type, but the other direction -- so that ambiguous Python types (like a boolean; easily confused as an integer) can be converted in specific ways to a configuration string. I don't think __repr__ or __str__ of the value to be converted are necessarily appropriate. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
...but -- to answer your question -- the point here isn't really the 'singleness' of the factory function, but the fact that it is type-independent, which (in principle) would allow it to be extended to handle arbitrary types by delegating some functionality to the types themselves.
This is all a nice generalization, but I don't see that much use for it. There's only a handful of types that are worth supporting here. So the cost of the generalization isn't worth the benefits.
I definitely disagree. A common case is a constrained type, where only a limited number of strings are allowed. Or an IP address, or domain name, or an internationalized boolean converter ("si" -> True), or a color specification, or a valid CSS class name, or... well, the list goes on forever.
The advantage of putting this in the parser is that you could have better error messages when the values were malformed. If the parser doesn't do the conversion, you are likely to have lost the location information by the time you try to do the conversion. Good error messages are one of the primary visible features for people who use the configuration files.
Sure, I agree with all of that. But my original (optint, optstr, optbool, optfloat) proposal can easily be extended the same way; in fact it is in some sense easier than an API that expects a type object. (Unless you have an adaptation framework in place; until we have a general one, inventing one just for this purpose definitely feels like overkill.)
An additional complication, though; if you plan to make the configuration file writable, these types shouldn't just support converting from a string to a Python type, but the other direction -- so that ambiguous Python types (like a boolean; easily confused as an integer) can be converted in specific ways to a configuration string. I don't think __repr__ or __str__ of the value to be converted are necessarily appropriate.
Actually, repr() or str() probably *is* the right answer for this, even if calling the constructor with a string argument isn't the answer for parsing and validation. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
...but -- to answer your question -- the point here isn't really the 'singleness' of the factory function, but the fact that it is type-independent, which (in principle) would allow it to be extended to handle arbitrary types by delegating some functionality to the types themselves.
This is all a nice generalization, but I don't see that much use for it. There's only a handful of types that are worth supporting here. So the cost of the generalization isn't worth the benefits.
I definitely disagree. A common case is a constrained type, where only a limited number of strings are allowed. Or an IP address, or domain name, or an internationalized boolean converter ("si" -> True), or a color specification, or a valid CSS class name, or... well, the list goes on forever.
The advantage of putting this in the parser is that you could have better error messages when the values were malformed. If the parser doesn't do the conversion, you are likely to have lost the location information by the time you try to do the conversion. Good error messages are one of the primary visible features for people who use the configuration files.
Sure, I agree with all of that. But my original (optint, optstr, optbool, optfloat) proposal can easily be extended the same way; in fact it is in some sense easier than an API that expects a type object. (Unless you have an adaptation framework in place; until we have a general one, inventing one just for this purpose definitely feels like overkill.
OK. I guess you could subclass opt* to get a new type; I wasn't thinking of that. I shy away from subclassing, but it might be appropriate here. It makes it easier to hang different parameters onto the type as well, like not_empty (for strings), max, min, etc. It would even be easier to hang serialization onto it. I don't think adaptation fits well here, since adaptation seems to generally be context-insensitive, and this conversion process is done in the context of a specific type declaration.
An additional complication, though; if you plan to make the configuration file writable, these types shouldn't just support converting from a string to a Python type, but the other direction -- so that ambiguous Python types (like a boolean; easily confused as an integer) can be converted in specific ways to a configuration string. I don't think __repr__ or __str__ of the value to be converted are necessarily appropriate.
Actually, repr() or str() probably *is* the right answer for this, even if calling the constructor with a string argument isn't the answer for parsing and validation.
In my experience, this stops working as the types become more complex. For instance, consider a converter that takes a string that has comma-separated names and creates a list of strings; there is a specific way to convert back to that representation (','.join(value)), and both repr() and str() will be incorrect. Potentially you could create a list subclass that has the right repr(), but that seems prone to error. repr() only gives an estimated Python representation of an object -- it is neither reliable (since obj == eval(repr(obj)) isn't true for a large number of objects), nor is it appropriate, since we're trying to generate configuration expressions that are tightly bound to a context, not Python expressions. In the case of generating a config file, if the conversion isn't reliable or is ambiguous it should be an error (which doesn't happen with repr() or str()). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
On Thu, 21 Oct 2004 12:26:24 -0500, Ian Bicking <ianb@colorstudy.com> wrote:
Sure, I agree with all of that. But my original (optint, optstr, optbool, optfloat) proposal can easily be extended the same way; in fact it is in some sense easier than an API that expects a type object. (Unless you have an adaptation framework in place; until we have a general one, inventing one just for this purpose definitely feels like overkill.
OK. I guess you could subclass opt* to get a new type; I wasn't thinking of that. I shy away from subclassing, but it might be appropriate here. It makes it easier to hang different parameters onto the type as well, like not_empty (for strings), max, min, etc. It would even be easier to hang serialization onto it.
It may be a matter of style; I also tend to shy away from subclassing, because often it leads to a 'parallel' class hierarchy to handle things that could be represented by different interfaces, adaptations or aspects of the original classes. In this case: IntType -> optint IPAdddressType -> optipaddress CustomBooleanType -> optcustomboolean Of course, over-generalization in this case can lead to overly complex classes that try to do too much stuff for every possible situation. But simple interfaces -- as the ones required for configuration support -- don't add that much complexity to the native object anyway. In fact, conversion to & from strings is a so useful extension that I tend to provide it for many of my classes, albeit not with a true standard interface. (In the examples above, there is another caveat -- some types, as BooleanType, can't be subclassed, and this makes things more difficult to generalize under my proposal)
Actually, repr() or str() probably *is* the right answer for this, even if calling the constructor with a string argument isn't the answer for parsing and validation.
In my experience, this stops working as the types become more complex. For instance, consider a converter that takes a string that has comma-separated names and creates a list of strings; there is a specific way to convert back to that representation (','.join(value)), and both repr() and str() will be incorrect.
Potentially you could create a list subclass that has the right repr(), but that seems prone to error. repr() only gives an estimated Python representation of an object -- it is neither reliable (since obj == eval(repr(obj)) isn't true for a large number of objects), nor is it appropriate, since we're trying to generate configuration expressions that are tightly bound to a context, not Python expressions. In the case of generating a config file, if the conversion isn't reliable or is ambiguous it should be an error (which doesn't happen with repr() or str()).
This is one of the situations where 'practicality beats purity', IMHO, because the standard Python prompt returns the repr() of the object. If all objects _always_ returned full reversible representations to repr() -- which should be possible, given enough care -- then the command prompt would be unusable for complex objects (just calling a method that return a complex object would cause a lot of stuff to be printed out). Perhaps the standard prompt could be changed to return str() instead of repr(), so repr could _really_ be used for the generic representation case... but it's probably too late to change it. -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com
On Tue, Oct 19, 2004 at 08:00:23AM -0700, Guido van Rossum wrote:
Let me suggest two variations that I have used successfully in my day job (which is also my night job :).
1. For parsing .ini files, I wrote a wrapper around ConfigParser. The Python-level API looks like this (anonymized and idealized):
from XXX import MyConfigWrapper, optstr, optint, optbool, optfloat
class Config(MyConfigWrapper): poll_time = optfloat("network-parameters", "poll-time") use_ssl = optbool("network-parameters", "use-ssl") window_title = optstr("ui-parameters", "window-title") # etc.
I'm surprised no one has mentioned optparse yet. It already has all the features you use in this example. Maybe a similar API for configuration file parsing would be nice, if only for the sake of consistency: parser = ConfigParser() parser.add_option("network-parameters", "poll-time", type="float", dest="poll_time") parser.add_option("network-parameters", "use-ssl", type="bool", dest="use_ssl") parser.add_option("ui-parameters", "window-title", type="float", dest="window_title") options = parser.parse_file('foo.conf') print options.window_title print options.use_ssl # etc. Bonus points if the implementation allows me to specify a command-line option and configuration file option with one call, as in docutils [1]. More bonus points for reusing optparse code. Cheers, Johannes [1] http://cvs.sourceforge.net/viewcvs.py/docutils/docutils/docutils/frontend.py?rev=HEAD&view=auto
On Thu, 2004-10-21 at 16:47, Johannes Gijsbers wrote:
I'm surprised no one has mentioned optparse yet. It already has all the features you use in this example. Maybe a similar API for configuration file parsing would be nice, if only for the sake of consistency:
parser = ConfigParser() parser.add_option("network-parameters", "poll-time", type="float", dest="poll_time") parser.add_option("network-parameters", "use-ssl", type="bool", dest="use_ssl") parser.add_option("ui-parameters", "window-title", type="float", dest="window_title") options = parser.parse_file('foo.conf')
print options.window_title print options.use_ssl # etc.
Bonus points if the implementation allows me to specify a command-line option and configuration file option with one call, as in docutils [1]. More bonus points for reusing optparse code.
This is a very intriguing idea. -Barry
On Tue, 2004-10-19 at 11:00, Guido van Rossum wrote:
2. We're handling modest amounts of XML, all using home-grown DTDs and with no specific requirements to interface to other apps or XML tools. I wrote a metaclass which lets me specify the DTD using Python syntax.
Sounds like my recent situation. I've done enough custom XML-ing lately that I've been thinking alone similar lines as you. Note that most of what I've written lately uses minidom, although I do have one particular application that uses sax. Both are powerful enough to do the job, but neither are that intuitive, IMO.
Again, my approach is slightly lower-level than previous proposals here but has the advantage of letting you be explicit about the mapping between Python and XML names, both for attributes and for subelements. The metaclass handles reading and writing. It supports elements containing text (is that CDATA? I never know)
I'm no XML guru, but I think they're different. In the one case you have something like: <node>text for the node</node> and in the other you have: <node><![CDATA[cdata, er, um, data]]></node> The differences being that the CDATA stuff shows up in a subnode of <node> and has less restriction on what data can be included within the delimiters. My applications use both.
or sub-elements, but not both. For sub-elements, it supports cases where one element has any number of sub-elements of a certain type, which are then collected in a list, so you can refer to them using Python sequence indexing/slicing notation. It also supports elements that have zero or one sub-element of a certain type; absence is indicated by setting the corresponding attribute to None. I don't support namespaces, although I expect it would be easy enough to add them. I don't support unrecognized elements or attributes: while everything can be omitted (and defaults to None), unrecognized attributes or elements are always rejected. (I suppose that could be fixed too if desired.)
I have use cases for both behaviors. OT1H, I generally want to reject unknown elements or attributes, reject duplicate elements where my "DTD" doesn't allow them, etc. In at least one case I'm doing something that's probably evil, where sub-elements name email headers and the text inside provide the data for the header. I'm sure XML experts cringe at that and suggest I use something like: <header name="to">value</header> or somesuch instead.
Here's an example:
[deleted] That actually doesn't look too bad. Do you think you'll be able to release your stuff? I don't have anything generic enough to be useful yet, but I probably could release stuff if/when I do.
I'm undecided on whether I like the approach with lists of (name, type) tuples better than the approach with property factories like in the first example; the list approach allows me to order the attributes and sub-elements consistently upon rendering, but I'm not particularly keen on typing string quotes around Python identifiers.
The property factories are nice, and I have the same aversion to string quoting Python identifiers. I personally have not had a use case for retaining sub-element order. I may play with my own implementation of your spec and see how far I can get. I definitely would like to see /something/ at a higher abstraction than minidom though. -Barry
participants (10)
-
Barry Warsaw
-
Bob Ippolito
-
Carlos Ribeiro
-
exarkun@divmod.com
-
Fred L. Drake, Jr.
-
Guido van Rossum
-
Ian Bicking
-
Johannes Gijsbers
-
Josiah Carlson
-
Michael Chermside