Proposal to add new built-in struct (was: Add kwargs to built-in function object)

This is a proposal to add a new built-in named struct:
struct(**kwargs) Return a struct object which has the attributes given in kwargs.
The name is really unimportant, and I'm open to other ideas, but I feel that many people have a good idea of what a struct is, and accessing an object returned from struct would look identical to the access of a struct in C, so it seems appropriate to me.
The rationale:
It is often helpful to package related information together in order to make passing the information to various functions more convenient. The easiest ways (currently) to package such information is to put it into a tuple (the easiest) or a dict.
Putting the information into a tuple may be easy initially, but it has high costs later on as it becomes hard to remember what order the information is in, and to someone reading the code, the intent of the code is far from clear.
Putting the information in a dict is a bit harder than a tuple, but since it is more readable later on than a tuple, this method is often used. Still, the access pattern is more cumbersome than it could be; foo["bar"] is more cumbersome than, say, foo.bar. This is especially the case if you have a dict or list of foos, where you then have to use foos[i]["bar"].
Both tuple and dict solutions suffer down the line when the information gets to be complicated enough to warrant a class of its own. It involves changing both the spot in the code where the information is created (to use the new class constructor), as well as changing every single field access in the code (changing every foo[0] or foo["bar"] to foo.bar).
An alternative is to use NamedTuple, NamedDict, NamedList, or to create your own class. As long as these are more complicated to use than a tuple or a dict, however, they are not likely to be used for this purpose. Another problem is that all of these methods require you to go to the trouble of thinking of a name for your class, and if you later decide to add more information to your packaged object, you have to make two changes (in the list of attributes / constructor and in the place where you instantiate your object).
Enter struct. Using struct is intended to be just as easy as using a dict (actually, easier when the number of fields is more than two or three), and not much harder than using a tuple. To declare a struct foo with attribute bar, you simply use:
foo = struct(bar="barvalue")
Access becomes very easy and readable:
foo.bar
Adding new fields is as easy as changing the initial instantiation, in one place:
foo = struct(bar="barvalue", baz="bazvalue")
Later on down the line, when you decide that you are doing more with foo than a struct should be doing, you can easily define a class Foo which inherits from struct, and since accesses to foo already look like foo.bar, you only have one spot in the code to change:
foo = Foo(bar="barvalue", baz="bazvalue")
and the rest "just works" with no changes.
The implementation:
class struct (object):
def __init__ (self, **kwargs): self.__dict__.update(kwargs)
def __repr__ (self): """ Using self.__class__.__name__ allows classes to inherit from struct and automatically have a nice __repr__ method. """ return "%s(%s)" % (self.__class__.__name__, ", ".join("%s=%s" % (attr, repr(val)) for attr, val in self.__dict__.iteritems()),) # or .items() # in Python 3K
def __str__ (self): return self.__repr__()
def __eq__ (self, other): """ Implements comparison operation mirroring that of a C struct. """ return self.__dict__ == other.__dict__
def __ne__ (self, other): """ See note for __eq__. """ return not self.__eq__(other)
def __setattr__ (self, name, value): """ I think it makes the most sense for a struct to have immutable fields. As soon as you start to add more fields, you should be using something other than a struct. """ if name in self.__dict__: self.__dict__[name] = value else: raise(AttributeError("'struct' object has no attribute '%s'" \ % (name,)))
def __len__ (self): """ I'm not sure that it's really necessary to include this, but I could see where it might be helpful in some instances. """ return len(self.__dict__)
def __iter__ (self): """ See note for __len__ """ return self.__dict__.itervalues() # or .values() in Python 3K
Sample usage:
a = struct(one=1, two=2) a
struct(two=2, one=1)
a.one
1
a.two
2
a.three
Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'struct' object has no attribute 'three'
a.one = "one" a.one
'one'
a.three = 3
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "struct.py", line 39, in __setattr__ % (name,))) AttributeError: 'struct' object has no attribute 'three'
b = struct(one=1, two=2) b
struct(two=2, one=1)
a == b
False
a.one = 1 a == b
True
len(a)
2
print ", ".join(str(v) for v in a)
2, 1
1 in a
True
"one" in a
False
Ideas or feedback, anyone?

On Thu, May 22, 2008 at 5:42 PM, Brandon Mintern bmintern@gmail.com wrote:
This is a proposal to add a new built-in named struct:
One thing I forgot to mention... this is mainly intended to be used where it makes sense to quickly build an anonymous object. It would be considered bad practice to have more than one place in the code which creates a struct having the same fields; in that case a NamedTuple (or some equivalent) would be more appropriate.
In other words, this would be used in a similar situation to that of lambda. You would not use lambda in several different places to define the same function; as soon as you start to write it a second time, you'd be much better to def the function and use it in both places. In the same way, creating the same struct in two places is a good indication that a NamedTuple or an actual class (possibly inheriting from struct and using a non-kwargs constructor) is more appropriate.

On Thu, May 22, 2008 at 2:42 PM, Brandon Mintern bmintern@gmail.com wrote:
This is a proposal to add a new built-in named struct:
struct(**kwargs) Return a struct object which has the attributes given in kwargs.
The name is really unimportant, and I'm open to other ideas, but I feel that many people have a good idea of what a struct is, and accessing an object returned from struct would look identical to the access of a struct in C, so it seems appropriate to me.
The rationale:
It is often helpful to package related information together in order to make passing the information to various functions more convenient. The easiest ways (currently) to package such information is to put it into a tuple (the easiest) or a dict.
Putting the information into a tuple may be easy initially, but it has high costs later on as it becomes hard to remember what order the information is in, and to someone reading the code, the intent of the code is far from clear.
Putting the information in a dict is a bit harder than a tuple, but since it is more readable later on than a tuple, this method is often used. Still, the access pattern is more cumbersome than it could be; foo["bar"] is more cumbersome than, say, foo.bar. This is especially the case if you have a dict or list of foos, where you then have to use foos[i]["bar"].
So you save three characters? I don't call that cumbersome.
Both tuple and dict solutions suffer down the line when the information gets to be complicated enough to warrant a class of its own. It involves changing both the spot in the code where the information is created (to use the new class constructor), as well as changing every single field access in the code (changing every foo[0] or foo["bar"] to foo.bar).
An alternative is to use NamedTuple, NamedDict, NamedList, or to create your own class. As long as these are more complicated to use than a tuple or a dict, however, they are not likely to be used for this purpose. Another problem is that all of these methods require you to go to the trouble of thinking of a name for your class, and if you later decide to add more information to your packaged object, you have to make two changes (in the list of attributes / constructor and in the place where you instantiate your object).
Thinking of a name for your class is not difficult, especially if you keep it private to the module, class, function, etc.
This does not strike me as useful enough to have as a built-in. It would be better placed in the stdlib.
-Brett

On Thu, May 22, 2008 at 6:10 PM, Brett Cannon brett@python.org wrote:
So you save three characters? I don't call that cumbersome.
If foo["bar"] is not cumbersome, it is at least less elegant and the intent is less clear than foo.bar. Moreover, as I stated in the next paragraph, it does become cumbersome down the line when you decide that you should have used a class after all, and now you have to change all of those foo["bar"] lines to foo.bar. Note that simple search-and-replace wouldn't help if you are passing foo to various functions.
Thinking of a name for your class is not difficult, especially if you keep it private to the module, class, function, etc.
It may not be difficult, but when the name is unnecessary, simply needing to declare it seems silly.
This does not strike me as useful enough to have as a built-in. It would be better placed in the stdlib.
I would be happy with it at least becoming part of collections or some other module, but then I wonder how many new-ish Python programmers would persist in using a tuple or a dict instead of a more elegant struct solution for lack of knowing about it. At least if it was in Python somewhere, though, searching for "python struct" would be more likely to return what the programmer is looking for.
Ouch... it seems that struct is already the name of a module. If enough people like my idea, perhaps that module could be renamed to "cstruct". Then again, if my idea did become a part of collections (rather than a built-in), collections.struct and the struct module would be able to co-exist, albeit somewhat confusingly.
Brandon

On Thu, May 22, 2008 at 3:24 PM, Brandon Mintern bmintern@gmail.com wrote:
On Thu, May 22, 2008 at 6:10 PM, Brett Cannon brett@python.org wrote:
So you save three characters? I don't call that cumbersome.
If foo["bar"] is not cumbersome, it is at least less elegant and the intent is less clear than foo.bar. Moreover, as I stated in the next paragraph, it does become cumbersome down the line when you decide that you should have used a class after all, and now you have to change all of those foo["bar"] lines to foo.bar. Note that simple search-and-replace wouldn't help if you are passing foo to various functions.
But you are suggesting that people think far enough ahead to think that a sequence or mapping will be cumbersome and thus something with attribute access should be used instead.
Thinking of a name for your class is not difficult, especially if you keep it private to the module, class, function, etc.
It may not be difficult, but when the name is unnecessary, simply needing to declare it seems silly.
Well, we almost ditched lambda and were going to require people to define a simple function to replace lambda functions, so not everyone thinks it is silly.
This does not strike me as useful enough to have as a built-in. It would be better placed in the stdlib.
I would be happy with it at least becoming part of collections or some other module, but then I wonder how many new-ish Python programmers would persist in using a tuple or a dict instead of a more elegant struct solution for lack of knowing about it. At least if it was in Python somewhere, though, searching for "python struct" would be more likely to return what the programmer is looking for.
Sticking something in built-ins so that it is easier for newbies to find it is not a good argument. Things only go into builtins if they are frequently used and warrant skipping an import statement.
Ouch... it seems that struct is already the name of a module. If enough people like my idea, perhaps that module could be renamed to "cstruct". Then again, if my idea did become a part of collections (rather than a built-in), collections.struct and the struct module would be able to co-exist, albeit somewhat confusingly.
I don't agree with that worry. re.compile() exists and people don't worry about conflicting with the built-in function. An import statement makes it clear what object 'struct' maps to in the namespace.
-Brett

Brett Cannon wrote:
I don't agree with that worry. re.compile() exists and people don't worry about conflicting with the built-in function.
But that's just a matter of two functions with the same name in different places. It's not a case of something being a function or type in one place and a module in another. That would be more confusing, I think.

On Thu, May 22, 2008 at 7:00 PM, Brett Cannon brett@python.org wrote:
If foo["bar"] is not cumbersome, it is at least less elegant and the intent is less clear than foo.bar. Moreover, as I stated in the next paragraph, it does become cumbersome down the line when you decide that you should have used a class after all, and now you have to change all of those foo["bar"] lines to foo.bar. Note that simple search-and-replace wouldn't help if you are passing foo to various functions.
But you are suggesting that people think far enough ahead to think that a sequence or mapping will be cumbersome and thus something with attribute access should be used instead.
Perhaps a lot of people wouldn't, unless the tutorial reflected the idea of using struct for anonymous, structured data. When I started out, I initially settled on tuples because it was the easiest to build up front, and I didn't see any better alternatives. After the frustration of maintaining tuple-based structures, I began to use dicts instead, but I didn't like the syntax all that much, and it looks weird (in my opinion) to have code that declares and indexes a dictionary when what you really want is something like a C struct.
I guess what I'm suggesting is that if a clear solution is presented that fits the use case well, I would expect a nontrivial number of people to use it. They would not be required to notice that it is then easy to convert to a real class, because good practice is "enforced" by the use of struct.
Thinking of a name for your class is not difficult, especially if you keep it private to the module, class, function, etc.
It may not be difficult, but when the name is unnecessary, simply needing to declare it seems silly.
Well, we almost ditched lambda and were going to require people to define a simple function to replace lambda functions, so not everyone thinks it is silly.
I'm sorry to keep hammering on this issue, but anonymous functions already have standard names from Math: f, g, h, fn, func, etc. In other words,
def f (x, y): return x+y f
is not much harder than
lambda x,y: x+y
(except for the fact that it cannot be used as an expression, which is why I personally still like lambda).
With class names, there is no such convention. Therefore, something like
c = NamedTuple("C", "one two") a = c(1, 2)
seems a bit strange. Why "c" and "C"? Why anything? Why should I have to include "C" as an argument only to never use it? I think that
a = struct(one=1, two=2)
clearly beats out the NamedTuple solution if all you really need is a struct.
This does not strike me as useful enough to have as a built-in. It would be better placed in the stdlib.
I would be happy with it at least becoming part of collections or some other module, but then I wonder how many new-ish Python programmers would persist in using a tuple or a dict instead of a more elegant struct solution for lack of knowing about it. At least if it was in Python somewhere, though, searching for "python struct" would be more likely to return what the programmer is looking for.
Sticking something in built-ins so that it is easier for newbies to find it is not a good argument. Things only go into builtins if they are frequently used and warrant skipping an import statement.
Fair enough. It would make sense to initially put it into collections, make sure people know about it, and then see how often it's used.
Ouch... it seems that struct is already the name of a module. If enough people like my idea, perhaps that module could be renamed to "cstruct". Then again, if my idea did become a part of collections (rather than a built-in), collections.struct and the struct module would be able to co-exist, albeit somewhat confusingly.
I don't agree with that worry. re.compile() exists and people don't worry about conflicting with the built-in function. An import statement makes it clear what object 'struct' maps to in the namespace.
Good, because I much-prefer using the name "struct" since it maps so closely to a C struct.
Brandon

Brandon Mintern wrote:
This is a proposal to add a new built-in named struct:
struct(**kwargs) Return a struct object which has the attributes given in kwargs.
I think I'd prefer 'record', to avoid any potential confusion with the struct module, which does something quite different.
Also, my Pascal background makes the term 'record' seem more high-level and therefore Pythonic to me.

Greg Ewing schrieb:
Brandon Mintern wrote:
This is a proposal to add a new built-in named struct:
struct(**kwargs) Return a struct object which has the attributes given in kwargs.
I think I'd prefer 'record', to avoid any potential confusion with the struct module, which does something quite different.
Also, my Pascal background makes the term 'record' seem more high-level and therefore Pythonic to me.
In the past, people have also suggested 'namespace' for the same concept.
Georg

"Greg Ewing" greg.ewing@canterbury.ac.nz wrote in message news:4835FBA4.8020206@canterbury.ac.nz... | Brandon Mintern wrote: | > This is a proposal to add a new built-in named struct: | > | > struct(**kwargs) | > Return a struct object which has the attributes given in kwargs. | | I think I'd prefer 'record', to avoid any potential | confusion with the struct module, which does something | quite different.
I agree, perhaps even Record .., but in any case in the collections module. Something like this has been the subject of enough c.l.p posts to make a case for something in the stdlib, but not in builtins. An implementation in Python also serves as a model for variations.

On Thu, 22 May 2008, Brandon Mintern wrote:
The implementation:
class struct (object):
<things here> def __setattr__ (self, name, value): """ I think it makes the most sense for a struct to have immutable fields. As soon as you start to add more fields, you should be using something other than a struct. """ if name in self.__dict__: self.__dict__[name] = value else: raise(AttributeError("'struct' object has no attribute '%s'" \ % (name,)))
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability. Someone more well-versed in the python object model should determine if this is actually a good idea.

On Thu, May 22, 2008 at 7:15 PM, Leif Walsh adlaiff6@gmail.com wrote:
On Thu, 22 May 2008, Brandon Mintern wrote:
The implementation:
class struct (object):
<things here> def __setattr__ (self, name, value): """ I think it makes the most sense for a struct to have immutable fields. As soon as you start to add more fields, you should be using something other than a struct. """ if name in self.__dict__: self.__dict__[name] = value else: raise(AttributeError("'struct' object has no attribute '%s'" \ % (name,)))
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability. Someone more well-versed in the python object model should determine if this is actually a good idea.
-- Cheers, Leif
Whoops... I should have said "An implementation." I intended the implementation to be a specification for behavior and not the one-and-only implementation. Personally, I'm not especially familiar with using __slots__ in anything but standard usage, so I found it easier to show the code using __dict__. Certainly if __slots__ can be used, it avoids the need to write __setattr__ and to explicitly raise the AttributeError exception (which I already see is wrong because it doesn't support inheritance).

On Thu, 22 May 2008, Brandon Mintern wrote:
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability. Someone more well-versed in the python object model should determine if this is actually a good idea.
-- Cheers, Leif
Whoops... I should have said "An implementation." I intended the implementation to be a specification for behavior and not the one-and-only implementation. Personally, I'm not especially familiar with using __slots__ in anything but standard usage, so I found it easier to show the code using __dict__. Certainly if __slots__ can be used, it avoids the need to write __setattr__ and to explicitly raise the AttributeError exception (which I already see is wrong because it doesn't support inheritance).
Don't take my word as gospel. I've only used __slots__ once, and it was at the suggest of another programmer in a kind of "I think this does what we want" suggestion, and we are still not entirely sure if it's doing what we think it's doing.

Leif Walsh wrote:
I've only used __slots__ once, and it was at the suggest of another programmer in a kind of "I think this does what we want" suggestion, and we are still not entirely sure if it's doing what we think it's doing.
Another thing to keep in mind about __slots__ is that its primary purpose is to save memory. It currently has the side effect of preventing other attributes from being added, but it may not remain that way forever.

On Thu, May 22, 2008 at 7:15 PM, Leif Walsh adlaiff6@gmail.com wrote:
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability.
That wouldn't work, because the slots that a class has are fixed when the class is defined. Changing __slots__ subsequent to that doesn't affect anything.

On Thu, May 22, 2008 at 9:59 PM, Greg Ewing greg.ewing@canterbury.ac.nz wrote:
On Thu, May 22, 2008 at 7:15 PM, Leif Walsh adlaiff6@gmail.com wrote:
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability.
That wouldn't work, because the slots that a class has are fixed when the class is defined. Changing __slots__ subsequent to that doesn't affect anything.
Unless you generate the class on the fly with a class factory, like namedtuple. Actually I prefer a metaclass factory, to avoid the repetition of the class name:
>>> class Foo(object): ... __metaclass__ = RecordType('x y z is_on', field_defaults={'is_on':False}, default=0.0) >>> a = Foo() >>> a Foo(x=0.0, y=0.0, z=0.0, is_on=False) >>> a.__slots__ ('x', 'y', 'z', 'is_on') >>> a.__dict__ Traceback (most recent call last): AttributeError: 'Foo' object has no attribute '__dict__'
George

On Fri, May 23, 2008 at 12:51 AM, George Sakkis george.sakkis@gmail.com wrote:
On Thu, May 22, 2008 at 9:59 PM, Greg Ewing greg.ewing@canterbury.ac.nz wrote:
On Thu, May 22, 2008 at 7:15 PM, Leif Walsh adlaiff6@gmail.com wrote:
I think it makes the most sense, if this construct is adopted, to use __slots__ to control mutability.
That wouldn't work, because the slots that a class has are fixed when the class is defined. Changing __slots__ subsequent to that doesn't affect anything.
Unless you generate the class on the fly with a class factory, like namedtuple. Actually I prefer a metaclass factory, to avoid the repetition of the class name:
>>> class Foo(object): ... __metaclass__ = RecordType('x y z is_on',
field_defaults={'is_on':False}, default=0.0) >>> a = Foo() >>> a Foo(x=0.0, y=0.0, z=0.0, is_on=False) >>> a.__slots__ ('x', 'y', 'z', 'is_on') >>> a.__dict__ Traceback (most recent call last): AttributeError: 'Foo' object has no attribute '__dict__'
The recipe here does roughly that (though without the defaulting extras):
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237
Note however that use of __slots__ is still controversial. (Because it's really only appropriate if you plan to make many of these objects and you want to keep memory consumption down.)
Steve

Steven Bethard wrote:
Note however that use of __slots__ is still controversial. (Because it's really only appropriate if you plan to make many of these objects and you want to keep memory consumption down.)
Also, if you were going to go down the route of creating classes on the fly, you would want a factory function to create a class suitable for the purpose at hand, and then make instances of that class. Otherwise, you would be creating a new class for every *instance* of your struct, which would be extremely wasteful (type objects are *big*!)
And if you're doing that, you might as well just have some built-in or library class that you can subclass.
participants (8)
-
Brandon Mintern
-
Brett Cannon
-
Georg Brandl
-
George Sakkis
-
Greg Ewing
-
Leif Walsh
-
Steven Bethard
-
Terry Reedy