[Python-ideas] Proposal to add new built-in struct (was: Add kwargs to built-in function object)

23 May 2008

      This is a proposal to add a new built-in named struct:

struct(**kwargs)
    Return a struct object which has the attributes given in kwargs.

The name is really unimportant, and I'm open to other ideas, but I
feel that many people have a good idea of what a struct is, and
accessing an object returned from struct would look identical to the
access of a struct in C, so it seems appropriate to me.

The rationale:

It is often helpful to package related information together in order
to make passing the information to various functions more convenient.
The easiest ways (currently) to package such information is to put it
into a tuple (the easiest) or a dict.

Putting the information into a tuple may be easy initially, but it has
high costs later on as it becomes hard to remember what order the
information is in, and to someone reading the code, the intent of the
code is far from clear.

Putting the information in a dict is a bit harder than a tuple, but
since it is more readable later on than a tuple, this method is often
used. Still, the access pattern is more cumbersome than it could be;
foo["bar"] is more cumbersome than, say, foo.bar. This is especially
the case if you have a dict or list of foos, where you then have to
use foos[i]["bar"].

Both tuple and dict solutions suffer down the line when the
information gets to be complicated enough to warrant a class of its
own. It involves changing both the spot in the code where the
information is created (to use the new class constructor), as well as
changing every single field access in the code (changing every foo[0]
or foo["bar"] to foo.bar).

An alternative is to use NamedTuple, NamedDict, NamedList, or to
create your own class. As long as these are more complicated to use
than a tuple or a dict, however, they are not likely to be used for
this purpose. Another problem is that all of these methods require you
to go to the trouble of thinking of a name for your class, and if you
later decide to add more information to your packaged object, you have
to make two changes (in the list of attributes / constructor and in
the place where you instantiate your object).

Enter struct. Using struct is intended to be just as easy as using a
dict (actually, easier when the number of fields is more than two or
three), and not much harder than using a tuple. To declare a struct
foo with attribute bar, you simply use:

foo = struct(bar="barvalue")

Access becomes very easy and readable:

foo.bar

Adding new fields is as easy as changing the initial instantiation, in
one place:

foo = struct(bar="barvalue", baz="bazvalue")

Later on down the line, when you decide that you are doing more with
foo than a struct should be doing, you can easily define a class Foo
which inherits from struct, and since accesses to foo already look
like foo.bar, you only have one spot in the code to change:

foo = Foo(bar="barvalue", baz="bazvalue")

and the rest "just works" with no changes.

The implementation:

class struct (object):

    def __init__ (self, **kwargs):
        self.__dict__.update(kwargs)

    def __repr__ (self):
        """
        Using self.__class__.__name__ allows classes to inherit from
        struct and automatically have a nice __repr__ method.
        """
        return "%s(%s)" % (self.__class__.__name__,
                           ", ".join("%s=%s" % (attr, repr(val))
                                     for attr, val
                                     in self.__dict__.iteritems()),)
                                                # or .items()
                                                # in Python 3K

    def __str__ (self):
        return self.__repr__()

    def __eq__ (self, other):
        """
        Implements comparison operation mirroring that of a C struct.
        """
        return self.__dict__ == other.__dict__

    def __ne__ (self, other):
        """
        See note for __eq__.
        """
        return not self.__eq__(other)

    def __setattr__ (self, name, value):
        """
        I think it makes the most sense for a struct to have immutable
        fields. As soon as you start to add more fields, you should be
        using something other than a struct.
        """
        if name in self.__dict__:
            self.__dict__[name] = value
        else:
            raise(AttributeError("'struct' object has no attribute '%s'" \
                                 % (name,)))

    def __len__ (self):
        """
        I'm not sure that it's really necessary to include this, but I
        could see where it might be helpful in some instances.
        """
        return len(self.__dict__)

    def __iter__ (self):
        """
        See note for __len__
        """
        return self.__dict__.itervalues()
                       # or .values() in Python 3K

Sample usage:
...
...
...
a = struct(one=1, two=2)
a
struct(two=2, one=1)
a.one
1
a.two
2
a.three
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'struct' object has no attribute 'three'
a.one = "one"
a.one
'one'
a.three = 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "struct.py", line 39, in __setattr__
    % (name,)))
AttributeError: 'struct' object has no attribute 'three'
b = struct(one=1, two=2)
b
struct(two=2, one=1)
a == b
False
a.one = 1
a == b
True
len(a)
2
print ", ".join(str(v) for v in a)
2, 1
1 in a
True
"one" in a
False
Ideas or feedback, anyone?

[Python-ideas] Proposal to add new built-in struct (was: Add kwargs to built-in function object)

Brandon Mintern