[Tutor] Storing information as attributes or in a dictionary

Wed Sep 19 06:48:46 CEST 2012

On Tue, Sep 18, 2012 at 07:14:26AM -0700, Michiel de Hoon wrote:
> Dear all,
> 
> Suppose I have a parser that parses information stored in e.g. an XML file.

You mean like the XML parsers that already come with Python?

http://docs.python.org/library/markup.html
http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/

Or powerful third-party libraries that already exist?

http://lxml.de/index.html

Please don't waste your time re-inventing the wheel :)

> I would like to design a Python class to store the information 
> contained in this XML file.
> 
> One option is to create a class like this:
> 
> class Record(object):
>     pass
> 
> and store the information in the XML file as attributes of objects of 
> this class

That is perfectly fine if you have a known set of attribute names, and 
none of them clash with Python reserved words (like "class", "del", 
etc.) or are otherwise illegal identifiers (e.g. "2or3").

In general, I prefer to use a record-like object if and only if I have a 
pre-defined set of field names, in which case I prefer to use 
namedtuple:

py> from collections import namedtuple as nt
py> Record = nt("Record", "north south east west")
py> x = Record(1, 2, 3, 4)
py> print x
Record(north=1, south=2, east=3, west=4)
py> x.east
3

> Alternatively I could subclass the dictionary class:
> 
> class Record(dict):
>     pass

Why bother subclassing it? You don't add any functionality. Just return 
a dict, it will be lighter-weight and faster.

> I can see some advantage to using a dictionary, because it allows me 
> to use the same strings as keys in the dictionary as in used in the 
> XML file itself. But are there some general guidelines for when to use 
> a dictionary-like class, 

Yes. You should prefer a dictionary when you have one or more of these:

- your field names could be illegal as identifiers 
  (e.g. "field name", "foo!", etc.)

- you have an unknown and potentially unlimited number of field names

- each record could have a different set of field names

- or some fields may be missing

- you expect to be programmatically inspecting field names that aren't 
  known until runtime, e.g.:

  name = get_name_of_field()
  value = record[name] # is cleaner than getattr(record, name)

- you expect to iterate over all field names

You might prefer to use attributes of a class if you have one or more 
of these:

- all field names are guaranteed to be legal identifiers

- you have a fixed set of field names, known ahead of time

- you value the convenience of writing record.field instead of
  record['field']

> and when to use attributes to store 
> information? In particular, are there any situations where there is 
> some advantage in using attributes?

Not so much. Attributes are convenient, because you save three 
characters:

obj.spam
obj['spam']

but otherwise attributes are just a more limited version of dict keys. 
Anything that can be done with attributes can be done with a dict, since 
attributes are usually implemented with a dict.

-- 
Steven