The trouble with "dynamic attributes".

Thu Sep 16 19:27:08 EDT 2010

On 16/09/2010 22:46, John Nagle wrote:
>   There's a tendency to use "dynamic attributes" in Python when
> trying to encapsulate objects from other systems. It almost
> works. But it's usually a headache in the end, and should be
> discouraged. Here's why.
>
> Some parsers, like BeautifulSoup, try to encapsulate HTML tag
> fields as Python attributes. This gives trouble for several reasons.
> First, the syntax for Python attributes and Python tags is different.
> Some strings won't convert to attributes. You can crash BeautifulSoup
> (which is supposed to be robust against bad HTML) by using a non-ASCII
> character in a tag in an HTML document it is parsing.
>
> Then there's the reserved word problem. "class" is a valid field
> name in HTML, and a reserved word in Python. So there has to be a
> workaround for reserved words.
>
> There's also the problem that user-created attributes go into the
> same namespace as other object attributes. This creates a vulnerability
> comparable to MySQL injection. If an attacker controls the input
> being parsed, they may be able to induce a store into something
> they shouldn't be able to access.
>
> This problem shows up again in "suds", the module for writing
> SOAP RPC clients. This module tries to use attributes for
> XML structures, and it almost works. It tends to founder when
> the XML data model has strings that aren't valid attributes.
> ("-" appears frequently in XML fields, but is not valid in an
> attribute name.)
>
> Using a dictionary, or inheriting an object from "dict", doesn't
> create these problems. The data items live in their own dictionary,
> and can't clash with anything else. Of course, you have to write
>
> tag['a']
>
> instead of
>
> tag.a
>
> but then, at least you know what to do when you need
>
> tag['class']
>
> "suds", incidentally, tries to do both. They accept both
>
> item.fieldname
>
> and
>
> item['fieldname']
>
> But they are faking a dictionary, and it doesn't quite work right.
>
> 'fieldname' in item
>
> works correctly, but the form to get None when the field is missing,
>
> item.get('fieldname',None)
>
> isn't implemented.
>
> Much of the code that uses objects as dictionaries either predates
> the days when you couldn't inherit from "dict", or was written by
> Javascript programmers. (In Javascript, an object and a dictionary
> are the same thing. In Python, they're not.) In new code, it's
> better to inherit from "dict". It eliminates the special cases.
>
For the work on updating the re module there was a discussion about
whether named capture groups should be available as attributes of the
match object or via subscripting (or both?). Subscripting seemed
preferable to me because:

1. Adding attributes looks too much like 'magic'.

2. What should happen if a group name conflicts with a normal attribute?

3. What should happen if a group name conflicts with a reserved word?

For those reasons the new regex module uses subscripting. It's more
Pythonic, IMHO.