[Tutor] object representation

Steven D'Aprano steve at pearwood.info
Fri Mar 5 02:45:59 CET 2010


On Thu, 4 Mar 2010 06:47:04 pm spir wrote:
> Hello,
>
> In python like in most languages, I guess, objects (at least
> composite ones -- I don't know about ints, for instance -- someone
> knows?) are internally represented as associative arrays. 

No. 

You can consider a Python object to be something like a C struct, or a 
Pascal record. The actual structure of the object depends on the type 
of the object (ints, strings, lists, etc will be slightly different). 
See Dave Angel's post for a good description.

And of course this is implementation dependent: CPython, being written 
in C, uses C structs to implement objects. Other Pythons, written in 
other languages, will use whatever data structure makes sense for their 
language. That's almost certainly going to be a struct-like structure, 
since that is a pretty fundamental data type, but it could be 
different.

If you want to know exactly how objects are represented internally by 
Python, I'm afraid you will probably need to read the source code. But 
a good start is here:

http://effbot.org/zone/python-objects.htm



> Python 
> associative arrays are dicts, which in turn are implemented as hash
> tables. Correct? Does this mean that the associative arrays
> representing objects are implemented like python dicts, thus hash
> tables?

"Associate array" is a generic term for a data structure that maps keys 
to items. There are lots of ways of implementing such an associative 
array: at least two different sorts of hash tables, about a dozen 
different types of binary trees, and so on.

In Python, associate arrays are called "dicts", and they are implemented 
in CPython as hash tables with chaining. But objects are not hash 
tables. *Some* objects INCLUDE a hash table as one of its fields, but 
not all. For example:

>>> int.__dict__
<dictproxy object at 0xb7f09674>
>>> (2).__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute '__dict__'
>>> class C: pass
...
>>> C.__dict__
{'__module__': '__main__', '__doc__': None}


That is why you can't add attributes to arbitrary built-in objects:

>>> {}.x = None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'x'

Because the dict instance has no __dict__, there is nowhere to insert 
the attribute.


> I was wondering about the question because I guess the constraints
> are quite different: * Dict keys are of any type, including
> heterogeneous (mixed). Object keys are names, ie a subset of strings.

CPython's implementation of hash tables is highly optimized for speed. 
The biggest bottleneck is the hash function, and that is tuned to be 
extremely efficient for strings and ints.

[...]
> So, I guess the best implementations for objects and dicts may be
> quite different. i wonder about alternatives for objects, in
> particuliar trie and variants: http://en.wikipedia.org/wiki/Trie,
> because they are specialised for associative arrays which keys are
> strings.

*shrug*

Maybe, maybe not. Tries are a more complicated data structure, which 
means bigger code and more bugs. They don't fit in very well with 
CPython's memory management system. And they use a large number of 
pointers, which can be wasteful.

E.g. a trie needs six pointers just to represent the single 
key "python":

'' -> 'p' -> 'y' -> 't' -> 'h' -> 'o' -> 'n'

while a hash table uses just one:

-> 'python'




-- 
Steven D'Aprano


More information about the Tutor mailing list