[Tutor] object representation
Steven D'Aprano
steve at pearwood.info
Fri Mar 5 02:45:59 CET 2010
On Thu, 4 Mar 2010 06:47:04 pm spir wrote:
> Hello,
>
> In python like in most languages, I guess, objects (at least
> composite ones -- I don't know about ints, for instance -- someone
> knows?) are internally represented as associative arrays.
No.
You can consider a Python object to be something like a C struct, or a
Pascal record. The actual structure of the object depends on the type
of the object (ints, strings, lists, etc will be slightly different).
See Dave Angel's post for a good description.
And of course this is implementation dependent: CPython, being written
in C, uses C structs to implement objects. Other Pythons, written in
other languages, will use whatever data structure makes sense for their
language. That's almost certainly going to be a struct-like structure,
since that is a pretty fundamental data type, but it could be
different.
If you want to know exactly how objects are represented internally by
Python, I'm afraid you will probably need to read the source code. But
a good start is here:
http://effbot.org/zone/python-objects.htm
> Python
> associative arrays are dicts, which in turn are implemented as hash
> tables. Correct? Does this mean that the associative arrays
> representing objects are implemented like python dicts, thus hash
> tables?
"Associate array" is a generic term for a data structure that maps keys
to items. There are lots of ways of implementing such an associative
array: at least two different sorts of hash tables, about a dozen
different types of binary trees, and so on.
In Python, associate arrays are called "dicts", and they are implemented
in CPython as hash tables with chaining. But objects are not hash
tables. *Some* objects INCLUDE a hash table as one of its fields, but
not all. For example:
>>> int.__dict__
<dictproxy object at 0xb7f09674>
>>> (2).__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute '__dict__'
>>> class C: pass
...
>>> C.__dict__
{'__module__': '__main__', '__doc__': None}
That is why you can't add attributes to arbitrary built-in objects:
>>> {}.x = None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'x'
Because the dict instance has no __dict__, there is nowhere to insert
the attribute.
> I was wondering about the question because I guess the constraints
> are quite different: * Dict keys are of any type, including
> heterogeneous (mixed). Object keys are names, ie a subset of strings.
CPython's implementation of hash tables is highly optimized for speed.
The biggest bottleneck is the hash function, and that is tuned to be
extremely efficient for strings and ints.
[...]
> So, I guess the best implementations for objects and dicts may be
> quite different. i wonder about alternatives for objects, in
> particuliar trie and variants: http://en.wikipedia.org/wiki/Trie,
> because they are specialised for associative arrays which keys are
> strings.
*shrug*
Maybe, maybe not. Tries are a more complicated data structure, which
means bigger code and more bugs. They don't fit in very well with
CPython's memory management system. And they use a large number of
pointers, which can be wasteful.
E.g. a trie needs six pointers just to represent the single
key "python":
'' -> 'p' -> 'y' -> 't' -> 'h' -> 'o' -> 'n'
while a hash table uses just one:
-> 'python'
--
Steven D'Aprano
More information about the Tutor
mailing list