[Tutor] Conditional attribute access / key access
Steven D'Aprano
steve at pearwood.info
Tue Aug 31 11:47:13 CEST 2010
On Tue, 31 Aug 2010 12:44:08 am Knacktus wrote:
> Hey everyone,
>
> I have a huge number of data items coming from a database.
Huge?
Later in this thread, you mentioned 200,000 items overall. That might
be "huge" to you, but it isn't to Python. Here's an example:
class K(object):
def __init__(self):
self.info = {"id": id(self),
"name": "root " + str(id(self)),
"children_ids": [2*id(self), 3*id(self)+1]}
And the size:
>>> k = K()
>>> sys.getsizeof(k)
28
>>> sys.getsizeof(k.info)
136
>>> L = [K() for _ in xrange(200000)]
>>> sys.getsizeof(L)
835896
The sizes given are in bytes. So 200,000 instances of this class, plus
the list to hold them, would take approximately 34 megabytes. An entry
level PC these days has 1000 megabytes of memory. "Huge"? Not even
close.
Optimizing with __slots__ is premature. Perhaps if you had 1000 times
that many instances, then it might be worth while.
> So far
> there're no restrictions about how to model the items. They can be
> dicts, objects of a custom class (preferable with __slots__) or
> namedTuple.
>
> Those items have references to each other using ids.
That approach sounds slow and ponderous to me. Why don't you just give
items direct references to each other, instead of indirect using ids?
I presume you're doing something like this:
ids = {0: None} # Map IDs to objects.
a = Part(0)
ids[1] = a
b = Part(1) # b is linked to a via its ID 1.
ids[2] = b
c = Part(2) # c is linked to b via its ID 2.
ids[3] = c
(only presumably less painfully).
If that's what you're doing, you should dump the ids and just do this:
a = Part(None)
b = Part(a)
c = Part(b)
Storing references to objects in Python is cheap -- it's only a pointer.
Using indirection via an ID you manage yourself is a pessimation, not
an optimization: it requires more code, slower speed, and more memory
too (because the integer IDs themselves are pointers to 12 byte
objects, not 4 byte ints).
If you *need* indirection, say because you are keeping the data in a
database and you want to only lazily load it when needed, rather than
all at once, then the right approach is probably a proxy object:
class PartProxy(object):
def __init__(self, database_id):
self._info = None
self.database_id = database_id
@property
def info(self):
if self._info is None:
self._info = get_from_database(self.database_id)
return self._info
--
Steven D'Aprano
More information about the Tutor
mailing list