Large Data Sets: Use base variables or classes? And some binding questions

Carl Banks pavlovevidence at gmail.com
Fri Sep 26 17:54:36 EDT 2008


On Sep 26, 11:39 am, Patrick  Sullivan <psu... at gmail.com> wrote:
> Hello.

Hi, I have a couple suggestions.


> I will be using some large data sets ("points" from 2 to 12 variables)
> and would like to use one class for each point rather than a list or
> dictionary.

Ok, point of terminology.  It's not really a nit-pick, either, since
it affects some of your questions below.  When you say you want to use
one class for each point, you apparently mean you would like to use
one class instance, or one object, for each point.

One class for each point would be terribly inefficient; one instance,
perhaps not.


> I imagine this is terribly inefficient, but how much?

You say large data sets, which suggests that __slots__ mechanism could
be useful to you.

class A(object):
    __slots__ = ['var1','var2','var3']

Normally, each class instance has an associated dict which stores the
attributes, but if you define __slots__ then the variables will be
stored in fixed memory locations and no dict will be created.

However, it seems from the rest of your comments that speed is your
main concern.  Last time someone reported __slots__ didn't make a big
difference in access time, but it probably would speed up creating
objects a bit.  Of course, you should profile it to make sure.


> What is the cost of creating a new class?

I'm assuming you want to know the cost of creating a class instance.
Generally speaking, the main cost of this is that you'd be executing
Python code (whereas list and dict are written in C).


> What is the cost of referencing a class variable?

I assume you mean an instance variable.


> What is the cost of calling a class method to just return a variable?

Significant penalty.

This is because even if the method call is faster (and I doubt very
highly that it is), the method still has to access the variable, which
is going to take the same amount of time as accessing the variable
directly.  I.e., you're getting the overhead of a method call to do
the same thing you could have done directly.

I highly recommend against doing this, not only because it's less
efficient, but also because it's considered bad style in Python.


> Key point: The point objects, once created, and essentially non-
> mutable. Static. Is there a way to "bind" a variable to a object
> method in a way that is more efficient than the function calling
> self.variable_name ?

Python 2.6 has a new object type called namedtuple in the collections
module.  (Actually it's a type factory that creates a subclass of
tuple with attribute names mapped to the indices.)  This might be a
perfect fit for your needs.  You have to upgrade to 2.6, though, which
won't be released for a few days.


Carl Banks




More information about the Python-list mailing list