Large Data Sets: Use base variables or classes? And some binding questions

malkarouri malkarouri at gmail.com
Fri Sep 26 22:21:24 CEST 2008


On 26 Sep, 16:39, Patrick  Sullivan <psu... at gmail.com> wrote:
> Hello.
>
> I will be using some large data sets ("points" from 2 to 12 variables)
> and would like to use one class for each point rather than a list or
> dictionary. I imagine this is terribly inefficient, but how much?

I can't really get into details here, but I would suggest that you go
ahead and try first. As you know, premature optimization is the root
of all evil.

General points I would suggest:

- Use Numpy/Scipy (http://www.scipy.org). You will have more
effeciency easier than if you try to use simply Python lists. And it
is much easier to later optimize that.
- Your questions of referencing classes and variables tell me that
perhaps you are starting from a C background, or Java maybe? Anyway,
as far as I know, it is not standard practice to write a class method
(you meant a normal bound method, right?) just to access a variable.
Use a normal Python variable and if you need to make it a method later
turn it into a property.
- Is the efficiency you are looking for is in terms of time or memory?
That difference leads to different optimization tricks sometimes.
- By using Numpy there is probably another advantage to you: some
efficiency in the data representation, as the NumPy array stores data,
say integers, without memory overhead per member (point). Just an
array of integers. Of course there is additional constant memory per
array which is independent of the number of elements (points) you are
storing.
- Generally try to think in terms of arrays of data rather than single
points. If it helps, think in terms of matrices. That is more or less
the design of Matlab, and Numpy is more or less similar.


Now if you specify your problem further I am sure that you will get
better advice from the community here. Don't focus on the details,
probably the bigger picture will help. Working in graphics? Image
processing? Machine Learning/Statistics/Data Mining/ etc..?

--
Muhammad Alkarouri



More information about the Python-list mailing list