Using Python for processing of large datasets (convincing managment)

Cameron Laird claird at starbase.neosoft.com
Mon Jul 8 00:12:52 EDT 2002


In article <7xwus7n5q4.fsf at ruckus.brouhaha.com>,
Paul Rubin  <phr-n2002b at NOSPAMnightsong.com> wrote:
>Thomas Jensen <spam at ob_scure.dk> writes:
>> I gotta look into this.
>> However I am uncertain as how to structure my program.
>> One of the tasks of the program will be to calculate the standard
>> deviation of rows of daily values (which are the result of another
>> calculation, etc). I was planning on using lists and tuples like this:
>>      [(date, value), (date, value), ...]
>> How well will this perform i wonder? Since lists and tuples are Python
>> structures, won't they still be "slow" to traverse?
>
>Lists and tuples are fast to traverse (they're just vectors in memory)
>but I don't see their relevance if by "rows" you mean database rows.
>You have to iterate over the rows and compute the SD.  I expect the
>time for that will be mostly taken by database operations.

Me, too.  While I know quite well how difficult
it is to describe any program that's worth wri-
ting, what we've heard of this one puzzles me.
I'll summarize by saying simply that I'm with
Paul:  I *strongly* suspect that database opera-
tions swamp arithmetic operations in elapsed
time, and that attention to the former will be
most rewarding.

You've mentioned once already that you might do
more with your SQL.  I can imagine that much the
greatest returns in performance will come from
writing more of your algorithms in SQL.  That's
likely to be a more scalable and satisfying ap-
proach than the multi-processing complexities at
which you've hinted.
-- 

Cameron Laird <Cameron at Lairds.com>
Business:  http://www.Phaseit.net
Personal:  http://starbase.neosoft.com/~claird/home.html



More information about the Python-list mailing list