[Baypiggies] Discussion for newbies/beginner night talks - test results

Thu Feb 15 21:22:43 CET 2007

At 11:21 PM 2/14/2007, Drew Perttula wrote:
>Nope, the square brackets are being used in a novel way that creates stan 
>structures,

Summary
-------
Appending many string fragments is done most efficiently using a dict in 
both time and memory under extreme conditions.  Text append gives better 
memory results than list append.  Under realistic conditions, there was 
little reason to choose text over list over dict.

Discussion
----------
This discussion and Stan got me thinking.  Today, I want to tackle code 
which generates a graphical directory display of icons.  It is possible for 
people to have directories of thousands (maybe even 10s of thousands) of 
icons.  Efficiency may matter.

The convention wisdom, IIRC, is to do list appends rather than string 
appends.  I wrote a program to test this, measuring execution time and 
final memory footprint under windows.  Here is the code:

import time

def holding():
     #l = ""
     #l = [""]
     l = {}
     for x in range(0,100000):
         #l += "abcdef"
         #l += ["abcdef"]
         #l.append("abcdef")
         l[x] = "abcdef"

time.sleep(1)
print "enter holding"
for a in range(0,10):
     start = time.time()
     for b in range(0,10):
         holding()
     print "%3.5f" % (time.time()-start)
     time.sleep(1)
print "exit holding"
time.sleep(2000)

The code uses "range" because I want to run it under 2.3, although all the 
results reported here are for 2.5.  Here are the results:

TIME (10 calls ea)   100K_iter_1_str 500K_iter_1_str 100K_iter_5_str 
1K_iter_5_str
text                 0.422           15.656          14.937          0.016
list+=               0.438           2.2             0.421           0.016
list_append          0.219           1.188           0.219           0.016
dict                 0.203           1.031           0.204           0

MEM                 100K_iter_1_str  500K_iter_1_str 100K_iter_5_str 
1K_iter_5_str
text                6.196            10.356          5.44            3.548
list+=              7.196            11.92           7.192           3.56
list_append         7.192            11.92           7.196           3.56
dict                5.66             9.548           5.656           3.588

The TIME results are in seconds for 10 calls to holding(), as should be 
clear from the code.  The best and worst case timings among the 10 runs did 
not vary much (usually second decimal) and so for simplicity I simply 
reported worst case timing.

The MEM results are in megabytes of memory at the end of run (at "exit 
holding").  At start of run before entering holding(), memory used was 3.5 
MB.  Clearly, there is a systematic growth in memory.  The final 
time.sleep(2000) is to allow time for garbage collection to reduce memory 
usage.  In one run, I waited 10 minutes and saw no change in memory 
used.  The memory used fluctuated far more than time.

My expectation for the memory results is that all memory internal to 
holding() would be released because all variables were local.  This 
expectation does not appear to be met.

Across the columns, "100K_iter" means the

             for x in range(0,100000):

statement had an upper limit of 100000 (as shown) while 500K_iter says the 
upper limit is 5 times greater and so on.  "1_str" means the string used is 
"abcdef" (as shown) while 5_str means "abcdefabcdefabcdefabcdefabcdef" was 
used instead.

For the rows, text means that the

         #l = ""
        #l += "abcdef"

statements were uncommented while dict is uncommented in the code show.

Evaluation
------------
I am looking at HTML generation.  A 100K HTML file is *huge*.  Doing 100K 
or 500K appends of a 6 or 30 character string are "stress tests" which I 
simply do not expect to reach.  I would like to know what the behavior is 
but these levels are not the design point.  The final column showing 1000 
appends of a 30 character string represent a typical design point.

At 1K iterations, the time differences do not matter.  I am using a Dual 
core 2 GHz machine for timing.  Even a machine 20X slower is still 
acceptable for generating a local html page. If I were writing a multiuser 
server, the timing would matter.  In my app, it simply does not matter at 
my design point.

What matters more to me is memory footprint.  The list append tests show 
larger memory footprint under all conditions tested.  Under realistic test 
conditions, there is little difference in memory footprint and I would 
expect that running more iterations or changing the test conditions could 
affect the rankings.

Under realistic conditions, there is not much difference in either time or 
memory footprint.

  ---------------------------------
| Dennis    | DennisR at dair.com    |
| Reinhardt | http://www.dair.com |
  ---------------------------------