[Baypiggies] Discussion for newbies/beginner night talks - test results

Daniel Yoo dyoo at cs.wpi.edu
Thu Feb 15 22:44:18 CET 2007


> Appending many string fragments is done most efficiently using a dict in 
> both time and memory under extreme conditions.

Hi Dennis,

I am not sure what you mean by this.


Let's make it clear: we have some concept of a template that whose "holes" 
need to be filled in.  There are a few ways to implement the concept of a 
templated string:


* By appending string fragments together.

     def mad_libs(noun, verb):
         return noun + " was " + verb + "ing along when ..."

   Alternatively,

     def mad_libs(noun, verb):
         buffer = []
         buffer.append(noun)
         buffer.append(" was ")
         ... ## you get the idea
         return ''.join(buffer)


* By using string formatting to interpolate a value.  In this case, we
   have two options to the string formatter, to pass along the values
   through a list, or through a dictionary:

     def mad_lib(noun, verb):
         return '%s was %sing along when' % (noun, verb)



I know what you meant to say earlier with,

> Appending many string fragments is done most efficiently using a dict in 
> both time and memory under extreme conditions.

but we have to be careful here: you don't want to compare the act of doing 
string interpolation to do string concatenation.  Rather, you mean to say 
"Filling in a template is done most efficiently using string formatting 
with a dictionary..."

If you like, think of it in terms of conceptual "types": the left hand 
side has to be an intention, a concept, and the right hand side is the 
implementation to make that intention work.



> This discussion and Stan got me thinking.  Today, I want to tackle code 
> which generates a graphical directory display of icons.  It is possible 
> for people to have directories of thousands (maybe even 10s of 
> thousands) of icons.  Efficiency may matter.
>
> The convention wisdom, IIRC, is to do list appends rather than string 
> appends.

In some cases, we do not necessarily need to do any appends at all, list 
or string.  We can generate the output on-the-fly, writing our output to a 
file-like object.

This part has been a bit ignored in the discussion.  Concretely:

###################################
def mad_libs(name, verb, out_file):
     out_file.write(noun)
     out_file.write(" was ")
     ... ## you get the idea
###################################

We do not have to always work with constructing strings.  In fact, in a 
lot of applications, we must not, or else we'll run out of memory.



> I wrote a program to test this, measuring execution time and final 
> memory footprint under windows.  Here is the code:

I'm cutting out the commented lines.

#############################
def holding():
     l = {}
     for x in range(0,100000):
         l[x] = "abcdef"
#############################

Can you explain what this is trying to test?  As far as I can tell, this 
is testing how fast we can build dictionaries, but there's only one string 
literal value involved here.

The problem is this: there's no string concatenation or string building 
going on here at all, so I'm not sure how this relates to previous 
discussion.


I think this might be related to the confusion earlier that dictionaries 
could be directly used to concatenate strings together.  That's not true: 
you need to go through a string-formatting operation too.


I assume that you had several edited versions of "holding", each 
corresponding to a commented line.  You have to compare apples to apples. 
The version working with dictionaries (as well as the one working with 
lists) must make the same end result, the same content as the version that 
I assume looks like:

############################
def holding():
     l = ""
     for x in range(0,100000):
         l += "abcdef"
#############################



To make this easier, we should change the defintions so that these are 
actual functions that can be compared functionally.  For example:

##############################################
def holding_with_lists():
     l = []
     for x in range(0,100000):
         l.append("abcdef")
     return ''.join(l)

def holding_naive():
     l = ""
     for x in range(0,100000):
         l += "abcdef"
     return l

assert holding_with_lists() == holding_naive()
##############################################

Otherwise, without making sure these different implemenations return the 
same result, we risk comparing apples to oranges.  I suspect that's what's 
happened here.


By the way, if we're going to publish benchmarks, we should make it easy 
to replicate those benchmarks.  The things we're comparing should be 
independent functions that can be tested.  Asking people to comment or 
uncomment lines of code to do the measurements is error prone.


Good luck to you.


More information about the Baypiggies mailing list