[Baypiggies] Discussion for newbies/beginner night talks - test results
Daniel Yoo
dyoo at cs.wpi.edu
Thu Feb 15 22:44:18 CET 2007
> Appending many string fragments is done most efficiently using a dict in
> both time and memory under extreme conditions.
Hi Dennis,
I am not sure what you mean by this.
Let's make it clear: we have some concept of a template that whose "holes"
need to be filled in. There are a few ways to implement the concept of a
templated string:
* By appending string fragments together.
def mad_libs(noun, verb):
return noun + " was " + verb + "ing along when ..."
Alternatively,
def mad_libs(noun, verb):
buffer = []
buffer.append(noun)
buffer.append(" was ")
... ## you get the idea
return ''.join(buffer)
* By using string formatting to interpolate a value. In this case, we
have two options to the string formatter, to pass along the values
through a list, or through a dictionary:
def mad_lib(noun, verb):
return '%s was %sing along when' % (noun, verb)
I know what you meant to say earlier with,
> Appending many string fragments is done most efficiently using a dict in
> both time and memory under extreme conditions.
but we have to be careful here: you don't want to compare the act of doing
string interpolation to do string concatenation. Rather, you mean to say
"Filling in a template is done most efficiently using string formatting
with a dictionary..."
If you like, think of it in terms of conceptual "types": the left hand
side has to be an intention, a concept, and the right hand side is the
implementation to make that intention work.
> This discussion and Stan got me thinking. Today, I want to tackle code
> which generates a graphical directory display of icons. It is possible
> for people to have directories of thousands (maybe even 10s of
> thousands) of icons. Efficiency may matter.
>
> The convention wisdom, IIRC, is to do list appends rather than string
> appends.
In some cases, we do not necessarily need to do any appends at all, list
or string. We can generate the output on-the-fly, writing our output to a
file-like object.
This part has been a bit ignored in the discussion. Concretely:
###################################
def mad_libs(name, verb, out_file):
out_file.write(noun)
out_file.write(" was ")
... ## you get the idea
###################################
We do not have to always work with constructing strings. In fact, in a
lot of applications, we must not, or else we'll run out of memory.
> I wrote a program to test this, measuring execution time and final
> memory footprint under windows. Here is the code:
I'm cutting out the commented lines.
#############################
def holding():
l = {}
for x in range(0,100000):
l[x] = "abcdef"
#############################
Can you explain what this is trying to test? As far as I can tell, this
is testing how fast we can build dictionaries, but there's only one string
literal value involved here.
The problem is this: there's no string concatenation or string building
going on here at all, so I'm not sure how this relates to previous
discussion.
I think this might be related to the confusion earlier that dictionaries
could be directly used to concatenate strings together. That's not true:
you need to go through a string-formatting operation too.
I assume that you had several edited versions of "holding", each
corresponding to a commented line. You have to compare apples to apples.
The version working with dictionaries (as well as the one working with
lists) must make the same end result, the same content as the version that
I assume looks like:
############################
def holding():
l = ""
for x in range(0,100000):
l += "abcdef"
#############################
To make this easier, we should change the defintions so that these are
actual functions that can be compared functionally. For example:
##############################################
def holding_with_lists():
l = []
for x in range(0,100000):
l.append("abcdef")
return ''.join(l)
def holding_naive():
l = ""
for x in range(0,100000):
l += "abcdef"
return l
assert holding_with_lists() == holding_naive()
##############################################
Otherwise, without making sure these different implemenations return the
same result, we risk comparing apples to oranges. I suspect that's what's
happened here.
By the way, if we're going to publish benchmarks, we should make it easy
to replicate those benchmarks. The things we're comparing should be
independent functions that can be tested. Asking people to comment or
uncomment lines of code to do the measurements is error prone.
Good luck to you.
More information about the Baypiggies
mailing list