[XML-SIG] HTML<->UTF-8 'codec'?

Thomas B. Passin tpassin@home.com
Sat, 20 Oct 2001 11:20:06 -0400


[Rich Salz]

> I thought StringIO was also a win.
>
It is, as long as it is cStringIO.  I made some tests once comparing
list.append()/string.join with cStringIO for this business of adding to a
string character by character.  My post is dated August 23,2000 - it should
be in the archives.  Here are the results - Method 1 was str=str+char,
method 2 was list.append() + string.join(list), and method 3 used cStringIO:

"The results are dramatic.  Method 1) is as good as or better than anything
until the string length exceeds about 1000 bytes.  Then Method 1 starts
slowing down.  Above about 4000 bytes, it's really getting ssslllooowww.
Here is a table of the results on my system - 450 MHz PIII running Win98,
Python 1.5.2.

      Rate of generating output string, char/sec
length of input    Method 1    Method 2    Method 3
    50-1000            3.3e5        1.8e5            2.3e5
    1200                3.2e5        1.8e5            2.6e5
    1500                1.2e5        1.8e5            2.5e5
    2000                1.2e5        2.7e5            2.6e5
    4000                6.1e4        1.8e5            2.6e5
    8000                3.6e4        1.9e5            2.5e5
    15000               1.7e4        1.4e5            2.5e5
    30,000               8200        1.8e5            2.7e5
    40,000               6600        1.8e5            2.4e5
    60,000               4500        2.1e5            2.2e5
    100,000             ---            1.8e5            2.4e5
    200,000             ---            1.8e5            2.4e5

These figures include some averaging.  The few numbers that are a little
different - like Method 2 at 60,000 char - probably don't mean anything.
Oh, yes, plain StringIO was definitely slower that cStringIO, as you might
think - I dont's have any figures, though."

So cStringIO is faster than list.append(), but it's not a giant difference.
The nice thing is the constant behavior vs string size for methods 2 and 3.
I suppose the details would be different for Python 2.1, but I doubt that
the overall picture is much different.

It was a post by Bjorn Pettersen on the speed of StringIO that got me
started trying this out.

Cheers,

Tom P

> > as the string gets larger the speedup of using a list can be dramatic,
even
> > more than an order of magnitude.  Anything more than a few KB for the
final
> > string would probably benefit from the list approach - especially when
you
> > add to the string a character at a time.
>
> --