[Python-3000] characters data type

Josiah Carlson jcarlson at uci.edu
Wed May 3 18:45:00 CEST 2006


"Guido van Rossum" <guido at python.org> wrote:
> On 5/2/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > This is one of the reasons why I'm pushing for some string methods on
> > the bytes object.  Even if bytes resize themselves quickly during
> > 'extension', a single allocation with a single pass copy will be far
> > faster.  It probably won't be quite as convenient as "".join() (if there
> > isn't a literal), but keeping the .join method seems to be a winner (if
> > only because it saves people from having to learn a different method for
> > unicode and bytes).
> 
> I wonder if that's really true. After all you still pay the overhead
> for the list. In fact, here's a challenge for you: implement += on
> bytes to be as fast as the list append + later join; or prove that it
> can't be done.

I don't believe it can be done.  See my sample at the end of this
message.  Note that removing the string[:] copy in the list.append
version only reduces the running time by about .07 seconds.


> Regarding your writing vs. my reading speed: (a) I hope you know the
> quote about "I apologize for this long letter but I don't have the
> time to make it shorter"; (b) I was referring to the discussion
> between you and MvL; that was definitely going too fast for anyone
> else to read it all. It really isn't necessary to do a point-by-point
> reply of everything the other person said. (And I need to heed this
> advice myself too!)

Indeed, I don't need to respond to every part in the message.  However,
not responding to a valid concern/criticism seems to me like a
head-in-the-sand approach to disagreements and discussions, which
certainly doesn't help anyone.


 - Josiah

>>> import time
>>> import array
>>>
>>> block = 1024*'\0'
>>> block2 = array.array("B", 1024*[0])
>>> desired_size = 16*1024*1024
>>>
>>> t = time.clock()
>>> for i in xrange(100):
...     l = []
...     for j in xrange(0, desired_size, len(block)):
...         l.append(block[:])
...     l = ''.join(l)
...
>>> print time.clock()-t
5.13626178421
>>>
>>> t = time.clock()
>>> for i in xrange(100):
...     x = block
...     for i in xrange(14):
...         x += x
...
>>> print time.clock()-t
7.16065713017
>>>
>>> z = time.clock()
>>> for i in xrange(100):
...     x = array.array("B", block2)
...
>>> z = time.clock()-z
>>>
>>> t = time.clock()
>>> for i in xrange(100):
...     x = array.array("B", block2)
...     for i in xrange(14):
...         x.extend(x)
...
>>> print time.clock()-t-z
7.28453746894
>>>



More information about the Python-3000 mailing list