[Python-3000] Making more effective use of slice objects in Py3k

Josiah Carlson jcarlson at uci.edu
Mon Aug 28 21:49:39 CEST 2006


"Guido van Rossum" <guido at python.org> wrote:
> 
> Josiah (and other supporters of string views),
> 
> You seem to be utterly convinced of the superior performance of your
> proposal without having done any measurements.
> 
> You appear to have a rather naive view on what makes code execute fast
> or slow (e.g. you don't seem to appreciate the savings due to a string
> object header and its data being consecutive in memory).
> 
> Unless you have serious benchmark data (for realistic Python code) I
> can't continue to participate in this discussion, where you have said
> nothing new in many posts.

Put up or shut up, eh?

I have written a simple extension module using Pyrex (my manual C
extension writing is awful).  Here are some sample interactions showing
that string views are indeed quite fast.  In all of these examples, a
naive implementation using only stringview.partition() was able to beat
Python 2.5 str.partition, str.split, and re.finditer.

Attached you will find the implementation of stringview I used, along
with sufficient build scripts to get it working using Python 2.3 and
Pyrex 0.9.3 .  Aside from replacing int usage with Py_ssize_t for 2.5,
and *nix users performing a dos2unix call, it should work without change
with the most recent Python and Pyrex versions.

 - Josiah


Using 2.3 :
    >>> x = stringview(40000*' ')
    >>> if 1:
    ...     t = time.time()
    ...     while x:
    ...             _1, _2, x = x.partition(' ')
    ...     print time.time()-t
    ... 
    0.18700003624
    >>> 

Compared with Python 2.5 beta 2
    >>> x = 40000*' '
    >>> if 1:
    ...     t = time.time()
    ...     while x:
    ...             _1, _2, x = x.partition(' ')
    ...     print time.time()-t
    ...
    0.625
    >>> 

But that's about as bad for Python 2.5 as it can get.  What about
something else?  Like a mail file?  In my 21.5 meg archive of py3k,
which contains 3456 messages, I wanted to discover all messages.

Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from stringview import *
>>> rest = stringview(open('mail', 'rb').read())
>>> import time
>>> if 1:
...     x = []
...     t = time.time()
...     while rest:
...         cur, found, rest = rest.partition('\r\n.\r\n')
...         x.append(cur)
...     print time.time()-t, len(x)
...
0.0780000686646 3456
>>> 

What about Python 2.5 using split?  That should be fast...

Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
 win32
Type "help", "copyright", "credits" or "license" for more information.
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
...     t = time.time()
...     x = rest.split('\r\n.\r\n')
...     print time.time()-t, len(x)
...
0.109999895096 3457
>>> 

Hrm...what about using re?
>>> import re
>>> pat = re.compile('\r\n\.\r\n')
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
...     x = []
...     t = time.time()
...     for i in pat.finditer(rest):
...         x.append(i)
...     print time.time()-t, len(x)
...
0.125 3456
>>>

Even that's not as good as Python 2.3 + string views.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_build.py
Type: application/octet-stream
Size: 654 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview.pyx
Type: application/octet-stream
Size: 2639 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0001.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_helper.h
Type: application/octet-stream
Size: 1656 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: _setup.py
Type: application/octet-stream
Size: 255 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0003.obj 


More information about the Python-3000 mailing list