[Python-3000] Making more effective use of slice objects in Py3k
Josiah Carlson
jcarlson at uci.edu
Mon Aug 28 21:49:39 CEST 2006
"Guido van Rossum" <guido at python.org> wrote:
>
> Josiah (and other supporters of string views),
>
> You seem to be utterly convinced of the superior performance of your
> proposal without having done any measurements.
>
> You appear to have a rather naive view on what makes code execute fast
> or slow (e.g. you don't seem to appreciate the savings due to a string
> object header and its data being consecutive in memory).
>
> Unless you have serious benchmark data (for realistic Python code) I
> can't continue to participate in this discussion, where you have said
> nothing new in many posts.
Put up or shut up, eh?
I have written a simple extension module using Pyrex (my manual C
extension writing is awful). Here are some sample interactions showing
that string views are indeed quite fast. In all of these examples, a
naive implementation using only stringview.partition() was able to beat
Python 2.5 str.partition, str.split, and re.finditer.
Attached you will find the implementation of stringview I used, along
with sufficient build scripts to get it working using Python 2.3 and
Pyrex 0.9.3 . Aside from replacing int usage with Py_ssize_t for 2.5,
and *nix users performing a dos2unix call, it should work without change
with the most recent Python and Pyrex versions.
- Josiah
Using 2.3 :
>>> x = stringview(40000*' ')
>>> if 1:
... t = time.time()
... while x:
... _1, _2, x = x.partition(' ')
... print time.time()-t
...
0.18700003624
>>>
Compared with Python 2.5 beta 2
>>> x = 40000*' '
>>> if 1:
... t = time.time()
... while x:
... _1, _2, x = x.partition(' ')
... print time.time()-t
...
0.625
>>>
But that's about as bad for Python 2.5 as it can get. What about
something else? Like a mail file? In my 21.5 meg archive of py3k,
which contains 3456 messages, I wanted to discover all messages.
Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from stringview import *
>>> rest = stringview(open('mail', 'rb').read())
>>> import time
>>> if 1:
... x = []
... t = time.time()
... while rest:
... cur, found, rest = rest.partition('\r\n.\r\n')
... x.append(cur)
... print time.time()-t, len(x)
...
0.0780000686646 3456
>>>
What about Python 2.5 using split? That should be fast...
Python 2.5b2 (r25b2:50512, Jul 11 2006, 10:16:14) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
... t = time.time()
... x = rest.split('\r\n.\r\n')
... print time.time()-t, len(x)
...
0.109999895096 3457
>>>
Hrm...what about using re?
>>> import re
>>> pat = re.compile('\r\n\.\r\n')
>>> rest = open('mail', 'rb').read()
>>> import time
>>> if 1:
... x = []
... t = time.time()
... for i in pat.finditer(rest):
... x.append(i)
... print time.time()-t, len(x)
...
0.125 3456
>>>
Even that's not as good as Python 2.3 + string views.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_build.py
Type: application/octet-stream
Size: 654 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview.pyx
Type: application/octet-stream
Size: 2639 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringview_helper.h
Type: application/octet-stream
Size: 1656 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: _setup.py
Type: application/octet-stream
Size: 255 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060828/916e8238/attachment-0003.obj
More information about the Python-3000
mailing list