[Patches] [ python-Patches-681780 ] Faster commonprefix (OS independent)
SourceForge.net
noreply@sourceforge.net
Tue, 04 Mar 2003 00:01:56 -0800
Patches item #681780, was opened at 2003-02-06 18:03
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470
Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Christos Georgiou (tzot)
Assigned to: Nobody/Anonymous (nobody)
Summary: Faster commonprefix (OS independent)
Initial Comment:
This routine is about 20% faster on a test set of 7 sets
of strings run 100000 times each (I can provide the test
if requested). The longer the common prefix is, the
faster the routine becomes relative to original
commonprefix.
My only worry is that it might get rejected if it is
considered too fancy; therefore I wasn't shy on
commenting.
I think we should also write a commonpathprefix, that
will do what commonprefix should do, being in the
*path.py module. I'll do that if none other does.
The provided patch is for posixpath.py and ntpath.py,
but since it's OS neutral it should work as is. It uses
itertools for speed, though, so it is not backportable, but
it can be if requested by substituting map for imap and a
normal slice for islice.
----------------------------------------------------------------------
Comment By: Sebastien Keim (s_keim)
Date: 2003-03-04 09:01
Message:
Logged In: YES
user_id=498191
I would suggest another possibility. This one use a property
of strings
ordering: if you have a<=b<=c and c.startswith(a) then
b.startswith(a).
I have tested two implementations :
# a 5 lines function with a really straightforward code.
# It can degenerate rather badly in the worst case (large
strings
# with a short common prefix) but is generally quite fast.
def commonprefix1(m):
if not m: return ''
prefix, greater = min(m), max(m)
while not greater.startswith(prefix):
prefix = prefix[:-1]
return prefix
# The second use a bissection to avoid the worst case. This make
# the implementation a little more complex but seems to
provide the
# fastest result.
def commonprefix2(m):
prefix = ''
if m:
low, high = min(m), max(m)
while low:
n = len(low)//2 + 1
l, h = low[:n], high[:n]
if h==l:
prefix += l
low, high = low[n:], high[n:]
else:
low, high = l[:-1], h
return prefix
I personally prefer the commonprefix1 implementation: its
the simplest one and it is probably fast enough for the few
commonprefix use-cases (anyway, it is still faster than the
current implementation).
----------------------------------------------------------------------
Comment By: Christos Georgiou (tzot)
Date: 2003-02-07 12:11
Message:
Logged In: YES
user_id=539787
I did my homework better, and found out that the buffer object
quite probably will be deprecated. So I rewrote the routine
without the buffer object (using str.startswith), which by the
way got another 10% speedup (relative to the latest version
using buffer.)
The commonprefix_nobuf.diff patch applies directly to the
original posixpath.py, ntpath.py. I will try to delete the other
patches, but I don't think I am allowed to do it.
----------------------------------------------------------------------
Comment By: Christos Georgiou (tzot)
Date: 2003-02-06 19:02
Message:
Logged In: YES
user_id=539787
Best case: comparing this to the old version with a list:
['/usr/local/lib/python2.3/posixpath.py']*120, 10000 iterations,
the speed difference is:
old: 319.58 sec
new: 34.43 sec
Since prefix_len always grows in the "while next_bit:" loop,
applying commonprefix2.diff to the *patched* version does a
very minor speedup (comparing smaller buffers in every
iteration); but it is only a matter of overoptimisation (ie it does
not hurt, but it's a trivial one, just 0.1%).
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2003-02-06 18:25
Message:
Logged In: YES
user_id=33168
As much as I'd like to blame IE, it's a SF bug AFAIK.
http://sf.net/tracker/?func=detail&atid=200001&aid=675910&group_id=1
----------------------------------------------------------------------
Comment By: Christos Georgiou (tzot)
Date: 2003-02-06 18:04
Message:
Logged In: YES
user_id=539787
For some reason, my IE never uploads the file on the first
attempt.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470