[Patches] [ python-Patches-681780 ] Faster commonprefix (OS independent)

SourceForge.net noreply@sourceforge.net
Tue, 04 Mar 2003 00:01:56 -0800


Patches item #681780, was opened at 2003-02-06 18:03
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Christos Georgiou (tzot)
Assigned to: Nobody/Anonymous (nobody)
Summary: Faster commonprefix (OS independent)

Initial Comment:
This routine is about 20% faster on a test set of 7 sets 
of strings run 100000 times each (I can provide the test 
if requested).  The longer the common prefix is, the 
faster the routine becomes relative to original 
commonprefix.

My only worry is that it might get rejected if it is 
considered too fancy; therefore I wasn't shy on 
commenting.

I think we should also write a commonpathprefix, that 
will do what commonprefix should do, being in the 
*path.py module.  I'll do that if none other does.

The provided patch is for posixpath.py and ntpath.py, 
but since it's OS neutral it should work as is.  It uses 
itertools for speed, though, so it is not backportable, but 
it can be if requested by substituting map for imap and a 
normal slice for islice.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2003-03-04 09:01

Message:
Logged In: YES 
user_id=498191

I would suggest another possibility. This one use a property
of strings 
ordering: if you have a<=b<=c and c.startswith(a) then
b.startswith(a).

I have tested two implementations :

# a 5 lines function with a really straightforward  code.
# It can degenerate rather badly in the worst case (large
strings 
# with a short common prefix) but is generally quite fast.
def commonprefix1(m):
    if not m: return ''
    prefix, greater = min(m), max(m)
    while not greater.startswith(prefix):
	prefix = prefix[:-1]
    return prefix

# The second use a bissection to avoid the worst case. This make
# the implementation a little more complex but seems to
provide the
# fastest result.
def commonprefix2(m):
    prefix = ''
    if m:
	low, high = min(m), max(m)
	while low:
	    n = len(low)//2 + 1
	    l, h = low[:n], high[:n]
	    if h==l:
		prefix += l
		low, high = low[n:], high[n:]
	    else:
		low, high = l[:-1], h
    return prefix
      
I personally prefer the commonprefix1 implementation: its
the simplest one and it is probably fast enough for the few
commonprefix use-cases (anyway, it is still faster than the
current implementation).      

----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-02-07 12:11

Message:
Logged In: YES 
user_id=539787

I did my homework better, and found out that the buffer object 
quite probably will be deprecated.  So I rewrote the routine 
without the buffer object (using str.startswith), which by the 
way got another 10% speedup (relative to the latest version 
using buffer.)
The commonprefix_nobuf.diff patch applies directly to the 
original posixpath.py, ntpath.py.  I will try to delete the other 
patches, but I don't think I am allowed to do it.

----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-02-06 19:02

Message:
Logged In: YES 
user_id=539787

Best case: comparing this to the old version with a list: 
['/usr/local/lib/python2.3/posixpath.py']*120, 10000 iterations, 
the speed difference is:
old: 319.58 sec
new: 34.43 sec

Since prefix_len always grows in the "while next_bit:" loop, 
applying commonprefix2.diff to the *patched* version does a 
very minor speedup (comparing smaller buffers in every 
iteration); but it is only a matter of overoptimisation (ie it does 
not hurt, but it's a trivial one, just 0.1%).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-02-06 18:25

Message:
Logged In: YES 
user_id=33168

As much as I'd like to blame IE, it's a SF bug AFAIK.  
http://sf.net/tracker/?func=detail&atid=200001&aid=675910&group_id=1


----------------------------------------------------------------------

Comment By: Christos Georgiou (tzot)
Date: 2003-02-06 18:04

Message:
Logged In: YES 
user_id=539787

For some reason, my IE never uploads the file on the first 
attempt.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470