Match beginning of two strings
Bengt Richter
bokr at oz.net
Mon Aug 4 15:27:25 EDT 2003
On Mon, 04 Aug 2003 11:56:04 GMT, Alex Martelli <aleax at aleax.it> wrote:
>Ravi wrote:
>
>> Hi,
>>
>> I have about 200GB of data that I need to go through and extract the
>> common first part of a line. Something like this.
>>
>> >>>a = "abcdefghijklmnopqrstuvwxyz"
>> >>>b = "abcdefghijklmnopBHLHT"
>> >>>c = extract(a,b)
>> >>>print c
>> "abcdefghijklmnop"
>>
>> Here I want to extract the common string "abcdefghijklmnop". Basically I
>> need a fast way to do that for any two given strings. For my situation,
>> the common string will always be at the beginning of both strings. I can
>
>Here's my latest study on this:
>
>*** pexa.py:
>
[...]
JFTHOI, if you have the inclination, I'm curious how this slightly
different 2.3-dependent version would fare in your harness on your
system with the rest:
def commonprefix(s1, s2): # very little tested!
try:
for i, c in enumerate(s1):
if c != s2[i]: return s1[:i]
except IndexError:
return s1[:i]
return s1
[...]
>
>and my measurements give me:
>
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa' \
>> 'pexa.extract("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>100000 loops, best of 3: 2.39 usec per loop
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>100000 loops, best of 3: 2.14 usec per loop
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract2("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>10000 loops, best of 3: 30.2 usec per loop
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract3("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>100000 loops, best of 3: 9.59 usec per loop
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_pyrex("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>10000 loops, best of 3: 21.8 usec per loop
>[alex at lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_c("abcdefghijklmonpKOU", "abcdefghijklmonpZE")'
>100000 loops, best of 3: 1.88 usec per loop
>[alex at lancelot exi]$
>
Interesting, but I think I will have to write a filter so I can
see a little more easily what your timeit.py outputs say ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list