Split a string by length

David MacQuigg dmq at gain.com
Thu Mar 25 10:48:02 EST 2004


On Thu, 25 Mar 2004 23:14:17 +1200, David McNab
<david at rebirthing.co.nz> wrote:

>Yermat wrote:
>>>> How can I do to simulate this way:
>>>> 'aabbcc'.split(2) -> ['aa', 'bb', 'cc']
>>>> I tried with a 'slice' but this didn't run:
>>>> [item for item in 'aabbcc'[::2]] -> ['a', 'b', 'c']
>>> [...]
>>  >>> import re
>>  >>> re.findall(".{2}","aabbccdd")
>> ['aa', 'bb', 'cc', 'dd']
>>  >>> re.findall(".{2}","aabbccdde")
>> ['aa', 'bb', 'cc', 'dd']
>
>Now, who's gonna benchmark these n different approaches?

I'll bite.

>>> import timeit
>>> def tm( setup, snip): timeit.main(['-s', setup, snip ])

>>> # Use regular expression - Yermat
>>> setup = 'import re'
>>> snip = 're.findall(".{2}","aabbccdd")'
>>> tm( setup, snip )
100000 loops, best of 3: 6.5 usec per loop

>>> # List comprehension - works on non-strings also - Eyal Lotem
>>> setup = '''   
>>> def divide(seq, size):
>>>     return [seq[i:i+size]  for i in xrange(0, len(seq), size)]
>>> '''
>>> snip = "divide('aabbcc', 2)"
>>> tm( setup, snip )
100000 loops, best of 3: 4.53 usec per loop

>>> # Move re.compile outside the loop - David MacQuigg
>>> setup = '''
>>> import re
>>> p = re.compile(".{2}")
>>> '''
>>> snip = 'p.findall("aabbccdd")'
>>> tm( setup, snip )
100000 loops, best of 3: 3.52 usec per loop

Note: If you use timeit.py from a command line, you have to be careful
to quote the entire setup sequence.  Otherwise, only the first quoted
statement is used in the setup.  All remaining statements are included
in the test loop.  Compare the results below to the last result above.

timeit.py "-s" "import re" "f = re.compile('.{2}')"
"f.findall('aabbccddee')"
100000 loops, best of 3: 6.43 usec per loop

This is equivalent to:

>>> setup = 'import re'
>>> snip = '''
>>> p = re.compile(".{2}")
>>> p.findall("aabbccdd")
>>> '''
>>> tm( setup, snip )
100000 loops, best of 3: 6.45 usec per loop

-- Dave




More information about the Python-list mailing list