Splitting a string

Thomas Heller theller at ctypes.org
Fri Apr 2 16:06:52 EDT 2010


Patrick Maupin schrieb:
> On Apr 2, 6:24 am, Peter Otten <__pete... at web.de> wrote:
>> Thomas Heller wrote:
>> > Maybe I'm just lazy, but what is the fastest way to convert a string
>> > into a tuple containing character sequences and integer numbers, like
>> > this:
>>
>> > 'si_pos_99_rep_1_0.ita'  -> ('si_pos_', 99, '_rep_', 1, '_', 0, '.ita')
>> >>> parts = re.compile("([+-]?\d+)").split('si_pos_99_rep_1_0.ita')
>> >>> parts[1::2] = map(int, parts[1::2])
>> >>> parts
>>
>> ['si_pos_', 99, '_rep_', 1, '_', 0, '.ita']
>>
>> Peter
> 
> You beat me to it.  re.split() seems underappreciated for some
> reason.  When I first started using it (even though it was faster for
> the tasks I was using it for than other things) I was really annoyed
> at the empty strings it was providing between matches.  It is only
> within the past couple of years that I have come to appreciate the
> elegant solutions that those empty strings allow for.  In short,
> re.split() is by far the fastest and most elegant way to use the re
> module for a large class of problems.
> 
> So, the only thing I have to add to this solution is that, for this
> particular regular expression, if the source string starts with or
> ends with digits, you will get empty strings at the beginning or end
> of the resultant list, so if this is a problem, you will want to check
> for and discard those.

Thanks to all for these code snippets.  Peter's solution is the winner -
most elegant and also the fastest.  With an additional list comprehension
to remove the possible empty strings at the start and at the end I get
16 us.  Interesting is that Xavier's solution (which is similar to
some code that I wrote myself) isn't so much slower; it get timings of
around 22 us.

-- 
Thanks,
Thomas




More information about the Python-list mailing list