A tough one: split on word length?
Laurent Pointal
laurent.pointal at free.fr
Mon May 16 13:13:35 EDT 2016
DFS wrote:
> Have:
> '584323 Fri 13 May 2016 17:37:01 -0000 (UTC) 584324 Fri 13 May 2016
> 13:44:40 -0400 584325 13 May 2016 17:45:25 GMT 584326 Fri 13 May 2016
> 13:47:28 -0400'
>
> Want:
> [('584323', 'Fri 13 May 2016 17:37:01 -0000 (UTC)'),
> ('584324', 'Fri 13 May 2016 13:44:40 -0400'),
> ('584325', '13 May 2016 17:45:25 GMT'),
> ('584326', 'Fri 13 May 2016 13:47:28 -0400')]
>
>
> Or maybe split() on space, then run through and add words of 6+ numbers
> to the list, then recombine everything until you hit the next group of
> 6+ numbers, and so on?
>
> The data is guaranteed to contain those 6+ groups of numbers.
Test with regexp under Python3
>>> import re
>>> s = '584323 Fri 13 May 2016 17:37:01 -0000 (UTC) 584324 Fri 13 May 2016
13:44:40 -0400 584325 13 May 2016 17:45:25 GMT 584326 Fri 13 May 2016
13:47:28 -0400'
>>> re.split("(\d{6})(.*?)", s)
['', '584323', '', ' Fri 13 May 2016 17:37:01 -0000 (UTC) ', '584324', '', '
Fri 13 May 2016 13:44:40 -0400 ', '584325', '', ' 13 May 2016 17:45:25 GMT
', '584326', '', ' Fri 13 May 2016 13:47:28 -0400']
Dismiss empty items and strip whitespaces at begin or end of string, and
that's done.
A+
Laurent.
Note: re experts will provide a cleaner solution.
More information about the Python-list
mailing list