string split without consumption
robert
no-spam at not-existing.invalid
Sat Feb 2 11:16:22 EST 2008
Tim Chase wrote:
>>>> this didn't work elegantly as expected:
>>>>
>>>> >>> ss
>>>> 'owi\nweoifj\nfheu\n'
>>>> >>> re.split(r'(?m)$',ss)
>>>> ['owi\nweoifj\nfheu\n']
>>> Do you have a need to use a regexp?
>> I'd like the general case - split without consumption.
>
> I'm not sure there's a one-pass regex solution to the problem
> using Python's regex engine. If pre-processing was allowed, one
> could do it.
>
I only found it partly with inverse logic - findall:
>>> re.findall(r'(?s).*?(?:\n|$)','owi\nweoifj\nfheu\nxx')
['owi\n', 'weoifj\n', 'fheu\n', 'xx', '']
>>> re.findall(r'(?s).*?(?:\n|$)','owi\nweoifj\nfheu\n')
['owi\n', 'weoifj\n', 'fheu\n', '']
>>>
but its also wrong regarding partial last lines.
re.split obviously doesn't understand \A \Z ^ $ and also \b etc.
empty matches.
>>> re.split(r'\b(?=\n)','owi\nweoifj\nfheu\n\nxx')
['owi\nweoifj\nfheu\n\nxx']
>>>>>> ss.splitlines(True)
>>> ['owi\n', 'weoifj\n', 'fheu\n']
>>>
>> thanks. Yet this does not work "naturally" consistent in my line
>> processing algorithm - the further buffering. Compare e.g.
>> ss.split('\n') ..
>
> well, one can do
>
> >>> [line + '\n' for line in ss.splitlines()]
> ['owi\n', 'eoifj\n', 'heu\n']
> >>> [line + '\n' for line in (ss+'xxx').splitlines()]
> ['owi\n', 'eoifj\n', 'heu\n', 'xxx\n']
>
> as another try for your edge case. It's understandable and
> natural-looking
>
nice for some display purposes, but "wrong" regarding a general
logic. The 'xxx' is not a complete line in the general case. Its
and (open) part and should appear so.
Robert
More information about the Python-list
mailing list