Regular expression bug?
Albert Hopkins
marduk at letterboxes.org
Thu Feb 19 14:26:46 EST 2009
On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
> >>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
That's how re.split works, same as str.split...
> >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
> >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
Wow!
To tell you the truth, I can't even read that... but one wonders why
don't you just do
def ccsplit(s):
cclist = []
current_word = ''
for char in s:
if char in string.uppercase:
if current_word:
cclist.append(current_word)
current_word = char
else:
current_word += char
if current_word:
ccl.append(current_word)
return cclist
>>> ccsplit('fooBarBaz')
--> ['foo', 'Bar', 'Baz']
This is arguably *much* more easy to read than the re example doesn't
require one to look ahead in the string.
-a
More information about the Python-list
mailing list