Regular expression bug?
marduk at letterboxes.org
Thu Feb 19 20:26:46 CET 2009
On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
> >>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
That's how re.split works, same as str.split...
> >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> However, it does seem to work with findall:
> >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
To tell you the truth, I can't even read that... but one wonders why
don't you just do
cclist = 
current_word = ''
for char in s:
if char in string.uppercase:
current_word = char
current_word += char
--> ['foo', 'Bar', 'Baz']
This is arguably *much* more easy to read than the re example doesn't
require one to look ahead in the string.
More information about the Python-list