Regular expression bug?

andrew cooke andrew at acooke.org
Thu Feb 19 20:51:06 CET 2009


i wonder what fraction of people posting with "bug?" in their titles here
actually find bugs?

anyway, how about:

re.findall('[A-Z]?[a-z]*', 'fooBarBaz')

or

re.findall('([A-Z][a-z]*|[a-z]+)', 'fooBarBaz')

(you have to specify what you're matching and lookahead/back doesn't do
that).

andrew


Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
>>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters.  To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
>
>>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
>>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
>
> So the regular expression seems to be doing the Right Thing.  Is this a
> bug in re.split, or am I missing something?
>
> (BTW, I tried looking at the source code for the re module, but I could
> not find the relevant code.  re.split calls sre_compile.compile().split,
> but the string 'split' does not appear in sre_compile.py.  So where does
> this method come from?)
>
> I'm using Python2.5.
>
> Thanks,
> rg
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>





More information about the Python-list mailing list