python3 raw strings and \u escapes
Thomas Rachel
nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Wed May 30 07:54:52 EDT 2012
Am 30.05.2012 08:52 schrieb rurpy at yahoo.com:
> This breaks a lot of my code because in python 2
> re.split (ur'[\u3000]', u'A\u3000A') ==> [u'A', u'A']
> but in python 3 (the result of running 2to3),
> re.split (r'[\u3000]', 'A\u3000A' ) ==> ['A\u3000A']
>
> I can remove the "r" prefix from the regex string but then
> if I have other regex backslash symbols in it, I have to
> double all the other backslashes -- the very thing that
> the r-prefix was invented to avoid.
>
> Or I can leave the "r" prefix and replace something like
> r'[ \u3000]' with r'[ ]'. But that is confusing because
> one can't distinguish between the space character and
> the ideographic space character. It also a problem if a
> reader of the code doesn't have a font that can display
> the character.
>
> Was there a reason for dropping the lexical processing of
> \u escapes in strings in python3 (other than to add another
> annoyance in a long list of python3 annoyances?)
Probably it is more consequent. Alas, it makes the whole stuff
incompatible to Py2.
But if you think about it: why allow for \u if \r, \n etc. are
disallowed as well?
> And is there no choice for me but to choose between the two
> poor choices I mention above to deal with this problem?
There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read,
but should do the trick...
Thomas
More information about the Python-list
mailing list