[Python-3000] Regular expressions, py3k and unicode
Mark Dickinson
dickinsm at gmail.com
Sun Jun 29 13:05:27 CEST 2008
On Sat, Jun 28, 2008 at 9:45 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Wouldn't it be more natural that, at least when the pattern is a str object
> rather a bytes object, the re.UNICODE be implied by default?
Might this have some unintended consequences? For example, one
would then get the following undesirable behaviour from the decimal
module, using inputs with Unicode fullwidth digits.
>>> Decimal('\uff11')
Decimal('1')
>>> Decimal('\uff11') == Decimal('1')
False
There are plenty of easy fixes for this, of course, but I don't know
how many other modules might be similarly affected.
In any case, it seems to me that having something like re.ASCII
would be useful.
Mark
More information about the Python-3000
mailing list