[New-bugs-announce] [issue46410] TypeError when parsing regexp with unicode named character sequence escape

Mon Jan 17 07:31:30 EST 2022

New submission from Jirka Marsik <jiri.marsik at oracle.com>:

re.compile(r"\N{name of Unicode Named Character Sequence}"), e.g. re.compile(r"\N{KEYCAP NUMBER SIGN}"), throws a TypeError. The regular expression parser relies on 'unicodedata' to lookup character names. The 'unicodedata' module recently added support for Unicode Named Character Sequences (https://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt). Trying to use these named character sequences in a regular expression leads to a 'TypeError', as the regexp parser tries to call 'ord' on a string with length > 1.

----------
components: Regular Expressions
messages: 410770
nosy: ezio.melotti, jirkamarsik, mrabarnett
priority: normal
severity: normal
status: open
title: TypeError when parsing regexp with unicode named character sequence escape
type: behavior
versions: Python 3.10

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue46410>
_______________________________________