[Python-checkins] [3.9] gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665) (GH-91830) (GH-91834)

serhiy-storchaka webhook-mailer at python.org
Fri Apr 22 14:35:13 EDT 2022


https://github.com/python/cpython/commit/97d14e1dfb9347bc8ef581055b2f70cd03e5f622
commit: 97d14e1dfb9347bc8ef581055b2f70cd03e5f622
branch: 3.9
author: Miss Islington (bot) <31488909+miss-islington at users.noreply.github.com>
committer: serhiy-storchaka <storchaka at gmail.com>
date: 2022-04-22T21:34:31+03:00
summary:

[3.9] gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665) (GH-91830) (GH-91834)

re.error is now raised instead of TypeError.
(cherry picked from commit 6ccfa31421393910b52936e0447625db06f2a655)
(cherry picked from commit 9c18d783c38fca57a63b61aa778d8a8d18945d95)

Co-authored-by: Serhiy Storchaka <storchaka at gmail.com>

files:
A Misc/NEWS.d/next/Library/2022-04-18-16-31-33.gh-issue-90568.9kiU7o.rst
M Lib/sre_parse.py
M Lib/test/test_re.py

diff --git a/Lib/sre_parse.py b/Lib/sre_parse.py
index 53706676e9f7b..d3ff196032b30 100644
--- a/Lib/sre_parse.py
+++ b/Lib/sre_parse.py
@@ -330,7 +330,7 @@ def _class_escape(source, escape):
             charname = source.getuntil('}', 'character name')
             try:
                 c = ord(unicodedata.lookup(charname))
-            except KeyError:
+            except (KeyError, TypeError):
                 raise source.error("undefined character name %r" % charname,
                                    len(charname) + len(r'\N{}'))
             return LITERAL, c
@@ -390,7 +390,7 @@ def _escape(source, escape, state):
             charname = source.getuntil('}', 'character name')
             try:
                 c = ord(unicodedata.lookup(charname))
-            except KeyError:
+            except (KeyError, TypeError):
                 raise source.error("undefined character name %r" % charname,
                                    len(charname) + len(r'\N{}'))
             return LITERAL, c
diff --git a/Lib/test/test_re.py b/Lib/test/test_re.py
index 56e98b7aedce7..007064093c4d1 100644
--- a/Lib/test/test_re.py
+++ b/Lib/test/test_re.py
@@ -753,6 +753,10 @@ def test_named_unicode_escapes(self):
                                "undefined character name 'SPAM'", 0)
         self.checkPatternError(r'[\N{SPAM}]',
                                "undefined character name 'SPAM'", 1)
+        self.checkPatternError(r'\N{KEYCAP NUMBER SIGN}',
+                            "undefined character name 'KEYCAP NUMBER SIGN'", 0)
+        self.checkPatternError(r'[\N{KEYCAP NUMBER SIGN}]',
+                            "undefined character name 'KEYCAP NUMBER SIGN'", 1)
         self.checkPatternError(br'\N{LESS-THAN SIGN}', r'bad escape \N', 0)
         self.checkPatternError(br'[\N{LESS-THAN SIGN}]', r'bad escape \N', 1)
 
diff --git a/Misc/NEWS.d/next/Library/2022-04-18-16-31-33.gh-issue-90568.9kiU7o.rst b/Misc/NEWS.d/next/Library/2022-04-18-16-31-33.gh-issue-90568.9kiU7o.rst
new file mode 100644
index 0000000000000..4411c715830e2
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2022-04-18-16-31-33.gh-issue-90568.9kiU7o.rst
@@ -0,0 +1,3 @@
+Parsing ``\N`` escapes of Unicode Named Character Sequences in a
+:mod:`regular expression <re>` raises now :exc:`re.error` instead of
+``TypeError``.



More information about the Python-checkins mailing list