Python-checkins
Threads by month
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
October 2025
- 1 participants
- 884 discussions
https://github.com/python/cpython/commit/0bbaf5de9744ae1acea3e2c9ad2257d1cc…
commit: 0bbaf5de9744ae1acea3e2c9ad2257d1cc68e847
branch: 3.9
author: Łukasz Langa <lukasz(a)langa.pl>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T19:40:52+01:00
summary:
Python 3.9.25
files:
A Misc/NEWS.d/3.9.25.rst
D Misc/NEWS.d/next/Core and Builtins/2024-06-10-10-42-48.gh-issue-120298.napREA.rst
D Misc/NEWS.d/next/Core and Builtins/2024-06-13-12-17-52.gh-issue-120384.w1UBGl.rst
D Misc/NEWS.d/next/Library/2021-08-03-05-31-00.bpo-44817.wOW_Qn.rst
D Misc/NEWS.d/next/Library/2022-10-29-03-40-18.gh-issue-98793.WSPB4A.rst
D Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
D Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
D Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
M Include/patchlevel.h
M README.rst
diff --git a/Include/patchlevel.h b/Include/patchlevel.h
index 7781d864119ae8..ee45252aa01a94 100644
--- a/Include/patchlevel.h
+++ b/Include/patchlevel.h
@@ -18,12 +18,12 @@
/*--start constants--*/
#define PY_MAJOR_VERSION 3
#define PY_MINOR_VERSION 9
-#define PY_MICRO_VERSION 24
+#define PY_MICRO_VERSION 25
#define PY_RELEASE_LEVEL PY_RELEASE_LEVEL_FINAL
#define PY_RELEASE_SERIAL 0
/* Version as a string */
-#define PY_VERSION "3.9.24+"
+#define PY_VERSION "3.9.25"
/*--end constants--*/
/* Version as a single 4-byte hex number, e.g. 0x010502B2 == 1.5.2b2.
diff --git a/Misc/NEWS.d/3.9.25.rst b/Misc/NEWS.d/3.9.25.rst
new file mode 100644
index 00000000000000..466ee4f07a2aed
--- /dev/null
+++ b/Misc/NEWS.d/3.9.25.rst
@@ -0,0 +1,70 @@
+.. date: 2025-08-15-23-08-44
+.. gh-issue: 137836
+.. nonce: b55rhh
+.. release date: 2025-10-31
+.. section: Security
+
+Add support of the "plaintext" element, RAWTEXT elements "xmp", "iframe",
+"noembed" and "noframes", and optionally RAWTEXT element "noscript" in
+:class:`html.parser.HTMLParser`.
+
+..
+
+.. date: 2025-06-28-13-23-53
+.. gh-issue: 136063
+.. nonce: aGk0Jv
+.. section: Security
+
+:mod:`email.message`: ensure linear complexity for legacy HTTP parameters
+parsing. Patch by Bénédikt Tran.
+
+..
+
+.. date: 2025-05-30-22-33-27
+.. gh-issue: 136065
+.. nonce: bu337o
+.. section: Security
+
+Fix quadratic complexity in :func:`os.path.expandvars`.
+
+..
+
+.. date: 2022-10-29-03-40-18
+.. gh-issue: 98793
+.. nonce: WSPB4A
+.. section: Library
+
+Fix argument typechecks in :func:`!_overlapped.WSAConnect` and
+:func:`!_overlapped.Overlapped.WSASendTo` functions.
+
+..
+
+.. bpo: 44817
+.. date: 2021-08-03-05-31-00
+.. nonce: wOW_Qn
+.. section: Library
+
+Ignore WinError 53 (ERROR_BAD_NETPATH), 65 (ERROR_NETWORK_ACCESS_DENIED) and
+161 (ERROR_BAD_PATHNAME) when using ntpath.realpath().
+
+..
+
+.. date: 2024-06-13-12-17-52
+.. gh-issue: 120384
+.. nonce: w1UBGl
+.. section: Core and Builtins
+
+Fix an array out of bounds crash in ``list_ass_subscript``, which could be
+invoked via some specificly tailored input: including concurrent
+modification of a list object, where one thread assigns a slice and another
+clears it.
+
+..
+
+.. date: 2024-06-10-10-42-48
+.. gh-issue: 120298
+.. nonce: napREA
+.. section: Core and Builtins
+
+Fix use-after free in ``list_richcompare_impl`` which can be invoked via
+some specificly tailored evil input.
diff --git a/Misc/NEWS.d/next/Core and Builtins/2024-06-10-10-42-48.gh-issue-120298.napREA.rst b/Misc/NEWS.d/next/Core and Builtins/2024-06-10-10-42-48.gh-issue-120298.napREA.rst
deleted file mode 100644
index 531d39517ac423..00000000000000
--- a/Misc/NEWS.d/next/Core and Builtins/2024-06-10-10-42-48.gh-issue-120298.napREA.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-Fix use-after free in ``list_richcompare_impl`` which can be invoked via
-some specificly tailored evil input.
diff --git a/Misc/NEWS.d/next/Core and Builtins/2024-06-13-12-17-52.gh-issue-120384.w1UBGl.rst b/Misc/NEWS.d/next/Core and Builtins/2024-06-13-12-17-52.gh-issue-120384.w1UBGl.rst
deleted file mode 100644
index 4a4db821ce29b8..00000000000000
--- a/Misc/NEWS.d/next/Core and Builtins/2024-06-13-12-17-52.gh-issue-120384.w1UBGl.rst
+++ /dev/null
@@ -1,3 +0,0 @@
-Fix an array out of bounds crash in ``list_ass_subscript``, which could be
-invoked via some specificly tailored input: including concurrent modification
-of a list object, where one thread assigns a slice and another clears it.
diff --git a/Misc/NEWS.d/next/Library/2021-08-03-05-31-00.bpo-44817.wOW_Qn.rst b/Misc/NEWS.d/next/Library/2021-08-03-05-31-00.bpo-44817.wOW_Qn.rst
deleted file mode 100644
index 79f8c506b54f37..00000000000000
--- a/Misc/NEWS.d/next/Library/2021-08-03-05-31-00.bpo-44817.wOW_Qn.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-Ignore WinError 53 (ERROR_BAD_NETPATH), 65 (ERROR_NETWORK_ACCESS_DENIED)
-and 161 (ERROR_BAD_PATHNAME) when using ntpath.realpath().
diff --git a/Misc/NEWS.d/next/Library/2022-10-29-03-40-18.gh-issue-98793.WSPB4A.rst b/Misc/NEWS.d/next/Library/2022-10-29-03-40-18.gh-issue-98793.WSPB4A.rst
deleted file mode 100644
index 7b67af06cf3d17..00000000000000
--- a/Misc/NEWS.d/next/Library/2022-10-29-03-40-18.gh-issue-98793.WSPB4A.rst
+++ /dev/null
@@ -1 +0,0 @@
-Fix argument typechecks in :func:`!_overlapped.WSAConnect` and :func:`!_overlapped.Overlapped.WSASendTo` functions.
diff --git a/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst b/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
deleted file mode 100644
index 1d152bb5318380..00000000000000
--- a/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
+++ /dev/null
@@ -1 +0,0 @@
-Fix quadratic complexity in :func:`os.path.expandvars`.
diff --git a/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst b/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
deleted file mode 100644
index 940a3ad5a72f68..00000000000000
--- a/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-:mod:`email.message`: ensure linear complexity for legacy HTTP parameters
-parsing. Patch by Bénédikt Tran.
diff --git a/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
deleted file mode 100644
index c30c9439a76a19..00000000000000
--- a/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
+++ /dev/null
@@ -1,3 +0,0 @@
-Add support of the "plaintext" element, RAWTEXT elements "xmp", "iframe",
-"noembed" and "noframes", and optionally RAWTEXT element "noscript" in
-:class:`html.parser.HTMLParser`.
diff --git a/README.rst b/README.rst
index 219a6c0ed5af25..7fffa72f27d883 100644
--- a/README.rst
+++ b/README.rst
@@ -1,4 +1,4 @@
-This is Python version 3.9.24
+This is Python version 3.9.25
=============================
.. image:: https://travis-ci.org/python/cpython.svg?branch=3.9
1
0
[3.11] gh-136063: fix quadratic-complexity parsing in `email.message._parseparam` (GH-136072) (GH-140830)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/3b7d81da078d48f72d50aa8c2bf06a97d2…
commit: 3b7d81da078d48f72d50aa8c2bf06a97d20bd913
branch: 3.11
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T18:29:53+01:00
summary:
[3.11] gh-136063: fix quadratic-complexity parsing in `email.message._parseparam` (GH-136072) (GH-140830)
(cherry picked from commit 680a5d070f59798bb88a1bb6eb027482b8d85c34)
Co-authored-by: Bénédikt Tran <10796600+picnixz(a)users.noreply.github.com>
Co-authored-by: Łukasz Langa <lukasz(a)langa.pl>
files:
A Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
M Lib/email/message.py
M Lib/test/test_email/test_email.py
diff --git a/Lib/email/message.py b/Lib/email/message.py
index 492a6b9a4309fa..6a9903f9c8e842 100644
--- a/Lib/email/message.py
+++ b/Lib/email/message.py
@@ -74,19 +74,25 @@ def _parseparam(s):
# RDM This might be a Header, so for now stringify it.
s = ';' + str(s)
plist = []
- while s[:1] == ';':
- s = s[1:]
- end = s.find(';')
- while end > 0 and (s.count('"', 0, end) - s.count('\\"', 0, end)) % 2:
- end = s.find(';', end + 1)
+ start = 0
+ while s.find(';', start) == start:
+ start += 1
+ end = s.find(';', start)
+ ind, diff = start, 0
+ while end > 0:
+ diff += s.count('"', ind, end) - s.count('\\"', ind, end)
+ if diff % 2 == 0:
+ break
+ end, ind = ind, s.find(';', end + 1)
if end < 0:
end = len(s)
- f = s[:end]
- if '=' in f:
- i = f.index('=')
- f = f[:i].strip().lower() + '=' + f[i+1:].strip()
+ i = s.find('=', start, end)
+ if i == -1:
+ f = s[start:end]
+ else:
+ f = s[start:i].rstrip().lower() + '=' + s[i+1:end].lstrip()
plist.append(f.strip())
- s = s[end:]
+ start = end
return plist
diff --git a/Lib/test/test_email/test_email.py b/Lib/test/test_email/test_email.py
index ad60ed3a7591c0..431d362718ada7 100644
--- a/Lib/test/test_email/test_email.py
+++ b/Lib/test/test_email/test_email.py
@@ -464,6 +464,27 @@ def test_get_param_with_quotes(self):
"Content-Type: foo; bar*0=\"baz\\\"foobar\"; bar*1=\"\\\"baz\"")
self.assertEqual(msg.get_param('bar'), 'baz"foobar"baz')
+ def test_get_param_linear_complexity(self):
+ # Ensure that email.message._parseparam() is fast.
+ # See https://github.com/python/cpython/issues/136063.
+ N = 100_000
+ for s, r in [
+ ("", ""),
+ ("foo=bar", "foo=bar"),
+ (" FOO = bar ", "foo=bar"),
+ ]:
+ with self.subTest(s=s, r=r, N=N):
+ src = f'{s};' * (N - 1) + s
+ res = email.message._parseparam(src)
+ self.assertEqual(len(res), N)
+ self.assertEqual(len(set(res)), 1)
+ self.assertEqual(res[0], r)
+
+ # This will be considered as a single parameter.
+ malformed = 's="' + ';' * (N - 1)
+ res = email.message._parseparam(malformed)
+ self.assertEqual(res, [malformed])
+
def test_field_containment(self):
msg = email.message_from_string('Header: exists')
self.assertIn('header', msg)
diff --git a/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst b/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
new file mode 100644
index 00000000000000..940a3ad5a72f68
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-06-28-13-23-53.gh-issue-136063.aGk0Jv.rst
@@ -0,0 +1,2 @@
+:mod:`email.message`: ensure linear complexity for legacy HTTP parameters
+parsing. Patch by Bénédikt Tran.
1
0
[3.11] gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) (GH-140848)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/5dceb93486176e6b4a6d9754491005113e…
commit: 5dceb93486176e6b4a6d9754491005113eb23427
branch: 3.11
author: Łukasz Langa <lukasz(a)langa.pl>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T18:15:08+01:00
summary:
[3.11] gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) (GH-140848)
(cherry picked from commit f029e8db626ddc6e3a3beea4eff511a71aaceb5c)
Co-authored-by: Serhiy Storchaka <storchaka(a)gmail.com>
files:
A Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
M Lib/ntpath.py
M Lib/posixpath.py
M Lib/test/test_genericpath.py
M Lib/test/test_ntpath.py
diff --git a/Lib/ntpath.py b/Lib/ntpath.py
index ebc55eb891082e..8a71f29a32f287 100644
--- a/Lib/ntpath.py
+++ b/Lib/ntpath.py
@@ -378,17 +378,23 @@ def expanduser(path):
# XXX With COMMAND.COM you can use any characters in a variable name,
# XXX except '^|<>='.
+_varpattern = r"'[^']*'?|%(%|[^%]*%?)|\$(\$|[-\w]+|\{[^}]*\}?)"
+_varsub = None
+_varsubb = None
+
def expandvars(path):
"""Expand shell variables of the forms $var, ${var} and %var%.
Unknown variables are left unchanged."""
path = os.fspath(path)
+ global _varsub, _varsubb
if isinstance(path, bytes):
if b'$' not in path and b'%' not in path:
return path
- import string
- varchars = bytes(string.ascii_letters + string.digits + '_-', 'ascii')
- quote = b'\''
+ if not _varsubb:
+ import re
+ _varsubb = re.compile(_varpattern.encode(), re.ASCII).sub
+ sub = _varsubb
percent = b'%'
brace = b'{'
rbrace = b'}'
@@ -397,94 +403,44 @@ def expandvars(path):
else:
if '$' not in path and '%' not in path:
return path
- import string
- varchars = string.ascii_letters + string.digits + '_-'
- quote = '\''
+ if not _varsub:
+ import re
+ _varsub = re.compile(_varpattern, re.ASCII).sub
+ sub = _varsub
percent = '%'
brace = '{'
rbrace = '}'
dollar = '$'
environ = os.environ
- res = path[:0]
- index = 0
- pathlen = len(path)
- while index < pathlen:
- c = path[index:index+1]
- if c == quote: # no expansion within single quotes
- path = path[index + 1:]
- pathlen = len(path)
- try:
- index = path.index(c)
- res += c + path[:index + 1]
- except ValueError:
- res += c + path
- index = pathlen - 1
- elif c == percent: # variable or '%'
- if path[index + 1:index + 2] == percent:
- res += c
- index += 1
- else:
- path = path[index+1:]
- pathlen = len(path)
- try:
- index = path.index(percent)
- except ValueError:
- res += percent + path
- index = pathlen - 1
- else:
- var = path[:index]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = percent + var + percent
- res += value
- elif c == dollar: # variable or '$$'
- if path[index + 1:index + 2] == dollar:
- res += c
- index += 1
- elif path[index + 1:index + 2] == brace:
- path = path[index+2:]
- pathlen = len(path)
- try:
- index = path.index(rbrace)
- except ValueError:
- res += dollar + brace + path
- index = pathlen - 1
- else:
- var = path[:index]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = dollar + brace + var + rbrace
- res += value
- else:
- var = path[:0]
- index += 1
- c = path[index:index + 1]
- while c and c in varchars:
- var += c
- index += 1
- c = path[index:index + 1]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = dollar + var
- res += value
- if c:
- index -= 1
+
+ def repl(m):
+ lastindex = m.lastindex
+ if lastindex is None:
+ return m[0]
+ name = m[lastindex]
+ if lastindex == 1:
+ if name == percent:
+ return name
+ if not name.endswith(percent):
+ return m[0]
+ name = name[:-1]
else:
- res += c
- index += 1
- return res
+ if name == dollar:
+ return name
+ if name.startswith(brace):
+ if not name.endswith(rbrace):
+ return m[0]
+ name = name[1:-1]
+
+ try:
+ if environ is None:
+ return os.fsencode(os.environ[os.fsdecode(name)])
+ else:
+ return environ[name]
+ except KeyError:
+ return m[0]
+
+ return sub(repl, path)
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A\B.
diff --git a/Lib/posixpath.py b/Lib/posixpath.py
index ce71a477b21928..8f300aea745170 100644
--- a/Lib/posixpath.py
+++ b/Lib/posixpath.py
@@ -287,42 +287,41 @@ def expanduser(path):
# This expands the forms $variable and ${variable} only.
# Non-existent variables are left unchanged.
-_varprog = None
-_varprogb = None
+_varpattern = r'\$(\w+|\{[^}]*\}?)'
+_varsub = None
+_varsubb = None
def expandvars(path):
"""Expand shell variables of form $var and ${var}. Unknown variables
are left unchanged."""
path = os.fspath(path)
- global _varprog, _varprogb
+ global _varsub, _varsubb
if isinstance(path, bytes):
if b'$' not in path:
return path
- if not _varprogb:
+ if not _varsubb:
import re
- _varprogb = re.compile(br'\$(\w+|\{[^}]*\})', re.ASCII)
- search = _varprogb.search
+ _varsubb = re.compile(_varpattern.encode(), re.ASCII).sub
+ sub = _varsubb
start = b'{'
end = b'}'
environ = getattr(os, 'environb', None)
else:
if '$' not in path:
return path
- if not _varprog:
+ if not _varsub:
import re
- _varprog = re.compile(r'\$(\w+|\{[^}]*\})', re.ASCII)
- search = _varprog.search
+ _varsub = re.compile(_varpattern, re.ASCII).sub
+ sub = _varsub
start = '{'
end = '}'
environ = os.environ
- i = 0
- while True:
- m = search(path, i)
- if not m:
- break
- i, j = m.span(0)
- name = m.group(1)
- if name.startswith(start) and name.endswith(end):
+
+ def repl(m):
+ name = m[1]
+ if name.startswith(start):
+ if not name.endswith(end):
+ return m[0]
name = name[1:-1]
try:
if environ is None:
@@ -330,13 +329,11 @@ def expandvars(path):
else:
value = environ[name]
except KeyError:
- i = j
+ return m[0]
else:
- tail = path[j:]
- path = path[:i] + value
- i = len(path)
- path += tail
- return path
+ return value
+
+ return sub(repl, path)
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A/B.
diff --git a/Lib/test/test_genericpath.py b/Lib/test/test_genericpath.py
index 4f311c2d498e9f..ce501a94516544 100644
--- a/Lib/test/test_genericpath.py
+++ b/Lib/test/test_genericpath.py
@@ -7,6 +7,7 @@
import sys
import unittest
import warnings
+from test import support
from test.support import is_emscripten
from test.support import os_helper
from test.support import warnings_helper
@@ -434,6 +435,19 @@ def check(value, expected):
os.fsencode('$bar%s bar' % nonascii))
check(b'$spam}bar', os.fsencode('%s}bar' % nonascii))
+ @support.requires_resource('cpu')
+ def test_expandvars_large(self):
+ expandvars = self.pathmodule.expandvars
+ with os_helper.EnvironmentVarGuard() as env:
+ env.clear()
+ env["A"] = "B"
+ n = 100_000
+ self.assertEqual(expandvars('$A'*n), 'B'*n)
+ self.assertEqual(expandvars('${A}'*n), 'B'*n)
+ self.assertEqual(expandvars('$A!'*n), 'B!'*n)
+ self.assertEqual(expandvars('${A}A'*n), 'BA'*n)
+ self.assertEqual(expandvars('${'*10*n), '${'*10*n)
+
def test_abspath(self):
self.assertIn("foo", self.pathmodule.abspath("foo"))
with warnings.catch_warnings():
diff --git a/Lib/test/test_ntpath.py b/Lib/test/test_ntpath.py
index 7d0c0a095bc50a..a55d8022ee3666 100644
--- a/Lib/test/test_ntpath.py
+++ b/Lib/test/test_ntpath.py
@@ -6,8 +6,8 @@
import unittest
import warnings
from ntpath import ALLOW_MISSING
-from test.support import os_helper
-from test.support import TestFailed, is_emscripten
+from test import support
+from test.support import os_helper, is_emscripten
from test.support.os_helper import FakePath
from test import test_genericpath
from tempfile import TemporaryFile
@@ -57,7 +57,7 @@ def tester(fn, wantResult):
fn = fn.replace("\\", "\\\\")
gotResult = eval(fn)
if wantResult != gotResult and _norm(wantResult) != _norm(gotResult):
- raise TestFailed("%s should return: %s but returned: %s" \
+ raise support.TestFailed("%s should return: %s but returned: %s" \
%(str(fn), str(wantResult), str(gotResult)))
# then with bytes
@@ -73,7 +73,7 @@ def tester(fn, wantResult):
warnings.simplefilter("ignore", DeprecationWarning)
gotResult = eval(fn)
if _norm(wantResult) != _norm(gotResult):
- raise TestFailed("%s should return: %s but returned: %s" \
+ raise support.TestFailed("%s should return: %s but returned: %s" \
%(str(fn), str(wantResult), repr(gotResult)))
@@ -820,6 +820,19 @@ def check(value, expected):
check('%spam%bar', '%sbar' % nonascii)
check('%{}%bar'.format(nonascii), 'ham%sbar' % nonascii)
+ @support.requires_resource('cpu')
+ def test_expandvars_large(self):
+ expandvars = ntpath.expandvars
+ with os_helper.EnvironmentVarGuard() as env:
+ env.clear()
+ env["A"] = "B"
+ n = 100_000
+ self.assertEqual(expandvars('%A%'*n), 'B'*n)
+ self.assertEqual(expandvars('%A%A'*n), 'BA'*n)
+ self.assertEqual(expandvars("''"*n + '%%'), "''"*n + '%')
+ self.assertEqual(expandvars("%%"*n), "%"*n)
+ self.assertEqual(expandvars("$$"*n), "$"*n)
+
def test_expanduser(self):
tester('ntpath.expanduser("test")', 'test')
@@ -1090,6 +1103,7 @@ def test_nt_helpers(self):
self.assertIsInstance(b_final_path, bytes)
self.assertGreater(len(b_final_path), 0)
+
class NtCommonTest(test_genericpath.CommonTest, unittest.TestCase):
pathmodule = ntpath
attributes = ['relpath']
diff --git a/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst b/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
new file mode 100644
index 00000000000000..1d152bb5318380
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
@@ -0,0 +1 @@
+Fix quadratic complexity in :func:`os.path.expandvars`.
1
0
[3.11] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140852)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/a18b38172ab1d9cd0a25b6977271e062ec…
commit: a18b38172ab1d9cd0a25b6977271e062ecc4f3b0
branch: 3.11
author: Serhiy Storchaka <storchaka(a)gmail.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T18:14:55+01:00
summary:
[3.11] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140852)
(cherry picked from commit a17c57eee5b5cc81390750d07e4800b19c0c3084)
(cherry picked from commit 0329bd11c7e98484727bbb9062d53a8fa53ac7fd)
Co-authored-by: Łukasz Langa <lukasz(a)langa.pl>
files:
A Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
M Doc/library/html.parser.rst
M Lib/html/parser.py
M Lib/test/test_htmlparser.py
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index d35090111e0822..c6020925404667 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -15,14 +15,18 @@
This module defines a class :class:`HTMLParser` which serves as the basis for
parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
-.. class:: HTMLParser(*, convert_charrefs=True)
+.. class:: HTMLParser(*, convert_charrefs=True, scripting=False)
Create a parser instance able to parse invalid markup.
- If *convert_charrefs* is ``True`` (the default), all character
- references (except the ones in ``script``/``style`` elements) are
+ If *convert_charrefs* is true (the default), all character
+ references (except the ones in elements like ``script`` and ``style``) are
automatically converted to the corresponding Unicode characters.
+ If *scripting* is false (the default), the content of the ``noscript``
+ element is parsed normally; if it's true, it's returned as is without
+ being parsed.
+
An :class:`.HTMLParser` instance is fed HTML data and calls handler methods
when start tags, end tags, text, comments, and other markup elements are
encountered. The user should subclass :class:`.HTMLParser` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
.. versionchanged:: 3.5
The default value for argument *convert_charrefs* is now ``True``.
+ .. versionchanged:: 3.11.15
+ Added the *scripting* parameter.
+
Example HTML Parser Application
-------------------------------
@@ -159,15 +166,15 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
.. method:: HTMLParser.handle_data(data)
This method is called to process arbitrary data (e.g. text nodes and the
- content of ``<script>...</script>`` and ``<style>...</style>``).
+ content of elements like ``script`` and ``style``).
.. method:: HTMLParser.handle_entityref(name)
This method is called to process a named character reference of the form
``&name;`` (e.g. ``>``), where *name* is a general entity reference
- (e.g. ``'gt'``). This method is never called if *convert_charrefs* is
- ``True``.
+ (e.g. ``'gt'``).
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_charref(name)
@@ -175,8 +182,8 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
This method is called to process decimal and hexadecimal numeric character
references of the form :samp:`&#{NNN};` and :samp:`&#x{NNN};`. For example, the decimal
equivalent for ``>`` is ``>``, whereas the hexadecimal is ``>``;
- in this case the method will receive ``'62'`` or ``'x3E'``. This method
- is never called if *convert_charrefs* is ``True``.
+ in this case the method will receive ``'62'`` or ``'x3E'``.
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_comment(data)
@@ -284,8 +291,8 @@ Parsing an element with a few attributes and a title::
Data : Python
End tag : h1
-The content of ``script`` and ``style`` elements is returned as is, without
-further parsing::
+The content of elements like ``script`` and ``style`` is returned as is,
+without further parsing::
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style
@@ -294,10 +301,10 @@ further parsing::
End tag : style
>>> parser.feed('<script type="text/javascript">'
- ... 'alert("<strong>hello!</strong>");</script>')
+ ... 'alert("<strong>hello! ☺</strong>");</script>')
Start tag: script
attr: ('type', 'text/javascript')
- Data : alert("<strong>hello!</strong>");
+ Data : alert("<strong>hello! ☺</strong>");
End tag : script
Parsing comments::
@@ -317,7 +324,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once
-(unless *convert_charrefs* is set to ``True``)::
+if *convert_charrefs* is false::
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
... parser.feed(chunk)
diff --git a/Lib/html/parser.py b/Lib/html/parser.py
index 8eae9dc55e568c..fb3c13b873f93f 100644
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -109,16 +109,24 @@ class HTMLParser(_markupbase.ParserBase):
argument.
"""
- CDATA_CONTENT_ELEMENTS = ("script", "style")
+ # See the HTML5 specs section "13.4 Parsing HTML fragments".
+ # https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments
+ # CDATA_CONTENT_ELEMENTS are parsed in RAWTEXT mode
+ CDATA_CONTENT_ELEMENTS = ("script", "style", "xmp", "iframe", "noembed", "noframes")
RCDATA_CONTENT_ELEMENTS = ("textarea", "title")
- def __init__(self, *, convert_charrefs=True):
+ def __init__(self, *, convert_charrefs=True, scripting=False):
"""Initialize and reset this instance.
- If convert_charrefs is True (the default), all character references
+ If convert_charrefs is true (the default), all character references
are automatically converted to the corresponding Unicode characters.
+
+ If *scripting* is false (the default), the content of the
+ ``noscript`` element is parsed normally; if it's true,
+ it's returned as is without being parsed.
"""
self.convert_charrefs = convert_charrefs
+ self.scripting = scripting
self.reset()
def reset(self):
@@ -153,7 +161,9 @@ def get_starttag_text(self):
def set_cdata_mode(self, elem, *, escapable=False):
self.cdata_elem = elem.lower()
self._escapable = escapable
- if escapable and not self.convert_charrefs:
+ if self.cdata_elem == 'plaintext':
+ self.interesting = re.compile(r'\Z')
+ elif escapable and not self.convert_charrefs:
self.interesting = re.compile(r'&|</%s(?=[\t\n\r\f />])' % self.cdata_elem,
re.IGNORECASE|re.ASCII)
else:
@@ -434,8 +444,10 @@ def parse_starttag(self, i):
self.handle_startendtag(tag, attrs)
else:
self.handle_starttag(tag, attrs)
- if tag in self.CDATA_CONTENT_ELEMENTS:
- self.set_cdata_mode(tag)
+ if (tag in self.CDATA_CONTENT_ELEMENTS or
+ (self.scripting and tag == "noscript") or
+ tag == "plaintext"):
+ self.set_cdata_mode(tag, escapable=False)
elif tag in self.RCDATA_CONTENT_ELEMENTS:
self.set_cdata_mode(tag, escapable=True)
return endpos
diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py
index a7be7a6e20224a..1c1be3ff476886 100644
--- a/Lib/test/test_htmlparser.py
+++ b/Lib/test/test_htmlparser.py
@@ -7,6 +7,18 @@
from test import support
+SAMPLE_RCDATA = (
+ '<!-- not a comment -->'
+ "<not a='start tag'>"
+ '<![CDATA[not a cdata]]>'
+ '<!not a bogus comment>'
+ '</not a bogus comment>'
+ '\u2603'
+)
+
+SAMPLE_RAWTEXT = SAMPLE_RCDATA + '&☺'
+
+
class EventCollector(html.parser.HTMLParser):
def __init__(self, *args, autocdata=False, **kw):
@@ -292,30 +304,20 @@ def test_get_starttag_text(self):
'Date().getTime()+\'"><\\/s\'+\'cript>\');\n//]]>'),
'\n<!-- //\nvar foo = 3.14;\n// -->\n',
'<!-- \u2603 -->',
- 'foo = "</ script>"',
- 'foo = "</scripture>"',
- 'foo = "</script\v>"',
- 'foo = "</script\xa0>"',
- 'foo = "</ſcript>"',
- 'foo = "</scrıpt>"',
])
def test_script_content(self, content):
s = f'<script>{content}</script>'
- self._run_check(s, [("starttag", "script", []),
- ("data", content),
- ("endtag", "script")])
+ self._run_check(s, [
+ ("starttag", "script", []),
+ ("data", content),
+ ("endtag", "script"),
+ ])
@support.subTests('content', [
'a::before { content: "<!-- not a comment -->"; }',
'a::before { content: "¬-an-entity-ref;"; }',
'a::before { content: "<not a=\'start tag\'>"; }',
'a::before { content: "\u2603"; }',
- 'a::before { content: "< /style>"; }',
- 'a::before { content: "</ style>"; }',
- 'a::before { content: "</styled>"; }',
- 'a::before { content: "</style\v>"; }',
- 'a::before { content: "</style\xa0>"; }',
- 'a::before { content: "</ſtyle>"; }',
])
def test_style_content(self, content):
s = f'<style>{content}</style>'
@@ -323,47 +325,59 @@ def test_style_content(self, content):
("data", content),
("endtag", "style")])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /title>',
- '</ title>',
- '</titled>',
- '</title\v>',
- '</title\xa0>',
- '</tıtle>',
+ @support.subTests('tag', ['title', 'textarea'])
+ def test_rcdata_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RCDATA}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RCDATA),
+ ("endtag", tag),
])
- def test_title_content(self, content):
- source = f"<title>{content}</title>"
+ source = f"<{tag}>&</{tag}>"
self._run_check(source, [
- ("starttag", "title", []),
- ("data", content),
- ("endtag", "title"),
+ ("starttag", tag, []),
+ ('entityref', 'amp'),
+ ("endtag", tag),
])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /textarea>',
- '</ textarea>',
- '</textareable>',
- '</textarea\v>',
- '</textarea\xa0>',
+ @support.subTests('tag',
+ ['style', 'xmp', 'iframe', 'noembed', 'noframes', 'script'])
+ def test_rawtext_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RAWTEXT}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", tag),
+ ])
+
+ def test_noscript_content(self):
+ source = f"<noscript>{SAMPLE_RAWTEXT}</noscript>"
+ # scripting=False -- normal mode
+ self._run_check(source, [
+ ('starttag', 'noscript', []),
+ ('comment', ' not a comment '),
+ ('starttag', 'not', [('a', 'start tag')]),
+ ('unknown decl', 'CDATA[not a cdata'),
+ ('comment', 'not a bogus comment'),
+ ('endtag', 'not'),
+ ('data', '☃'),
+ ('entityref', 'amp'),
+ ('charref', '9786'),
+ ('endtag', 'noscript'),
])
- def test_textarea_content(self, content):
- source = f"<textarea>{content}</textarea>"
+ # scripting=True -- RAWTEXT mode
+ self._run_check(source, [
+ ("starttag", "noscript", []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", "noscript"),
+ ], collector=EventCollector(scripting=True))
+
+ def test_plaintext_content(self):
+ content = SAMPLE_RAWTEXT + '</plaintext>' # not closing
+ source = f"<plaintext>{content}"
self._run_check(source, [
- ("starttag", "textarea", []),
+ ("starttag", "plaintext", []),
("data", content),
- ("endtag", "textarea"),
])
@support.subTests('endtag', ['script', 'SCRIPT', 'script ', 'script\n',
@@ -380,52 +394,65 @@ def test_script_closing_tag(self, endtag):
("endtag", "script")],
collector=EventCollectorNoNormalize(convert_charrefs=False))
- @support.subTests('endtag', ['style', 'STYLE', 'style ', 'style\n',
- 'style/', 'style foo=bar', 'style foo=">"'])
- def test_style_closing_tag(self, endtag):
- content = """
- b::before { content: "<!-- not a comment -->"; }
- p::before { content: "¬-an-entity-ref;"; }
- a::before { content: "<i>"; }
- a::after { content: "</i>"; }
- """
- s = f'<StyLE>{content}</{endtag}>'
- self._run_check(s, [("starttag", "style", []),
- ("data", content),
- ("endtag", "style")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['title', 'TITLE', 'title ', 'title\n',
- 'title/', 'title foo=bar', 'title foo=">"'])
- def test_title_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TitLe>{content}</{endtag}>'
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['textarea', 'TEXTAREA', 'textarea ', 'textarea\n',
- 'textarea/', 'textarea foo=bar', 'textarea foo=">"'])
- def test_textarea_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TexTarEa>{content}</{endtag}>'
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_closing_tag(self, tag):
+ for endtag in [tag, tag.upper(), f'{tag} ', f'{tag}\n',
+ f'{tag}/', f'{tag} foo=bar', f'{tag} foo=">"']:
+ content = "<!-- not a comment --><i>Spam</i>"
+ s = f'<{tag.upper()}>{content}</{endtag}>'
+ self._run_check(s, [
+ ("starttag", tag, []),
+ ('data', content),
+ ("endtag", tag),
+ ], collector=EventCollectorNoNormalize(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_invalid_closing_tag(self, tag):
+ content = (
+ f'< /{tag}>'
+ f'</ {tag}>'
+ f'</{tag}x>'
+ f'</{tag}\v>'
+ f'</{tag}\xa0>'
+ )
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag,endtag', [
+ ('title', 'tıtle'),
+ ('style', 'ſtyle'),
+ ('style', 'ſtyle'),
+ ('style', 'style'),
+ ('iframe', 'ıframe'),
+ ('noframes', 'noframeſ'),
+ ('noscript', 'noſcript'),
+ ('noscript', 'noscrıpt'),
+ ('script', 'ſcript'),
+ ('script', 'scrıpt'),
+ ])
+ def test_invalid_nonascii_closing_tag(self, tag, endtag):
+ content = f"<br></{endtag}>"
+ source = f"<{tag}>{content}"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
@support.subTests('tail,end', [
('', False),
diff --git a/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
new file mode 100644
index 00000000000000..c30c9439a76a19
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
@@ -0,0 +1,3 @@
+Add support of the "plaintext" element, RAWTEXT elements "xmp", "iframe",
+"noembed" and "noframes", and optionally RAWTEXT element "noscript" in
+:class:`html.parser.HTMLParser`.
1
0
[3.9] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140857)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/941811fc9d91d6417abe8878e9bfe8e931…
commit: 941811fc9d91d6417abe8878e9bfe8e93143e106
branch: 3.9
author: Serhiy Storchaka <storchaka(a)gmail.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T18:02:38+01:00
summary:
[3.9] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140857)
(cherry picked from commit a17c57eee5b5cc81390750d07e4800b19c0c3084)
(cherry picked from commit 0329bd11c7e98484727bbb9062d53a8fa53ac7fd)
Co-authored-by: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
files:
A Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
M Doc/library/html.parser.rst
M Lib/html/parser.py
M Lib/test/test_htmlparser.py
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index 03aff25ce6117a..848fa774b27852 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -15,14 +15,18 @@
This module defines a class :class:`HTMLParser` which serves as the basis for
parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
-.. class:: HTMLParser(*, convert_charrefs=True)
+.. class:: HTMLParser(*, convert_charrefs=True, scripting=False)
Create a parser instance able to parse invalid markup.
- If *convert_charrefs* is ``True`` (the default), all character
- references (except the ones in ``script``/``style`` elements) are
+ If *convert_charrefs* is true (the default), all character
+ references (except the ones in elements like ``script`` and ``style``) are
automatically converted to the corresponding Unicode characters.
+ If *scripting* is false (the default), the content of the ``noscript``
+ element is parsed normally; if it's true, it's returned as is without
+ being parsed.
+
An :class:`.HTMLParser` instance is fed HTML data and calls handler methods
when start tags, end tags, text, comments, and other markup elements are
encountered. The user should subclass :class:`.HTMLParser` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
.. versionchanged:: 3.5
The default value for argument *convert_charrefs* is now ``True``.
+ .. versionchanged:: 3.9.25
+ Added the *scripting* parameter.
+
Example HTML Parser Application
-------------------------------
@@ -159,15 +166,15 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
.. method:: HTMLParser.handle_data(data)
This method is called to process arbitrary data (e.g. text nodes and the
- content of ``<script>...</script>`` and ``<style>...</style>``).
+ content of elements like ``script`` and ``style``).
.. method:: HTMLParser.handle_entityref(name)
This method is called to process a named character reference of the form
``&name;`` (e.g. ``>``), where *name* is a general entity reference
- (e.g. ``'gt'``). This method is never called if *convert_charrefs* is
- ``True``.
+ (e.g. ``'gt'``).
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_charref(name)
@@ -175,8 +182,8 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
This method is called to process decimal and hexadecimal numeric character
references of the form ``&#NNN;`` and ``&#xNNN;``. For example, the decimal
equivalent for ``>`` is ``>``, whereas the hexadecimal is ``>``;
- in this case the method will receive ``'62'`` or ``'x3E'``. This method
- is never called if *convert_charrefs* is ``True``.
+ in this case the method will receive ``'62'`` or ``'x3E'``.
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_comment(data)
@@ -284,8 +291,8 @@ Parsing an element with a few attributes and a title::
Data : Python
End tag : h1
-The content of ``script`` and ``style`` elements is returned as is, without
-further parsing::
+The content of elements like ``script`` and ``style`` is returned as is,
+without further parsing::
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style
@@ -294,10 +301,10 @@ further parsing::
End tag : style
>>> parser.feed('<script type="text/javascript">'
- ... 'alert("<strong>hello!</strong>");</script>')
+ ... 'alert("<strong>hello! ☺</strong>");</script>')
Start tag: script
attr: ('type', 'text/javascript')
- Data : alert("<strong>hello!</strong>");
+ Data : alert("<strong>hello! ☺</strong>");
End tag : script
Parsing comments::
@@ -317,7 +324,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once
-(unless *convert_charrefs* is set to ``True``)::
+if *convert_charrefs* is false::
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
... parser.feed(chunk)
diff --git a/Lib/html/parser.py b/Lib/html/parser.py
index 8724c22f8ff289..62134d376e1654 100644
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -109,16 +109,24 @@ class HTMLParser(_markupbase.ParserBase):
argument.
"""
- CDATA_CONTENT_ELEMENTS = ("script", "style")
+ # See the HTML5 specs section "13.4 Parsing HTML fragments".
+ # https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments
+ # CDATA_CONTENT_ELEMENTS are parsed in RAWTEXT mode
+ CDATA_CONTENT_ELEMENTS = ("script", "style", "xmp", "iframe", "noembed", "noframes")
RCDATA_CONTENT_ELEMENTS = ("textarea", "title")
- def __init__(self, *, convert_charrefs=True):
+ def __init__(self, *, convert_charrefs=True, scripting=False):
"""Initialize and reset this instance.
- If convert_charrefs is True (the default), all character references
+ If convert_charrefs is true (the default), all character references
are automatically converted to the corresponding Unicode characters.
+
+ If *scripting* is false (the default), the content of the
+ ``noscript`` element is parsed normally; if it's true,
+ it's returned as is without being parsed.
"""
self.convert_charrefs = convert_charrefs
+ self.scripting = scripting
self.reset()
def reset(self):
@@ -153,7 +161,9 @@ def get_starttag_text(self):
def set_cdata_mode(self, elem, *, escapable=False):
self.cdata_elem = elem.lower()
self._escapable = escapable
- if escapable and not self.convert_charrefs:
+ if self.cdata_elem == 'plaintext':
+ self.interesting = re.compile(r'\Z')
+ elif escapable and not self.convert_charrefs:
self.interesting = re.compile(r'&|</%s(?=[\t\n\r\f />])' % self.cdata_elem,
re.IGNORECASE|re.ASCII)
else:
@@ -441,8 +451,10 @@ def parse_starttag(self, i):
self.handle_startendtag(tag, attrs)
else:
self.handle_starttag(tag, attrs)
- if tag in self.CDATA_CONTENT_ELEMENTS:
- self.set_cdata_mode(tag)
+ if (tag in self.CDATA_CONTENT_ELEMENTS or
+ (self.scripting and tag == "noscript") or
+ tag == "plaintext"):
+ self.set_cdata_mode(tag, escapable=False)
elif tag in self.RCDATA_CONTENT_ELEMENTS:
self.set_cdata_mode(tag, escapable=True)
return endpos
diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py
index 9d4533e3f354ef..ea5eb9fd7fcd81 100644
--- a/Lib/test/test_htmlparser.py
+++ b/Lib/test/test_htmlparser.py
@@ -7,6 +7,18 @@
from test import support
+SAMPLE_RCDATA = (
+ '<!-- not a comment -->'
+ "<not a='start tag'>"
+ '<![CDATA[not a cdata]]>'
+ '<!not a bogus comment>'
+ '</not a bogus comment>'
+ '\u2603'
+)
+
+SAMPLE_RAWTEXT = SAMPLE_RCDATA + '&☺'
+
+
class EventCollector(html.parser.HTMLParser):
def __init__(self, *args, autocdata=False, **kw):
@@ -292,30 +304,20 @@ def test_get_starttag_text(self):
'Date().getTime()+\'"><\\/s\'+\'cript>\');\n//]]>'),
'\n<!-- //\nvar foo = 3.14;\n// -->\n',
'<!-- \u2603 -->',
- 'foo = "</ script>"',
- 'foo = "</scripture>"',
- 'foo = "</script\v>"',
- 'foo = "</script\xa0>"',
- 'foo = "</ſcript>"',
- 'foo = "</scrıpt>"',
])
def test_script_content(self, content):
s = f'<script>{content}</script>'
- self._run_check(s, [("starttag", "script", []),
- ("data", content),
- ("endtag", "script")])
+ self._run_check(s, [
+ ("starttag", "script", []),
+ ("data", content),
+ ("endtag", "script"),
+ ])
@support.subTests('content', [
'a::before { content: "<!-- not a comment -->"; }',
'a::before { content: "¬-an-entity-ref;"; }',
'a::before { content: "<not a=\'start tag\'>"; }',
'a::before { content: "\u2603"; }',
- 'a::before { content: "< /style>"; }',
- 'a::before { content: "</ style>"; }',
- 'a::before { content: "</styled>"; }',
- 'a::before { content: "</style\v>"; }',
- 'a::before { content: "</style\xa0>"; }',
- 'a::before { content: "</ſtyle>"; }',
])
def test_style_content(self, content):
s = f'<style>{content}</style>'
@@ -323,47 +325,59 @@ def test_style_content(self, content):
("data", content),
("endtag", "style")])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /title>',
- '</ title>',
- '</titled>',
- '</title\v>',
- '</title\xa0>',
- '</tıtle>',
+ @support.subTests('tag', ['title', 'textarea'])
+ def test_rcdata_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RCDATA}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RCDATA),
+ ("endtag", tag),
])
- def test_title_content(self, content):
- source = f"<title>{content}</title>"
+ source = f"<{tag}>&</{tag}>"
self._run_check(source, [
- ("starttag", "title", []),
- ("data", content),
- ("endtag", "title"),
+ ("starttag", tag, []),
+ ('entityref', 'amp'),
+ ("endtag", tag),
])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /textarea>',
- '</ textarea>',
- '</textareable>',
- '</textarea\v>',
- '</textarea\xa0>',
+ @support.subTests('tag',
+ ['style', 'xmp', 'iframe', 'noembed', 'noframes', 'script'])
+ def test_rawtext_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RAWTEXT}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", tag),
+ ])
+
+ def test_noscript_content(self):
+ source = f"<noscript>{SAMPLE_RAWTEXT}</noscript>"
+ # scripting=False -- normal mode
+ self._run_check(source, [
+ ('starttag', 'noscript', []),
+ ('comment', ' not a comment '),
+ ('starttag', 'not', [('a', 'start tag')]),
+ ('unknown decl', 'CDATA[not a cdata'),
+ ('comment', 'not a bogus comment'),
+ ('endtag', 'not'),
+ ('data', '☃'),
+ ('entityref', 'amp'),
+ ('charref', '9786'),
+ ('endtag', 'noscript'),
])
- def test_textarea_content(self, content):
- source = f"<textarea>{content}</textarea>"
+ # scripting=True -- RAWTEXT mode
+ self._run_check(source, [
+ ("starttag", "noscript", []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", "noscript"),
+ ], collector=EventCollector(scripting=True))
+
+ def test_plaintext_content(self):
+ content = SAMPLE_RAWTEXT + '</plaintext>' # not closing
+ source = f"<plaintext>{content}"
self._run_check(source, [
- ("starttag", "textarea", []),
+ ("starttag", "plaintext", []),
("data", content),
- ("endtag", "textarea"),
])
@support.subTests('endtag', ['script', 'SCRIPT', 'script ', 'script\n',
@@ -380,52 +394,65 @@ def test_script_closing_tag(self, endtag):
("endtag", "script")],
collector=EventCollectorNoNormalize(convert_charrefs=False))
- @support.subTests('endtag', ['style', 'STYLE', 'style ', 'style\n',
- 'style/', 'style foo=bar', 'style foo=">"'])
- def test_style_closing_tag(self, endtag):
- content = """
- b::before { content: "<!-- not a comment -->"; }
- p::before { content: "¬-an-entity-ref;"; }
- a::before { content: "<i>"; }
- a::after { content: "</i>"; }
- """
- s = f'<StyLE>{content}</{endtag}>'
- self._run_check(s, [("starttag", "style", []),
- ("data", content),
- ("endtag", "style")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['title', 'TITLE', 'title ', 'title\n',
- 'title/', 'title foo=bar', 'title foo=">"'])
- def test_title_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TitLe>{content}</{endtag}>'
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['textarea', 'TEXTAREA', 'textarea ', 'textarea\n',
- 'textarea/', 'textarea foo=bar', 'textarea foo=">"'])
- def test_textarea_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TexTarEa>{content}</{endtag}>'
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_closing_tag(self, tag):
+ for endtag in [tag, tag.upper(), f'{tag} ', f'{tag}\n',
+ f'{tag}/', f'{tag} foo=bar', f'{tag} foo=">"']:
+ content = "<!-- not a comment --><i>Spam</i>"
+ s = f'<{tag.upper()}>{content}</{endtag}>'
+ self._run_check(s, [
+ ("starttag", tag, []),
+ ('data', content),
+ ("endtag", tag),
+ ], collector=EventCollectorNoNormalize(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_invalid_closing_tag(self, tag):
+ content = (
+ f'< /{tag}>'
+ f'</ {tag}>'
+ f'</{tag}x>'
+ f'</{tag}\v>'
+ f'</{tag}\xa0>'
+ )
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag,endtag', [
+ ('title', 'tıtle'),
+ ('style', 'ſtyle'),
+ ('style', 'ſtyle'),
+ ('style', 'style'),
+ ('iframe', 'ıframe'),
+ ('noframes', 'noframeſ'),
+ ('noscript', 'noſcript'),
+ ('noscript', 'noscrıpt'),
+ ('script', 'ſcript'),
+ ('script', 'scrıpt'),
+ ])
+ def test_invalid_nonascii_closing_tag(self, tag, endtag):
+ content = f"<br></{endtag}>"
+ source = f"<{tag}>{content}"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
@support.subTests('tail,end', [
('', False),
diff --git a/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
new file mode 100644
index 00000000000000..c30c9439a76a19
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
@@ -0,0 +1,3 @@
+Add support of the "plaintext" element, RAWTEXT elements "xmp", "iframe",
+"noembed" and "noframes", and optionally RAWTEXT element "noscript" in
+:class:`html.parser.HTMLParser`.
1
0
[3.13] gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) (GH-140845)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/9ab89c026aa9611c4b0b67c288b8303a48…
commit: 9ab89c026aa9611c4b0b67c288b8303a480fe742
branch: 3.13
author: Łukasz Langa <lukasz(a)langa.pl>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T17:58:09+01:00
summary:
[3.13] gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) (GH-140845)
(cherry picked from commit f029e8db626ddc6e3a3beea4eff511a71aaceb5c)
Co-authored-by: Serhiy Storchaka <storchaka(a)gmail.com>
Co-authored-by: Łukasz Langa <lukasz(a)langa.pl>
files:
A Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
M Lib/ntpath.py
M Lib/posixpath.py
M Lib/test/test_genericpath.py
M Lib/test/test_ntpath.py
diff --git a/Lib/ntpath.py b/Lib/ntpath.py
index 9cdc16480f9afe..01f060e70beed9 100644
--- a/Lib/ntpath.py
+++ b/Lib/ntpath.py
@@ -400,17 +400,23 @@ def expanduser(path):
# XXX With COMMAND.COM you can use any characters in a variable name,
# XXX except '^|<>='.
+_varpattern = r"'[^']*'?|%(%|[^%]*%?)|\$(\$|[-\w]+|\{[^}]*\}?)"
+_varsub = None
+_varsubb = None
+
def expandvars(path):
"""Expand shell variables of the forms $var, ${var} and %var%.
Unknown variables are left unchanged."""
path = os.fspath(path)
+ global _varsub, _varsubb
if isinstance(path, bytes):
if b'$' not in path and b'%' not in path:
return path
- import string
- varchars = bytes(string.ascii_letters + string.digits + '_-', 'ascii')
- quote = b'\''
+ if not _varsubb:
+ import re
+ _varsubb = re.compile(_varpattern.encode(), re.ASCII).sub
+ sub = _varsubb
percent = b'%'
brace = b'{'
rbrace = b'}'
@@ -419,94 +425,44 @@ def expandvars(path):
else:
if '$' not in path and '%' not in path:
return path
- import string
- varchars = string.ascii_letters + string.digits + '_-'
- quote = '\''
+ if not _varsub:
+ import re
+ _varsub = re.compile(_varpattern, re.ASCII).sub
+ sub = _varsub
percent = '%'
brace = '{'
rbrace = '}'
dollar = '$'
environ = os.environ
- res = path[:0]
- index = 0
- pathlen = len(path)
- while index < pathlen:
- c = path[index:index+1]
- if c == quote: # no expansion within single quotes
- path = path[index + 1:]
- pathlen = len(path)
- try:
- index = path.index(c)
- res += c + path[:index + 1]
- except ValueError:
- res += c + path
- index = pathlen - 1
- elif c == percent: # variable or '%'
- if path[index + 1:index + 2] == percent:
- res += c
- index += 1
- else:
- path = path[index+1:]
- pathlen = len(path)
- try:
- index = path.index(percent)
- except ValueError:
- res += percent + path
- index = pathlen - 1
- else:
- var = path[:index]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = percent + var + percent
- res += value
- elif c == dollar: # variable or '$$'
- if path[index + 1:index + 2] == dollar:
- res += c
- index += 1
- elif path[index + 1:index + 2] == brace:
- path = path[index+2:]
- pathlen = len(path)
- try:
- index = path.index(rbrace)
- except ValueError:
- res += dollar + brace + path
- index = pathlen - 1
- else:
- var = path[:index]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = dollar + brace + var + rbrace
- res += value
- else:
- var = path[:0]
- index += 1
- c = path[index:index + 1]
- while c and c in varchars:
- var += c
- index += 1
- c = path[index:index + 1]
- try:
- if environ is None:
- value = os.fsencode(os.environ[os.fsdecode(var)])
- else:
- value = environ[var]
- except KeyError:
- value = dollar + var
- res += value
- if c:
- index -= 1
+
+ def repl(m):
+ lastindex = m.lastindex
+ if lastindex is None:
+ return m[0]
+ name = m[lastindex]
+ if lastindex == 1:
+ if name == percent:
+ return name
+ if not name.endswith(percent):
+ return m[0]
+ name = name[:-1]
else:
- res += c
- index += 1
- return res
+ if name == dollar:
+ return name
+ if name.startswith(brace):
+ if not name.endswith(rbrace):
+ return m[0]
+ name = name[1:-1]
+
+ try:
+ if environ is None:
+ return os.fsencode(os.environ[os.fsdecode(name)])
+ else:
+ return environ[name]
+ except KeyError:
+ return m[0]
+
+ return sub(repl, path)
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A\B.
diff --git a/Lib/posixpath.py b/Lib/posixpath.py
index 80561ae7e52faf..4b3b7880a47a1e 100644
--- a/Lib/posixpath.py
+++ b/Lib/posixpath.py
@@ -284,42 +284,41 @@ def expanduser(path):
# This expands the forms $variable and ${variable} only.
# Non-existent variables are left unchanged.
-_varprog = None
-_varprogb = None
+_varpattern = r'\$(\w+|\{[^}]*\}?)'
+_varsub = None
+_varsubb = None
def expandvars(path):
"""Expand shell variables of form $var and ${var}. Unknown variables
are left unchanged."""
path = os.fspath(path)
- global _varprog, _varprogb
+ global _varsub, _varsubb
if isinstance(path, bytes):
if b'$' not in path:
return path
- if not _varprogb:
+ if not _varsubb:
import re
- _varprogb = re.compile(br'\$(\w+|\{[^}]*\})', re.ASCII)
- search = _varprogb.search
+ _varsubb = re.compile(_varpattern.encode(), re.ASCII).sub
+ sub = _varsubb
start = b'{'
end = b'}'
environ = getattr(os, 'environb', None)
else:
if '$' not in path:
return path
- if not _varprog:
+ if not _varsub:
import re
- _varprog = re.compile(r'\$(\w+|\{[^}]*\})', re.ASCII)
- search = _varprog.search
+ _varsub = re.compile(_varpattern, re.ASCII).sub
+ sub = _varsub
start = '{'
end = '}'
environ = os.environ
- i = 0
- while True:
- m = search(path, i)
- if not m:
- break
- i, j = m.span(0)
- name = m.group(1)
- if name.startswith(start) and name.endswith(end):
+
+ def repl(m):
+ name = m[1]
+ if name.startswith(start):
+ if not name.endswith(end):
+ return m[0]
name = name[1:-1]
try:
if environ is None:
@@ -327,13 +326,11 @@ def expandvars(path):
else:
value = environ[name]
except KeyError:
- i = j
+ return m[0]
else:
- tail = path[j:]
- path = path[:i] + value
- i = len(path)
- path += tail
- return path
+ return value
+
+ return sub(repl, path)
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A/B.
diff --git a/Lib/test/test_genericpath.py b/Lib/test/test_genericpath.py
index 6d2593cb4cf228..480dd4a87793b9 100644
--- a/Lib/test/test_genericpath.py
+++ b/Lib/test/test_genericpath.py
@@ -7,9 +7,9 @@
import sys
import unittest
import warnings
-from test.support import (
- is_apple, is_emscripten, os_helper, warnings_helper
-)
+from test import support
+from test.support import os_helper, is_emscripten
+from test.support import warnings_helper
from test.support.script_helper import assert_python_ok
from test.support.os_helper import FakePath
@@ -446,6 +446,19 @@ def check(value, expected):
os.fsencode('$bar%s bar' % nonascii))
check(b'$spam}bar', os.fsencode('%s}bar' % nonascii))
+ @support.requires_resource('cpu')
+ def test_expandvars_large(self):
+ expandvars = self.pathmodule.expandvars
+ with os_helper.EnvironmentVarGuard() as env:
+ env.clear()
+ env["A"] = "B"
+ n = 100_000
+ self.assertEqual(expandvars('$A'*n), 'B'*n)
+ self.assertEqual(expandvars('${A}'*n), 'B'*n)
+ self.assertEqual(expandvars('$A!'*n), 'B!'*n)
+ self.assertEqual(expandvars('${A}A'*n), 'BA'*n)
+ self.assertEqual(expandvars('${'*10*n), '${'*10*n)
+
def test_abspath(self):
self.assertIn("foo", self.pathmodule.abspath("foo"))
with warnings.catch_warnings():
@@ -503,7 +516,7 @@ def test_nonascii_abspath(self):
# directory (when the bytes name is used).
and sys.platform not in {
"win32", "emscripten", "wasi"
- } and not is_apple
+ } and not support.is_apple
):
name = os_helper.TESTFN_UNDECODABLE
elif os_helper.TESTFN_NONASCII:
diff --git a/Lib/test/test_ntpath.py b/Lib/test/test_ntpath.py
index e1982dfd0bdfd9..2b075871ddf78b 100644
--- a/Lib/test/test_ntpath.py
+++ b/Lib/test/test_ntpath.py
@@ -8,8 +8,7 @@
import warnings
from ntpath import ALLOW_MISSING
from test import support
-from test.support import cpython_only, os_helper
-from test.support import TestFailed, is_emscripten
+from test.support import os_helper, is_emscripten
from test.support.os_helper import FakePath
from test import test_genericpath
from tempfile import TemporaryFile
@@ -59,7 +58,7 @@ def tester(fn, wantResult):
fn = fn.replace("\\", "\\\\")
gotResult = eval(fn)
if wantResult != gotResult and _norm(wantResult) != _norm(gotResult):
- raise TestFailed("%s should return: %s but returned: %s" \
+ raise support.TestFailed("%s should return: %s but returned: %s" \
%(str(fn), str(wantResult), str(gotResult)))
# then with bytes
@@ -75,7 +74,7 @@ def tester(fn, wantResult):
warnings.simplefilter("ignore", DeprecationWarning)
gotResult = eval(fn)
if _norm(wantResult) != _norm(gotResult):
- raise TestFailed("%s should return: %s but returned: %s" \
+ raise support.TestFailed("%s should return: %s but returned: %s" \
%(str(fn), str(wantResult), repr(gotResult)))
@@ -1022,6 +1021,19 @@ def check(value, expected):
check('%spam%bar', '%sbar' % nonascii)
check('%{}%bar'.format(nonascii), 'ham%sbar' % nonascii)
+ @support.requires_resource('cpu')
+ def test_expandvars_large(self):
+ expandvars = ntpath.expandvars
+ with os_helper.EnvironmentVarGuard() as env:
+ env.clear()
+ env["A"] = "B"
+ n = 100_000
+ self.assertEqual(expandvars('%A%'*n), 'B'*n)
+ self.assertEqual(expandvars('%A%A'*n), 'BA'*n)
+ self.assertEqual(expandvars("''"*n + '%%'), "''"*n + '%')
+ self.assertEqual(expandvars("%%"*n), "%"*n)
+ self.assertEqual(expandvars("$$"*n), "$"*n)
+
def test_expanduser(self):
tester('ntpath.expanduser("test")', 'test')
@@ -1440,7 +1452,7 @@ def test_con_device(self):
self.assertTrue(os.path.exists(r"\\.\CON"))
@unittest.skipIf(sys.platform != 'win32', "Fast paths are only for win32")
- @cpython_only
+ @support.cpython_only
def test_fast_paths_in_use(self):
# There are fast paths of these functions implemented in posixmodule.c.
# Confirm that they are being used, and not the Python fallbacks in
diff --git a/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst b/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
new file mode 100644
index 00000000000000..1d152bb5318380
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-05-30-22-33-27.gh-issue-136065.bu337o.rst
@@ -0,0 +1 @@
+Fix quadratic complexity in :func:`os.path.expandvars`.
1
0
[3.11] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140856)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/20fe1821d7957c112b666ceadbdb1ad67c…
commit: 20fe1821d7957c112b666ceadbdb1ad67c8104c2
branch: 3.11
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T17:57:52+01:00
summary:
[3.11] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140856)
(cherry picked from commit fe694a6db620062f467469bd2bb987315d72fd62)
Co-authored-by: Alex Waygood <Alex.Waygood(a)Gmail.com>
files:
M Tools/clinic/clinic.py
diff --git a/Tools/clinic/clinic.py b/Tools/clinic/clinic.py
index 337ab88f7e7e93..d644b716aec790 100755
--- a/Tools/clinic/clinic.py
+++ b/Tools/clinic/clinic.py
@@ -4664,10 +4664,8 @@ def bad_node(self, node):
c_default = "NULL"
elif (isinstance(expr, ast.BinOp) or
(isinstance(expr, ast.UnaryOp) and
- not (isinstance(expr.operand, ast.Num) or
- (hasattr(ast, 'Constant') and
- isinstance(expr.operand, ast.Constant) and
- type(expr.operand.value) in (int, float, complex)))
+ not (isinstance(expr.operand, ast.Constant) and
+ type(expr.operand.value) in {int, float, complex})
)):
c_default = kwargs.get("c_default")
if not (isinstance(c_default, str) and c_default):
@@ -4769,14 +4767,10 @@ def bad_node(self, node):
self.function.parameters[key] = p
def parse_converter(self, annotation):
- if (hasattr(ast, 'Constant') and
- isinstance(annotation, ast.Constant) and
+ if (isinstance(annotation, ast.Constant) and
type(annotation.value) is str):
return annotation.value, True, {}
- if isinstance(annotation, ast.Str):
- return annotation.s, True, {}
-
if isinstance(annotation, ast.Name):
return annotation.id, False, {}
1
0
[3.12] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140850)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/a183a11db8bc2520c52814635de2df118d…
commit: a183a11db8bc2520c52814635de2df118d2d7e8c
branch: 3.12
author: Serhiy Storchaka <storchaka(a)gmail.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T17:57:28+01:00
summary:
[3.12] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140850)
(cherry picked from commit a17c57eee5b5cc81390750d07e4800b19c0c3084)
(cherry picked from commit 0329bd11c7e98484727bbb9062d53a8fa53ac7fd)
Co-authored-by: Serhiy Storchaka <storchaka(a)gmail.com>
files:
A Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
M Doc/library/html.parser.rst
M Lib/html/parser.py
M Lib/test/test_htmlparser.py
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index 6d433b5a04fc4a..606d93639c4eb2 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -15,14 +15,18 @@
This module defines a class :class:`HTMLParser` which serves as the basis for
parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
-.. class:: HTMLParser(*, convert_charrefs=True)
+.. class:: HTMLParser(*, convert_charrefs=True, scripting=False)
Create a parser instance able to parse invalid markup.
- If *convert_charrefs* is ``True`` (the default), all character
- references (except the ones in ``script``/``style`` elements) are
+ If *convert_charrefs* is true (the default), all character
+ references (except the ones in elements like ``script`` and ``style``) are
automatically converted to the corresponding Unicode characters.
+ If *scripting* is false (the default), the content of the ``noscript``
+ element is parsed normally; if it's true, it's returned as is without
+ being parsed.
+
An :class:`.HTMLParser` instance is fed HTML data and calls handler methods
when start tags, end tags, text, comments, and other markup elements are
encountered. The user should subclass :class:`.HTMLParser` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
.. versionchanged:: 3.5
The default value for argument *convert_charrefs* is now ``True``.
+ .. versionchanged:: 3.12.13
+ Added the *scripting* parameter.
+
Example HTML Parser Application
-------------------------------
@@ -159,15 +166,15 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
.. method:: HTMLParser.handle_data(data)
This method is called to process arbitrary data (e.g. text nodes and the
- content of ``<script>...</script>`` and ``<style>...</style>``).
+ content of elements like ``script`` and ``style``).
.. method:: HTMLParser.handle_entityref(name)
This method is called to process a named character reference of the form
``&name;`` (e.g. ``>``), where *name* is a general entity reference
- (e.g. ``'gt'``). This method is never called if *convert_charrefs* is
- ``True``.
+ (e.g. ``'gt'``).
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_charref(name)
@@ -175,8 +182,8 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
This method is called to process decimal and hexadecimal numeric character
references of the form :samp:`&#{NNN};` and :samp:`&#x{NNN};`. For example, the decimal
equivalent for ``>`` is ``>``, whereas the hexadecimal is ``>``;
- in this case the method will receive ``'62'`` or ``'x3E'``. This method
- is never called if *convert_charrefs* is ``True``.
+ in this case the method will receive ``'62'`` or ``'x3E'``.
+ This method is only called if *convert_charrefs* is false.
.. method:: HTMLParser.handle_comment(data)
@@ -284,8 +291,8 @@ Parsing an element with a few attributes and a title::
Data : Python
End tag : h1
-The content of ``script`` and ``style`` elements is returned as is, without
-further parsing::
+The content of elements like ``script`` and ``style`` is returned as is,
+without further parsing::
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style
@@ -294,10 +301,10 @@ further parsing::
End tag : style
>>> parser.feed('<script type="text/javascript">'
- ... 'alert("<strong>hello!</strong>");</script>')
+ ... 'alert("<strong>hello! ☺</strong>");</script>')
Start tag: script
attr: ('type', 'text/javascript')
- Data : alert("<strong>hello!</strong>");
+ Data : alert("<strong>hello! ☺</strong>");
End tag : script
Parsing comments::
@@ -317,7 +324,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once
-(unless *convert_charrefs* is set to ``True``)::
+if *convert_charrefs* is false::
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
... parser.feed(chunk)
diff --git a/Lib/html/parser.py b/Lib/html/parser.py
index 9b7556592ba473..bfab3e64cd5402 100644
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -109,17 +109,25 @@ class HTMLParser(_markupbase.ParserBase):
argument.
"""
- CDATA_CONTENT_ELEMENTS = ("script", "style")
+ # See the HTML5 specs section "13.4 Parsing HTML fragments".
+ # https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments
+ # CDATA_CONTENT_ELEMENTS are parsed in RAWTEXT mode
+ CDATA_CONTENT_ELEMENTS = ("script", "style", "xmp", "iframe", "noembed", "noframes")
RCDATA_CONTENT_ELEMENTS = ("textarea", "title")
- def __init__(self, *, convert_charrefs=True):
+ def __init__(self, *, convert_charrefs=True, scripting=False):
"""Initialize and reset this instance.
- If convert_charrefs is True (the default), all character references
+ If convert_charrefs is true (the default), all character references
are automatically converted to the corresponding Unicode characters.
+
+ If *scripting* is false (the default), the content of the
+ ``noscript`` element is parsed normally; if it's true,
+ it's returned as is without being parsed.
"""
super().__init__()
self.convert_charrefs = convert_charrefs
+ self.scripting = scripting
self.reset()
def reset(self):
@@ -154,7 +162,9 @@ def get_starttag_text(self):
def set_cdata_mode(self, elem, *, escapable=False):
self.cdata_elem = elem.lower()
self._escapable = escapable
- if escapable and not self.convert_charrefs:
+ if self.cdata_elem == 'plaintext':
+ self.interesting = re.compile(r'\Z')
+ elif escapable and not self.convert_charrefs:
self.interesting = re.compile(r'&|</%s(?=[\t\n\r\f />])' % self.cdata_elem,
re.IGNORECASE|re.ASCII)
else:
@@ -435,8 +445,10 @@ def parse_starttag(self, i):
self.handle_startendtag(tag, attrs)
else:
self.handle_starttag(tag, attrs)
- if tag in self.CDATA_CONTENT_ELEMENTS:
- self.set_cdata_mode(tag)
+ if (tag in self.CDATA_CONTENT_ELEMENTS or
+ (self.scripting and tag == "noscript") or
+ tag == "plaintext"):
+ self.set_cdata_mode(tag, escapable=False)
elif tag in self.RCDATA_CONTENT_ELEMENTS:
self.set_cdata_mode(tag, escapable=True)
return endpos
diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py
index 29f48098ae32ba..303c0baa87b026 100644
--- a/Lib/test/test_htmlparser.py
+++ b/Lib/test/test_htmlparser.py
@@ -8,6 +8,18 @@
from test import support
+SAMPLE_RCDATA = (
+ '<!-- not a comment -->'
+ "<not a='start tag'>"
+ '<![CDATA[not a cdata]]>'
+ '<!not a bogus comment>'
+ '</not a bogus comment>'
+ '\u2603'
+)
+
+SAMPLE_RAWTEXT = SAMPLE_RCDATA + '&☺'
+
+
class EventCollector(html.parser.HTMLParser):
def __init__(self, *args, autocdata=False, **kw):
@@ -293,30 +305,20 @@ def test_get_starttag_text(self):
'Date().getTime()+\'"><\\/s\'+\'cript>\');\n//]]>'),
'\n<!-- //\nvar foo = 3.14;\n// -->\n',
'<!-- \u2603 -->',
- 'foo = "</ script>"',
- 'foo = "</scripture>"',
- 'foo = "</script\v>"',
- 'foo = "</script\xa0>"',
- 'foo = "</ſcript>"',
- 'foo = "</scrıpt>"',
])
def test_script_content(self, content):
s = f'<script>{content}</script>'
- self._run_check(s, [("starttag", "script", []),
- ("data", content),
- ("endtag", "script")])
+ self._run_check(s, [
+ ("starttag", "script", []),
+ ("data", content),
+ ("endtag", "script"),
+ ])
@support.subTests('content', [
'a::before { content: "<!-- not a comment -->"; }',
'a::before { content: "¬-an-entity-ref;"; }',
'a::before { content: "<not a=\'start tag\'>"; }',
'a::before { content: "\u2603"; }',
- 'a::before { content: "< /style>"; }',
- 'a::before { content: "</ style>"; }',
- 'a::before { content: "</styled>"; }',
- 'a::before { content: "</style\v>"; }',
- 'a::before { content: "</style\xa0>"; }',
- 'a::before { content: "</ſtyle>"; }',
])
def test_style_content(self, content):
s = f'<style>{content}</style>'
@@ -324,47 +326,59 @@ def test_style_content(self, content):
("data", content),
("endtag", "style")])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /title>',
- '</ title>',
- '</titled>',
- '</title\v>',
- '</title\xa0>',
- '</tıtle>',
+ @support.subTests('tag', ['title', 'textarea'])
+ def test_rcdata_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RCDATA}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RCDATA),
+ ("endtag", tag),
])
- def test_title_content(self, content):
- source = f"<title>{content}</title>"
+ source = f"<{tag}>&</{tag}>"
self._run_check(source, [
- ("starttag", "title", []),
- ("data", content),
- ("endtag", "title"),
+ ("starttag", tag, []),
+ ('entityref', 'amp'),
+ ("endtag", tag),
])
- @support.subTests('content', [
- '<!-- not a comment -->',
- "<not a='start tag'>",
- '<![CDATA[not a cdata]]>',
- '<!not a bogus comment>',
- '</not a bogus comment>',
- '\u2603',
- '< /textarea>',
- '</ textarea>',
- '</textareable>',
- '</textarea\v>',
- '</textarea\xa0>',
+ @support.subTests('tag',
+ ['style', 'xmp', 'iframe', 'noembed', 'noframes', 'script'])
+ def test_rawtext_content(self, tag):
+ source = f"<{tag}>{SAMPLE_RAWTEXT}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", tag),
+ ])
+
+ def test_noscript_content(self):
+ source = f"<noscript>{SAMPLE_RAWTEXT}</noscript>"
+ # scripting=False -- normal mode
+ self._run_check(source, [
+ ('starttag', 'noscript', []),
+ ('comment', ' not a comment '),
+ ('starttag', 'not', [('a', 'start tag')]),
+ ('unknown decl', 'CDATA[not a cdata'),
+ ('comment', 'not a bogus comment'),
+ ('endtag', 'not'),
+ ('data', '☃'),
+ ('entityref', 'amp'),
+ ('charref', '9786'),
+ ('endtag', 'noscript'),
])
- def test_textarea_content(self, content):
- source = f"<textarea>{content}</textarea>"
+ # scripting=True -- RAWTEXT mode
+ self._run_check(source, [
+ ("starttag", "noscript", []),
+ ("data", SAMPLE_RAWTEXT),
+ ("endtag", "noscript"),
+ ], collector=EventCollector(scripting=True))
+
+ def test_plaintext_content(self):
+ content = SAMPLE_RAWTEXT + '</plaintext>' # not closing
+ source = f"<plaintext>{content}"
self._run_check(source, [
- ("starttag", "textarea", []),
+ ("starttag", "plaintext", []),
("data", content),
- ("endtag", "textarea"),
])
@support.subTests('endtag', ['script', 'SCRIPT', 'script ', 'script\n',
@@ -381,52 +395,65 @@ def test_script_closing_tag(self, endtag):
("endtag", "script")],
collector=EventCollectorNoNormalize(convert_charrefs=False))
- @support.subTests('endtag', ['style', 'STYLE', 'style ', 'style\n',
- 'style/', 'style foo=bar', 'style foo=">"'])
- def test_style_closing_tag(self, endtag):
- content = """
- b::before { content: "<!-- not a comment -->"; }
- p::before { content: "¬-an-entity-ref;"; }
- a::before { content: "<i>"; }
- a::after { content: "</i>"; }
- """
- s = f'<StyLE>{content}</{endtag}>'
- self._run_check(s, [("starttag", "style", []),
- ("data", content),
- ("endtag", "style")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['title', 'TITLE', 'title ', 'title\n',
- 'title/', 'title foo=bar', 'title foo=">"'])
- def test_title_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TitLe>{content}</{endtag}>'
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "title", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "title")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
-
- @support.subTests('endtag', ['textarea', 'TEXTAREA', 'textarea ', 'textarea\n',
- 'textarea/', 'textarea foo=bar', 'textarea foo=">"'])
- def test_textarea_closing_tag(self, endtag):
- content = "<!-- not a comment --><i>Egg & Spam</i>"
- s = f'<TexTarEa>{content}</{endtag}>'
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg & Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=True))
- self._run_check(s, [("starttag", "textarea", []),
- ('data', '<!-- not a comment --><i>Egg '),
- ('entityref', 'amp'),
- ('data', ' Spam</i>'),
- ("endtag", "textarea")],
- collector=EventCollectorNoNormalize(convert_charrefs=False))
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_closing_tag(self, tag):
+ for endtag in [tag, tag.upper(), f'{tag} ', f'{tag}\n',
+ f'{tag}/', f'{tag} foo=bar', f'{tag} foo=">"']:
+ content = "<!-- not a comment --><i>Spam</i>"
+ s = f'<{tag.upper()}>{content}</{endtag}>'
+ self._run_check(s, [
+ ("starttag", tag, []),
+ ('data', content),
+ ("endtag", tag),
+ ], collector=EventCollectorNoNormalize(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag', [
+ 'script', 'style', 'xmp', 'iframe', 'noembed', 'noframes',
+ 'textarea', 'title', 'noscript',
+ ])
+ def test_invalid_closing_tag(self, tag):
+ content = (
+ f'< /{tag}>'
+ f'</ {tag}>'
+ f'</{tag}x>'
+ f'</{tag}\v>'
+ f'</{tag}\xa0>'
+ )
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+
+ @support.subTests('tag,endtag', [
+ ('title', 'tıtle'),
+ ('style', 'ſtyle'),
+ ('style', 'ſtyle'),
+ ('style', 'style'),
+ ('iframe', 'ıframe'),
+ ('noframes', 'noframeſ'),
+ ('noscript', 'noſcript'),
+ ('noscript', 'noscrıpt'),
+ ('script', 'ſcript'),
+ ('script', 'scrıpt'),
+ ])
+ def test_invalid_nonascii_closing_tag(self, tag, endtag):
+ content = f"<br></{endtag}>"
+ source = f"<{tag}>{content}"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
+ source = f"<{tag}>{content}</{tag}>"
+ self._run_check(source, [
+ ("starttag", tag, []),
+ ("data", content),
+ ("endtag", tag),
+ ], collector=EventCollector(convert_charrefs=False, scripting=True))
@support.subTests('tail,end', [
('', False),
diff --git a/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
new file mode 100644
index 00000000000000..c30c9439a76a19
--- /dev/null
+++ b/Misc/NEWS.d/next/Security/2025-08-15-23-08-44.gh-issue-137836.b55rhh.rst
@@ -0,0 +1,3 @@
+Add support of the "plaintext" element, RAWTEXT elements "xmp", "iframe",
+"noembed" and "noframes", and optionally RAWTEXT element "noscript" in
+:class:`html.parser.HTMLParser`.
1
0
[3.9] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140854)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/fde6ac1290ac582b4f7bf95e8d9e28408d…
commit: fde6ac1290ac582b4f7bf95e8d9e28408ddffe15
branch: 3.9
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T17:56:55+01:00
summary:
[3.9] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140854)
(cherry picked from commit fe694a6db620062f467469bd2bb987315d72fd62)
Co-authored-by: Alex Waygood <Alex.Waygood(a)Gmail.com>
files:
M Tools/clinic/clinic.py
diff --git a/Tools/clinic/clinic.py b/Tools/clinic/clinic.py
index c68ee9a232078f..f35f08f010c5c5 100755
--- a/Tools/clinic/clinic.py
+++ b/Tools/clinic/clinic.py
@@ -4518,10 +4518,8 @@ def bad_node(self, node):
c_default = "NULL"
elif (isinstance(expr, ast.BinOp) or
(isinstance(expr, ast.UnaryOp) and
- not (isinstance(expr.operand, ast.Num) or
- (hasattr(ast, 'Constant') and
- isinstance(expr.operand, ast.Constant) and
- type(expr.operand.value) in (int, float, complex)))
+ not (isinstance(expr.operand, ast.Constant) and
+ type(expr.operand.value) in {int, float, complex})
)):
c_default = kwargs.get("c_default")
if not (isinstance(c_default, str) and c_default):
@@ -4613,14 +4611,10 @@ def bad_node(self, node):
self.function.parameters[parameter_name] = p
def parse_converter(self, annotation):
- if (hasattr(ast, 'Constant') and
- isinstance(annotation, ast.Constant) and
+ if (isinstance(annotation, ast.Constant) and
type(annotation.value) is str):
return annotation.value, True, {}
- if isinstance(annotation, ast.Str):
- return annotation.s, True, {}
-
if isinstance(annotation, ast.Name):
return annotation.id, False, {}
1
0
[3.10] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140855)
by ambv Oct. 31, 2025
by ambv Oct. 31, 2025
Oct. 31, 2025
https://github.com/python/cpython/commit/9524203deefb6d4ea6a502661f855961de…
commit: 9524203deefb6d4ea6a502661f855961dee1af85
branch: 3.10
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ambv <lukasz(a)langa.pl>
date: 2025-10-31T17:56:30+01:00
summary:
[3.10] gh-90953: Don't use deprecated AST nodes in clinic.py (GH-104322) (GH-140855)
(cherry picked from commit fe694a6db620062f467469bd2bb987315d72fd62)
Co-authored-by: Alex Waygood <Alex.Waygood(a)Gmail.com>
files:
M Tools/clinic/clinic.py
diff --git a/Tools/clinic/clinic.py b/Tools/clinic/clinic.py
index b0d1717596f6b1..c9cbf0b2e29f4e 100755
--- a/Tools/clinic/clinic.py
+++ b/Tools/clinic/clinic.py
@@ -4558,10 +4558,8 @@ def bad_node(self, node):
c_default = "NULL"
elif (isinstance(expr, ast.BinOp) or
(isinstance(expr, ast.UnaryOp) and
- not (isinstance(expr.operand, ast.Num) or
- (hasattr(ast, 'Constant') and
- isinstance(expr.operand, ast.Constant) and
- type(expr.operand.value) in (int, float, complex)))
+ not (isinstance(expr.operand, ast.Constant) and
+ type(expr.operand.value) in {int, float, complex})
)):
c_default = kwargs.get("c_default")
if not (isinstance(c_default, str) and c_default):
@@ -4658,14 +4656,10 @@ def bad_node(self, node):
self.function.parameters[key] = p
def parse_converter(self, annotation):
- if (hasattr(ast, 'Constant') and
- isinstance(annotation, ast.Constant) and
+ if (isinstance(annotation, ast.Constant) and
type(annotation.value) is str):
return annotation.value, True, {}
- if isinstance(annotation, ast.Str):
- return annotation.s, True, {}
-
if isinstance(annotation, ast.Name):
return annotation.id, False, {}
1
0