gh-118761: Improve the import time of ``gettext`` (#128898)
https://github.com/python/cpython/commit/c9c9fcb8fcc3ef43e1d8bd71ae0ed3d4231... commit: c9c9fcb8fcc3ef43e1d8bd71ae0ed3d4231a6013 branch: main author: Eli Schwartz <eschwartz@gentoo.org> committer: AA-Turner <9087854+AA-Turner@users.noreply.github.com> date: 2025-01-20T00:01:20Z summary: gh-118761: Improve the import time of ``gettext`` (#128898) ``gettext`` is often imported in programs that may not end up translating anything. In fact, the ``struct`` module already has a delayed import when parsing ``GNUTranslations`` to speed up the no ``.mo`` files case. The re module is also used in the same situation, but behind a function chain only called by ``GNUTranslations``. Cache the compiled regex globally the first time it is used. The finditer function is converted to a method call on the compiled object which is slightly more efficient, and necessary for the delayed re import. files: A Misc/NEWS.d/next/Library/2025-01-15-19-16-50.gh-issue-118761.cbW2ZL.rst M Lib/gettext.py diff --git a/Lib/gettext.py b/Lib/gettext.py index a0d81cf846a05c..4c1f9427459b14 100644 --- a/Lib/gettext.py +++ b/Lib/gettext.py @@ -48,7 +48,6 @@ import operator import os -import re import sys @@ -70,22 +69,26 @@ # https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms # http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plura... -_token_pattern = re.compile(r""" - (?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs - (?P<NUMBER>[0-9]+\b) | # decimal integer - (?P<NAME>n\b) | # only n is allowed - (?P<PARENTHESIS>[()]) | - (?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >, - # <=, >=, ==, !=, &&, ||, - # ? : - # unary and bitwise ops - # not allowed - (?P<INVALID>\w+|.) # invalid token - """, re.VERBOSE|re.DOTALL) - +_token_pattern = None def _tokenize(plural): - for mo in re.finditer(_token_pattern, plural): + global _token_pattern + if _token_pattern is None: + import re + _token_pattern = re.compile(r""" + (?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs + (?P<NUMBER>[0-9]+\b) | # decimal integer + (?P<NAME>n\b) | # only n is allowed + (?P<PARENTHESIS>[()]) | + (?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >, + # <=, >=, ==, !=, &&, ||, + # ? : + # unary and bitwise ops + # not allowed + (?P<INVALID>\w+|.) # invalid token + """, re.VERBOSE|re.DOTALL) + + for mo in _token_pattern.finditer(plural): kind = mo.lastgroup if kind == 'WHITESPACES': continue diff --git a/Misc/NEWS.d/next/Library/2025-01-15-19-16-50.gh-issue-118761.cbW2ZL.rst b/Misc/NEWS.d/next/Library/2025-01-15-19-16-50.gh-issue-118761.cbW2ZL.rst new file mode 100644 index 00000000000000..0eef8777512dd8 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2025-01-15-19-16-50.gh-issue-118761.cbW2ZL.rst @@ -0,0 +1,3 @@ +Reduce import time of :mod:`gettext` by up to ten times, by importing +:mod:`re` on demand. In particular, ``re`` is no longer implicitly +exposed as ``gettext.re``. Patch by Eli Schwartz.
participants (1)
-
AA-Turner