[New-bugs-announce] [issue35628] Allow lazy loading of translations in gettext.
s-ball
report at bugs.python.org
Mon Dec 31 08:14:02 EST 2018
New submission from s-ball <s-ball at laposte.net>:
When working on i18n, I realized that msgfmt.py did not generate any hash table. One step further, I realized that the gettext.py would not have used it because it unconditionnaly loads the whole translation files and contains the following TODO message:
TODO:
- Lazy loading of .mo files. Currently the entire catalog is loaded into
memory, but that's probably bad for large translated programs. Instead,
the lexical sort of original strings in GNU .mo files should be exploited
to do binary searches and lazy initializations. Or you might want to use
the undocumented double-hash algorithm for .mo files with hash tables, but
you'll need to study the GNU gettext code to do this.
I have studied the code, and found that it should not be too complex to implement it in pure Python. I have posted a message on python-ideas about it and here are my conclusion:
Features:
========
The gettext module should be allowed to load lazily the catalogs from mo
file. This lazy load should be optional and make use of the hash tables
from mo files when they are present or revert to a binary search. The
translation strings should be cached for better performances.
API changes:
============
3 functions from the gettext module will have 2 new optional parameter
named caching, and keepopen:
gettext.bindtextdomain(domain, localedir=None) would become
gettext.bindtextdomain(domain, localedir=None, caching=None, keepopen=False)
gettext.translation(domain, localedir=None, languages=None, class_=None,
fallback=False, codeset=None) would become
gettext.translation(domain, localedir=None, languages=None, class_=None,
fallback=False, codeset=None, caching=None, keepopen=False)
gettext.install(domain, localedir=None, codeset=None, names=None) would
become
gettext.install(domain, localedir=None, codeset=None, names=None,
caching=None, keepopen=False)
The new caching parameter could receive the following values:
caching=None: revert to the previour eager loading of the full catalog.
It will be the default to allow previous application to see no change
caching=1: lazy loading with unlimited cache
caching=n where n is a positive (>=0) integer value: lazy loading with a
LRU cache limited to n strings
The keepopen parameter would be a boolean:
keepopen=False (default): the mo file is only opened before loading a
translation string and closed immediately after - it is also opened once
when the GNUTranslation class is initialized to load the file description
keepopen=True: the mo file is kept open during the lifetime of the
GNUTranslation object.
This parameter is ignored and not used if caching is None
Implementation:
==============
The current GNUTranslation class loads the content of the mo file to
build a dictionnary where the original strings are the keys and the
translated keys the values. Plural forms use a special processing: the
key is a 2 tuple (singular original string, order), and the value is the
corresponding translated string - order=0 is normally for the singular
translated string.
The proposed implementation would simply replace this dictionary with a
special mapping subclass when caching is not None. That subclass would
use same keys as the original directory and would:
- first search in its cache
- if not found in cache and if the hashtable has not a zero size search
the original string by hash
- if not found in cache and if the hashtable has a zero size, search the
original string with a binary search algorithm.
- if a string is found, it should feed the LRU cache, eventually
throwing away the oldest entry (entries)
That should allow to implement the new feature with minimal refactoring
for the gettext module.
But I also propose to change msgfmt.py to build the hashtable. IMHO, the function should lie in the standard library probably as a submodule of gettext to allow various Python projects (pybabel, django) to directly use it instead of developping their own ones.
I will probably submit a PR in a while but it will will require some time to propose a full implementation with a correct test coverage.
----------
components: Library (Lib)
messages: 332815
nosy: s-ball
priority: normal
severity: normal
status: open
title: Allow lazy loading of translations in gettext.
type: enhancement
versions: Python 3.8
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35628>
_______________________________________
More information about the New-bugs-announce
mailing list