Support localization of unicode descriptions
Dear all, Who might also be interested in setting up a project that supports localization for Unicode block description and character description. Translations are available from https://github.com/unicode-table/unicode-table-data/tree/master/loc <https://github.com/unicode-table/unicode-table-data/tree/master/loc> If possible, use a gettext approach similar to https://pypi.org/project/pycountry/ <https://pypi.org/project/pycountry/> Implementing this feature will allow users to read Unicode descriptions in their own language, other than English. For example, now is possible only in English: from unicodedata import name print(name('ß')) LATIN SMALL LETTER SHARP S So unicodedata could provide a way to translate LATIN SMALL LETTER SHARP S to e.g. German with: from unicodedata import name from unicodedata_l10n import LOCALED_DIR from gettext import translation german = translation('UnicodeData' LOCALED_DIR, languages=['de']) german.install() print(_(name('ß'))) LATEINISCHER KLEINBUCHSTABE SCHARFES S and something similar for unicodedata.category Best, Pander
On 10 July 2018 at 08:57, Pander <pander@users.sourceforge.net> wrote:
Dear all,
Who might also be interested in setting up a project that supports localization for Unicode block description and character description.
Translations are available from https://github.com/unicode-table/unicode-table-data/tree/master/loc If possible, use a gettext approach similar to https://pypi.org/project/pycountry/
Implementing this feature will allow users to read Unicode descriptions in their own language, other than English.
Is this a Unicode Consortium standard, or a 3rd party project? The website wasn't completely clear on the matter, but there's nothing I could find on the Unicode website about translations of the standard name (there's also nothing that specifically explains the choice to use English for the standard names...). If it's not part of the standard, then there's an argument that the Python implementation of this should also be a 3rd party package, rather than being in the stdlib. Is this feature available on PyPI at the moment? Also, would this not lead to non-English speakers expecting that the localised names would work in "\N{...}" notation? Paul
... Is this a Unicode Consortium standard, or a 3rd party project? The website wasn't completely clear on the matter, but there's nothing I could find on the Unicode website about translations of the standard name (there's also nothing that specifically explains the choice to use English for the standard names...). If it's not part of the standard, then there's an argument that the Python implementation of this should also be a 3rd party package, rather than being in the stdlib. Is this feature available on PyPI at the moment? This is a third party initiative. The translations are contributed by volunteers. I have talked with Python core developers and they suggested to post this here, as it is for them out of scope for Python std lib. At
On 07/10/2018 10:34 AM, Paul Moore wrote: the moment there is no implementation yet. That is what I would like to discuss here.
Also, would this not lead to non-English speakers expecting that the localised names would work in "\N{...}" notation?
Don't know. If an implementation has been made, it should be positioned very carefully. Pander
Paul
On 10 July 2018 at 09:45, Pander <pander@users.sourceforge.net> wrote:
On 07/10/2018 10:34 AM, Paul Moore wrote:
... Is this a Unicode Consortium standard, or a 3rd party project? The website wasn't completely clear on the matter, but there's nothing I could find on the Unicode website about translations of the standard name (there's also nothing that specifically explains the choice to use English for the standard names...). If it's not part of the standard, then there's an argument that the Python implementation of this should also be a 3rd party package, rather than being in the stdlib. Is this feature available on PyPI at the moment?
This is a third party initiative. The translations are contributed by volunteers. I have talked with Python core developers and they suggested to post this here, as it is for them out of scope for Python std lib. At the moment there is no implementation yet. That is what I would like to discuss here.
Thanks for the clarification. I'd say that in that case, this should probably be created as a 3rd party project on PyPI in the first instance. If it becomes popular and is useful to a sufficiently large user base, it could then be included in the stdlib. Your example code uses a separate unicodedata_l10n package, and it would be very easy to publish that on PyPI and later move it unchanged to the stdlib if needed.
Also, would this not lead to non-English speakers expecting that the localised names would work in "\N{...}" notation?
Don't know. If an implementation has been made, it should be positioned very carefully.
Having this as a 3rd party project would make it much less likely that users would expect \N support, IMO. Paul
On 7/10/2018 4:45 AM, Pander wrote:
This is a third party initiative. The translations are contributed by volunteers. I have talked with Python core developers and they suggested to post this here, as it is for them out of scope for Python std lib.
Python-ideas list is for discussion of python and the stdlib library. This is not a place for prolonged discussion of pypi projects. It *is* a place to discuss adding a hook that can be used to access translations. There are both official doc translations, accessible from the official doc pages, and others that are independent. The official ones, at least, are discussed on the doc-sig list https://mail.python.org/mailman/listinfo/doc-sig There are currently 7 languages and coordinators listed at https://devguide.python.org/experts/#documentation-translations 4 have progressed far enough to be listed in the drop-down box on https://docs.python.org/3/ I should think that these people should be asked if they want to be involved with unicode description translations. They should certainly have some helpful advice. The description vocabulary is rather restricted, so a word translation dictionary should be pretty easy. For at least for some languages, it should be possible to generate the 200000 description translations from this. The main issues are word order and language-dependent 'word' units. Hence, the three English words "LATIN SMALL LETTER" become two words in German, 'LATEINISCHER KLEINBUCHSTABE', versus three words in Spanish, but in reverse order, 'LETRA PEQUEÑA LATINA'. It is possible that the doc translators already uses translation software that deal with these issues. -- Terry Jan Reedy
The problem with non-canonical translations of the Unicode character names is that there is not one unique possible rendering into language X. Equally, I could find synonyms in general English for the names, but one would be official, the others at best informally clarifying. For informational purposes I think it's great to have a third party project to find out "Unicode character named 'Something In English' is roughly translated as <whatever> in your native language." But it's hard to see how an unofficial loose cross-language dictionary should be party of the standard library. On Tue, Jul 10, 2018, 5:11 PM Terry Reedy <tjreedy@udel.edu> wrote:
On 7/10/2018 4:45 AM, Pander wrote:
This is a third party initiative. The translations are contributed by volunteers. I have talked with Python core developers and they suggested to post this here, as it is for them out of scope for Python std lib.
Python-ideas list is for discussion of python and the stdlib library. This is not a place for prolonged discussion of pypi projects. It *is* a place to discuss adding a hook that can be used to access translations.
There are both official doc translations, accessible from the official doc pages, and others that are independent. The official ones, at least, are discussed on the doc-sig list https://mail.python.org/mailman/listinfo/doc-sig There are currently 7 languages and coordinators listed at https://devguide.python.org/experts/#documentation-translations 4 have progressed far enough to be listed in the drop-down box on https://docs.python.org/3/
I should think that these people should be asked if they want to be involved with unicode description translations. They should certainly have some helpful advice.
The description vocabulary is rather restricted, so a word translation dictionary should be pretty easy. For at least for some languages, it should be possible to generate the 200000 description translations from this. The main issues are word order and language-dependent 'word' units. Hence, the three English words "LATIN SMALL LETTER" become two words in German, 'LATEINISCHER KLEINBUCHSTABE', versus three words in Spanish, but in reverse order, 'LETRA PEQUEÑA LATINA'. It is possible that the doc translators already uses translation software that deal with these issues.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 7/10/2018 5:20 PM, David Mertz wrote:
The problem with non-canonical translations of the Unicode character names is that there is not one unique possible rendering into language X. Equally, I could find synonyms in general English for the names, but one would be official, the others at best informally clarifying.
If the Unicode consortium does not provide official translations, then we *might* 'bless' some other source. The first place I would look would be the translators we already trust enough to display their work on the official doc page.
For informational purposes I think it's great to have a third party project to find out "Unicode character named 'Something In English' is roughly translated as <whatever> in your native language." But it's hard to see how an unofficial loose cross-language dictionary should be party of the standard library.
The doc translations are intentionally not in the cpython repository and not in the cpython distribution and not considered part of the stdlib. In general, core devs have no particular expertise, interest, or time to vet translators and review translations. A proposal to make turtle a package and put translations of turtle commands in a submodule got no traction. "Put it on Pypi" I have rejected proposals to put translations of IDLE's menus in an idlelib subdirectory, to be distributed with cpython as 'part' of the stdlib. I am thinking about various ideas to allow users to customize the menu, either by editing a file or processing a download. (For instance, the Japanese doc translation includes the IDLE chapter, which has a list of menu items and descriptions.) But this is a different issue. The repository for unicode description translations should also be other than the cpython repository.
On Tue, Jul 10, 2018, 5:11 PM Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 7/10/2018 4:45 AM, Pander wrote:
> This is a third party initiative. The translations are contributed by > volunteers. I have talked with Python core developers and they suggested > to post this here, as it is for them out of scope for Python std lib.
Python-ideas list is for discussion of python and the stdlib library. This is not a place for prolonged discussion of pypi projects. It *is* a place to discuss adding a hook that can be used to access translations.
There are both official doc translations, accessible from the official doc pages, and others that are independent. The official ones, at least, are discussed on the doc-sig list https://mail.python.org/mailman/listinfo/doc-sig There are currently 7 languages and coordinators listed at https://devguide.python.org/experts/#documentation-translations 4 have progressed far enough to be listed in the drop-down box on https://docs.python.org/3/
I should think that these people should be asked if they want to be involved with unicode description translations. They should certainly have some helpful advice.
The description vocabulary is rather restricted, so a word translation dictionary should be pretty easy. For at least for some languages, it should be possible to generate the 200000 description translations from this. The main issues are word order and language-dependent 'word' units. Hence, the three English words "LATIN SMALL LETTER" become two words in German, 'LATEINISCHER KLEINBUCHSTABE', versus three words in Spanish, but in reverse order, 'LETRA PEQUEÑA LATINA'. It is possible that the doc translators already uses translation software that deal with these issues.
-- Terry Jan Reedy
On 7/11/18 1:14 AM, Terry Reedy wrote:
On 7/10/2018 5:20 PM, David Mertz wrote:
The problem with non-canonical translations of the Unicode character names is that there is not one unique possible rendering into language X. Equally, I could find synonyms in general English for the names, but one would be official, the others at best informally clarifying.
If the Unicode consortium does not provide official translations, then we *might* 'bless' some other source. The first place I would look would be the translators we already trust enough to display their work on the official doc page.
For informational purposes I think it's great to have a third party project to find out "Unicode character named 'Something In English' is roughly translated as <whatever> in your native language." But it's hard to see how an unofficial loose cross-language dictionary should be party of the standard library.
The doc translations are intentionally not in the cpython repository and not in the cpython distribution and not considered part of the stdlib. In general, core devs have no particular expertise, interest, or time to vet translators and review translations.
A proposal to make turtle a package and put translations of turtle commands in a submodule got no traction. "Put it on Pypi"
I have rejected proposals to put translations of IDLE's menus in an idlelib subdirectory, to be distributed with cpython as 'part' of the stdlib. I am thinking about various ideas to allow users to customize the menu, either by editing a file or processing a download. (For instance, the Japanese doc translation includes the IDLE chapter, which has a list of menu items and descriptions.) But this is a different issue.
The repository for unicode description translations should also be other than the cpython repository.
I have made the following collection of scripts https://github.com/OpenTaal/python-unicodedata_l10n as a start for offering l18n support. At the moment, there are only a few languages with a wide coverage of translations, but that is progressing slowly at https://unicode-table.com/ I made it so that PO and MO files are being generated. In order to package and publish this on e.g. PyPI, I'm looking for someone who has more experience in that area.
On Tue, Jul 10, 2018, 5:11 PM Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 7/10/2018 4:45 AM, Pander wrote:
> This is a third party initiative. The translations are contributed by > volunteers. I have talked with Python core developers and they suggested > to post this here, as it is for them out of scope for Python std lib.
Python-ideas list is for discussion of python and the stdlib library. This is not a place for prolonged discussion of pypi projects. It *is* a place to discuss adding a hook that can be used to access translations.
There are both official doc translations, accessible from the official doc pages, and others that are independent. The official ones, at least, are discussed on the doc-sig list https://mail.python.org/mailman/listinfo/doc-sig There are currently 7 languages and coordinators listed at https://devguide.python.org/experts/#documentation-translations 4 have progressed far enough to be listed in the drop-down box on https://docs.python.org/3/
I should think that these people should be asked if they want to be involved with unicode description translations. They should certainly have some helpful advice.
The description vocabulary is rather restricted, so a word translation dictionary should be pretty easy. For at least for some languages, it should be possible to generate the 200000 description translations from this. The main issues are word order and language-dependent 'word' units. Hence, the three English words "LATIN SMALL LETTER" become two words in German, 'LATEINISCHER KLEINBUCHSTABE', versus three words in Spanish, but in reverse order, 'LETRA PEQUEÑA LATINA'. It is possible that the doc translators already uses translation software that deal with these issues.
On 10/07/2018 23.20, David Mertz wrote:
The problem with non-canonical translations of the Unicode character names is that there is not one unique possible rendering into language X. Equally, I could find synonyms in general English for the names, but one would be official, the others at best informally clarifying.
Let's not forget that some official names of unicode symbols are either misleading or entirely wrong, but cannot be changed. See e.g. https://www.unicode.org/notes/tn27/tn27-4.html
For informational purposes I think it's great to have a third party project to find out "Unicode character named 'Something In English' is roughly translated as <whatever> in your native language." But it's hard to see how an unofficial loose cross-language dictionary should be party of the standard library.
On Tue, Jul 10, 2018, 5:11 PM Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 7/10/2018 4:45 AM, Pander wrote:
> This is a third party initiative. The translations are contributed by > volunteers. I have talked with Python core developers and they suggested > to post this here, as it is for them out of scope for Python std lib.
Python-ideas list is for discussion of python and the stdlib library. This is not a place for prolonged discussion of pypi projects. It *is* a place to discuss adding a hook that can be used to access translations.
There are both official doc translations, accessible from the official doc pages, and others that are independent. The official ones, at least, are discussed on the doc-sig list https://mail.python.org/mailman/listinfo/doc-sig There are currently 7 languages and coordinators listed at https://devguide.python.org/experts/#documentation-translations 4 have progressed far enough to be listed in the drop-down box on https://docs.python.org/3/
I should think that these people should be asked if they want to be involved with unicode description translations. They should certainly have some helpful advice.
The description vocabulary is rather restricted, so a word translation dictionary should be pretty easy. For at least for some languages, it should be possible to generate the 200000 description translations from this. The main issues are word order and language-dependent 'word' units. Hence, the three English words "LATIN SMALL LETTER" become two words in German, 'LATEINISCHER KLEINBUCHSTABE', versus three words in Spanish, but in reverse order, 'LETRA PEQUEÑA LATINA'. It is possible that the doc translators already uses translation software that deal with these issues.
-- Terry Jan Reedy
participants (5)
-
David Mertz
-
Pander
-
Paul Moore
-
Terry Reedy
-
Thomas Jollans