i18n and Python tracebacks
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example. For example, using French as my default local language, instead of
1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero
I might get something like
1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro
André
On Sat, 15 May 2010 11:02:35 -0300 Andre Roberge <andre.roberge@gmail.com> wrote:
1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro
I'm not sure it's a good idea. The fact that these messages are always in English makes it possible: - to share them with other developers in order to get help - to parse them in order to assert certain kind of errors These messages are primarily meant for developers, not users. (as a sidenote, I regularly get annoyed by gcc's "translated" error messages -- especially how crappy the French translation often is. It's always better to get a good English error message than a horrible French one) Antoine.
On Sat, 15 May 2010 17:04:29 +0200 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 15 May 2010 11:02:35 -0300 Andre Roberge <andre.roberge@gmail.com> wrote:
1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro
I'm not sure it's a good idea. The fact that these messages are always in English makes it possible: - to share them with other developers in order to get help - to parse them in order to assert certain kind of errors
These messages are primarily meant for developers, not users.
(as a sidenote, I regularly get annoyed by gcc's "translated" error messages -- especially how crappy the French translation often is. It's always better to get a good English error message than a horrible French one)
Antoine.
I share this point of view (while my mother tongue is french as well). Distinguish the language user (developper) from app end-user. Now, at another level, it may also be considered that people are able to program using their own language; it's also fair and good from the pov of diversity. It may help spreading & developping "the art of programming" by removing an important entry barrier. But it's a very big effort and there should be a reference anyway (*). Denis (*) In "the best of all possible worlds", an IAL... http://en.wikipedia.org/wiki/International_auxiliary_language ________________________________ vit esse estrany ☣ spir.wikidot.com
On Sat, May 15, 2010 at 11:02 AM, Andre Roberge <andre.roberge@gmail.com> wrote:
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example.
+1 In PyAr (Python Argentina) we are talking about this too: http://mx.grulic.org.ar/lurker/message/20100513.021134.7443563e.es.html It would be great if there would be a standard mechanism as PO for message translation. In that case, Pootle could be used to get a colaborative reviewed translation. English is a important barrier in some cases, like teaching Python to non-advanced users that in general don't read/speak english (in our country, the language is spanish). Allowing changing locale messages language at runtime may be interesting too (like in PostgreSQL: SET lc_messages TO 'en_US.UTF-8'; )
For example, using French as my default local language, instead of
1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero
I might get something like
1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro
In spanish a possible translation would be: Traza de rastreo (llamada más reciente a lo último): Archivo "<stdin>", línea 1, en <módulo> ZeroDivisionError: división entera o módulo por cero We are setting up a local wiki page trying to address this issues: http://python.org.ar/pyar/MensajesExcepcionales Best regards, Mariano Reingart http://www.python.org.ar http://www.sistemasagiles.com.ar http://reingart.blogspot.com
On 5/15/2010 10:02 AM, Andre Roberge wrote:
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example.
If you change the proposal to having a translated version 'in addition to' rather than 'instead of' the English version, I would be in favor of it as an option. Then non-English speakers would gradually learn a bit of English from each error, and people like me could also get a boost on learning the math/comp vocabulary of another language, such as Spanish. Since a decent translation will not necessarily have substitution fields in the same order, this proposal requires that they be indicated and filled by name rather than position. This is easy with the new 3.x string formatting system, but I have no idea how it is done presently. Use of unicode as the string type in 3.x, including for identifiers, makes internationalization (whew, no wonder people abbreviate that as i8n) of Python, to whatever level, easier than with 2.x. But I think further steps will require more initiative from the various other-language communities. In other words, more is needed than 'I think it would be a good idea...'. I also think it would be good if they cooperated to not re-invent the wheel (differently) for each language and form something like an Intermation Python Working Group (assuming there is not such now). Current core developers, of necessity, are comfortable enough with the current situation and mostly have other itches to scratch. Terry Jan Reedy
I think it's important as Antoine notes to preserve the ability for code to read and interpret the error messages. If changes are being made here (after the moratorium presumably) it would be nice if the changes made it easier to parse errors by explicitly delimiting the error message text rather than requiring ad hoc parsing. Of course I'm saying "would be nice" rather than offering a compelling argument here. My thought is that *if* changes are being made here anyway then it's worth considering. --- Bruce http://www.vroospeak.com http://jarlsberg.appspot.com On Sat, May 15, 2010 at 9:23 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 5/15/2010 10:02 AM, Andre Roberge wrote:
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example.
If you change the proposal to having a translated version 'in addition to' rather than 'instead of' the English version, I would be in favor of it as an option. Then non-English speakers would gradually learn a bit of English from each error, and people like me could also get a boost on learning the math/comp vocabulary of another language, such as Spanish.
Since a decent translation will not necessarily have substitution fields in the same order, this proposal requires that they be indicated and filled by name rather than position. This is easy with the new 3.x string formatting system, but I have no idea how it is done presently.
Use of unicode as the string type in 3.x, including for identifiers, makes internationalization (whew, no wonder people abbreviate that as i8n) of Python, to whatever level, easier than with 2.x. But I think further steps will require more initiative from the various other-language communities. In other words, more is needed than 'I think it would be a good idea...'. I also think it would be good if they cooperated to not re-invent the wheel (differently) for each language and form something like an Intermation Python Working Group (assuming there is not such now). Current core developers, of necessity, are comfortable enough with the current situation and mostly have other itches to scratch.
Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 5/15/2010 3:36 PM, Bruce Leban wrote:
I think it's important as Antoine notes to preserve the ability for code to read and interpret the error messages.
If the English message were always present, in a known position, and a translation were optionally provided (perhaps bracked in <<>>, for instance) in addition, then that should remain true. tjr
On Sat, May 15, 2010 at 1:23 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Python, to whatever level, easier than with 2.x. But I think further steps will require more initiative from the various other-language communities. In other words, more is needed than 'I think it would be a good idea...'. I also think it would be good if they cooperated to not re-invent the wheel (differently) for each language and form something like an Intermation Python Working Group (assuming there is not such now). Current core developers, of necessity, are comfortable enough with the current situation and mostly have other itches to scratch.
As we were talking about this issues here in our local mailing list too, I set forward and made a draft proposal with some thoughts: http://python.org.ar/pyar/TracebackInternationalizationProposal It includes very very early patch against trunk, with some messages translated to spanish (harcoded). Sorry if there is any mistake, I hope the interested people (here in Argentina at least), with more experience in C and Python, would help me to fix/enhance this and/or champion it. Do you think this is the right way? Any advice will be appreciated, and any help is welcome BTW, as you may have noticed, my first language is Spanish, so pardon my English. Best regards, Mariano Reingart
Mariano Reingart wrote:
Sorry if there is any mistake, I hope the interested people (here in Argentina at least), with more experience in C and Python, would help me to fix/enhance this and/or champion it.
Do you think this is the right way?
The basic concept appears sound, but you'll want to work against the py3k branch rather than trunk. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, May 16, 2010 at 1:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Mariano Reingart wrote:
Sorry if there is any mistake, I hope the interested people (here in Argentina at least), with more experience in C and Python, would help me to fix/enhance this and/or champion it.
Do you think this is the right way?
The basic concept appears sound, but you'll want to work against the py3k branch rather than trunk.
Done (sorry for the 2-year delay), it implements Py_GETTEXT against py3.3+: http://bugs.python.org/issue16344 Updated proposal: http://python.org.ar/pyar/TracebackInternationalizationProposal BTW, I've make a patch for a related issue too (utf-8): http://bugs.python.org/issue16343 If this Traceback Internationalization Proposal makes sense, I could present it on the PyCon Argentina 2012 Core-Python Sprint to see if we can advance it: http://ar.pycon.org/2012/projects/index#134 Best regards Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com
Andre Roberge writes:
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example.
If you do this, you really need a way to recover the original message (or a pointer to it) for programs that automatically analyze the tracebacks. A hash code or something like that might do the trick. And, no, running the program again with --trace-lang=C is not good enough; there's no guarantee you can reproduce. AFAIK this requires an extension to the current localization infrastructure, both gettext for C and in Python.
On 5/16/2010 4:09 AM, Stephen J. Turnbull wrote:
Andre Roberge writes:
I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example.
If you do this, you really need a way to recover the original message (or a pointer to it) for programs that automatically analyze the tracebacks. A hash code or something like that might do the trick. And, no, running the program again with --trace-lang=C is not good enough; there's no guarantee you can reproduce.
I already posted the suggestion, from gmane, that translation be in addition to rather than instead of the English original, both for the above reason and for human language learning either way.
AFAIK this requires an extension to the current localization infrastructure, both gettext for C and in Python.
It will probably required efforts of more than 1 person, perhaps from multiple language communities (who would most likely have to cooperate in English ;-). Terry Jan Reedy
Couldn't this be done first as a simple module that wraps a try block around the interactive prompt and changes known error messages to their translated counterparts? It would probably make sense to see if there's any traction for the idea first that way before changing core Python. Does anyone know what happened to the Chinese Python project? Did that ever get any significant user base? -- Carl Johnson
Carl M. Johnson wrote:
Couldn't this be done first as a simple module that wraps a try block around the interactive prompt and changes known error messages to their translated counterparts? It would probably make sense to see if there's any traction for the idea first that way before changing core Python.
It would actually be interesting to see just how far someone could get purely with sys.excepthook. It would be subject to some fairly significant limitations (particularly when it comes to reparsing strings with interpolated values), but the traceback parsing and comparison code in doctest may offer a good starting point. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan writes:
It would actually be interesting to see just how far someone could get [on translating tracebacks] purely with sys.excepthook.
It would be subject to some fairly significant limitations (particularly when it comes to reparsing strings with interpolated values), but the traceback parsing and comparison code in doctest may offer a good starting point.
Actually, it shouldn't be too hard to handle the interpolations. In fact the language to be parsed is probably mostly pretty simple, and can be automatically translated to BNF or whatever input your favorite parsing library wants from the .pot file. The generated grammar probably would be on the order of the size of the .pot file, no? It could be stored with the .mos as a "pseudo-translation".
On Tue, May 18, 2010 at 12:14 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Nick Coghlan writes:
It would actually be interesting to see just how far someone could get [on translating tracebacks] purely with sys.excepthook.
It would be subject to some fairly significant limitations (particularly when it comes to reparsing strings with interpolated values), but the traceback parsing and comparison code in doctest may offer a good starting point.
Actually, it shouldn't be too hard to handle the interpolations. In fact the language to be parsed is probably mostly pretty simple, and can be automatically translated to BNF or whatever input your favorite parsing library wants from the .pot file. The generated grammar probably would be on the order of the size of the .pot file, no? It could be stored with the .mos as a "pseudo-translation".
Interpolation is not very hard (although it could be error prone). I tried that with some regex but I'd found some dead-ends because some messages are hard-coded at the interpreter level, so they cannot be implemented purely with sys.excepthook I'd created a parallel project just if anyone is interested (would be the pure-python version but it would require too much work): http://code.google.com/p/pydiversity/ Maybe I missed something, but the gettext approach seems more consistent and cleaner, and IMHO using gettext is easier than rewriting an interpreter :-) [sorry for the 2-year delay] Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com
participants (9)
-
Andre Roberge
-
Antoine Pitrou
-
Bruce Leban
-
Carl M. Johnson
-
Mariano Reingart
-
Nick Coghlan
-
spir ☣
-
Stephen J. Turnbull
-
Terry Reedy