Re: [Python-ideas] Needing help to change the grammar

(moving discussion to Python Ideas) (Context for py-ideas: a teacher in Brazil is working on a Python language variant that uses Portuguese rather than English-based keywords. This is intended for use in teaching introductory programming lessons, not as a professional development tool) Glenn Linderman wrote:
Making that work would actually require something like the file encoding cookie that is detected at the parsing stage. Otherwise the parser and compiler would choke on the unexpected keywords long before the interpreter reached the stage of attempting to import anything. Adjusting the parser to accept different keyword names would be even more difficult though, since changing the details of the grammar definition is a lot more invasive than just changing the encoding of the file being read. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:
Cheers, Nick.
Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware. Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by: * assigning a special extension (eg .pybr) to the 'special' source code files, * associating this extension to the preprocessing program... * that would pass the back-translated .py source to python. [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting). * Any developper can edit any source according to his/her own preferences. * Python does not need to care about that. * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax. The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.] Denis ------ la vita e estrany

spir wrote:
My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance. That would be noticed before reading the file. Cached modules would still be standard .pyc, interoperable with .pyc compiled from normal Python. I am presuming this would work on all systems.
The OP was proposing to change 'is not' to the equivalent of 'not is'. I am not sure of how critical that would actually be. For the purpose of easing transition to international Python, not messing with statement word order would be a plus.
This might be easier than changing the interpreter. The extension could just as be be read and written by an editor. The problem is the multiple editors. The reason I susggested some support in the core for nationalization is that I think a) it is inevitable, in spite of the associated problem of ghettoization, while b) ghettoization should be discourage and can be ameliorated with a bit of core support. I am aware, of course, that such support, by removing one barrier to nationalization, will accelerate the development of such versions. Terry Jan Reedy

Terry Reedy writes:
I think this is the right way to go. We currently need, and will need for the medium term, coding cookies for legacy encoding support. I don't see why this shouldn't work the same way.
My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance.
But there are a lot of languages. Once the ice is broken, I think a lot of translations will appear. So I think the variant extension approach is likely to get pretty ugly.
But the grammar is not being changed in the details; it's actually not being changed at all (with the one exception). If it's a one-to-one map at the keyword level, I don't see why there would be a problem. Of course there will be the occasional word order issue, as here with "is not", and that does involve changing the grammar.
Why not simply have a preprocessing func that translates back to standard/english python using a simple dict?
Because it's just not that simple, of course. You need to parse far enough to recognize strings, for example, and leave them alone. Since the parser doesn't detect unbalanced quotation marks in comments, you need to parse those too. You must parse import statements, because the file name might happen to be the equivalent of a keyword, and *not* translate those. There may be other issues, as well.
I don't think that ghettoization is that much more encouraged by this development than by PEP 263. It's always been possible to use non-English identifiers, even with languages normally not written in ASCII (there are several C identifiers in XEmacs than I'm pretty sure are obscenities in Latin and Portuguese, I wouldn't be surprised if a similar device isn't occasionally used in Python programs<wink>), and of course comments have long been written in practically any ASCII-compatible coding you can name. I think it was Alex Martelli who contributed a couple of rather (un)amusing stories about multinational teams where all of one nationality up and quit one day, leaving the rest of the team with copiously but unintelligibly documented code, to the PEP 263 discussion. In fact, AFAICS the fact that it's parsable as Python means that translated keywords aren't a problem at all, since that same parser can be adapted to substitute the English versions for you. That still leaves you with meaningless identifiers and comments, but as I say we already had those.

Stephen J. Turnbull wrote:
Would it be possible to use 2to3 for this? It wouldn't be perfect but it might be easier to scale a preprocessor to dozens of languages without freezing those users out of the ability to use standard English Python modules. Also, does anyone know if ChinesePython [1] ever caught on? (Hey, there's one case where you do NOT need to worry about keyword conflicts!) Looking at the homepage, it appears stuck at Python 2.1. But I don't know much Chinese, so I could be wrong. [1]: http://www.chinesepython.org/cgi_bin/cgb.cgi/english/english.html internationally-yrs, -- Carl

Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:
Cheers, Nick.
Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware. Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by: * assigning a special extension (eg .pybr) to the 'special' source code files, * associating this extension to the preprocessing program... * that would pass the back-translated .py source to python. [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting). * Any developper can edit any source according to his/her own preferences. * Python does not need to care about that. * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax. The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.] Denis ------ la vita e estrany

spir wrote:
My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance. That would be noticed before reading the file. Cached modules would still be standard .pyc, interoperable with .pyc compiled from normal Python. I am presuming this would work on all systems.
The OP was proposing to change 'is not' to the equivalent of 'not is'. I am not sure of how critical that would actually be. For the purpose of easing transition to international Python, not messing with statement word order would be a plus.
This might be easier than changing the interpreter. The extension could just as be be read and written by an editor. The problem is the multiple editors. The reason I susggested some support in the core for nationalization is that I think a) it is inevitable, in spite of the associated problem of ghettoization, while b) ghettoization should be discourage and can be ameliorated with a bit of core support. I am aware, of course, that such support, by removing one barrier to nationalization, will accelerate the development of such versions. Terry Jan Reedy

Terry Reedy writes:
I think this is the right way to go. We currently need, and will need for the medium term, coding cookies for legacy encoding support. I don't see why this shouldn't work the same way.
My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance.
But there are a lot of languages. Once the ice is broken, I think a lot of translations will appear. So I think the variant extension approach is likely to get pretty ugly.
But the grammar is not being changed in the details; it's actually not being changed at all (with the one exception). If it's a one-to-one map at the keyword level, I don't see why there would be a problem. Of course there will be the occasional word order issue, as here with "is not", and that does involve changing the grammar.
Why not simply have a preprocessing func that translates back to standard/english python using a simple dict?
Because it's just not that simple, of course. You need to parse far enough to recognize strings, for example, and leave them alone. Since the parser doesn't detect unbalanced quotation marks in comments, you need to parse those too. You must parse import statements, because the file name might happen to be the equivalent of a keyword, and *not* translate those. There may be other issues, as well.
I don't think that ghettoization is that much more encouraged by this development than by PEP 263. It's always been possible to use non-English identifiers, even with languages normally not written in ASCII (there are several C identifiers in XEmacs than I'm pretty sure are obscenities in Latin and Portuguese, I wouldn't be surprised if a similar device isn't occasionally used in Python programs<wink>), and of course comments have long been written in practically any ASCII-compatible coding you can name. I think it was Alex Martelli who contributed a couple of rather (un)amusing stories about multinational teams where all of one nationality up and quit one day, leaving the rest of the team with copiously but unintelligibly documented code, to the PEP 263 discussion. In fact, AFAICS the fact that it's parsable as Python means that translated keywords aren't a problem at all, since that same parser can be adapted to substitute the English versions for you. That still leaves you with meaningless identifiers and comments, but as I say we already had those.

Stephen J. Turnbull wrote:
Would it be possible to use 2to3 for this? It wouldn't be perfect but it might be easier to scale a preprocessor to dozens of languages without freezing those users out of the ability to use standard English Python modules. Also, does anyone know if ChinesePython [1] ever caught on? (Hey, there's one case where you do NOT need to worry about keyword conflicts!) Looking at the homepage, it appears stuck at Python 2.1. But I don't know much Chinese, so I could be wrong. [1]: http://www.chinesepython.org/cgi_bin/cgb.cgi/english/english.html internationally-yrs, -- Carl
participants (5)
-
Carl Johnson
-
Nick Coghlan
-
spir
-
Stephen J. Turnbull
-
Terry Reedy