Mailman 3 Re: [Python-ideas] Needing help to change the grammar - Python-ideas

spir

April 2009

5:31 p.m.

New subject: Needing help to change the grammar

Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:

...

Cheers, Nick.

Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware. Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by: * assigning a special extension (eg .pybr) to the 'special' source code files, * associating this extension to the preprocessing program... * that would pass the back-translated .py source to python. [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting). * Any developper can edit any source according to his/her own preferences. * Python does not need to care about that. * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax. The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.] Denis ------ la vita e estrany

Reply

Sign in to reply online Use email software

Stephen J. Turnbull

10:56 a.m.

New subject: Needing help to change the grammar

Terry Reedy writes:

...

spir wrote:

...
Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:

...

...
...
Making that work would actually require something like the file encoding cookie that is detected at the parsing stage. Otherwise the parser and compiler would choke on the unexpected keywords long before the interpreter reached the stage of attempting to import anything.

I think this is the right way to go. We currently need, and will need for the medium term, coding cookies for legacy encoding support. I don't see why this shouldn't work the same way.

...

My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance.

But there are a lot of languages. Once the ice is broken, I think a lot of translations will appear. So I think the variant extension approach is likely to get pretty ugly.

...

...
...
Adjusting the parser to accept different keyword names would be even more difficult though, since changing the details of the grammar definition is a lot more invasive than just changing the encoding of the file being read.

But the grammar is not being changed in the details; it's actually not being changed at all (with the one exception). If it's a one-to-one map at the keyword level, I don't see why there would be a problem. Of course there will be the occasional word order issue, as here with "is not", and that does involve changing the grammar.

...

...
Why not simply have a preprocessing func that translates back to standard/english python using a simple dict?

Because it's just not that simple, of course. You need to parse far enough to recognize strings, for example, and leave them alone. Since the parser doesn't detect unbalanced quotation marks in comments, you need to parse those too. You must parse import statements, because the file name might happen to be the equivalent of a keyword, and *not* translate those. There may be other issues, as well.

...

The reason I susggested some support in the core for nationalization is that I think a) it is inevitable, in spite of the associated problem of ghettoization, while b) ghettoization should be discourage and can be ameliorated with a bit of core support. I am aware, of course, that such support, by removing one barrier to nationalization, will accelerate the development of such versions.

I don't think that ghettoization is that much more encouraged by this development than by PEP 263. It's always been possible to use non-English identifiers, even with languages normally not written in ASCII (there are several C identifiers in XEmacs than I'm pretty sure are obscenities in Latin and Portuguese, I wouldn't be surprised if a similar device isn't occasionally used in Python programs<wink>), and of course comments have long been written in practically any ASCII-compatible coding you can name. I think it was Alex Martelli who contributed a couple of rather (un)amusing stories about multinational teams where all of one nationality up and quit one day, leaving the rest of the team with copiously but unintelligibly documented code, to the PEP 263 discussion. In fact, AFAICS the fact that it's parsable as Python means that translated keywords aren't a problem at all, since that same parser can be adapted to substitute the English versions for you. That still leaves you with meaningless identifiers and comments, but as I say we already had those.

Reply

Sign in to reply online Use email software

spir

April 2009

5:31 p.m.

New subject: Needing help to change the grammar

Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:

...

Cheers, Nick.

Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware. Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by: * assigning a special extension (eg .pybr) to the 'special' source code files, * associating this extension to the preprocessing program... * that would pass the back-translated .py source to python. [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting). * Any developper can edit any source according to his/her own preferences. * Python does not need to care about that. * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax. The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.] Denis ------ la vita e estrany

Reply

Sign in to reply online Use email software

Stephen J. Turnbull

10:56 a.m.

New subject: Needing help to change the grammar

Terry Reedy writes:

...

spir wrote:

...
Le Sat, 18 Apr 2009 23:52:49 +1000, Nick Coghlan <ncoghlan@gmail.com> s'exprima ainsi:

...

...
...
Making that work would actually require something like the file encoding cookie that is detected at the parsing stage. Otherwise the parser and compiler would choke on the unexpected keywords long before the interpreter reached the stage of attempting to import anything.

I think this is the right way to go. We currently need, and will need for the medium term, coding cookies for legacy encoding support. I don't see why this shouldn't work the same way.

...

My original proposal in response to the OP was that language be encoded in the extension: pybr, for instance.

But there are a lot of languages. Once the ice is broken, I think a lot of translations will appear. So I think the variant extension approach is likely to get pretty ugly.

...

...
...
Adjusting the parser to accept different keyword names would be even more difficult though, since changing the details of the grammar definition is a lot more invasive than just changing the encoding of the file being read.

But the grammar is not being changed in the details; it's actually not being changed at all (with the one exception). If it's a one-to-one map at the keyword level, I don't see why there would be a problem. Of course there will be the occasional word order issue, as here with "is not", and that does involve changing the grammar.

...

...
Why not simply have a preprocessing func that translates back to standard/english python using a simple dict?

Because it's just not that simple, of course. You need to parse far enough to recognize strings, for example, and leave them alone. Since the parser doesn't detect unbalanced quotation marks in comments, you need to parse those too. You must parse import statements, because the file name might happen to be the equivalent of a keyword, and *not* translate those. There may be other issues, as well.

...

The reason I susggested some support in the core for nationalization is that I think a) it is inevitable, in spite of the associated problem of ghettoization, while b) ghettoization should be discourage and can be ameliorated with a bit of core support. I am aware, of course, that such support, by removing one barrier to nationalization, will accelerate the development of such versions.

I don't think that ghettoization is that much more encouraged by this development than by PEP 263. It's always been possible to use non-English identifiers, even with languages normally not written in ASCII (there are several C identifiers in XEmacs than I'm pretty sure are obscenities in Latin and Portuguese, I wouldn't be surprised if a similar device isn't occasionally used in Python programs<wink>), and of course comments have long been written in practically any ASCII-compatible coding you can name. I think it was Alex Martelli who contributed a couple of rather (un)amusing stories about multinational teams where all of one nationality up and quit one day, leaving the rest of the team with copiously but unintelligibly documented code, to the PEP 263 discussion. In fact, AFAICS the fact that it's parsable as Python means that translated keywords aren't a problem at all, since that same parser can be adapted to substitute the English versions for you. That still leaves you with meaningless identifiers and comments, but as I say we already had those.

Reply

Sign in to reply online Use email software

Re: [Python-ideas] Needing help to change the grammar

Nick Coghlan

spir

Terry Reedy

Stephen J. Turnbull

Carl Johnson

spir

Terry Reedy

Stephen J. Turnbull

Carl Johnson

tags

participants (5)