Python 1.7 tokenization feature request
Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python: i = 0 while i < 10: print i & 1 i = i + 1 which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on. Greg
Greg> Once 1.6 is out the door, would people be willing to consider Greg> extending Python's token set to make HTML/XML-ish spellings using Greg> entity references legal? This would make the following 100% legal Greg> Python: Greg> i = 0 Greg> while i < 10: Greg> print i & 1 Greg> i = i + 1 What makes it difficult to pump your Python code through cgi.escape when embedding it? There doesn't seem to be an inverse function to cgi.escape (at least not in the cgi module), but I suspect it could rather easily be written. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/
gvwilson@nevex.com writes:
Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python:
i = 0 while i < 10: print i & 1 i = i + 1
I don't think that would be sufficient. What about user-defined entities, as in résultat = max(a,b)? (rsultat, in French.) Would Python have to also parse a DTD from somewhere? What about other places when Python and XML syntax collide, as in this contrived example: <![CDATA[ # Python code starts here if a[index[1]]>b: print ... Oops! The ]]> looks like the end of the CDATA section, but it's legal Python code. IMHO whatever tool is outputting the XML should handle escaping wacky characters in the Python code, which will be undone by the parser when the XML gets parsed. Users certainly won't be writing this XML by hand; writing 'if (i < 10)' is very strange. -- A.M. Kuchling http://starship.python.net/crew/amk/ Art history is the nightmare from which art is struggling to awake. -- Robert Fulford
Greg Wilson wrote: ...would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal?
i = 0 while i < 10: print i & 1 i = i + 1
Skip Montanaro wrote: What makes it difficult to pump your Python code through cgi.escape when embedding it?
Most non-programmers use WYSIWYG editor, and many of these are moving toward XML-compliant formats. Parsing the standard character entities seemed like a good first step toward catering to this (large) audience.
Andrew Kuchling wrote: I don't think that would be sufficient. What about user-defined entities, as in résultat = max(a,b)? (r�sultat, in French.) Would Python have to also parse a DTD from somewhere?
Longer term, I believe that someone is going to come out with a programming language that (finally) leaves the flat-ASCII world behind, and lets people use the structuring mechanisms (e.g. XML) that we have developed for everyone else's data. I think it would be to Python's advantage to be first, and if I'm wrong, there's little harm done. User-defined entities, DTD's, and the like are probably part of that, but I don't think I know enough to know what to ask for. Escaping the standard entites seems like an easy start.
Andrew Kuchling also wrote: What about other places when Python and XML syntax collide, as in this contrived example:
<![CDATA[ # Python code starts here if a[index[1]]>b: print ...
Oops! The ]]> looks like the end of the CDATA section, but it's legal Python code.
Yup; that's one of the reasons I'd like to be able to write: <python> # Python code starts here if a[index[1]]>b: print ... </python>
Users certainly won't be writing this XML by hand; writing 'if (i < 10)' is very strange.
I'd expect my editor to put '<' in the file when I press the '<' key, and to display '<' on the screen when viewing the file. thanks, Greg
gvwilson@nevex.com writes:
Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python:
i = 0 while i < 10: print i & 1 i = i + 1
which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on.
Sure, and while we're at it, maybe we can add support for C trigraph sequences as well. Maybe I'm missing the point, but why can't you just use a filter (cgi.escape() or something comparable)? I for one, am *NOT* in favor of complicating the Python parser in this most bogus manner. Furthermore, with respect to the editor argument, I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about. Therefore, I don't think that the Python parser should be modified in any way to account for XML tags, entities, or other extraneous markup that's not part of the core language. I know that I, for one, would be extremely pissed if I fired up emacs and had to maintain someone else's code that had all of this garbage in it. Just my 0.02. -- Dave
David M. Beazley wrote:
...and while we're at it, maybe we can add support for C trigraph sequences as well.
I don't know of any mass-market editors that generate C trigraphs.
...I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about.
'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up. Thanks, Greg
gvwilson@nevex.com writes:
'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program...
Look, I'm all for CP4E and making programming more accessible to the masses, but as a professional programmer, I frankly do not care what non-programmers think about the tools that I (and most of the programming world) use to write software. Furthermore, if all of your non-programmer friends don't want to care about the underlying details, they certainly won't care how programs are represented---including a nice and *simple* text representation without markup, entities, and other syntax that is not an essential part of the language. However, as a professional, I most certainly DO care about how programs are represented--specifically, I want to be able to move them around between machines. Edit them with essentially any editor, transform them as I see fit, and be able to easily read them and have a sense of what is going on. Markup is just going to make this a huge pain in the butt. No, I'm not for this idea one bit. Sorry.
I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up.
What gives you the idea that Python is behind? What is it playing catch up to? -- Dave
'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up.
But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave-the-diplomat's response. While storage of Python files in XML documents is a good thing, it's hard to see why XML should be viewed as the only storage format for Python files. I think a much richer XML schema could be useful in some distant future: <class name="Foo"> <method name="Foo"> <argumentlist> <argument name="self"> ... What might be more useful in the short them IMO is to define a _standard_ mechanism for Python-in-XML encoding/decoding, so that all code which encodes Python in XML is done the same way, and so that XML editors can figure out once and for all how to decode Python-in-CDATA. Strawman Encoding # 1: replace < with < and > with > when not in strings, and vice versa on the decoding side. Strawman Encoding # 2: - do Strawman 1, AND - replace space-determined indentation with { and } tokens or other INDENT and DEDENT markers using some rare Unicode characters to work around inevitable bugs in whitespace handling of XML processors. --david
David Ascher wrote: But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave's response.
Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-) Greg
gvwilson@nevex.com wrote:
David Ascher wrote: But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave's response.
Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-)
This should be easy to implement provided a hook for compile() is added to e.g. the sys-module which then gets used instead of calling the byte code compiler directly... Then you could redirect the compile() arguments to whatever codec you wish (e.g. a SGML entity codec) and the builtin compiler would only see the output of that codec. Well, just a thought... I don't think encoding programs would make life as a programmer easier, but instead harder. It adds one more level of confusion on top of it all. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
David Ascher wrote: But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave's response.
Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-)
Greg
If you propose a transformation between Python Syntax and XML, then you potentially have something which all parties can agree to as being a good thing. Forcing one into the other is denying the history and current practices of both domains and user populations. You cannot ignore the fact that "I can read anyone's Python" is a key selling point of Python among its current practitioners, or that its cleanliness and lack of magic characters ($ is usually invoked, but < is just as magic/ugly) are part of its appeal/success. No XML editor is going to edit all XML documents without custom editors anyway! I certainly don't expect to be drawing SVG diagrams with a keyboard! That's what schemas and custom editors are for. Define a schema for 'encoded Python' (well, first, find a schema notation that will survive), write a plugin to your favorite XML editor, and then your (theoretical? =) users can use the same 'editor' to edit PythonXML or any other XML. Most XML probably won't be edited with a keyboard but with a pointing device or a speech recognizer anyway... IMO, you're being seduced by the apparent closeness between XML and Python-in-ASCII. It's only superficial... Think of Python-in-ASCII as a rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering of Python-in-ASCII, and everyone will be happy (as long as everyone agrees on the one-to-one transformation). --david
On Mon, 13 Mar 2000, David Ascher wrote:
If you propose a transformation between Python Syntax and XML, then you potentially have something which all parties can agree to as being a good thing.
Indeed. I know that i wouldn't have any use for it at the moment, but i can see the potential for usefulness of a structured representation for Python source code (like an AST in XML) which could be directly edited in an XML editor, and processed (by an XSL stylesheet?) to produce actual runnable Python. But attempting to mix the two doesn't get you anywhere. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu
Greg:
Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-)
well, I would find it easier to support a more aggressive proposal: make sure Python 1.7 can deal with source code written in Unicode, using any supported encoding. with that in place, you can plug in your favourite unicode encoding via the Unicode framework. </F>
Greg wrote:
...I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about.
'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up.
I don't get it. the XML specification contains a lot of stuff, and I completely fail to see how adding support for a very small part of XML would make it possible to use XML editors to write Python code. what am I missing? </F>
gvwilson@nevex.com wrote:
'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up.
Your goal is worth pursuing but I agree with the others that the syntax change is not the right way. It _is_ possible to teach XMetaL to edit Python programs -- structurally -- just as it does XML. What you do is hook into the macro engine (which already supports Python) and use the Python tokenizer to build a parse tree. You copy that into a DOM using the same elements and attributes you would use if you were doing some kind of batch conversion. Then on "save" you reverse the process. Implementation time: ~3 days. The XMetaL competitor, Documentor has an API specifically designed to make this sort of thing easy. Making either of them into a friendly programmer's editor is a much larger task. I think this is where the majority of the R&D should occur, not at the syntax level. If one invents a fundamentally better way of working with the structures behind Python code, then it would be relatively easy to write code that maps that to today's Python syntax. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant
You should use your entities in the XML files, and then whatever application actually launches Python (PMZ, your make engine, XMetaL) could decode the data and launch Python. This is already how it works in XMetaL. I've just reinstalled recently so I don't have my macro file. Therefore, please excuse the Javascript (not Python) example. <MACRO name="Revert To Saved" lang="JScript" id="90" desc="Opens last saved version of the current document"> <![CDATA[ if (!ActiveDocument.Saved) { retVal = Application.Confirm("If you continue you will lose changes to this document.\nDo you want to revert to the last-saved version?"); if (retVal) { ActiveDocument.Reload(); } } ]]></MACRO> This is in "journalist.mcr" in the "Macros" folder of XMetaL. This already works fine for Python. You change lang="Python" and thanks to the benevalence of Bill Gates and the hard work of Mark Hammond, you can use Python for XMetaL macros. It doesn't work perfectly: exceptions crash XMetaL, last I tried. As long as you don't make mistakes, everything works nicely. :) You can write XMetaL macros in Python and the whole thing is stored as XML. Still, XMetaL is not very friendly as a Python editor. It doesn't have nice whitespace handling! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant
On Mon, 13 Mar 2000 gvwilson@nevex.com wrote:
Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python:
i = 0 while i < 10: print i & 1 i = i + 1
which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on.
Why? Whatever XML parser you use will output "i<1" as "i<1", so
the Python that comes out of the XML parser is quite all right. Why change
Python to do an XML parser job?
--
Moshe Zadka
On Tue, 14 Mar 2000, Moshe Zadka wrote:
On Mon, 13 Mar 2000 gvwilson@nevex.com wrote:
legal? This would make the following 100% legal Python:
i = 0 while i < 10: print i & 1 i = i + 1
Why? Whatever XML parser you use will output "i<1" as "i<1", so the Python that comes out of the XML parser is quite all right. Why change Python to do an XML parser job?
I totally agree. To me, this is the key issue: it is NOT the responsibility of the programming language to accommodate any particular encoding format. While we're at it, why don't we change Python to accept quoted-printable source code? Or base64-encoded source code? XML already defines a perfectly reasonable mechanism for escaping a plain stream of text -- adding this processing to Python adds nothing but confusion. The possible useful benefit from adding the proposed "feature" is exactly zero. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu
participants (10)
-
Andrew M. Kuchling
-
David Ascher
-
David M. Beazley
-
Fredrik Lundh
-
gvwilson@nevex.com
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Moshe Zadka
-
Paul Prescod
-
Skip Montanaro