Enabling access to the AST for Python code

Hi Python Ideas folks, (I previously posted a similar message on Python-Dev, but it's a better fit for this list. See that thread here: https://mail.python.org/pipermail/python-dev/2015-May/140063.html) Enabling access to the AST for compiled code would make some cool things possible (C# LINQ-style ORMs, for example), and not knowing too much about this part of Python internals, I'm wondering how possible and practical this would be. Context: PonyORM (http://ponyorm.com/) allows you to write regular Python generator expressions like this: select(c for c in Customer if sum(c.orders.price) > 1000) which compile into and run SQL like this: SELECT "c"."id" FROM "Customer" "c" LEFT JOIN "Order" "order-1" ON "c"."id" = "order-1"."customer" GROUP BY "c"."id" HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000 I think the Pythonic syntax here is beautiful. But the tricks PonyORM has to go to get it are ... not quite so beautiful. Because the AST is not available, PonyORM decompiles Python bytecode into an AST first, and then converts that to SQL. (More details on all that from author's EuroPython talk at http://pyvideo.org/video/2968) PonyORM needs the AST just for generator expressions and lambda functions, but obviously if this kind of AST access feature were in Python it'd probably be more general. I believe C#'s LINQ provides something similar, where if you're developing a LINQ converter library (say LINQ to SQL), you essentially get the AST of the code ("expression tree") and the library can do what it wants with that. (I know that there's the "ast" module and ast.parse(), which can give you an AST given a *source string*, but that's not very convenient here.) What would it take to enable this kind of AST access in Python? Is it possible? Is it a good idea? -Ben

On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt <benhoyt@gmail.com> wrote:
What concretely are you imagining? I can imagine lots of possibilities with pretty different properties... e.g., one could have an '.ast' attribute attached to every code object, which always tracks the source that the code was compiled from. Or one could add a new (quasi)quoting syntax, like 'select(! c for c in Customer if sum(c.orders.price) > 1000)' where ! is a low-priority operator that simply returns the AST of whatever is written to the right of it. Or... lots of things, probably. -n -- Nathaniel J. Smith -- http://vorpus.org

Not knowing too much about interpreter internals, I guess I was fishing somewhat for the range of possibilities. :-) But I was definitely thinking more along the lines of a "co_ast" attribute on code objects. The new syntax approach might be fun, but I'd think it's a lot more challenging and problematic to add new syntax. -Ben On Thu, May 21, 2015 at 9:40 PM, Nathaniel Smith <njs@pobox.com> wrote:

On 22/05/2015 1:57 p.m., Ben Hoyt wrote:
Advantages of new syntax: * More flexible: Any expression can be made into an AST, not just lambdas or genexps. * More efficient: No need to carry an AST around with every code object, the vast majority of which will never be used. Disadvantages of new syntax: * All the disadvantages of new syntax. -- Greg

On May 21, 2015, at 18:18, Ben Hoyt <benhoyt@gmail.com> wrote:
Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

On Thu, May 21, 2015 at 9:22 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
* MacroPy looks interesting * PonyORM -> SQLAlchemy http://dask.pydata.org/en/latest/array-blaze.html#why-use-blaze """These different projects (Blaze -> dask.array -> NumPy -> Numba) act as
... http://continuum.io/blog/blaze """Once a graph is evaluated, Blaze attempts to gather all available type

On 3 July 2015 at 10:55, Wes Turner <wes.turner@gmail.com> wrote:
* MacroPy looks interesting * PonyORM -> SQLAlchemy
Wes, it would be helpful if you could provide some context and rationale for links and cryptic bullet points when you post them, rather than expecting us all to follow precisely how you believe they relate to the topic of discussion. In this particular case, I think I can *personally* guess what you meant, but I'm also already familiar with all of the projects you mentioned. For folks without that background, such brief notes are going to be much harder to interpret :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Huh, interesting idea. I've never used import hooks. Looks like the relevant macropy source code is here: https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py So basically you would do the following: 1) intercept the import 2) find the source code file yourself and read it 3) call ast.parse() on the source string 4) do anything you want to the AST, for example turn the "select(c for c in Customer if sum(c.orders.price) > 1000" into whatever SQL or other function calls 5) pass the massaged AST to compile(), execute it and return the module Hmmm, yeah, I think you're basically suggesting macro-like processing of the AST. Pretty cool, but not quite what I was thinking of ... I was thinking select() would get an AST object at runtime and do stuff with it. -Ben On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

Why would it require "a lot of extra memory"? A program text size is measured in megabytes, and the AST is typically more compact than the code as text. A few megabytes is nothing. Best Neil

On 3 July 2015 at 06:25, Neil Girdhar <mistersheik@gmail.com> wrote:
It's more complicated than that. What happens when we multiply that "nothing" by 10,000 concurrent processes across multiple servers. Is it still nothing? How about 10,000,000? What does keeping the extra data around do to our CPU level cache efficiency? Is there a key data structure we're adding a new pointer to? What does *that* do to our performance? Where are the AST objects being kept? Do they become part of the serialised form of the affected object? If yes, what does that do to the wire protocol overhead for inter-process communication, or to the size of cached bytecode files? If no, does that mean these objects may be missing the AST data when deserialised? When we're talking about sufficiently central data structures, a few *bytes* can end up counting as "a lot". Code and function objects aren't quite *that* central (unlike, say, tuple instances), but adding things to them can still have a significant impact (hence the ability to avoid creating docstrings). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess we find a way to share data between the processes?
Why would a few megabytes of data affect your CPU level cache? If I have a Python program that generates a data structure that's a few megabytes, does it slow down the rest of the program?
When do you send code objects on the wire? I'm not even sure if pickle supports that yet. When we're talking about sufficiently central data structures, a few
Thanks, I'm interested in learning more about this. There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated? Was it re-parsing the source? Was it an import hook? Something else? I want to do this with a personal project. I realize we may not get the AST by default, but it would be nice to know how I should best determine it myself.

On Jul 3, 2015, at 13:42, Neil Girdhar <mistersheik@gmail.com> wrote:
There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated?
I think it depends on what exactly you're trying to do, but using an import hook means you can call compile or ast.parse once and keep it around as well as using it for the compile, so that seems like it should be a good solution for most uses.

On Sat, Jul 4, 2015 at 3:46 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, the MacroPy library uses this approach (import hooks to get and modify the AST): https://github.com/lihaoyi/macropy -Bene

As I remember, the proposal is or would have to be to give code objects a new attribute -- co_ast. This would require an addition to marshal to compress and uncompress asts. It would expand both on-disk .pyc files and even more, in-memory code objects. On 7/2/2015 4:25 PM, Neil Girdhar wrote:
Why do you think that? Each text token becomes an node object that is a minimun 56 bytes (on my 64-bit Win7 3.5). For instance, a one-byte '+' (in all-ascii code) balloons to at least 56 bytes in the ast and compiled back down to 1 byte in the byte code. I expect the uncompressed in-memory size of asts to be several times the current size of corresponding code objects. -- Terry Jan Reedy

Oh wait, macropy already has this exact thing. They call it PINQ (kinda Python LINQ), and they're macro-compiling it to SQLAlchemy calls. https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy Wow. -Ben On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt <benhoyt@gmail.com> wrote:

On May 21, 2015, at 19:15, Ben Hoyt <benhoyt@gmail.com> wrote:
I didn't even realize he'd included this when suggesting MacroPy. :) Anyway, most of his macros are pretty easy to read as sample code, so even if what he's done isn't exactly what you wanted, it should be a good foundation.
If you wanted to do this yourself, and only need to support 3.4+, it's a lot easier than the way MacroPy does it. But of course it's even easier to just use MacroPy.
If you really want to, you can build a trivial import hook that just attaches the ASTs (to everything, or only to specific code) and then ignore the code and process the AST at runtime. If you actually need to use runtime information in the processing, that might be worth it, but otherwise it seems like you're just wasting time transforming and compiling the AST on every request. Of course you could build in a cache if the information isn't really dynamic, but in that case, using the code object and .pyc as a cache is a lot simpler and probably more efficient.

On Thu, May 21, 2015 at 10:10 PM Ben Hoyt <benhoyt@gmail.com> wrote:
Depending on what version of Python you are targeting, it's actually simpler than that even to get it into the import system: 1. Subclass importlib.machinery.SourceFileLoader <https://docs.python.org/3/library/importlib.html#importlib.machinery.SourceF...> and override source_to_code() <https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader...> to do your AST transformation and return your changed code object (basically your steps 3-5 above) 2. Set a path hook that uses an instance of importlib.machinery.FileFinder <https://docs.python.org/3/library/importlib.html#importlib.machinery.FileFin...> which utilizes your custom loader 3. There is no step 3 I know this isn't what you're after, but I just wanted to let you know importlib has made this sort of thing fairly trivial to implement. -Brett

On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote:
*Some* Python modules are distributed as source. Don't forget that byte-code only modules are officially supported. Functions may also be constructed dynamically, at runtime. Closures may have source code available for them, but functions and methods constructed with exec (such as those in namedtuples) do not. Also, the interactive interpreter is a very powerful tool, but it doesn't record the source code of functions you type into it. So there are at least three examples where the source is not available at all. Ben also talks about *convenience*: `func.ast` will always be more convenient than: import ast import parse ast.parse(inspect.getsource(func)) not to mention the wastefulness of parsing something which has already been parsed before. On the other hand, keeping the ast around even when it's not used wastes memory, so this is a classic time/space trade off.
Let's have a look at yours then, that ought to only take a minute or three :-) (That's my definition of "pretty easily".) I think that the majority of Python programmers have no idea that you can even write an import hook at all, let alone how to do it. -- Steve

LOn May 21, 2015, at 19:44, Steven D'Aprano <steve@pearwood.info> wrote:
By comparison, code objects that don't carry around their AST including everything running in any version of Python except maybe a future version that'll be out in a year and a half, if this idea gets accepted, and probably only in CPython. Plus, I'm pretty sure people would demand the ability to not waste memory and disk space on ASTs when they don't need them, so they still wouldn't be always available.
Does "import macropy" count? That only took me a second or three. :) Certainly a _lot_ easier than hacking the CPython source, even for something as trivial as adding a new member to the code object and finding all the places to attach the AST.
Sure, because they have no need to do so. But it's very easy to learn. Especially after the changes in 3.3 and again in 3.4. During the discussion on Unicode operators that turned into a discussion on a Unicode empty set literal, I suggested an import hook, someone (possibly you?) challenged me to write one if it was so easy, and it took me under half an hour to learn the 3.4 system and implement one. (I'm sure it would be a lot faster this time. But probably not on my phone...) All that work to improve the import system really did pay off. By comparison, hacking in new syntax to CPython to play with operator sectioning yesterday took me about four hours. And of course anyone can download and use my import hook to get the empty set literal in any standard Python 3.4 or later, but anyone who wants to use my operator sectioning hacks has to clone my fork and build and install a new interpreter.

redirecting py-dev thread here On 05/21/2015 07:06 PM, Greg wrote:
On 22/05/2015 1:33 p.m., Ethan Furman wrote:
Ah, I think I see -- that 'select' isn't really doing anything is it? The 'if' clause is acting as the 'select' in the gen-exp. But then `sum(c.orders.price)` isn't really Python semantics either, is it... although it could be if it was souped up -- `c.orders` would have to return a customer-based object that was smart enough to return a list of whatever attribute was asked for. That'd be cool. -- ~Ethan~

This sounds like a cool feature, though I'm not sure if exposing the AST directly on the code object is the best approach.. Attaching the AST to the code object implies serializing (and deserializing into nicely sparse heap allocations) it via .pyc files, since code objects are marshalled there. What about improving the parser so that exact start/end positions are recorded for function bodies? This might be represented as 2 cheap integers in RAM, allowing for a helper function in the compiler or inspect modules (inspect.ast()?) to handle the grunt work. Implementations like Micropython could just stub out those fields with -1 or whatever else if desired. One upside to direct attachment would be that a function returned by e.g. eval() with no underlying source file would still have its AST attached, without the caller having to keep hold of the unparsed string, but the downside of RAM/disk/potentially hefty deserialization performance seems to outweigh that. I also wish there was a nicer way of introducing an expression that was to be represented as an AST, but I think that would involve adding another language keyword, and simply overloading the meaning of generators slightly seems preferable to that. :) David On Thu, May 21, 2015 at 09:18:24PM -0400, Ben Hoyt wrote:

On 22/05/2015 02:18, Ben Hoyt wrote:
Hi Python Ideas folks,
[snipped to death]
-Ben
You might find this interesting https://github.com/Psycojoker/baron The introduction states "Baron is a Full Syntax Tree (FST) library for Python. By opposition to an AST which drops some syntax information in the process of its creation (like empty lines, comments, formatting), a FST keeps everything and guarantees the operation fst_to_code(code_to_fst(source_code)) == source_code.". -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On Thu, May 21, 2015 at 6:18 PM, Ben Hoyt <benhoyt@gmail.com> wrote:
What concretely are you imagining? I can imagine lots of possibilities with pretty different properties... e.g., one could have an '.ast' attribute attached to every code object, which always tracks the source that the code was compiled from. Or one could add a new (quasi)quoting syntax, like 'select(! c for c in Customer if sum(c.orders.price) > 1000)' where ! is a low-priority operator that simply returns the AST of whatever is written to the right of it. Or... lots of things, probably. -n -- Nathaniel J. Smith -- http://vorpus.org

Not knowing too much about interpreter internals, I guess I was fishing somewhat for the range of possibilities. :-) But I was definitely thinking more along the lines of a "co_ast" attribute on code objects. The new syntax approach might be fun, but I'd think it's a lot more challenging and problematic to add new syntax. -Ben On Thu, May 21, 2015 at 9:40 PM, Nathaniel Smith <njs@pobox.com> wrote:

On 22/05/2015 1:57 p.m., Ben Hoyt wrote:
Advantages of new syntax: * More flexible: Any expression can be made into an AST, not just lambdas or genexps. * More efficient: No need to carry an AST around with every code object, the vast majority of which will never be used. Disadvantages of new syntax: * All the disadvantages of new syntax. -- Greg

On May 21, 2015, at 18:18, Ben Hoyt <benhoyt@gmail.com> wrote:
Why not? Python modules are distributed as source. You can pretty easily write an import hook to intercept module loading at the AST level and transform it however you want. Or just use MacroPy, which wraps up all the hard stuff (especially 2.x compatibility) and provides a huge framework of useful tools. What do you want to do that can't be done that way? For many uses, you don't even have to go that far--code objects remember their source file and line number, which you can usually use to retrieve the text and regenerate the AST.

On Thu, May 21, 2015 at 9:22 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
* MacroPy looks interesting * PonyORM -> SQLAlchemy http://dask.pydata.org/en/latest/array-blaze.html#why-use-blaze """These different projects (Blaze -> dask.array -> NumPy -> Numba) act as
... http://continuum.io/blog/blaze """Once a graph is evaluated, Blaze attempts to gather all available type

On 3 July 2015 at 10:55, Wes Turner <wes.turner@gmail.com> wrote:
* MacroPy looks interesting * PonyORM -> SQLAlchemy
Wes, it would be helpful if you could provide some context and rationale for links and cryptic bullet points when you post them, rather than expecting us all to follow precisely how you believe they relate to the topic of discussion. In this particular case, I think I can *personally* guess what you meant, but I'm also already familiar with all of the projects you mentioned. For folks without that background, such brief notes are going to be much harder to interpret :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Huh, interesting idea. I've never used import hooks. Looks like the relevant macropy source code is here: https://github.com/lihaoyi/macropy/blob/master/macropy/core/import_hooks.py So basically you would do the following: 1) intercept the import 2) find the source code file yourself and read it 3) call ast.parse() on the source string 4) do anything you want to the AST, for example turn the "select(c for c in Customer if sum(c.orders.price) > 1000" into whatever SQL or other function calls 5) pass the massaged AST to compile(), execute it and return the module Hmmm, yeah, I think you're basically suggesting macro-like processing of the AST. Pretty cool, but not quite what I was thinking of ... I was thinking select() would get an AST object at runtime and do stuff with it. -Ben On Thu, May 21, 2015 at 9:51 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

Why would it require "a lot of extra memory"? A program text size is measured in megabytes, and the AST is typically more compact than the code as text. A few megabytes is nothing. Best Neil

On 3 July 2015 at 06:25, Neil Girdhar <mistersheik@gmail.com> wrote:
It's more complicated than that. What happens when we multiply that "nothing" by 10,000 concurrent processes across multiple servers. Is it still nothing? How about 10,000,000? What does keeping the extra data around do to our CPU level cache efficiency? Is there a key data structure we're adding a new pointer to? What does *that* do to our performance? Where are the AST objects being kept? Do they become part of the serialised form of the affected object? If yes, what does that do to the wire protocol overhead for inter-process communication, or to the size of cached bytecode files? If no, does that mean these objects may be missing the AST data when deserialised? When we're talking about sufficiently central data structures, a few *bytes* can end up counting as "a lot". Code and function objects aren't quite *that* central (unlike, say, tuple instances), but adding things to them can still have a significant impact (hence the ability to avoid creating docstrings). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess we find a way to share data between the processes?
Why would a few megabytes of data affect your CPU level cache? If I have a Python program that generates a data structure that's a few megabytes, does it slow down the rest of the program?
When do you send code objects on the wire? I'm not even sure if pickle supports that yet. When we're talking about sufficiently central data structures, a few
Thanks, I'm interested in learning more about this. There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated? Was it re-parsing the source? Was it an import hook? Something else? I want to do this with a personal project. I realize we may not get the AST by default, but it would be nice to know how I should best determine it myself.

On Jul 3, 2015, at 13:42, Neil Girdhar <mistersheik@gmail.com> wrote:
There are a lot of messages in this discussion. Was there a final consensus about how the AST for a given code object should be calculated?
I think it depends on what exactly you're trying to do, but using an import hook means you can call compile or ast.parse once and keep it around as well as using it for the compile, so that seems like it should be a good solution for most uses.

On Sat, Jul 4, 2015 at 3:46 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, the MacroPy library uses this approach (import hooks to get and modify the AST): https://github.com/lihaoyi/macropy -Bene

As I remember, the proposal is or would have to be to give code objects a new attribute -- co_ast. This would require an addition to marshal to compress and uncompress asts. It would expand both on-disk .pyc files and even more, in-memory code objects. On 7/2/2015 4:25 PM, Neil Girdhar wrote:
Why do you think that? Each text token becomes an node object that is a minimun 56 bytes (on my 64-bit Win7 3.5). For instance, a one-byte '+' (in all-ascii code) balloons to at least 56 bytes in the ast and compiled back down to 1 byte in the byte code. I expect the uncompressed in-memory size of asts to be several times the current size of corresponding code objects. -- Terry Jan Reedy

Oh wait, macropy already has this exact thing. They call it PINQ (kinda Python LINQ), and they're macro-compiling it to SQLAlchemy calls. https://github.com/lihaoyi/macropy#pinq-to-sqlalchemy Wow. -Ben On Thu, May 21, 2015 at 10:10 PM, Ben Hoyt <benhoyt@gmail.com> wrote:

On May 21, 2015, at 19:15, Ben Hoyt <benhoyt@gmail.com> wrote:
I didn't even realize he'd included this when suggesting MacroPy. :) Anyway, most of his macros are pretty easy to read as sample code, so even if what he's done isn't exactly what you wanted, it should be a good foundation.
If you wanted to do this yourself, and only need to support 3.4+, it's a lot easier than the way MacroPy does it. But of course it's even easier to just use MacroPy.
If you really want to, you can build a trivial import hook that just attaches the ASTs (to everything, or only to specific code) and then ignore the code and process the AST at runtime. If you actually need to use runtime information in the processing, that might be worth it, but otherwise it seems like you're just wasting time transforming and compiling the AST on every request. Of course you could build in a cache if the information isn't really dynamic, but in that case, using the code object and .pyc as a cache is a lot simpler and probably more efficient.

On Thu, May 21, 2015 at 10:10 PM Ben Hoyt <benhoyt@gmail.com> wrote:
Depending on what version of Python you are targeting, it's actually simpler than that even to get it into the import system: 1. Subclass importlib.machinery.SourceFileLoader <https://docs.python.org/3/library/importlib.html#importlib.machinery.SourceF...> and override source_to_code() <https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader...> to do your AST transformation and return your changed code object (basically your steps 3-5 above) 2. Set a path hook that uses an instance of importlib.machinery.FileFinder <https://docs.python.org/3/library/importlib.html#importlib.machinery.FileFin...> which utilizes your custom loader 3. There is no step 3 I know this isn't what you're after, but I just wanted to let you know importlib has made this sort of thing fairly trivial to implement. -Brett

On Thu, May 21, 2015 at 06:51:34PM -0700, Andrew Barnert via Python-ideas wrote:
*Some* Python modules are distributed as source. Don't forget that byte-code only modules are officially supported. Functions may also be constructed dynamically, at runtime. Closures may have source code available for them, but functions and methods constructed with exec (such as those in namedtuples) do not. Also, the interactive interpreter is a very powerful tool, but it doesn't record the source code of functions you type into it. So there are at least three examples where the source is not available at all. Ben also talks about *convenience*: `func.ast` will always be more convenient than: import ast import parse ast.parse(inspect.getsource(func)) not to mention the wastefulness of parsing something which has already been parsed before. On the other hand, keeping the ast around even when it's not used wastes memory, so this is a classic time/space trade off.
Let's have a look at yours then, that ought to only take a minute or three :-) (That's my definition of "pretty easily".) I think that the majority of Python programmers have no idea that you can even write an import hook at all, let alone how to do it. -- Steve

LOn May 21, 2015, at 19:44, Steven D'Aprano <steve@pearwood.info> wrote:
By comparison, code objects that don't carry around their AST including everything running in any version of Python except maybe a future version that'll be out in a year and a half, if this idea gets accepted, and probably only in CPython. Plus, I'm pretty sure people would demand the ability to not waste memory and disk space on ASTs when they don't need them, so they still wouldn't be always available.
Does "import macropy" count? That only took me a second or three. :) Certainly a _lot_ easier than hacking the CPython source, even for something as trivial as adding a new member to the code object and finding all the places to attach the AST.
Sure, because they have no need to do so. But it's very easy to learn. Especially after the changes in 3.3 and again in 3.4. During the discussion on Unicode operators that turned into a discussion on a Unicode empty set literal, I suggested an import hook, someone (possibly you?) challenged me to write one if it was so easy, and it took me under half an hour to learn the 3.4 system and implement one. (I'm sure it would be a lot faster this time. But probably not on my phone...) All that work to improve the import system really did pay off. By comparison, hacking in new syntax to CPython to play with operator sectioning yesterday took me about four hours. And of course anyone can download and use my import hook to get the empty set literal in any standard Python 3.4 or later, but anyone who wants to use my operator sectioning hacks has to clone my fork and build and install a new interpreter.

redirecting py-dev thread here On 05/21/2015 07:06 PM, Greg wrote:
On 22/05/2015 1:33 p.m., Ethan Furman wrote:
Ah, I think I see -- that 'select' isn't really doing anything is it? The 'if' clause is acting as the 'select' in the gen-exp. But then `sum(c.orders.price)` isn't really Python semantics either, is it... although it could be if it was souped up -- `c.orders` would have to return a customer-based object that was smart enough to return a list of whatever attribute was asked for. That'd be cool. -- ~Ethan~

This sounds like a cool feature, though I'm not sure if exposing the AST directly on the code object is the best approach.. Attaching the AST to the code object implies serializing (and deserializing into nicely sparse heap allocations) it via .pyc files, since code objects are marshalled there. What about improving the parser so that exact start/end positions are recorded for function bodies? This might be represented as 2 cheap integers in RAM, allowing for a helper function in the compiler or inspect modules (inspect.ast()?) to handle the grunt work. Implementations like Micropython could just stub out those fields with -1 or whatever else if desired. One upside to direct attachment would be that a function returned by e.g. eval() with no underlying source file would still have its AST attached, without the caller having to keep hold of the unparsed string, but the downside of RAM/disk/potentially hefty deserialization performance seems to outweigh that. I also wish there was a nicer way of introducing an expression that was to be represented as an AST, but I think that would involve adding another language keyword, and simply overloading the meaning of generators slightly seems preferable to that. :) David On Thu, May 21, 2015 at 09:18:24PM -0400, Ben Hoyt wrote:

On 22/05/2015 02:18, Ben Hoyt wrote:
Hi Python Ideas folks,
[snipped to death]
-Ben
You might find this interesting https://github.com/Psycojoker/baron The introduction states "Baron is a Full Syntax Tree (FST) library for Python. By opposition to an AST which drops some syntax information in the process of its creation (like empty lines, comments, formatting), a FST keeps everything and guarantees the operation fst_to_code(code_to_fst(source_code)) == source_code.". -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
participants (15)
-
Andrew Barnert
-
Ben Hoyt
-
Brett Cannon
-
David Wilson
-
Ethan Furman
-
Greg
-
Guido van Rossum
-
Mark Lawrence
-
Nathaniel Smith
-
Neil Girdhar
-
Nick Coghlan
-
Steven D'Aprano
-
Terry Reedy
-
Wes Turner
-
Yury Selivanov