[Python-ideas] Enabling access to the AST for Python code

Neil Girdhar mistersheik at gmail.com
Fri Jul 3 22:42:55 CEST 2015

On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 3 July 2015 at 06:25, Neil Girdhar <mistersheik at gmail.com> wrote:
> > Why would it require "a lot of extra memory"?  A program text size is
> > measured in megabytes, and the AST is typically more compact than the
> code
> > as text.  A few megabytes is nothing.
> It's more complicated than that.
> What happens when we multiply that "nothing" by 10,000 concurrent
> processes across multiple servers. Is it still nothing? How about
> 10,000,000?

I guess we find a way to share data between the processes?

> What does keeping the extra data around do to our CPU level cache
> efficiency? Is there a key data structure we're adding a new pointer
> to? What does *that* do to our performance?

Why would a few megabytes of data affect your CPU level cache?  If I have a
Python program that generates a data structure that's a few megabytes, does
it slow down the rest of the program?

> Where are the AST objects being kept? Do they become part of the
> serialised form of the affected object? If yes, what does that do to
> the wire protocol overhead for inter-process communication, or to the
> size of cached bytecode files? If no, does that mean these objects may
> be missing the AST data when deserialised?

When do you send code objects on the wire?  I'm not even sure if pickle
supports that yet.

When we're talking about sufficiently central data structures, a few
> *bytes* can end up counting as "a lot". Code and function objects
> aren't quite *that* central (unlike, say, tuple instances), but adding
> things to them can still have a significant impact (hence the ability
> to avoid creating docstrings).

Thanks, I'm interested in learning more about this.

There are a lot of messages in this discussion.  Was there a final
consensus about how the AST for a given code object should be calculated?
Was it re-parsing the source?  Was it an import hook?  Something else?  I
want to do this with a personal project.  I realize we may not get the AST
by default, but it would be nice to know how I should best determine it

> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150703/18bc190e/attachment.html>

More information about the Python-ideas mailing list