On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 July 2015 at 06:25, Neil Girdhar <mistersheik@gmail.com> wrote:
> Why would it require "a lot of extra memory"?  A program text size is
> measured in megabytes, and the AST is typically more compact than the code
> as text.  A few megabytes is nothing.

It's more complicated than that.

What happens when we multiply that "nothing" by 10,000 concurrent
processes across multiple servers. Is it still nothing? How about
10,000,000?

I guess we find a way to share data between the processes?
 

What does keeping the extra data around do to our CPU level cache
efficiency? Is there a key data structure we're adding a new pointer
to? What does *that* do to our performance?

Why would a few megabytes of data affect your CPU level cache?  If I have a Python program that generates a data structure that's a few megabytes, does it slow down the rest of the program?
 

Where are the AST objects being kept? Do they become part of the
serialised form of the affected object? If yes, what does that do to
the wire protocol overhead for inter-process communication, or to the
size of cached bytecode files? If no, does that mean these objects may
be missing the AST data when deserialised?

When do you send code objects on the wire?  I'm not even sure if pickle supports that yet.

When we're talking about sufficiently central data structures, a few
*bytes* can end up counting as "a lot". Code and function objects
aren't quite *that* central (unlike, say, tuple instances), but adding
things to them can still have a significant impact (hence the ability
to avoid creating docstrings).

Thanks, I'm interested in learning more about this.  

There are a lot of messages in this discussion.  Was there a final consensus about how the AST for a given code object should be calculated?  Was it re-parsing the source?  Was it an import hook?  Something else?  I want to do this with a personal project.  I realize we may not get the AST by default, but it would be nice to know how I should best determine it myself.
 

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia