[Python-ideas] Enabling access to the AST for Python code

Fri Jul 3 12:20:14 CEST 2015

On 3 July 2015 at 06:25, Neil Girdhar <mistersheik at gmail.com> wrote:
> Why would it require "a lot of extra memory"?  A program text size is
> measured in megabytes, and the AST is typically more compact than the code
> as text.  A few megabytes is nothing.

It's more complicated than that.

What happens when we multiply that "nothing" by 10,000 concurrent
processes across multiple servers. Is it still nothing? How about
10,000,000?

What does keeping the extra data around do to our CPU level cache
efficiency? Is there a key data structure we're adding a new pointer
to? What does *that* do to our performance?

Where are the AST objects being kept? Do they become part of the
serialised form of the affected object? If yes, what does that do to
the wire protocol overhead for inter-process communication, or to the
size of cached bytecode files? If no, does that mean these objects may
be missing the AST data when deserialised?

When we're talking about sufficiently central data structures, a few
*bytes* can end up counting as "a lot". Code and function objects
aren't quite *that* central (unlike, say, tuple instances), but adding
things to them can still have a significant impact (hence the ability
to avoid creating docstrings).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia