[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Wed Jun 27 11:12:03 EDT 2018

The OP says "crash" (implying some kind of segfault) but here the
snippet raises a mere exception:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
s_push: parser stack overflow
MemoryError
>>> 

Regards

Antoine.

On Wed, 27 Jun 2018 08:04:06 -0700
Guido van Rossum <guido at python.org> wrote:
> I consider this is a bug -- a violation of Python's (informal) promise to
> the user that when CPython segfaults it is not the user's fault.
> 
> Given typical Python usage patterns, I don't consider this an important
> bug, but maybe someone is interested in trying to fix it.
> 
> As far as your application is concerned, I'm not sure that generating code
> like that is the right approach. Why don't you generate a data structure
> and a little engine that walks the data structure?
> 
> On Wed, Jun 27, 2018 at 12:05 AM Fiedler Roman <Roman.Fiedler at ait.ac.at>
> wrote:
> 
> > Hello List,
> >
> > Context: we are conducting machine learning experiments that generate some
> > kind of nested decision trees. As the tree includes specific decision
> > elements (which require custom code to evaluate), we decided to store the
> > decision tree (result of the analysis) as generated Python code. Thus the
> > decision tree can be transferred to sensor nodes (detectors) that will then
> > filter data according to the decision tree when executing the given code.
> >
> > Tracking down a crash when executing that generated code, we came to
> > following simplified reproducer that will cause the interpreter to crash
> > (on Python 2/3) when loading the code before execution is started:
> >
> > #!/usr/bin/python2 -BEsStt
> >
> > A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
> >
> > The error message is:
> >
> > s_push: parser stack overflow
> > MemoryError
> >
> > Despite the machine having 16GB of RAM, the code cannot be loaded.
> > Splitting it into two lines using an intermediate variable is the current
> > workaround to still get it running after manual adapting.
> >
> > As discussed on Python security list, crashes when loading such decision
> > trees or also mathematical formulas (see bug report [1]) should not be a
> > security problem. Even when not directly covered in the Python security
> > model documentation [2], this case comes too close to "arbitrary code
> > execution", where Python does not attempt to provide any protection. There
> > might be only some border cases of affected software,  e.g. Python sandbox
> > systems like Zope/Plone or maybe even Python based smart contract
> > blockchains like Etherereum (do not know if/where the use/derived work from
> > the default Python interpreter for their use). But in both cases they would
> > also be too close violating the security model, thus no changes to Python
> > required from this side. Thus Python security suggested that the discussion
> > should be continued on this list.
> >
> >
> > Even when no security problem involved, the crash is still quite an
> > annoyance. Development of code generators can be a tedious tasks. It is
> > then somehow frustrating, when your generated code is not accepted by the
> > interpreter, even when you do not feel like getting close to some
> > system-relevant limits, e.g. 50 elements in a line like above on a 16GB
> > machine. You may adapt the generator, but as the error does not include any
> > information, which limit you really violated (number of brackets, function
> > calls, list definitions?) you can only do experiments or look on the Python
> > compiler code to figure that out. Even when you fix it, you have no
> > guarantee to hit some other obscure limit the next day or that those limits
> > change from one Python minor version to the next causing regressions.
> >
> > Questions:
> >
> > * Do you deem it possible/sensible to even attempt to write a Python
> > language code generator that will produce non-malicious, syntactically
> > valid decision tree code/mathematical formulas and still having a
> > sufficiently high probability that the Python interpreter will also run
> > that code now and in near future (regressions)?
> >
> > * Assuming yes to the question above, when generating code, what should be
> > the maximal nesting depth a code generator can always expect to be compiled
> > on Python 2.7 and 3.5 on? Are there any other similar restrictions that
> > need to be considered by the code generator? Or is generating code that way
> > not the preferred solution anyway - the code generator should generate e.g.
> > binary python code immediately? Note: in the end the exact same logic code
> > will run as Python process, it seems it is only about how it is loaded into
> > the Python interpreter.
> >
> > * If not possible/recommended/sensible, we might generate Java-bytecode or
> > native x86-code instead, where the likelihood of the (virtual) CPU really
> > executing code that is compliant to the language specification (even with
> > CPU errata like FDIV-bug et al) might be magnitudes higher than with the
> > Python interpreter.
> >
> > Any feedback appreciated!
> >
> > Roman
> >
> > [1] https://bugs.python.org/issue3971)
> > [2] http://python-security.readthedocs.io/security.html#security-model
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >  
> 
>