[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Wed Jun 27 11:04:06 EDT 2018

I consider this is a bug -- a violation of Python's (informal) promise to
the user that when CPython segfaults it is not the user's fault.

Given typical Python usage patterns, I don't consider this an important
bug, but maybe someone is interested in trying to fix it.

As far as your application is concerned, I'm not sure that generating code
like that is the right approach. Why don't you generate a data structure
and a little engine that walks the data structure?

On Wed, Jun 27, 2018 at 12:05 AM Fiedler Roman <Roman.Fiedler at ait.ac.at>
wrote:

> Hello List,
>
> Context: we are conducting machine learning experiments that generate some
> kind of nested decision trees. As the tree includes specific decision
> elements (which require custom code to evaluate), we decided to store the
> decision tree (result of the analysis) as generated Python code. Thus the
> decision tree can be transferred to sensor nodes (detectors) that will then
> filter data according to the decision tree when executing the given code.
>
> Tracking down a crash when executing that generated code, we came to
> following simplified reproducer that will cause the interpreter to crash
> (on Python 2/3) when loading the code before execution is started:
>
> #!/usr/bin/python2 -BEsStt
>
> A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
>
> The error message is:
>
> s_push: parser stack overflow
> MemoryError
>
> Despite the machine having 16GB of RAM, the code cannot be loaded.
> Splitting it into two lines using an intermediate variable is the current
> workaround to still get it running after manual adapting.
>
> As discussed on Python security list, crashes when loading such decision
> trees or also mathematical formulas (see bug report [1]) should not be a
> security problem. Even when not directly covered in the Python security
> model documentation [2], this case comes too close to "arbitrary code
> execution", where Python does not attempt to provide any protection. There
> might be only some border cases of affected software,  e.g. Python sandbox
> systems like Zope/Plone or maybe even Python based smart contract
> blockchains like Etherereum (do not know if/where the use/derived work from
> the default Python interpreter for their use). But in both cases they would
> also be too close violating the security model, thus no changes to Python
> required from this side. Thus Python security suggested that the discussion
> should be continued on this list.
>
>
> Even when no security problem involved, the crash is still quite an
> annoyance. Development of code generators can be a tedious tasks. It is
> then somehow frustrating, when your generated code is not accepted by the
> interpreter, even when you do not feel like getting close to some
> system-relevant limits, e.g. 50 elements in a line like above on a 16GB
> machine. You may adapt the generator, but as the error does not include any
> information, which limit you really violated (number of brackets, function
> calls, list definitions?) you can only do experiments or look on the Python
> compiler code to figure that out. Even when you fix it, you have no
> guarantee to hit some other obscure limit the next day or that those limits
> change from one Python minor version to the next causing regressions.
>
> Questions:
>
> * Do you deem it possible/sensible to even attempt to write a Python
> language code generator that will produce non-malicious, syntactically
> valid decision tree code/mathematical formulas and still having a
> sufficiently high probability that the Python interpreter will also run
> that code now and in near future (regressions)?
>
> * Assuming yes to the question above, when generating code, what should be
> the maximal nesting depth a code generator can always expect to be compiled
> on Python 2.7 and 3.5 on? Are there any other similar restrictions that
> need to be considered by the code generator? Or is generating code that way
> not the preferred solution anyway - the code generator should generate e.g.
> binary python code immediately? Note: in the end the exact same logic code
> will run as Python process, it seems it is only about how it is loaded into
> the Python interpreter.
>
> * If not possible/recommended/sensible, we might generate Java-bytecode or
> native x86-code instead, where the likelihood of the (virtual) CPU really
> executing code that is compliant to the language specification (even with
> CPU errata like FDIV-bug et al) might be magnitudes higher than with the
> Python interpreter.
>
> Any feedback appreciated!
>
> Roman
>
> [1] https://bugs.python.org/issue3971)
> [2] http://python-security.readthedocs.io/security.html#security-model
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180627/d08b175c/attachment-0001.html>