On 2014-07-01 19:16, Nick Coghlan wrote:
On 1 July 2014 10:33, Steven D'Aprano steve@pearwood.info wrote:
[Aside: does Python do any sort of verification of the bytecode before executing it, as Java does?]
Nope, it will happily attempt to execute invalid bytecode. That's actually one of the reasons executing untrusted bytecode is even less safe than executing untrusted source code - it's likely to be possible to trigger segfaults that way.
There's an initial attempt at a bytecode verifier on PyPI (https://pypi.python.org/pypi/Python-Bytecode-Verifier/), and I have a vague recollection that Google have a bytecode verifier kicking around somewhere, but there's nothing built in to the CPython runtime.
The re module also uses a kind of bytecode that's generated by the Python front end and verified by the C back end. The bytecode contains things like offsets; for example, the bytecode that starts a repeated sequence has an offset to the corresponding bytecode that ends it, and vice versa.
The problem with that is that the structure (i.e. the nesting) is no longer explicit, so it's more difficult to spot misnested structures.
For the regex module, I decided that it would be easier to verify if I kept the structure explicit by using bytecodes to indicate the start and end of the structures. For example, a repeated sequence could be indicated by having a structure like GREEDY_REPEAT min_count max_count ... END.
The C back end could then build the internal representation that's actually interpreted.