[Python-ideas] .pyu nicode syntax symbols (was Re: Empty set, Empty dict)

MRAB python at mrabarnett.plus.com
Tue Jul 1 20:59:21 CEST 2014


On 2014-07-01 19:16, Nick Coghlan wrote:
> On 1 July 2014 10:33, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> [Aside: does Python do any sort of verification of the bytecode
>> before executing it, as Java does?]
>
> Nope, it will happily attempt to execute invalid bytecode. That's
> actually one of the reasons executing untrusted bytecode is even less
> safe than executing untrusted source code - it's likely to be
> possible to trigger segfaults that way.
>
> There's an initial attempt at a bytecode verifier on PyPI
> (https://pypi.python.org/pypi/Python-Bytecode-Verifier/), and I have
> a vague recollection that Google have a bytecode verifier kicking
> around somewhere, but there's nothing built in to the CPython
> runtime.
>
The re module also uses a kind of bytecode that's generated by the
Python front end and verified by the C back end. The bytecode contains
things like offsets; for example, the bytecode that starts a repeated
sequence has an offset to the corresponding bytecode that ends it, and
vice versa.

The problem with that is that the structure (i.e. the nesting) is no
longer explicit, so it's more difficult to spot misnested structures.

For the regex module, I decided that it would be easier to verify if I
kept the structure explicit by using bytecodes to indicate the start and
end of the structures. For example, a repeated sequence could be
indicated by having a structure like GREEDY_REPEAT min_count max_count
... END.

The C back end could then build the internal representation that's
actually interpreted.


More information about the Python-ideas mailing list