[Python-Dev] Regular expression bytecode

Jonathan Goble jcgoble3 at gmail.com
Sun Feb 14 13:49:38 EST 2016


I'm new to Python's mailing lists, so please forgive me if I'm sending
this to the wrong list. :)

I filed http://bugs.python.org/issue26336 a few days ago, but now I
think this list might be a better place to get discussion going.
Basically, I'd like to see the bytecode of a compiled regex object
exposed as a public (probably read-only) attribute of the object.

Currently, although compiled in pure Python through modules
sre_compile and sre_parse, the list of opcodes is then passed into C
and copied into an array in a C struct, without being publicly exposed
in any way. The only way for a user to get an internal representation
of the regex is the re.DEBUG flag, which only produces an intermediate
representation rather than the actual bytecode and only goes to
stdout, which makes it useless for someone who wants to examine it
programmatically.

I'm sure others can think of other potential use cases for this, but
one in particular would be that someone could write a debugger that
can allow a user to step through a regex one opcode at a time to see
exactly where it is failing. It would also perhaps be nice to have a
public constructor for the regex object type, which would enable users
to modify the bytecode and directly create a new regex object from it,
similar to what is currently possible through the types.FunctionType
and types.CodeType constructors.

In addition to exposing the code in a public attribute, a helper
module written in Python similar to the dis module (which is for
Python's own bytecode) would be very helpful, allowing the code to be
easily disassembled and examined at a higher level.

Is this a good idea, or am I barking up the wrong tree? I think it's a
great idea, but I'm open to being told this is a horrible idea. :) I
welcome any and all comments both here and on the bug tracker.

Jonathan Goble


More information about the Python-Dev mailing list