[Python-ideas] Exposing regular expression bytecode
Jonathan Goble
jcgoble3 at gmail.com
Mon Feb 15 00:48:24 EST 2016
(This was previously sent to python-dev [1], but it was suggested that
I bring it here first.)
I filed http://bugs.python.org/issue26336 a few days ago, but now I
think this list might be a better place to get discussion going.
Basically, I'd like to see the bytecode of a compiled regex object
exposed as a public (probably read-only) attribute of the object.
Currently, although compiled in pure Python through modules
sre_compile and sre_parse, the list of opcodes is then passed into C
and copied into an array in a C struct, without being publicly exposed
in any way. The only way for a user to get an internal representation
of the regex is the re.DEBUG flag, which only produces an intermediate
representation rather than the actual bytecode and only goes to
stdout, which makes it useless for someone who wants to examine it
programmatically.
I'm sure others can think of other potential use cases for this, but
one in particular would be that someone could write a debugger that
can allow a user to step through a regex one opcode at a time to see
exactly where it is failing. It would also perhaps be nice to have a
public constructor for the regex object type, which would enable users
to modify the bytecode and directly create a new regex object from it,
similar to what is currently possible for function bytecode through
the types.FunctionType and types.CodeType constructors. This would
make possible things such as optimizers.
In addition to exposing the code in a public attribute, a helper
module written in Python similar to the dis module (which is for
Python's own bytecode) would be very helpful, allowing the code to be
easily disassembled and examined at a higher level.
Is this a good idea, or am I barking up the wrong tree? I think it's a
great idea, but I'm open to being told this is a horrible idea. :) I
welcome any and all comments both here and on the bug tracker.
[1] https://mail.python.org/pipermail/python-dev/2016-February/143355.html
More information about the Python-ideas
mailing list