Re: [Python-ideas] Exposing regular expression bytecode
OK, thanks for the comments, everyone. I'm glad to hear that people generally think this is a useful idea. Some specific replies: On Tue, Feb 16, 2016 at 4:22 AM, Chris Angelico <rosuav@gmail.com> wrote:
For what it's worth, I read your post with interest, but didn't have anything substantive to reply - mainly because I don't use regexes much. But it would be rather cool to be able to decompile a regex. Imagine a regex pretty-printer: compile an arbitrary string, and if it's a valid regex, decompile it to a valid source code form, using re.VERBOSE. That could help _hugely_ with debugging, if the trick can be pulled off.
ChrisA
That's exactly the type of tools I envision being made available by third parties. Depending on how much I get invested into this project, I may even write such a tool myself (though that's not guaranteed). On Tue, Feb 16, 2016 at 4:55 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Sorry. I don't personally have any issue with the proposal, and it sounds like a reasonable idea. I don't think it's likely to be *hugely* controversial - although it will likely need a little care in documenting the feature to ensure that we are clear that there's no guarantees of backward compatibility that we don't want to commit to on the newly - exposed data. And we should also ensure that by exposing this information, we don't preclude changes such as the incorporation of the regex module (I don't know if the regex module has a bytecode implementation like the re module does).
The regex implementation is indeed something I would need to investigate here, and will do so before I go too far.
The next step is probably simply to raise a tracker issue for this. I know you said you have little C experience, but the big problem is that it's unlikely that any of the core devs with C experience will have the time or motivation to code up your idea. So without a working patch, and someone willing and able to respond to comments on the patch, it's not likely to progress.
Tracker issue is already filed: http://bugs.python.org/issue26336 I actually filed the issue before I realized that the mailing lists were a better place to discuss it.
But if you are willing to dig into Python's C API yourself (and it sounds like you are) there are definitely people who will help you. You might want to join the core mentorship list (see http://pythonmentors.com/) where you should get plenty of assistance. This proposal sounds like a great "beginner" task, as well - so even if you don't want to implement it yourself, still put it on the tracker, and mark it as an "easy" change, and maybe some other newcomer who wants a task to help them learn the C API will pick it up.
I'll look into the mentorship list; thanks for the link. As for marking it "easy", I don't seem to have the necessary permissions to change the Keywords field; perhaps you or someone else can set that flag for me? If so, I'd appreciate it. :-)
Hope that helps - thanks for the suggestion and sorry if it seems like no-one was interested at first. It's an unfortunate fact of life around here that things *do* take time to get people's interest. You mention patience in one of your messages - that's definitely something you'll need to cultivate, I'm afraid... :-)
Patience is something I've been working on since I was a little kid. I'm 29 years old now, and it still eludes me from time to time. But yes, it's something I'll have to work on. :-P Also, I received a small patch off-list from Petr Viktorin implementing a getter for the code list (thanks, Petr). I'll need to test it, but from the little I know of the C API it looks like it will get me started in the right direction. Assuming that works, what's left is a public constructor for the regex type (to enable optimizers), a dis-like module, and docs and tests. I don't think this would be major enough to require a PEP, but of course being new here, I'm open to being told I'm wrong. :-)
Executive summary: Very process-meta. Suggestion: good GSoC project? Jonathan Goble writes:
That's exactly the type of tools I envision being made available by third parties.
Experience shows that such visions mostly remain dreams. Dreaming is good, but Python demands somewhat more for inclusion in the core. Not that much -- Victor Stinner's FAT Python is a good example. (I think that's being discussed on the python-dev list, easy to find in the archives.) What he's *actually* doing is (conceptually, I haven't looked at the actual patch) a somewhat invasive modification of the core compilation process. But along with that he's demonstrated several practical optimizations that are enabled by his change.[1] (Note that he brought the patch with him when proposing his change, too. I guess he wrote the code when he woke up in the morning. :-) That created a certain amount of buzz, and some people see the needed changes as simplifying the whole process, which brought them on board.
Depending on how much I get invested into this project, I may even write such a tool myself (though that's not guaranteed).
This is exactly backwards from the point of view of getting it into the stdlib. What the deafening silence was saying (and what the actual posts say!) is that nobody else is going to do it. Features that aren't going to be exploited fairly soon are complications, and that is against a fundamental design principle (the "Zen of Python", try "python -m this | grep -i comp" if you haven't seen it before). Which gives me an idea: Victor proposed writing optimizations for FAT Python as a Google Summer of Code project for students. If you can find an experienced developer to mentor with you, the basic change sounds like an easy project for a student, and the tool like something quite advanced but still feasible. If you want to know more, write me off-list. I can't mentor, but I can help with the admin details. FMI: https://wiki.python.org/moin/SummerOfCode/2016
On Tue, Feb 16, 2016 at 4:55 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Sorry. I don't personally have any issue with the proposal, and it sounds like a reasonable idea. I don't think it's likely to be *hugely* controversial
Agreed. The only real risks in exposing an existing internal attribute are (1) complication and (2) future maintenance cost for a little-used feature (eg, if Python decides to anoint regex). It might languish in the tracker for quite a while if you can't demonstrate real use cases, though. (The educational aspect of being able to merely list the compiled bytecodes readably might be enough, but I would bet against it.)
I don't think this would be major enough to require a PEP,
Definitely, bet against that. This doesn't change the language or violate backward compatibility at all, and design looks quite straightforward, including the potential tools (ISTR you saying you've seen them for other regexp engines? at least the UI, and maybe the algorithms, can be borrowed). Footnotes: [1] They all give 1-10% on microbenchmarks, so individually they're insignificant. But as the apocryphal congressman said about billions of dollars, "1% here, 5% there, and pretty soon you're talking about perceptible speedups", and that gets certain people excited.
Executive summary: Very process-meta. Suggestion: good GSoC project? Jonathan Goble writes:
That's exactly the type of tools I envision being made available by third parties.
Experience shows that such visions mostly remain dreams. Dreaming is good, but Python demands somewhat more for inclusion in the core. Not that much -- Victor Stinner's FAT Python is a good example. (I think that's being discussed on the python-dev list, easy to find in the archives.) What he's *actually* doing is (conceptually, I haven't looked at the actual patch) a somewhat invasive modification of the core compilation process. But along with that he's demonstrated several practical optimizations that are enabled by his change.[1] (Note that he brought the patch with him when proposing his change, too. I guess he wrote the code when he woke up in the morning. :-) That created a certain amount of buzz -- optimization proposals always do :-), and some people see the needed changes as simplifying the whole process, which brought them on board.
Depending on how much I get invested into this project, I may even write such a tool myself (though that's not guaranteed).
This is exactly backwards from the point of view of getting it into the stdlib. What the deafening silence was saying (and what the actual posts say!) is that nobody else is going to do it. Features that aren't going to be exploited fairly soon are complications, and that is against a fundamental design principle (the "Zen of Python", try "python -m this | grep -i comp" if you haven't seen it before). Which gives me an idea: Victor proposed writing optimizations for FAT Python as a Google Summer of Code project for students. If you can find an experienced developer to mentor with you, the basic change sounds like an easy project for a student, and the tool like something quite advanced but still feasible. If you want to know more, write me off-list, or better yet ask on core-mentorship. I can't mentor, but I can help with the admin details. FMI: https://wiki.python.org/moin/SummerOfCode/2016
On Tue, Feb 16, 2016 at 4:55 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Sorry. I don't personally have any issue with the proposal, and it sounds like a reasonable idea. I don't think it's likely to be *hugely* controversial
Agreed. The only real risks in exposing an existing internal attribute are (1) complication and (2) future maintenance cost for a little-used feature (eg, if Python decides to anoint regex). It might languish in the tracker for quite a while if you can't demonstrate real use cases, though. (The educational aspect of being able to merely list the compiled bytecodes readably might be enough, but I would bet against it.)
I don't think this would be major enough to require a PEP,
Definitely, bet against that. This doesn't change the language or violate backward compatibility at all, and design looks quite straightforward, including the potential tools (ISTR you saying you've seen them for other regexp engines? at least the UI, and maybe the algorithms, can be borrowed). Footnotes: [1] They all give 1-10% on microbenchmarks, so individually they're insignificant. But as the apocryphal congressman said about billions of dollars, "1% here, 5% there, and pretty soon you're talking about perceptible speedups", and that gets certain people excited.
On Tue, Feb 16, 2016 at 9:25 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Executive summary: Very process-meta. Suggestion: good GSoC project?
Thanks for the details about Python's processes; I'm learning about them as I go. I now see that this may be a bit harder to get pushed through than I had initially thought, but at least for now, I'll keep slowly pushing toward it. As for GSoC, that's an interesting idea. I'm not a student myself (29 years old and on disability), though, but that does seem like a possible way forward if a student wants to take it on. I'll see about putting something together to post on core-mentorship tonight or tomorrow (and of course, I'm still open to doing work on it myself as well).
On 16 February 2016 at 18:29, Jonathan Goble <jcgoble3@gmail.com> wrote:
But if you are willing to dig into Python's C API yourself (and it sounds like you are) there are definitely people who will help you. You might want to join the core mentorship list (see http://pythonmentors.com/) where you should get plenty of assistance. This proposal sounds like a great "beginner" task, as well - so even if you don't want to implement it yourself, still put it on the tracker, and mark it as an "easy" change, and maybe some other newcomer who wants a task to help them learn the C API will pick it up.
I'll look into the mentorship list; thanks for the link. As for marking it "easy", I don't seem to have the necessary permissions to change the Keywords field; perhaps you or someone else can set that flag for me? If so, I'd appreciate it. :-)
Done (although the "easy" keyword is mainly to help others know it's something they could work on, so if you're working on it it's less relevant :-))
Also, I received a small patch off-list from Petr Viktorin implementing a getter for the code list (thanks, Petr). I'll need to test it, but from the little I know of the C API it looks like it will get me started in the right direction. Assuming that works, what's left is a public constructor for the regex type (to enable optimizers), a dis-like module, and docs and tests. I don't think this would be major enough to require a PEP, but of course being new here, I'm open to being told I'm wrong. :-)
IMO, this shouldn't need a PEP, unless someone feels that exposing the code list implies a compatibility guarantee (personally, I don't - I think that it should be seen in the same light as the dis module, and as a CPython implementation detail). Paul
FWIW, I've decided to shelve this idea for the time being, at least, as I've had some things come up unexpectedly that are going to eat into my available time for the foreseeable future, so I no longer have time to pursue this myself. (Suffice it to say that Real Life always has crappy timing. :-P) Maybe I'll have time to resurrect it in the future; in the meantime, the issue on the bug tracker remains open in the event someone gets bored and decides to take a crack at it.
2016-02-19 11:42 GMT-08:00 Jonathan Goble <jcgoble3@gmail.com>:
FWIW, I've decided to shelve this idea for the time being, at least, as I've had some things come up unexpectedly that are going to eat into my available time for the foreseeable future, so I no longer have time to pursue this myself. (Suffice it to say that Real Life always has crappy timing. :-P)
Maybe I'll have time to resurrect it in the future; in the meantime, the issue on the bug tracker remains open in the event someone gets bored and decides to take a crack at it.
I found the bugtracker issue during the PyCon sprints and decided to implement the feature: http://bugs.python.org/issue26336.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Jun 5, 2016 at 3:36 AM, Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
2016-02-19 11:42 GMT-08:00 Jonathan Goble <jcgoble3@gmail.com>:
FWIW, I've decided to shelve this idea for the time being, at least, as I've had some things come up unexpectedly that are going to eat into my available time for the foreseeable future, so I no longer have time to pursue this myself. (Suffice it to say that Real Life always has crappy timing. :-P)
Maybe I'll have time to resurrect it in the future; in the meantime, the issue on the bug tracker remains open in the event someone gets bored and decides to take a crack at it.
I found the bugtracker issue during the PyCon sprints and decided to implement the feature: http://bugs.python.org/issue26336.
Great, thanks! I had actually forgotten about this, as real life is still pretty hectic (I'm currently in the middle of searching for a job). I'm glad someone is interested in working on it.
participants (4)
-
Jelle Zijlstra
-
Jonathan Goble
-
Paul Moore
-
Stephen J. Turnbull