[ python-Feature Requests-1701452 ] Feature request: Comparing regexps
SourceForge.net
noreply at sourceforge.net
Thu Apr 19 08:44:11 CEST 2007
Feature Requests item #1701452, was opened at 2007-04-16 07:28
Message generated for change (Comment added) made by rhettinger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Thomas Dybdahl Ahle (thomasda)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: Feature request: Comparing regexps
Initial Comment:
It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.
----------------------------------------------------------------------
>Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-19 01:44
Message:
Logged In: YES
user_id=80475
Originator: NO
Am closing the feature request because it has very little chance of making
it into the core distribution without being a third-party module first.
The existing implementation is already somewhat difficult to maintain and
we don't want to make that worse. Also, the new functionality is of
interest to only a small subset of the re module users.
If you do write it, respect the GPL license on the SML code and take some
care to make sure you can license your new code in any way you see fit.
----------------------------------------------------------------------
Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-18 09:33
Message:
Logged In: YES
user_id=1304417
Originator: YES
I talked with the guy who wrote the ZZ comparator.
"""I can give you the source code under the GPL if you like. However, I
think it would be difficult to port to python. It's 1100 lines of
very SML-style mostly-uncommented code. Do you know SML?
It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.
I would expect credit for the algorithm. :-) Let me know if you
succeed in porting it."""
I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:
"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)
For comparing regular expressions I take the following additional
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""
It could be cool.
Otherwise I can also try to learn sml and port it.
----------------------------------------------------------------------
Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-17 02:51
Message:
Logged In: YES
user_id=1304417
Originator: YES
I found this page with the function to do so:
http://terpstra.ca/compare.html
I also found this thread:
http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses
how to do this with some canonical formed expressions.
A function like this would really be able to speed some applications up,
which matches a lot of strings with a number of expressions. If you know
which ones are disjoint, you can break the test when one test matches.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-16 15:43
Message:
Logged In: YES
user_id=80475
Originator: NO
Can this be done in the existing implementation by comparing the racetrack
diagrams (character transitions)?
Thomas, do you know of anywhere this have been done (third-party modules
or in other languages)?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
More information about the Python-bugs-list
mailing list