[ python-Feature Requests-1701452 ] Feature request: Comparing regexps

SourceForge.net noreply at sourceforge.net
Thu Apr 19 08:44:11 CEST 2007


Feature Requests item #1701452, was opened at 2007-04-16 07:28
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Thomas Dybdahl Ahle (thomasda)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: Feature request: Comparing regexps

Initial Comment:
It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-19 01:44

Message:
Logged In: YES 
user_id=80475
Originator: NO

Am closing the feature request because it has very little chance of making
it into the core distribution without being a third-party module first. 
The existing implementation is already somewhat difficult to maintain and
we don't want to make that worse.  Also, the new functionality is of
interest to only a small subset of the re module users.

If you do write it, respect the GPL license on the SML code and take some
care to make sure you can license your new code in any way you see fit.



----------------------------------------------------------------------

Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-18 09:33

Message:
Logged In: YES 
user_id=1304417
Originator: YES

I talked with the guy who wrote the ZZ comparator.

"""I can give you the source code under the GPL if you like. However, I  
think it would be difficult to port to python. It's 1100 lines of  
very SML-style mostly-uncommented code. Do you know SML?

It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.  
I would expect credit for the algorithm. :-) Let me know if you  
succeed in porting it."""

I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:

"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)

For comparing regular expressions I take the following additional  
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""

It could be cool.
Otherwise I can also try to learn sml and port it.

----------------------------------------------------------------------

Comment By: Thomas Dybdahl Ahle (thomasda)
Date: 2007-04-17 02:51

Message:
Logged In: YES 
user_id=1304417
Originator: YES

I found this page with the function to do so:
http://terpstra.ca/compare.html

I also found this thread:
http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses
how to do this with some canonical formed expressions. 

A function like this would really be able to speed some applications up,
which matches a lot of strings with a number of expressions. If you know
which ones are disjoint, you can break the test when one test matches.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-16 15:43

Message:
Logged In: YES 
user_id=80475
Originator: NO

Can this be done in the existing implementation by comparing the racetrack
diagrams (character transitions)?

Thomas, do you know of anywhere this have been done (third-party modules
or in other languages)?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470


More information about the Python-bugs-list mailing list