regular expression reverse match?

Ron Adam radam2 at tampabay.rr.com
Wed Oct 29 16:56:37 EST 2003


On Wed, 29 Oct 2003 00:48:46 -0800, "Emile van Sebille"
<emile at fenx.com> wrote:


[snip]

>
>See, I told you it wouldn't work  ;-)    I abused an implementation detail
>that apparently doesn't exist on the version you've got.
>

It's not the first time something doesn't work on a winxp system.  :-)


[snip]

>
>You'll probably want to build a validation routine.  Here's one quick idea:
>
>import re
>
>def oksofar(pattern, test, sample):
>    teststr = "".join([test,sample[len(test):]])
>    return not not re.match(pattern, teststr)
>
>for p,t,s in [
>  (r'^(\d{5})$', '123456', '56789'),
>  (r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
>  (r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
>             ]:
>    print p,t,s,not not re.match(p, s)
>    for ii in range(len(t)):
>        print ii,t[:ii+1], oksofar(p, t[:ii+1], s)
>
>
>HTH,
>
>Emile van Sebille
>emile at fenx.com
>


Ok,  it took me a while to see what you did.  Using a sample to
complete the partial input is a good Idea,  Thanks!  :-)

This works in the case of regular expressions that are linear and only
have one possible path.  

Doesn't work with expressions that have more than one possible path,
such as r'yes$|no$' .   I think that's what Diez was trying to explain
to me.  That this could become rather complex when grouping and
alternate cases are present.



It seems there are three possible routs to take on this.

1.  Process the input  so it will match the re.  This will require
generating a set of samples, 1 for each re branch.  Then matching with
each sample.  Using the method above.

So now the problem becomes....   <starting to feel as though I fell in
a  rabbit hole> ....  is there a way to generate samples where there
is one of each for each re branch?    

2.  Process the re so it will match a subset of all possible final end
points.  ie... a less than operation. 

	if this were a math problem it might look something like this.

	string  <=  yes$ | no$
	string   =   (y | ye | yes)$ | (n | no)$
	string   =   (y$ | ye$ | yes$) | (n$ | no$)
	string   =   y$ | ye$ | yes$ | n$ | no$

I was hoping there was a way to do this already.  Since there isn't,
it would require building a parser to convert strings of 'abc'  to
'a|ab|abc' and handle distributed operations with ^,$ and other
special characters.  And there are probably exceptions that will
complicate the process.    :-/   

This is method is probably only worth doing if it had wider
applications. <shrug>  I don't know enough (yet) to know.


 3.  Process both the input and the re.  

	1.	string  <=  yes$|no$
 
	2.	string  <= yes$
		string <= no$

	3.	generate a sample list, 1 sample for each re
	4.	test the input using each sample for a possible match

This might be doable....  will have to think on it and do a little
experimenting. 


 





More information about the Python-list mailing list