Programming challenge: wildcard exclusion in cartesian products

Chris F Clark cfc at shell01.TheWorld.com
Mon Mar 27 15:01:35 EST 2006


Yes, there is literature on the generating side of the regular
expression/FSM model.  In fact, the matching problem and the
generating problems are exactly equivalent.  A slight variation of the
definition of how a matcher works, turns it into a generator and vice
versa.  To directly generate (rather than match) from an FSM, one
simply does a walk (e.g. depth first search or breath first search)
over the machine writing out (rather than matching) the symbols used
as labels.  Thus, all the theorems about matching apply to generating
and also in reverse.

You can see this to some extent form the problem posed.  If one
generates Sigma* and subtracts out the elements from some regular
language (say by matching them), that is exactly equivalent (in
strings generated) to generating the complement of the regular
language.  

In fact, it is quite easy (with the correct regular expression tool,
i.e. one that handles regular expression differences) to take the
problems posed and generate provably minimal (i.e. provably maximally
efficient) generation (or matching) programs as FSMs.  The provably
minimal FSM won't go down any paths that don't have some sequence that
generates an "accept" value.  It is worth noting the regular languages
are closed under the difference operator, so that resulting language
from substracting one regular expression from another is still a
regular language, which can be used to prove that the machine is
minimal.

Therefore, while the output can be exponentially larger than the
input, one should expect that implementations should be able to
generate the output in linear time (relative to the size of the
output), since FSMs run in linear time relative to the string they are
processing (whether generating or matching).  Under a reasonable model
of computation, one cannot do better than linear in the size of string
processed.

I'm sure if you asked under comp.theory, you would get tons of other
relevant facts from someone who understands automata theory better
than I.  Note, if one does ask there, one should correct the notation.
The "*" symbol was used as in globbing, not as commonly used in
regular expressions.  The notation ".*" (as someone corrected in one
of their replies) is the normal notation for what the original poster
wanted by "*".

Hope this helps,
-Chris

*****************************************************************************
Chris Clark                    Internet   :  compres at world.std.com
Compiler Resources, Inc.       Web Site   :  http://world.std.com/~compres  
23 Bailey Rd                   voice      :  (508) 435-5016
Berlin, MA  01503  USA         fax        :  (978) 838-0263  (24 hours)
------------------------------------------------------------------------------



More information about the Python-list mailing list