How to print all expressions that match a regular expression
Tim Chase
python.list at tim.thechases.com
Sun Feb 7 08:47:17 EST 2010
hzhuo1 at gmail.com wrote:
>>> "Given the function hashlib.sha256, enumerate all the possible inputs
>>> that give the hexadecimal result
>>> 0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
>
> This is a hash collision problem. Nobody has proved that SHA-256 is
> collision free
It's actually pretty easy to prove that it is *not* collision
free. The SHA-256 encodes 512 bits of data. So the the process
of encoding (2**512)+1 distinct inputs incurs a collision in
SHA-256 space as soon as you've hit (2**512)+1 if not earlier.
to start you off:
sha_backmap = {}
for i in xrange((2**512)+2):
hash = sha(str(i))
if hash in sha_backmap:
print "Collision found: %i and %i" % (
i, sha_backmap[hash])
break
sha_backmap[hash] = i
Though it might take a computer the size of the universe, so I'm
guessing that the first collision encountered is with "42". I
leave the actual calculation and hashing of all possible
combinations of 513 bits of data as an exercise to the reader
with a lot of time on their hands or a quantum computer under
their desk ;-)
> It is hard to tell in advance. However, we can add some timing limit
> or counting limit, to make it an algorithm, which can halt. For
> example, whenever the program outputs more than 1000000 expressions
> that match the input regex, we can halt because that exceeds our
> limit. But surely this is not efficient because of the post-decision.
As mentioned, it sounds like you either want a depth-first of the
solution space that raises exceptions on an infinite/unbounded
operator ("*", "+", and "{N,}" as mentioned in another email), or
if you want to handle those operators, do a breadth-first search
of the solution-space and track your depth (or time taken, or
previous number of multi-factor atoms if you desire) to ensure
you don't exceed a certain depth. But you're still talking a
combinatorial number of solutions for even simple regexps.
-tkc
More information about the Python-list
mailing list