How to print all expressions that match a regular expression

Steve Holden steve at holdenweb.com
Sun Feb 7 08:07:30 EST 2010


hzhuo1 at gmail.com wrote:
>> So it seems we both misunderstood the problem.
>>
>> I didn't read the top level article until now, and reading it, I can't make
>> sense of it.
>>
> 
> Seems that you should read the whole thing before making a post, or
> else you cannot know what we are talking about.
> Steven doesn't misunderstand me. We are talking about what I need, and
> he tries to help.
> 
> 
> 
>>> "Given the function hashlib.sha256, enumerate all the possible inputs
>>> that give the hexadecimal result
>>> 0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
>> I tried some "parrot" variants but no dice. :-(
>>
>> [snip]
>>
> 
> This is a hash collision problem. Nobody has proved that SHA-256 is
> collision free, even not in the random oracle model, because people
> always suppose that a random oracle exists, and make hash function its
> substitution. That means it may be broken someday. And any provable
> security based on random oracle model is not secure.
> 
It's very easy to prove that no hash function is collision-free, since
the domain (all possible inputs) is much larger than the range (all
possible outputs). Hence there must be many inputs that map to the same
output.

A *good* hash function is unpredictable enough to make finding two
colliding strings impractical - and even the best hash functions that
cryptographers could devise at the time have been broken. We should
remember that "broken" to a cryptographer means something rather
different than it does in common usage, so a broken scheme need not
necessarily be dropped immediately - one would just stop using it in new
systems.

> 
>>> I'm suggesting that, in general, there's no way to tell in advance which
>>> regexes will be easy and which will be hard, and even when they are easy,
>>> the enumeration will often be infinite.
> 
> It is hard to tell in advance. However, we can add some timing limit
> or counting limit, to make it an algorithm, which can halt. For
> example, whenever the program outputs more than 1000000 expressions
> that match the input regex, we can halt because that exceeds our
> limit. But surely this is not efficient because of the post-decision.
> 
>> Essentially, any regexp that includes '+' or '*' (directly or via e.g. notation
>> that denotes "digit sequence") yields an infinite number of strings.
> 
> Infinity is really relative, not absolute. It is relative to the
> computing speed. For example, the regex '^[0|1]{2048}$' is rather
> simple and doesn't contain '+' or '$', but trying to output all
> expressions that match it has a complexity of 2^2048. If we can do
> that, then we can break RSA-2048.
> We must face the reality .
> 
I have always understood that there's a pretty real distinction between
"finite" and "infinite". Are you telling me I am wrong, or are you
merely saying that some finite cases might just as well be infinite for
practical purposes?

And I really don't see how simple enumeration of range(2^2048) breaks
RSA-2048, since that problem requires you to find two factors which,
when multiplied together, give that specific value.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list