How to print all expressions that match a regular expression
Alf P. Steinbach
alfps at start.no
Sat Feb 6 23:38:38 EST 2010
* hzhuo1 at gmail.com:
>> So it seems we both misunderstood the problem.
>>
>> I didn't read the top level article until now, and reading it, I can't make
>> sense of it.
>>
>
> [1] Seems that you should read the whole thing before making a post, or
> else you cannot know what we are talking about.
> Steven doesn't misunderstand me. We are talking about what I need, and
> he tries to help.
If you were not misunderstood then you've posted a question for which there's no
practical solution.
>>> "Given the function hashlib.sha256, enumerate all the possible inputs
>>> that give the hexadecimal result
>>> 0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
>> I tried some "parrot" variants but no dice. :-(
>>
>> [snip]
>>
>
> This is a hash collision problem. Nobody has proved that SHA-256 is
> collision free, even not in the random oracle model, because people
> always suppose that a random oracle exists, and make hash function its
> substitution. That means it may be broken someday. And any provable
> security based on random oracle model is not secure.
Stephen's little challenge wasn't about breaking SHA-256 but about guessing his
secret phrase, given his clues.
>>> I'm suggesting that, in general, there's no way to tell in advance which
>>> regexes will be easy and which will be hard, and even when they are easy,
>>> the enumeration will often be infinite.
>
> It is hard to tell in advance.
No, it's trivial.
> [2] However, we can add some timing limit
> or counting limit, to make it an algorithm, which can halt. For
> example, whenever the program outputs more than 1000000 expressions
> that match the input regex, we can halt because that exceeds our
> limit. But surely this is not efficient because of the post-decision.
You don't need to wait for that output to complete. You can calculate the number
of strings up front. Like it appears that you do below:
>> Essentially, any regexp that includes '+' or '*' (directly or via e.g. notation
>> that denotes "digit sequence") yields an infinite number of strings.
>
> Infinity is really relative, not absolute. It is relative to the
> computing speed. For example, the regex '^[0|1]{2048}$' is rather
> simple and doesn't contain '+' or '$', but trying to output all
> expressions that match it has a complexity of 2^2048. If we can do
> that, then we can break RSA-2048.
> We must face the reality .
Here it seems that you have no problem calculating number of combinations, yet
above, at the paragraph marked [2], you talk about waiting for a million strings
to be output before seeing that it's too much, and earlier, at the paragraph
marked [1], you maintain that your original question about generating all such
strings (completely impractical) was what you wanted help with?
I'm sorry but I can't make sense of this; it appears to be meaningless.
Perhaps if you tried to clarify the requirements a bit.
Cheers & hth.,
- Alf
More information about the Python-list
mailing list