re Challenge: More Compact?

John Machin machin_john_888 at hotmail.com
Sun Jul 15 18:29:25 EDT 2001


Tim Daneliuk <tundra at tundraware.com> wrote in message news:<3B51E6B1.47E25214 at tundraware.com>...
> The following re is (I think) the description of a legitimate IP addess in
> "quad" format (IPV4).  My question is, can it be made even shorter?
> 
> ipquad   = r"^((\d\d?\d?\.){3}(\d\d?\d?))$"

Is this a trick question? Oh well, fools rush in ...

What about:
ipquad   = r"^((\d{1,3}\.){3}(\d{1,3}))$"
This saves two bytes. It is semantically equivalent.

Now for the next big two-byte saving: throw away the outer capturing
parentheses.
ipquad   = r"^(\d{1,3}\.){3}(\d{1,3})$"
-- In general, you can still access what matched your entire re by
referring to group 0 when invoking methods of your match object. In
this particular case, you don't even need to do that, as the whole
objective of the exercise is to match the whole input string.

Now for the third big two-byte saving: throw away the rightmost set of
capturing parentheses.
ipquad   = r"^(\d{1,3}\.){3}\d{1,3}$"
Either (a) you have no intention of retrieving parts of the matched
string later, so the "capturing" functionality is of no use to you, or
(b) you haven't considered this: if your input string is
"010.020.030.040", then with your original re, groups 0 and 1 will be
"010.020.030.040", group 2 will be "030." and group 3 will "040" --
about as useful as a hip pocket on an athletic supporter.

OK, so now we've trimmed the text description of the re down a little,
let's revisit the "legitimate IP address" notion. Is 999.999.999.999
now a legitimate IP address? Last I heard, the max for each little
doodad was 255. If so, you may not want to cr[au]ft a complicated re
to try to enforce that. The one-liner freaks would no doubt relish a
challenge to come up with something better than

max(map(int, your_input_string.split("."))) > 255

to use as a test *after* the uncomplicated_re.match() is successful.



More information about the Python-list mailing list