greedy match wanted

alexk alexk at mailinator.com
Thu Mar 3 12:31:59 EST 2005


Hi,

I would like to request your help.

My problem is as follows. I want to match urls, and therefore I have a
group
of long valid domain names in my regex:

.... (?:com|org|net|biz|info|ac|cc|gs|ms|
			 sh|st|tc|tf|tj|to|vg|ad|ae|af|ag|
			 com\.ag|ai|off\.ai|al|an|ao|aq|
			 com\.ar|net\.ar|org\.ar|as|at|co\.at| ... ) ...

However, for a url like kuku.com.to it matches the kuku.com part,
while I want it to match the whole kuku.com.to. Notice that both "com"
and "com.to" are present in the group above.

1. How do I give precedence for "com.to" over "com" in the above group
?
Maybe I can somehow sort it by lexicographic order and then by length,
or divide it to a set of sub-groups by length ?

Thanks for any help,
Alex.




More information about the Python-list mailing list