Help with Regex for domain names
MRAB
python at mrabarnett.plus.com
Thu Jul 30 17:28:55 EDT 2009
Nobody wrote:
> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:
>
>>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
>> You might also want to consider that some country
>> codes such as "co" for Columbia might match more than
>> you want, for example:
>>
>> re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')
>>
>> will match.
>
> ... so put \b at the end, i.e.:
>
> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')
>
It would still match "www.bbc.co.uk", so you might need:
regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)')
More information about the Python-list
mailing list