Help with Regex for domain names

MRAB python at mrabarnett.plus.com
Thu Jul 30 17:28:55 EDT 2009


Nobody wrote:
> On Thu, 30 Jul 2009 10:29:09 -0700, rurpy wrote:
> 
>>> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)')
>> You might also want to consider that some country
>> codes such as "co" for Columbia might match more than
>> you want, for example:
>>
>>   re.match(r'[\w\-\.]+\.(?:us|au|de|co)', 'foo.boo.com')
>>
>> will match.
> 
> ... so put \b at the end, i.e.:
> 
> regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b')
> 
It would still match "www.bbc.co.uk", so you might need:

regex = re.compile(r'[\w\-\.]+\.(?:us|au|de)\b(?!\.\b)')



More information about the Python-list mailing list