[Tutor] regex advise + email validate
Norman Khine
norman at khine.net
Fri Oct 1 15:39:29 CEST 2010
hi steven, thanks for the in-depth info, yes i am aware that email
validation is not full proof until you actually send an email and then
validate that you have a response, i guess dnspython helps a little in
this respect and integration with OpenID and the like. for my needs i
just wanted to find a simple way to include in the regex the .travel
TLD
i suppose international tld's would not be supported either
http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains#Internationalized_country_code_top-level_domains
using the regex.
On Fri, Oct 1, 2010 at 3:20 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, 1 Oct 2010 09:34:01 pm Norman Khine wrote:
>> hello, i have this code
>>
>> http://pastie.org/1193091
>>
>> i would like to extend this so that it validates TLD's such as
>> .travel and .museum, i can do this by changing {2,4} to {2,7} but
>> this sort of defeats the purpose of validating the correct email
>> address.
>
> The only good advice for using regular expressions to validate emails
> addresses is...
>
> Don't.
>
> Just don't even try.
>
> The only way to validate an email address is to actually try to send
> email to it and see if it can be delivered. That is the ONLY way to
> know if an address is valid.
>
> First off, even if you could easily detect invalid addresses -- and you
> can't, but for the sake of the argument let's pretend you can -- then
> this doesn't help you at all. fred at example.com is syntactically valid,
> but I guarantee that it will *never* be deliverable.
>
> asgfkagfkdgfkasdfg at hdsgfjdshgfjhsdfg.com is syntactically correct, and
> it *could* be a real address, but if you can actually deliver mail to
> it, I'll eat my hat.
>
> If you absolutely must try to detect syntactically invalid addresses,
> the most you should bother is to check that the string isn't blank. If
> you don't care about local addresses, you can also check that it
> contains at least one @ sign. (A little known fact is that email
> addresses can contain multiple @ signs.) Other than that, leave it up
> to the mail server to validate the address -- which it does by trying
> to deliver mail to it.
>
> Somebody has created a Perl regex to validate *some* email addresses.
> Even this one doesn't accept all valid addresses, although it comes
> close:
>
> http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
>
> Read it and weep.
>
> See here for more info:
>
> http://northernplanets.blogspot.com/2007/03/how-not-to-validate-email-addresses.html
>
> This is exactly the sort of thing that you should avoid like the plague:
>
> http://www.regular-expressions.info/email.html
>
> This is full of bad advice. This clown arrogantly claims that his
> regex "matches any email address". It doesn't. He then goes on to
> admit "my claim only holds true when one accepts my definition of what
> a valid email address really is". Oh really? What about the RFC that
> *defines* what email addresses are? Shouldn't that count for more than
> the misinformed opinion of somebody who arrogantly dismisses bug
> reports for his regex because it "matches 99% of the email addresses in
> use today"?
>
> 99% sounds like a lot, but if you have 20,000 people use your software,
> that's 200 whose valid email address will be misidentified.
>
> He goes on to admit that his regex wrongly rejects .museum addresses,
> but he considers that acceptable. He seriously suggests that it would
> be a good idea for your program to list all the TLDs, and even all the
> country codes, even though "by the time you read this, the list might
> already be out of date".
>
> This is shonky programming. Avoid it like poison.
>
>
> --
> Steven D'Aprano
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
--
˙uʍop ǝpısdn p,uɹnʇ pןɹoʍ ǝɥʇ ǝǝs noʎ 'ʇuǝɯɐן sǝɯıʇ ǝɥʇ puɐ 'ʇuǝʇuoɔ
ǝq s,ʇǝן ʇǝʎ
%>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or
chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )
More information about the Tutor
mailing list