What is wrong with this regex for matching emails?
alister
alister.ware at ntlworld.com
Tue Dec 19 15:21:52 EST 2017
On Mon, 18 Dec 2017 07:57:27 +1100, Ben Finney wrote:
> Peng Yu <pengyu.ut at gmail.com> writes:
>
>> Hi,
>>
>> I would like to extract "abc at efg.hij.xyz". But it only shows ".hij".
>
> Others have address this question. I'll answer a separate one:
>
>> Does anybody see what is wrong with it? Thanks.
>
> One thing that's wrong with it is that it is far too restrictive.
>
>> email_regex =
>> re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)')
>
> This excludes a great many email addresses that are valid. Please don't
> try to restrict a match for email addresses that will exclude actual
> email addresses.
>
> For an authoritative guide to matching email addresses, see RFC 3696 §3
> <URL:https://tools.ietf.org/html/rfc3696#section-3>.
>
> A more correct match would boil down to:
>
> * Match any printable Unicode characters (not just ASCII).
>
> * Locate the *last* ‘@’ character. (An email address may contain more
> than one ‘@’ character; you should allow any printable ASCII character
> in the local part.)
>
> * Match the domain part as the text after the last ‘@’ character. Match
> the local part as anything before that character. Reject an address
> that has either of these empty.
>
> * Validate the domain by DNS request. Your program is not an authority
> for what domains are valid; the only authority for that is the DNS.
>
> * Don't validate the local part at all. Your program is not an authority
> for what local parts are accepted to the destination host; the only
> authority for that is the destination mail host.
At which point you have basicaly boiled your test down to
<Anything>@<anything>.<anything> which is rather pointless
there are only 2 reasons why you would want an email anyway
1) Data mining, just to add to your mailing list- in which case even if
it validates you still don't know if it is a fake address to prevent spam
so validating is pointless
2) it is part of a registration process, in which case if it is incorrect
the registration email will not be received & registration cannot be
completed so self validating without any effort.
--
OMNIVERSAL AWARENESS?? Oh, YEH!! First you need four GALLONS of JELL-O
and a BIG WRENCH!! ... I think you drop th'WRENCH in the JELL-O as if
it was a FLAVOR, or an INGREDIENT ... ... or ... I ... um ... WHERE'S
the WASHING MACHINES?
More information about the Python-list
mailing list