What is wrong with this regex for matching emails?
Ned Batchelder
ned at nedbatchelder.com
Sun Dec 17 12:01:14 EST 2017
On 12/17/17 10:29 AM, Peng Yu wrote:
> Hi,
>
> I would like to extract "abc at efg.hij.xyz". But it only shows ".hij".
> Does anybody see what is wrong with it? Thanks.
>
> $ cat main.py
> #!/usr/bin/env python
> # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:
>
> import re
> email_regex = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)')
> s = 'abc at efg.hij.xyz.'
> for email in re.findall(email_regex, s):
> print email
>
> $ ./main.py
> .hij
>
There are two problems: you have a group at the end to match .something,
but you need to make that 1-or-more of those, with a +. Second,
re.findall will only return the matched groups, so you need to change
your final group to be a non-capturing group, with (?:...)
email_regex =
re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)+')
--Ned.
More information about the Python-list
mailing list