Parsing for email addresses

Tim Chase python.list at
Tue Feb 16 01:35:21 CET 2010

Jonathan Gardner wrote:
> On Feb 15, 3:34 pm, galileo228 <mattbar... at> wrote:
>> I'm trying to write python code that will open a textfile and find the
>> email addresses inside it. I then want the code to take just the
>> characters to the left of the "@" symbol, and place them in a list.
>> (So if galileo... at was in the file, 'galileo228' would be
>> added to the list.)
>> Any suggestions would be much appeciated!
> You may want to use regexes for this. For every match, split on '@'
> and take the first bit.
> Note that the actual specification for email addresses is far more
> than a single regex can handle. However, for almost every single case
> out there nowadays, a regex will get what you need.

You can even capture the part as you find the regexps.  As 
Jonathan mentions, finding RFC-compliant email addresses can be a 
hairy/intractable problem.  But you can get a pretty close 

   import re

   r = re.compile(r'([-\w._+]+)@(?:[-\w]+\.)+(?:\w{2,5})', re.I)
   #                                        ^
   # if you want to allow local domains like
   #   user at localhost
   # then change the "+" marked with the "^"
   # to a "*" and the "{2,5}" to "+" to unlimit
   # the TLD.  This will change the outcome
   # of the last test "jim at com" to True

   for test, expected in (
       ('jim at', True),
       ('jim at', True),
       ('', False),
       ('', False),
       ('@com', False),
       ('jim at com', False),
     m = r.match(test)
     if bool(m) ^ expected:
       print "Failed: %r should be %s" % (test, expected)

   emails = set()
   for line in file('test.txt'):
     for match in r.finditer(line):
   print "All the emails:",
   print ', '.join(emails)


More information about the Python-list mailing list