Parsing for email addresses

Tim Chase python.list at tim.thechases.com
Tue Feb 16 21:15:30 CET 2010


galileo228 wrote:
> [code]
> fileHandle = open('/Users/Matt/Documents/python/results.txt','r')
> names = fileHandle.readlines()
> [/code]
> 
> Now, the 'names' list has values looking like this: ['aaa12 at domain.com
> \n', 'bbb34 at domain.com\n', etc]. So I ran the following code:
> 
> [code]
> for x in names:
>     st_list.append(x.replace('@domain.com\n',''))
> [/code]
> 
> And that did the trick! 'Names' now has ['aaa12', 'bbb34', etc].
> 
> Obviously this only worked because all of the domain names were the
> same. If they were not then based on your comments and my own
> research, I would've had to use regex and the split(), which looked
> massively complicated to learn.

The complexities stemmed from several factors that, with more 
details, could have made the solutions less daunting:

   (a) you mentioned "finding" the email addresses -- this makes 
it sound like there's other junk in the file that has to be 
sifted through to find "things that look like an email address". 
If the sole content of the file is lines containing only email 
addresses, then "find the email address" is a bit like [1]

   (b) you omitted the detail that the domains are all the same. 
  Even if they're not the same, (a) reduces the problem to a much 
easier task:

   s = set()
   for line in file('results.txt'):
     s.add(line.rsplit('@', 1)[0].lower())
   print s

If it was previously a CSV or tab-delimited file, Python offers 
batteries-included processing to make it easy:

   import csv
   f = file('results.txt', 'rb')
   r = csv.DictReader(f)  # CSV
   # r = csv.DictReader(f, delimiter='\t') # tab delim
   s = set()
   for row in r:
     s.add(row['Email'].lower())
   f.close()

or even

   f = file(...)
   r = csv.DictReader(...)
   s = set(row['Email'].lower() for row in r)
   f.close()

Hope this gives you more ideas to work with.

-tkc

[1]
http://jacksmix.files.wordpress.com/2007/05/findx.jpg






More information about the Python-list mailing list