Parsing for email addresses
python.list at tim.thechases.com
Tue Feb 16 21:15:30 CET 2010
> fileHandle = open('/Users/Matt/Documents/python/results.txt','r')
> names = fileHandle.readlines()
> Now, the 'names' list has values looking like this: ['aaa12 at domain.com
> \n', 'bbb34 at domain.com\n', etc]. So I ran the following code:
> for x in names:
> And that did the trick! 'Names' now has ['aaa12', 'bbb34', etc].
> Obviously this only worked because all of the domain names were the
> same. If they were not then based on your comments and my own
> research, I would've had to use regex and the split(), which looked
> massively complicated to learn.
The complexities stemmed from several factors that, with more
details, could have made the solutions less daunting:
(a) you mentioned "finding" the email addresses -- this makes
it sound like there's other junk in the file that has to be
sifted through to find "things that look like an email address".
If the sole content of the file is lines containing only email
addresses, then "find the email address" is a bit like 
(b) you omitted the detail that the domains are all the same.
Even if they're not the same, (a) reduces the problem to a much
s = set()
for line in file('results.txt'):
If it was previously a CSV or tab-delimited file, Python offers
batteries-included processing to make it easy:
f = file('results.txt', 'rb')
r = csv.DictReader(f) # CSV
# r = csv.DictReader(f, delimiter='\t') # tab delim
s = set()
for row in r:
f = file(...)
r = csv.DictReader(...)
s = set(row['Email'].lower() for row in r)
Hope this gives you more ideas to work with.
More information about the Python-list