Parsing for email addresses
mattbarkan at
Tue Feb 16 19:07:57 EST 2010
Tim -
Thanks for this. I actually did intend to have to sift through other
junk in the file, but then figured I could just cut and paste emails
directly from the 'to' field, thus making life easier.
Also, in this particular instance, the domain names were the same, and
thus I was able to figure out my solution, but I do need to know how
to handle the same situation when the domain names are different, so
your response was most helpful.
Apologies for leaving out some details.
On Feb 16, 3:15 pm, Tim Chase <python.l... at> wrote:
> galileo228 wrote:
> > [code]
> > fileHandle = open('/Users/Matt/Documents/python/results.txt','r')
> > names = fileHandle.readlines()
> > [/code]
> > Now, the 'names' list has values looking like this: ['aa... at
> > \n', 'bb... at\n', etc]. So I ran the following code:
> > [code]
> > for x in names:
> > st_list.append(x.replace('... at\n',''))
> > [/code]
> > And that did the trick! 'Names' now has ['aaa12', 'bbb34', etc].
> > Obviously this only worked because all of the domain names were the
> > same. If they were not then based on your comments and my own
> > research, I would've had to use regex and the split(), which looked
> > massively complicated to learn.
> The complexities stemmed from several factors that, with more
> details, could have made the solutions less daunting:
> (a) you mentioned "finding" the email addresses -- this makes
> it sound like there's other junk in the file that has to be
> sifted through to find "things that look like an email address".
> If the sole content of the file is lines containing only email
> addresses, then "find the email address" is a bit like [1]
> (b) you omitted the detail that the domains are all the same.
> Even if they're not the same, (a) reduces the problem to a much
> easier task:
> s = set()
> for line in file('results.txt'):
> s.add(line.rsplit('@', 1)[0].lower())
> print s
> If it was previously a CSV or tab-delimited file, Python offers
> batteries-included processing to make it easy:
> import csv
> f = file('results.txt', 'rb')
> r = csv.DictReader(f) # CSV
> # r = csv.DictReader(f, delimiter='\t') # tab delim
> s = set()
> for row in r:
> s.add(row['Email'].lower())
> f.close()
> or even
> f = file(...)
> r = csv.DictReader(...)
> s = set(row['Email'].lower() for row in r)
> f.close()
> Hope this gives you more ideas to work with.
> -tkc
> [1]
More information about the Python-list
mailing list