[Tutor] Is there a better way?
Joel Goldstick
joel.goldstick at gmail.com
Wed Jan 11 13:57:26 CET 2012
On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza <marco.vincenzo at gmail.com> wrote:
> Hello,
>
> I've been slowly teaching myself python, using it for small projects when it
> seems appropriate. In this case, I was handed a list of email addresses for
> a mailing but some of them had been truncated. There are only 21 possible
> email "suffixes" so I planned to just identify which it should be and then
> replace it. However, when I started writing the code I realized that I'd be
> doing a lot of "repeating". Is there a better way to "fix" the suffixes
> without doing each individually? Here's my working code (for 4 colleges):
>
> import re
> with file('c:\python27\mvc\mailing_list.txt', 'r') as infile:
> outlist = []
> for line in infile.read().split('\n'):
> if line.rstrip().lower().endswith('edu'):
> newline = line + '\n'
> outlist.append(newline.lower())
> elif re.search("@bar", line):
> newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n'
> outlist.append(newline.lower())
> elif re.search("@bcc", line):
> newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n'
> outlist.append(newline.lower())
> elif re.search("@bmc", line):
> newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n'
> outlist.append(newline.lower())
> elif re.search("@leh", line):
> newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n'
> outlist.append(newline.lower())
>
> with file('c:\python27\mvc\output.txt','w') as outfile:
> outfile.writelines(outlist)
>
> Thanks,
> Marco
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
First, look here about reading files:
http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects
I like this better:
f = open('filename', 'r')
for line in f:
print line # this will give you one line at a time without
the trailing newline
Second, make a dictionary of with the key being what comes after the @
in your truncated file. The value will be the complete text you want:
d = {"bcc" : "bcc.cuny.edu", etc. }
Third, use line.split('@') to split the line into what comes before
and after the @ sign. It will return a list
address_parts = line.split('@')
address_parts[0] is what you want to keep as is. I'm guessing that the
3 characters after the @ will be enough to identify what the full
address should look like, so
if address_parts[1][0:3] in d:
result = '@'.join([address_parts[0], d[address_parts[1][0:3]])
write the result to your out file.
Its early in the morning for me, and this is untested, but it might
give you some ideas.
--
Joel Goldstick
More information about the Tutor
mailing list