Unwanted Spaces and Iterative Loop
matt.s.marotta at gmail.com
matt.s.marotta at gmail.com
Sun Jan 26 20:15:20 EST 2014
On Sunday, 26 January 2014 19:40:26 UTC-5, Steven D'Aprano wrote:
> On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote:
>
>
>
> > I have been working on a python script that separates mailing addresses
>
> > into different components.
>
> >
>
> > Here is my code:
>
> >
>
> > inFile = "directory"
>
> > outFile = "directory"
>
> > inHandler = open(inFile, 'r')
>
> > outHandler = open(outFile, 'w')
>
>
>
> Are you *really* opening the same file for reading and writing at the
>
> same time?
>
>
>
> Even if your operating system allows that, surely it's not a good idea.
>
> You might get away with it for small files, but at some point you're
>
> going to run into weird, hard-to-diagnose bugs.
>
>
>
>
>
> > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir
>
> \tCity\tProvince\tPostalCode")
>
>
>
> This looks like a CSV file using tabs as the separator. You really ought
>
> to use the csv module.
>
>
>
> http://docs.python.org/3/library/csv.html
>
> http://docs.python.org/2/library/csv.html
>
>
>
> http://pymotw.com/2/csv/
>
>
>
>
>
> > for line in inHandler:
>
> > str = line.replace("FarmID\tAddress", " ")
>
> > outHandler.write(str[0:-1])
>
> > str = str.replace(" ","\t", 1)
>
> > str = str.replace(" Rd,","\tRd\t\t")
>
> > str = str.replace(" Rd","\tRd\t")
>
> > str = str.replace("Ave,","\tAve\t\t")
>
> > str = str.replace("Ave","\tAve\t\t")
>
> > str = str.replace("St ","\tSt\t\t")
>
> > str = str.replace("St,","\tSt\t\t")
>
> > str = str.replace("Dr,","\tDr\t\t")
>
> [snip additional string manipulations]
>
> > str = str.replace(",","\t")
>
> > str = str.replace(" ON","ON\t")
>
> > outHandler.write(str)
>
>
>
>
>
> Aiy aiy aiy, what a mess! I get a headache just trying to understand it!
>
>
>
> The first question that comes to mind is that you appear to be writing
>
> each input line *twice*, first after a very minimal set of string
>
> manipulations (you convert the literal string "FarmID\tAddress" to a
>
> space, then write the whole line out), the second time after a whole mess
>
> of string replacements. Why?
>
>
>
> If the sample data you show below is accurate, I *think* what you are
>
> trying to do is simply suppress the header line. The first line in the
>
> input file is:
>
>
>
> FarmID Address
>
>
>
> and rather than write that you want to write a space. I don't know why
>
> you want the output file to begin with a space, but this would be better:
>
>
>
> for line in inHandler:
>
> line = line.strip() # Remove any leading and trailing whitespace,
>
> # including the trailing newline. Later, we'll add a newline
>
> # back in.
>
> if line == "FarmID\tAddress":
>
> outHandler.write(" ") # Write a mysterious space.
>
> continue # And skip to the next line.
>
> # Now process the non-header lines.
>
>
>
>
>
> Now, as far as the non-header lines, you do a whole lot of complex string
>
> manipulations, replacing chunks of text with or without tabs or commas to
>
> the same text with or without tabs but in a different order. The logic of
>
> these manipulations completely escape me: what are you actually trying to
>
> do here?
>
>
>
> I *strongly* suggest that you don't try to implement your program logic
>
> in the form of string manipulations. According to your sample data, your
>
> data looks like this:
>
>
>
> 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
>
>
>
> i.e.
>
>
>
> farmId TAB address COMMA district COMMA postcode
>
>
>
> It is much better to pull the line apart into named components,
>
> manipulate the components directly, then put it back together in the
>
> order you want. This makes the code more understandable, and easier to
>
> change if you ever need to change things.
>
>
>
> for line in inHandler:
>
> line = line.strip()
>
> if line == "FarmID\tAddress":
>
> outHandler.write(" ") # Write a mysterious space.
>
> continue
>
> # Now process the non-header lines.
>
> farmid, address = line.split("\t")
>
> farmid = farmid.strip()
>
> address, district, postcode = address.split(",")
>
> address = address.strip()
>
> district = district.strip()
>
> postcode = postcode.strip()
>
> # Now process the fields however you like.
>
> parts_of_address = address.split(" ")
>
> street_number = parts_of_address[0] # first part
>
> street_type = parts_of_address[-1] # last part
>
> street_name = parts_of_address[1:-1] # everything else
>
> street_name = " ".join(street_name)
>
>
>
> and so on for the post code. Then, at the very end, assemble the parts
>
> you want to write out, join them with tabs, and write:
>
>
>
> fields = [farmid, street_number, street_name, street_type, ... ]
>
> outHandler.write("\t".join(fields))
>
> outHandler.write("\n")
>
>
>
>
>
> Or use the csv module to do the actual writing. It will handle escaping
>
> anything that needs escaping, newlines, tabs, etc.
>
>
>
>
>
>
>
> --
>
> Steven
I`m not reading and writing to the same file, I just changed the actual paths to directory.
This is for a school assignment, and we haven`t been taught any of the stuff you`re talking about. Although I appreciate your help, everything needs to stay as is and I just need to create the loop to get rid of the farmID from the end of the postal codes.
More information about the Python-list
mailing list