Unwanted Spaces and Iterative Loop
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Jan 26 19:40:26 EST 2014
On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote:
> I have been working on a python script that separates mailing addresses
> into different components.
>
> Here is my code:
>
> inFile = "directory"
> outFile = "directory"
> inHandler = open(inFile, 'r')
> outHandler = open(outFile, 'w')
Are you *really* opening the same file for reading and writing at the
same time?
Even if your operating system allows that, surely it's not a good idea.
You might get away with it for small files, but at some point you're
going to run into weird, hard-to-diagnose bugs.
> outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir
\tCity\tProvince\tPostalCode")
This looks like a CSV file using tabs as the separator. You really ought
to use the csv module.
http://docs.python.org/3/library/csv.html
http://docs.python.org/2/library/csv.html
http://pymotw.com/2/csv/
> for line in inHandler:
> str = line.replace("FarmID\tAddress", " ")
> outHandler.write(str[0:-1])
> str = str.replace(" ","\t", 1)
> str = str.replace(" Rd,","\tRd\t\t")
> str = str.replace(" Rd","\tRd\t")
> str = str.replace("Ave,","\tAve\t\t")
> str = str.replace("Ave","\tAve\t\t")
> str = str.replace("St ","\tSt\t\t")
> str = str.replace("St,","\tSt\t\t")
> str = str.replace("Dr,","\tDr\t\t")
[snip additional string manipulations]
> str = str.replace(",","\t")
> str = str.replace(" ON","ON\t")
> outHandler.write(str)
Aiy aiy aiy, what a mess! I get a headache just trying to understand it!
The first question that comes to mind is that you appear to be writing
each input line *twice*, first after a very minimal set of string
manipulations (you convert the literal string "FarmID\tAddress" to a
space, then write the whole line out), the second time after a whole mess
of string replacements. Why?
If the sample data you show below is accurate, I *think* what you are
trying to do is simply suppress the header line. The first line in the
input file is:
FarmID Address
and rather than write that you want to write a space. I don't know why
you want the output file to begin with a space, but this would be better:
for line in inHandler:
line = line.strip() # Remove any leading and trailing whitespace,
# including the trailing newline. Later, we'll add a newline
# back in.
if line == "FarmID\tAddress":
outHandler.write(" ") # Write a mysterious space.
continue # And skip to the next line.
# Now process the non-header lines.
Now, as far as the non-header lines, you do a whole lot of complex string
manipulations, replacing chunks of text with or without tabs or commas to
the same text with or without tabs but in a different order. The logic of
these manipulations completely escape me: what are you actually trying to
do here?
I *strongly* suggest that you don't try to implement your program logic
in the form of string manipulations. According to your sample data, your
data looks like this:
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
i.e.
farmId TAB address COMMA district COMMA postcode
It is much better to pull the line apart into named components,
manipulate the components directly, then put it back together in the
order you want. This makes the code more understandable, and easier to
change if you ever need to change things.
for line in inHandler:
line = line.strip()
if line == "FarmID\tAddress":
outHandler.write(" ") # Write a mysterious space.
continue
# Now process the non-header lines.
farmid, address = line.split("\t")
farmid = farmid.strip()
address, district, postcode = address.split(",")
address = address.strip()
district = district.strip()
postcode = postcode.strip()
# Now process the fields however you like.
parts_of_address = address.split(" ")
street_number = parts_of_address[0] # first part
street_type = parts_of_address[-1] # last part
street_name = parts_of_address[1:-1] # everything else
street_name = " ".join(street_name)
and so on for the post code. Then, at the very end, assemble the parts
you want to write out, join them with tabs, and write:
fields = [farmid, street_number, street_name, street_type, ... ]
outHandler.write("\t".join(fields))
outHandler.write("\n")
Or use the csv module to do the actual writing. It will handle escaping
anything that needs escaping, newlines, tabs, etc.
--
Steven
More information about the Python-list
mailing list