Separate Address number and name
Anders Wegge Keller
wegge at wegge.dk
Tue Jan 21 20:04:49 EST 2014
Shane Konings <shane.konings at gmail.com> writes:
...
> The following is a sample of the data. There are hundreds of lines
> that need to have an automated process of splitting the strings into
> headings to be imported into excel with theses headings
> ID Address StreetNum StreetName SufType Dir City Province PostalCode
>
>
> 1 1067 Niagara Stone Rd, W, Niagara-On-The-Lake, ON L0S 1J0
> 2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3 25 Hunter Rd, Grimsby, E, ON L3M 4A3
> 4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6 500 Glenridge Ave, East, St. Catharines, ON L2S 3A1
> 7 471 Foss Rd, Pelham, ON L0S 1C0
> 8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9 3836 Main St, North, Lincoln, ON L0R 1S0
> 10 1025 York Rd, W, Niagara-On-The-Lake, ON L0S 1P0
The input doesn't look consistent to me. Is Dir supposed to be an
optional value? If that is the only optional, it can be worked
around. But if the missing direction (I'm guessing) is due to
malformed input data, you have a hell of a job in front of you.
What do you want to do with incomplete or malformed data? Try to
parse it as a "best effort", or simply spew out an error message for
an operator to look at?
In the latter case, I suggest a stepwise approach:
* Split input by ',' ->res0
* Split the first result by ' ' -> res
-> Id = res[0]
-> Address = res[1:]
-> StreetNum = res[1]
-> StreetName= res [2:]
-> SufType = res[-1]
* Check if res0[1] looks like a cardinal direction
If so Dir = res0[1]
Otherwise, croak or use the default direction. Insert an element in
the list, so the remainder is shifted to match the following steps.
-> City = res0[2]
* Split res0[3] by ' ' -> respp
respp[0] -> Province
respp[1:] -> Postcode
And put in som basic sanitation of the resulting values, before
committing them as a parsed result. Provinces and post codes, should
be easy enough to validate against a fixed list.
--
/Wegge
Leder efter redundant peering af dk.*,linux.debian.*
More information about the Python-list
mailing list