newbie question: parsing street name from address

John Machin sjmachin at lexicon.net
Thu Jun 21 18:03:39 EDT 2007


On Jun 22, 4:43 am, Eric <ven... at gmail.com> wrote:
> On Jun 21, 9:47 am, cjl <cjl... at gmail.com> wrote:
>
>
>
> > P:
>
> > I am working on a project that requires geocoding, and have written a
> > very simple geocoder that uses the Google service.
>
> > I would like to be able to extract the name of the street from the
> > addresses in my data, however they vary significantly. Here a some
> > examples:
>
> > 25 Main St
> > 2500 14th St
> > 12 Bennet Pkwy
> > Pearl St
> > Bennet Rd and Main st
> > 19th St
>
> > As you can see, sometimes I have the house number, and sometimes I do
> > not. Sometimes the street name is a number. Sometimes I simply have
> > the names of intersecting streets.
>
> > I would like to be able to parse the above into the following:
>
> > Main St
> > 14th St
> > Bennet Pkwy
> > Pearl St
> > Bennet Rd
> > Main St
> > 19th St
>
> > How might I approach this complex parsing problem?
>
> > -CJL
>
> You might be able to use consistencies in your data to make this
> simpler.  If the examples you have there are representative, it looks
> like what you should do is look for a word like 'St' or 'Rd' and then
> return that word and the previous word.

The OP's data already contains
    [corner|cnr [of]] Foo Rd and|& Bar St
and real world data will contain things like
    1234 John F Kennedy Memorial Drive
    456 Broadway

As Paul wrote, "Parsing street addresses is a very complex parsing
problem", even when you restrict yourself to one mostly-English-
speaking country. Software written under such restrictions rapidly
breaks down elsewhere (Rue de la Paix, Wilhelmstrasse, Avenida 9 de
Julio, etc) and blows up altogether when street names aren't used in
postal addresses (e.g. Japan).




More information about the Python-list mailing list