Stuck on a three word street name regex

Paul Rubin no.email at nospam.invalid
Thu Jan 28 01:35:27 CET 2010


Brian D <briandenzer at gmail.com> writes:
> I've tackled this kind of problem before by looping through a patterns
> dictionary, but there must be a smarter approach.>
> Two addresses. Note that the first has incorrectly transposed the
> direction and street name. ....

If you're really serious about it (e.g. you are the post office trying
to program automatic mail sorting machines) there is no simple regex
trick anything like what you want.  A lot of addresses will be
ambiguous.  You have use all the info you have about your entire address
corpus (e.g. you need a complete street directory of the whole US) and
do a bunch of Bayesian inference.  As a very simple example, for an
address like "1000 RAMPART S ST" you'd use the zip code to identify the
address's geographic neighborhood, and then use your street directory to
find candidate correct addresses within that zip code.  The USPS does
an amazing job of delivering mail to completely mangled addresses
based on methods like that.



More information about the Python-list mailing list