[Tutor] (de)serialization questions

Albert-Jan Roskam fomcl at yahoo.com
Sun Oct 3 21:57:12 CEST 2010

Hi Lee, Alan and Steven,

Thank you very much for your replies!

First, Lee:
>> That does not seem like it will work. What happens when
>> 2 addresses have the same zip code?

--> Sorry I didn't answer that before. When the zipcode is known, that's not a 
problem. The data typist simply has to enter the zip code and the street number 
and voilà, the street name and city name appear. A big time saver. When the 
zipcode is the UNknown, indeed I need street name, apt number, and city to get 
the right zip code. Without the street number, I might end up with a list of zip 
codes. But having no street number would automatically invalidate the given 
address. We couldn't possibly mail a letter without having the apt. number!

I just ordered a book on sqlite this morning 
 It indeed seems like the way to go, also in the wider context of the program. 
It makes much more sense to maintain one database table instead of 3 csv files 
for the three data typists' output.

Alan: I forwarded your book to my office address. I'll print and read it!
Btw, your private website is nice too. Nice pictures! Do you recognize where 
this was taken:http://yfrog.com/n0scotland046j .You're lucky to live in a 
beautiful place like Scotland 


All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have the 
Romans ever done for  us?

From: Lee Harr <missive at hotmail.com>
To: tutor at python.org
Sent: Sat, October 2, 2010 12:56:21 AM
Subject: Re: [Tutor] (de)serialization questions

>>> I have data about zip codes, street and city names (and perhaps later also 
>>> street numbers). I made a dictionary of the form {zipcode: (street, city)}
>> One dictionary with all of the data?
>> That does not seem like it will work. What happens when
>> 2 addresses have the same zip code?

You did not answer this question.

Did you think about it?

> Maybe my main question is as follows: what permanent object is most suitable 
> store a large amount of entries (maybe too many to fit into the computer's
> memory), which can be looked up very fast.

One thing about Python is that you don't normally need to
think about how your objects are stored (memory management).

It's an advantage in the normal case -- you just use the most
convenient object, and if it's fast enough and small enough
you're good to  go.

Of course, that means that if it is not fast enough, or not
small enough, then you've got to do a bit more work to do.

> Eventually, I want to create two objects:
> 1-one to look up street name and city using zip code

So... you want to have a function like:

def addresses_by_zip(zipcode):
    '''returns list of all addresses in the given zipcode'''

> 2-one to look up zip code using street name, apartment number and city

and another one like:

def zip_by_address(street_name, apt, city):
    '''returns the zipcode for the given street name, apartment, and city'''

To me, it sounds like a job for a database (at least behind the scenes),
but you could try just creating a custom Python object that holds
these things:

class Address(object):
    street_number  = '345'
    street_name = 'Main St'
    apt = 'B'
    city = 'Springfield'
    zipcode = '99999'

Then create another object that holds a collection of these addresses
and has methods addresses_by_zip(self, zipcode) and
zip_by_address(self, street_number, street_name, apt, city)

> I stored object1 in a marshalled dictionary. Its length is about 450.000 (I 
> in Holland, not THAT many streets). Look-ups are incredibly fast (it has to,
> because it's part of an autocompletion feature of a data entry program). I
> haven't got the street number data needed for object2 yet, but it's going to 
> much larger. Many streets have different zip codes for odd or even numbers, or
> the zip codes are divided into street number ranges (for long streets).

Remember that you don't want to try to optimize too soon.

Build a  simple working system and see what happens. If it
is too slow or takes up too much memory, fix it.

> You suggest to simply use a file. I like simple solutions, but doesn't that, 
> definition, require a slow, linear search?

You could create an index, but then any database will already have
an indexing function built in.

I'm not saying that rolling your own custom database is a bad idea,
but if you are trying to get some work done (and not just playing around
and learning Python) then it's probably better to use something that is
already proven to work.

If you have some code you are trying out, but are not sure you
are going the right way, post it and let people take a look at it.

Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101003/f9d18dc9/attachment.html>

More information about the Tutor mailing list