[Tutor] (de)serialization questions

Lee Harr missive at hotmail.com
Sat Oct 2 00:56:21 CEST 2010


>>> I have data about zip codes, street and city names (and perhaps later also of
>>> street numbers). I made a dictionary of the form {zipcode: (street, city)}
>>
>> One dictionary with all of the data?
>>
>> That does not seem like it will work. What happens when
>> 2 addresses have the same zip code?

You did not answer this question.

Did you think about it?


> Maybe my main question is as follows: what permanent object is most suitable to
> store a large amount of entries (maybe too many to fit into the computer's
> memory), which can be looked up very fast.

One thing about Python is that you don't normally need to
think about how your objects are stored (memory management).

It's an advantage in the normal case -- you just use the most
convenient object, and if it's fast enough and small enough
you're good to go.

Of course, that means that if it is not fast enough, or not
small enough, then you've got to do a bit more work to do.


> Eventually, I want to create two objects:
> 1-one to look up street name and city using zip code

So... you want to have a function like:

def addresses_by_zip(zipcode):
    '''returns list of all addresses in the given zipcode'''
    ....


> 2-one to look up zip code using street name, apartment number and city

and another one like:

def zip_by_address(street_name, apt, city):
    '''returns the zipcode for the given street name, apartment, and city'''
    ....


To me, it sounds like a job for a database (at least behind the scenes),
but you could try just creating a custom Python object that holds
these things:

class Address(object):
    street_number = '345'
    street_name = 'Main St'
    apt = 'B'
    city = 'Springfield'
    zipcode = '99999'

Then create another object that holds a collection of these addresses
and has methods addresses_by_zip(self, zipcode) and
zip_by_address(self, street_number, street_name, apt, city)


> I stored object1 in a marshalled dictionary. Its length is about 450.000 (I live
> in Holland, not THAT many streets). Look-ups are incredibly fast (it has to,
> because it's part of an autocompletion feature of a data entry program). I
> haven't got the street number data needed for object2 yet, but it's going to be
> much larger. Many streets have different zip codes for odd or even numbers, or
> the zip codes are divided into street number ranges (for long streets).

Remember that you don't want to try to optimize too soon.

Build a simple working system and see what happens. If it
is too slow or takes up too much memory, fix it.


> You suggest to simply use a file. I like simple solutions, but doesn't that, by
> definition, require a slow, linear search?

You could create an index, but then any database will already have
an indexing function built in.

I'm not saying that rolling your own custom database is a bad idea,
but if you are trying to get some work done (and not just playing around
and learning Python) then it's probably better to use something that is
already proven to work.


If you have some code you are trying out, but are not sure you
are going the right way, post it and let people take a look at it.

 		 	   		  


More information about the Tutor mailing list