list vs. dict

John Machin sjmachin at
Wed Feb 27 19:11:10 EST 2002

Beda Kosata <kosatab at> wrote in message news:<3C7CF0EE.3010009 at>...
> Hi,
> for a piece of code I need to store relatively large amount of records 
> (100-1000).

Sorry, a 1000-record amount is relatively tiny.

> Each record will contain few (10-20) pieces of information 
> (mostly strings or integers).
> For convenience it would be better for me to make this records as dicts, 
> so I don't have to remember what data has what position in the list. 

There are *two* levels to think about: how you store the fields in
each record, and how you store the collection(s) of records

Fields in each record: for *convenience*, contemplate using a class,
not a dict.

>>> class Person(object):
...    __slots__ = ("name", "salary", "title", "empno")
>>> p1 = Person() # new record
>>> = "Attila the Hun"
>>> p1.salary = 1000000
>>> p1.title = "CEO"
>>> p1.empno = 666
>>> p2 = Person()
>>> = "Marmaduke Murgatroyd"
>>> p2.salary = 20000
>>> p2.tittle = "Clerk" # __slots__ gives you typo-checking on
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'Person' object has no attribute 'tittle'
>>> p2.title = "Clerk"
>>> p2.empno = 1
>>> for x in (p1, p2):
...   print, x.salary, x.title
Attila the Hun 1000000 CEO
Marmaduke Murgatroyd 20000 Clerk
>>> print ["false", "true"][p1.salary < p2.salary]

Now, how you store your collection(s) of records depends on many
things, but speed is *not* one of those things when you have only 1000

If the record has a unique key, such as an employee number, and you
need to access records by employee number, then you can set up a dict
with employee number as the key and the class instance as the value --
employee_dict[p1.empno] = p1 --

If you need merely to access the records sequentially, you can use a

employee_list = []
employee_dict = {}
for buffer in file(""):
   fld = buffer.rstrip().split("~") # assuming data separated by "~"
   p = Person() = fld[0]
   p.salary = int(fld[1])
   p.title = fld[2]
   p.empno = int(fld[3])
   # The above is where you do have to "remember" the
   # correspondence between positions and names
   # but this could be automated using the __slots__ and
   # a parallel sequence of functions to apply:
   # conv_funcs = (None, int, None, int)
   employee_dict[p.empno] = p
# note -- above needs much error checking and exception handling
# to make it robust -- e.g. salary or empno not an int, 
# empno not unique, too few/many fields in input file, ...

> However the most important for me is speed

Speed of what? Speed of running? Speed of implementing a robust
functional application? Speed of maintenance when bugs surface or
requirements change?

Hope this helps,

More information about the Python-list mailing list