[Tutor] Careful Dictionary Building
Michael Langford
mlangford.cs03 at gtalumni.org
Fri Dec 28 18:49:00 CET 2007
This functionality already exists in the ever so useful defaultdict object.
You pass a factory method to the constructor of defaultdict for an object,
and it returns a new object when there is no key:
from collections import defaultdict
mydict = defaultdict(list)
for record in mylist:
mydict[ record[0] ].append( record )
defaultdict is usually good enough for datasets I've used it for.
--Michael
On 12/28/07, doug shawhan <doug.shawhan at gmail.com> wrote:
>
> *sigh* Ignore folks. I had forgotten about .has_key().
>
>
>
> On Dec 28, 2007 11:22 AM, doug shawhan <doug.shawhan at gmail.com> wrote:
>
> > I'm building a dictionary from a list with ~ 1M records.
> >
> > Each record in the list is itself a list.
> > Each record in the list has a line number, (index 0) which I wish to use
> > as a dictionary key.
> >
> > The problem: It is possible for two different records in the list to
> > share this line number. If they do, I want to append the record to the value
> > in the dictionary.
> >
> > The obvious (lazy) method of searching for doubled lines requires
> > building and parsing a key list for every record. There must be a better
> > way!
> >
> > dict = {}
> > for record in list
> > if record[0] in dict.keys ():
> > dict[ record[0] ].append( record )
> > else:
> > dict[ record[0] ] = [record]
> >
> > Once you get ~ 80,000 records it starts slowing down pretty badly (I
> > would too ...).
> >
> > Here's hoping there is a really fast, pythonic way of doing this!
> >
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
--
Michael Langford
Phone: 404-386-0495
Consulting: http://www.RowdyLabs.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071228/1bc904aa/attachment.htm
More information about the Tutor
mailing list