My Big Dict.

Wed Jul 2 05:54:57 EDT 2003

> "Christophe Delord" <christophe.delord at free.fr> wrote in message
> news:20030702073735.40293ba2.christophe.delord at free.fr...
> > Hello,
> >
> > On Wed, 2 Jul 2003 00:13:26 -0400, Xavier wrote:
> >
> > > Greetings,
> > >
> > > (do excuse the possibly comical subject text)
> > >
> > > I need advice on how I can convert a text db into a dict.  Here is an
> > > example of what I need done.
> > >
> > > some example data lines in the text db goes as follows:
> > >
> > > CODE1!DATA1 DATA2, DATA3
> > > CODE2!DATA1, DATA2 DATA3
> > >
> > > As you can see, the lines are dynamic and the data are not alike, they
> > > change in permission values (but that's obvious in any similar
> > > situation)
> > >
> > > Any idea on how I can convert 20,000+ lines of the above into the
> > > following protocol for use in my code?:
> > >
> > > TXTDB = {'CODE1': 'DATA1 DATA2, DATA3', 'CODE2': 'DATA1, DATA2 DATA3'}
> > >
> >
> > If your data is in a string you can use a regular expression to parse
> > each line, then the findall method returns a list of tuples containing
> > the key and the value of each item. Finally the dict class can turn this
> > list into a dict. For example:
>
> and you can kill a fly with a sledgehammer.  why not
>
> f = open('somefile.txt')
> d = {}
> l = f.readlines()
> for i in l:
>     a,b = i.split('!')
>     d[a] = b.strip()
>
> or am i missing something obvious? (b/t/w the above parsed 20000+ lines on
a
> celeron 500 in less than a second.)

Your code looks good Christophe.  Just two little things to be aware of:
1) if you use split like this, then each line must contain one and only one
'!', which means (in particular) that empy lines will bomb, and also data
must not contain any '!' or else you'll get an exception such as
"ValueError: unpack list of wrong size".   If your data may contain '!',
then consider slicing up each line in a different way.
2) if your file is really huge, then you may want to fill up your dictionary
as you're reading the file, instead of reading everything in a list and then
building your dictionary (hence using up twice the memory).

But apart from these details, I agree with Christophe that this is the way
to go.

Aurélien