My Big Dict.

Wed Jul 2 01:37:35 EDT 2003

Hello,

On Wed, 2 Jul 2003 00:13:26 -0400, Xavier wrote:

> Greetings,
> 
> (do excuse the possibly comical subject text)
> 
> I need advice on how I can convert a text db into a dict.  Here is an
> example of what I need done.
> 
> some example data lines in the text db goes as follows:
> 
> CODE1!DATA1 DATA2, DATA3
> CODE2!DATA1, DATA2 DATA3
> 
> As you can see, the lines are dynamic and the data are not alike, they
> change in permission values (but that's obvious in any similar
> situation)
> 
> Any idea on how I can convert 20,000+ lines of the above into the
> following protocol for use in my code?:
> 
> TXTDB = {'CODE1': 'DATA1 DATA2, DATA3', 'CODE2': 'DATA1, DATA2 DATA3'}
> 
> I was thinking of using AWK or something to the similar liking but I
> just wanted to check up with the list for any faster/sufficient hacks
> in python to do such a task.

If your data is in a string you can use a regular expression to parse
each line, then the findall method returns a list of tuples containing
the key and the value of each item. Finally the dict class can turn this
list into a dict. For example:

data_re = re.compile(r"^(\w+)!(.*)", re.MULTILINE)

bigdict = dict(data_re.findall(data))

On my computer the second line take between 7 and 8 seconds to parse
100000 lines.

Try this:

------------------------------
import re
import time

N = 100000

print "Initialisation..."
data = "".join(["CODE%d!DATA%d_1, DATA%d_2, DATA%d_3\n"%(i,i,i,i) for i
in range(N)])

data_re = re.compile(r"^(\w+)!(.*)", re.MULTILINE)

print "Parsing..."
start = time.time()
bigdict = dict(data_re.findall(data))
stop = time.time()

print "%s items parsed in %s seconds"%(len(bigdict), stop-start)
------------------------------

> 
> Thanks.
> 
> -- Xavier.
> 
> oderint dum mutuant
> 
> 
> 

-- 

(o_   Christophe Delord                     __o
//\   http://christophe.delord.free.fr/   _`\<,_
V_/_  mailto:christophe.delord at free.fr   (_)/ (_)