My Big Dict.
Christophe Delord
christophe.delord at free.fr
Wed Jul 2 01:37:35 EDT 2003
Hello,
On Wed, 2 Jul 2003 00:13:26 -0400, Xavier wrote:
> Greetings,
>
> (do excuse the possibly comical subject text)
>
> I need advice on how I can convert a text db into a dict. Here is an
> example of what I need done.
>
> some example data lines in the text db goes as follows:
>
> CODE1!DATA1 DATA2, DATA3
> CODE2!DATA1, DATA2 DATA3
>
> As you can see, the lines are dynamic and the data are not alike, they
> change in permission values (but that's obvious in any similar
> situation)
>
> Any idea on how I can convert 20,000+ lines of the above into the
> following protocol for use in my code?:
>
> TXTDB = {'CODE1': 'DATA1 DATA2, DATA3', 'CODE2': 'DATA1, DATA2 DATA3'}
>
> I was thinking of using AWK or something to the similar liking but I
> just wanted to check up with the list for any faster/sufficient hacks
> in python to do such a task.
If your data is in a string you can use a regular expression to parse
each line, then the findall method returns a list of tuples containing
the key and the value of each item. Finally the dict class can turn this
list into a dict. For example:
data_re = re.compile(r"^(\w+)!(.*)", re.MULTILINE)
bigdict = dict(data_re.findall(data))
On my computer the second line take between 7 and 8 seconds to parse
100000 lines.
Try this:
------------------------------
import re
import time
N = 100000
print "Initialisation..."
data = "".join(["CODE%d!DATA%d_1, DATA%d_2, DATA%d_3\n"%(i,i,i,i) for i
in range(N)])
data_re = re.compile(r"^(\w+)!(.*)", re.MULTILINE)
print "Parsing..."
start = time.time()
bigdict = dict(data_re.findall(data))
stop = time.time()
print "%s items parsed in %s seconds"%(len(bigdict), stop-start)
------------------------------
>
> Thanks.
>
> -- Xavier.
>
> oderint dum mutuant
>
>
>
--
(o_ Christophe Delord __o
//\ http://christophe.delord.free.fr/ _`\<,_
V_/_ mailto:christophe.delord at free.fr (_)/ (_)
More information about the Python-list
mailing list