[Tutor] how best to store and process varriable ammounts of paired
data
Brian van den Broek
bvande at po-box.mcgill.ca
Thu Apr 22 13:34:25 EDT 2004
Hi all,
I'm starting a project to write a bunch of functions for parsing the
datafiles of a particular application I use. Thanks to help from the group
I now understand how to work with files :-) but I have a question about
efficient storage of the information I extract.
I want to extract from the files two types of lines that always come in
pairs where one is a unique numerical id and the other a title string. The
id numbers are unique, but not necessarily in numerical order. The
associated strings are not necessarily unique. The lines are seperated by
varriable ammounts of data, but in each case, I am sure I know how to
extract only the information of interest. :-) After extraction I want to
do something with those pairs. My tasks will leave the pair invarriant in
the extracted data I store, so an immutable type is fine. There can be
anywhere from a small number of pairs to tens of thousands, depending on
the particular datafile in question.
I can think of three main methods for storing and using the extracted data.
1) Iterate over the file and build a dictionary as I go, using the
identified numerical ids as the keys, and the title strings as the values.
Then work by iteration over the dictionary keys.
2) Iterate the file and build two lists, one of the id's and one of the
title strings. Then:
a) make a dictionary from the lists and work with the dictionary, or
b) Just work from the lists themselves, iterating over the indices.
3) Parse the file, building tuples (id, string title) as I go and putting
them in a list. Then iterate over the list, and read each tupple value as
needed.
So, since there may be 1000's of (id, title) pairs, I am wanting to choose
the best method -- best here being defined as some compromise between high
speed and small memory footprint.
Pointers as to which method here listed would be the way to go? Or some
other way I've overlooked? Or, I am worrying about something that doesn't
really matter?
Perhaps my problem is, last name notwithstanding, I'm not Dutch:
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
;-)
Thanks and best to all,
Brian vdB
More information about the Tutor
mailing list