How to efficiently extract information from structured text file
Rhodri James
rhodri at wildebst.demon.co.uk
Tue Feb 16 19:29:44 EST 2010
On Tue, 16 Feb 2010 23:48:17 -0000, Imaginationworks <xiajunyi at gmail.com>
wrote:
> Hi,
>
> I am trying to read object information from a text file (approx.
> 30,000 lines) with the following format, each line corresponds to a
> line in the text file. Currently, the whole file was read into a
> string list using readlines(), then use for loop to search the "= {"
> and "};" to determine the Object, SubObject,and SubSubObject. My
> questions are
>
> 1) Is there any efficient method that I can search the whole string
> list to find the location of the tokens(such as '= {' or '};'
The usual idiom is to process a line at a time, which avoids the memory
overhead of reading the entire file in, creating the list, and so on.
Assuming your input file is laid out as neatly as you said, that's
straightforward to do:
for line in myfile:
if "= {" in line:
start_a_new_object(line)
elif "};" in line:
end_current_object(line)
else:
add_stuff_to_current_object(line)
You probably want more robust tests than I used there, but that depends on
how well-defined your input file is. If it can be edited by hand, you'll
need to be more defensive!
> 2) Is there any efficient ways to extract the object information you
> may suggest?
That depends on what you mean by "extract the object information". If you
mean "get the object name", just split the line at the "=" and strip off
the whitespace you don't want. If you mean "track how objects are
connected to one another, have each object keep a list of its immediate
sub-objects (which will have lists of their immediate sub-objects, and so
on); it's fairly easy to keep track of which objects are current using a
list as a stack. If you mean something else, sorry but my crystal ball is
cloudy tonight.
--
Rhodri James *-* Wildebeeste Herder to the Masses
More information about the Python-list
mailing list