[Tutor] help with refactoring needed -- which approach is more
Pythonic?
Brian van den Broek
bvande at po-box.mcgill.ca
Wed Feb 9 23:05:44 CET 2005
Hi all,
I have data files with a format that can be scheamatized as:
File Header Contents
. . .
File Header End Tag
Node Header Contents
. . .
Node Header End Tag
Node Contents
. . .
Node End Tag
[Repeat Node elements until end of file]
I'm refactoring the heck out of a file conversion utility I wrote for
this format back when I knew even less than I do now =:-0
The main change in refactoring is moving it to OOP. I have a method
that serves as the entry point for parsing the files. It separates the
file header content and the nodes (or body content), sending them each
to appropriate methods to be processed.
I want the body parser to accept a list of lines corresponding to the
nodes portions of my file, separate out each node (everything between
node end tags, the bottommost end tag included in the node), and
send each node's contents to a further method for processing. What I
have now works and is a big improvement on what I had before. But, I
know that I tend to employ while loops more than I perhaps ought, and
much of the style of OOP has yet to sink in. So, any suggestions on
how to make this method more Pythonic would be most welcome.
(body_contents is a list of file lines, with all file header lines
removed.)
. def body_parser(self, body_contents):
.
. while body_contents:
.
. count = 0
. current_node_contents = []
.
. for line in body_contents:
. current_node_contents.append(line)
. count += 1
. if line == node_end_tag: # node_end_tag elsewhere
. break # defined and includes '\n'
.
. self.node_parser(current_node_contents)
. body_contents = body_contents[count:]
Another alternative has occurred to me, but seems to compensate for
the avoidance of while by being ugly. Untested code:
. def alt_body_parser(self, body_contents):
.
. body_contents = ''.join(body_contents)
. body_contents = body_contents.split(node_end_tag)
.
. # ugly lives here -- having removed node_end_tag's
. # with split, I need to put them back on:
. count = 0
. for i in body_contents:
. body_contents[count] = i + node_end_tag
. count += 1
. # (The sub-alternative of having the node_parser method
. # put them back, while easier, also seems a dangerous
. # separation of responsibility for the integrity of the data
. # format.)
.
. for i in body_contents:
. self.node_parser(i)
So, which of these 2 (and a half) ways seems most Pythonic to the more
experienced? Any better ways I've overlooked?
Thanks, and best to all,
Brian vdB
More information about the Tutor
mailing list