[Tutor] help with refactoring needed -- which approach is more Pythonic?

Brian van den Broek bvande at po-box.mcgill.ca
Wed Feb 9 23:05:44 CET 2005


Hi all,

I have data files with a format that can be scheamatized as:

File Header Contents
. . .
File Header End Tag
Node Header Contents
. . .
Node Header End Tag
Node Contents
. . .
Node End Tag
[Repeat Node elements until end of file]

I'm refactoring the heck out of a file conversion utility I wrote for 
this format back when I knew even less than I do now =:-0

The main change in refactoring is moving it to OOP. I have a method 
that serves as the entry point for parsing the files. It separates the 
file header content and the nodes (or body content), sending them each 
to appropriate methods to be processed.

I want the body parser to accept a list of lines corresponding to the 
nodes portions of my file, separate out each node (everything between 
  node end tags, the bottommost end tag included in the node), and 
send each node's contents to a further method for processing. What I 
have now works and is a big improvement on what I had before. But, I 
know that I tend to employ while loops more than I perhaps ought, and 
much of the style of OOP has yet to sink in. So, any suggestions on 
how to make this method more Pythonic would be most welcome.

(body_contents is a list of file lines, with all file header lines 
removed.)

.    def body_parser(self, body_contents):
.
.        while body_contents:
.
.            count = 0
.            current_node_contents = []
.
.            for line in body_contents:
.                current_node_contents.append(line)
.                count += 1
.                if line == node_end_tag:  # node_end_tag elsewhere
.                    break                 # defined and includes '\n'
.
.            self.node_parser(current_node_contents)
.            body_contents = body_contents[count:]

Another alternative has occurred to me, but seems to compensate for 
the avoidance of while by being ugly. Untested code:

.    def alt_body_parser(self, body_contents):
.
.        body_contents = ''.join(body_contents)
.        body_contents = body_contents.split(node_end_tag)
.
.        # ugly lives here -- having removed node_end_tag's
.        # with split, I need to put them back on:
.        count = 0
.        for i in body_contents:
.            body_contents[count] = i + node_end_tag
.            count += 1
.        # (The sub-alternative of having the node_parser method
.        # put them back, while easier, also seems a dangerous
.        # separation of responsibility for the integrity of the data
.        # format.)
.
.        for i in body_contents:
.            self.node_parser(i)

So, which of these 2 (and a half) ways seems most Pythonic to the more 
experienced? Any better ways I've overlooked?

Thanks, and best to all,

Brian vdB



More information about the Tutor mailing list