Parse bug text file

Terry Reedy tjreedy at udel.edu
Sun Jul 27 21:15:56 CEST 2014


On 7/27/2014 2:08 PM, CM wrote:
> I have a big text file of bugs that I want to use Python to parse
> such that the bugs can be neatly filed into a database. I can bumble
> toward a solution with looping but feel this is a classic example of
> reinventing the wheel, and yet I'm finding it hard to Google for.
>
> Basically the file is structured like this (silly examples, of
> course), with each of these three lets call a "bug block":
>
>
> - BUG 2.13.14  When you wear a purple hat, the application locks up.
> If you sing the theme to "The Love Boat", the application becomes
> available again.
>
> - ISSUE 2.13.14  During thunderstorms, the application runs
> backwards.
>
> - BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
> That's too bad.
>
>
> Generally, every bug block starts with a "-" as the first character,

I will assume 'always'

> then some words in all caps, a date in that format, and then the
> descriptive text. There is always a blank line in between bug blocks,
> but sometimes there may be a blank line within the bug description as
> well.
>
> The goal is to grab each bug block, clean up that text (there are CRs
> in it, etc., but I can do that), and dump it into a database record
> (the db stuff I can do).  Grabbing the date along the way would be
> wonderful as well.
>
> I can go through it with opening the text file and reading in the
> lines, and if the first character is a "-" then count that as the
> start of a bug block, but I am not sure how to find the last line of
> a bug block...it would be the line before the first line of the next
> bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in
> Python, and I'm requesting pointers toward that standard way (or what
> this type of task is usually called).  Thanks.

Split the processing into two phases: generating individual bugs and 
processing each bug. Here is a prototype.

with open(bugfile) as f:
     for bug in bugs(f):
         process(bug)

Here are two examples of the first phase. Use the second for a big file. 
  (If individual bugs are more than a few lines, I would collect lines 
in the generator in a list and use ''.join(<list>)).

bugtext = '''\
- BUG 2.13.14  When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat",
the application becomes available again.

- ISSUE 2.13.14  During thunderstorms, the application runs backwards.

- BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
That's too bad
'''

buglist1 = [bug.strip().replace('\n', '') for bug in 
bugtext[1:].split('\n-')]
for bug in buglist1: print(bug)

def bugs(lines):
     lines = iter(lines)
     bug = next(lines)[1:]
     for line in lines:
         if line[:1] != '-':
             bug += line
         else:
             yield bug.strip()
             bug = line[1:]
     yield bug.strip()


buglist2 = [bug for bug in bugs(bugtext.splitlines())]
for bug in buglist2: print(bug)
print(buglist1 == buglist2)

 >>>
BUG 2.13.14  When you wear a purple hat, the application locks up.If you 
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.That's 
too bad
BUG 2.13.14  When you wear a purple hat, the application locks up.If you 
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.That's 
too bad
True

Now write process(bug)

-- 
Terry Jan Reedy




More information about the Python-list mailing list