Parse bug text file
Terry Reedy
tjreedy at udel.edu
Sun Jul 27 15:15:56 EDT 2014
On 7/27/2014 2:08 PM, CM wrote:
> I have a big text file of bugs that I want to use Python to parse
> such that the bugs can be neatly filed into a database. I can bumble
> toward a solution with looping but feel this is a classic example of
> reinventing the wheel, and yet I'm finding it hard to Google for.
>
> Basically the file is structured like this (silly examples, of
> course), with each of these three lets call a "bug block":
>
>
> - BUG 2.13.14 When you wear a purple hat, the application locks up.
> If you sing the theme to "The Love Boat", the application becomes
> available again.
>
> - ISSUE 2.13.14 During thunderstorms, the application runs
> backwards.
>
> - BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.
> That's too bad.
>
>
> Generally, every bug block starts with a "-" as the first character,
I will assume 'always'
> then some words in all caps, a date in that format, and then the
> descriptive text. There is always a blank line in between bug blocks,
> but sometimes there may be a blank line within the bug description as
> well.
>
> The goal is to grab each bug block, clean up that text (there are CRs
> in it, etc., but I can do that), and dump it into a database record
> (the db stuff I can do). Grabbing the date along the way would be
> wonderful as well.
>
> I can go through it with opening the text file and reading in the
> lines, and if the first character is a "-" then count that as the
> start of a bug block, but I am not sure how to find the last line of
> a bug block...it would be the line before the first line of the next
> bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in
> Python, and I'm requesting pointers toward that standard way (or what
> this type of task is usually called). Thanks.
Split the processing into two phases: generating individual bugs and
processing each bug. Here is a prototype.
with open(bugfile) as f:
for bug in bugs(f):
process(bug)
Here are two examples of the first phase. Use the second for a big file.
(If individual bugs are more than a few lines, I would collect lines
in the generator in a list and use ''.join(<list>)).
bugtext = '''\
- BUG 2.13.14 When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat",
the application becomes available again.
- ISSUE 2.13.14 During thunderstorms, the application runs backwards.
- BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.
That's too bad
'''
buglist1 = [bug.strip().replace('\n', '') for bug in
bugtext[1:].split('\n-')]
for bug in buglist1: print(bug)
def bugs(lines):
lines = iter(lines)
bug = next(lines)[1:]
for line in lines:
if line[:1] != '-':
bug += line
else:
yield bug.strip()
bug = line[1:]
yield bug.strip()
buglist2 = [bug for bug in bugs(bugtext.splitlines())]
for bug in buglist2: print(bug)
print(buglist1 == buglist2)
>>>
BUG 2.13.14 When you wear a purple hat, the application locks up.If you
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14 During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's
too bad
BUG 2.13.14 When you wear a purple hat, the application locks up.If you
sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14 During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's
too bad
True
Now write process(bug)
--
Terry Jan Reedy
More information about the Python-list
mailing list