Parse bug text file

wxjmfauth at gmail.com wxjmfauth at gmail.com
Sun Jul 27 22:55:01 CEST 2014


Le dimanche 27 juillet 2014 20:08:06 UTC+2, CM a écrit :
> I have a big text file of bugs that I want to use Python to parse such that the bugs can be neatly filed into a database. I can bumble toward a solution with looping but feel this is a classic example of reinventing the wheel, and yet I'm finding it hard to Google for.
> 
> 
> 
> Basically the file is structured like this (silly examples, of course), with each of these three lets call a "bug block":
> 
> 
> 
> 
> 
> - BUG 2.13.14  When you wear a purple hat, the application locks up.  If you sing the theme to "The Love Boat", the application becomes available again.
> 
> 
> 
> - ISSUE 2.13.14  During thunderstorms, the application runs backwards.
> 
> 
> 
> - BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.  That's too bad. 
> 
> 
> 
> 
> 
> Generally, every bug block starts with a "-" as the first character, then some words in all caps, a date in that format, and then the descriptive text. There is always a blank line in between bug blocks, but sometimes there may be a blank line within the bug description as well.
> 
> 
> 
> The goal is to grab each bug block, clean up that text (there are CRs in it, etc., but I can do that), and dump it into a database record (the db stuff I can do).  Grabbing the date along the way would be wonderful as well.
> 
> 
> 
> I can go through it with opening the text file and reading in the lines, and if the first character is a "-" then count that as the start of a bug block, but I am not sure how to find the last line of a bug block...it would be the line before the first line of the next bug block, but not sure the best way to go about it.
> 
> 
> 
> There must be a rather standard way to do something like this in Python, and I'm requesting pointers toward that standard way (or what this type of task is usually called).  Thanks.

The real question: how to open and close a block given a
delimiter?

>>> s = """\
... - BUG 2.13.14  When you wear
... available again.
... 
...    - ISSUE 2.13.14  Duringthunderstorms
... 
... - BUG/OPTIMIZE 11.12.12:  Sometimes
... 
... - aaa -bbb
... 
... -
... """
>>> def z(s):
...     r = []
...     inblock = False
...     t = ''
...     i = 0
...     while i < len(s):
...         if s[i] == '-':
...             if inblock:
...                 r.append(t)
...                 t = s[i]
...             else:
...                 t = t + s[i]
...                 inblock = not inblock
...         else:
...             t = t + s[i]
...         i = i + 1
...     r.append(t)
...     return r
...     
>>> r = z(s)
>>> for e in r:
...     print(e)
...     
- BUG 2.13.14  When you wear
available again.

   
- ISSUE 2.13.14  Duringthunderstorms


- BUG/OPTIMIZE 11.12.12:  Sometimes


- aaa 
-bbb


-

>>> ''.join(r) == s
True

jmf



More information about the Python-list mailing list