[Tutor] Regular expression on python

Mon Apr 13 20:42:03 CEST 2015

On 13/04/15 13:29, jarod_v6 at libero.it wrote:

> Input Read Pairs: 2127436 Both Surviving: 1795091 (84.38%) Forward Only Surviving: 17315 (0.81%) Reverse Only Surviving: 6413 (0.30%) Dropped: 308617 (14.51%)

Its not clear where the tabs are in this line.
But if they are after the numbers, like so:

Input Read Pairs: 2127436 \t
Both Surviving: 1795091 (84.38%) \t
Forward Only Surviving: 17315 (0.81%) \t
Reverse Only Surviving: 6413 (0.30%) \t
Dropped: 308617 (14.51%)

Then you may not need to use regular expressions.
Simply split by tab then split by :
And if the 'number' contains parens split again by space

>   with open("255.trim.log","r") as p:
>      for i in p:
>          lines= i.strip("\t")

lines is a bad name here since its only a single line. In fact I'd lose 
the 'i' variable and just use

for line in p:

>          if lines.startswith("Input"):
>              tp = lines.split("\t")
>              print re.findall("Input\d",str(tp))

Input is not followed by a number. You need a more powerful pattern.
Which is why I recommend trying to solve it as far as possible
without using regex.

> So I started to find ":" from the row:
>   with open("255.trim.log","r") as p:
>      for i in p:
>          lines= i.strip("\t")
>          if lines.startswith("Input"):
>              tp = lines.split("\t")
>              print re.findall(":",str(tp[0]))

Does finding the colons really help much?
Or at least, does it help any more than splitting by colon would?

> And I'm able to find, but when I try to take the number using \d not work.
> Someone can explain why?

Because your pattern doesn't match the string.

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos