[Tutor] Regular expression re.search() object . Please help

kumar s ps_python at yahoo.com
Thu Jan 13 23:57:24 CET 2005


Hello group:
thank you for the suggestions. It worked for me using 

if not line.startswith('Name='): expression. 


I have been practising regular expression problems. I
tumle over one simple thing always. After obtaining
either a search object or a match object, I am unable
to apply certain methods on these objects to get
stuff. 

I have looked into many books including my favs(
Larning python and Alan Gaulds Learn to program using
python) I did not find the basic question, how can I
get what I intend to do with returned reg.ex match
object (search(), match()).

For example:

I have a simple list like the following:

>>> seq
['>probe:HG-U133B:200000_s_at:164:623;
Interrogation_Position=6649; Antisense;',
'TCATGGCTGACAACCCATCTTGGGA']


Now I intend to extract particular pattern and write
to another list say: desired[]

What I want to extract:
I want to extract 164:623:
Which always comes after _at: and ends with ;
2. The second pattern/number I want to extract is
6649:
This always comes after position=.

How I want to put to desired[]:

>>> desired
['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA']

I write a pattern:


pat = '[0-9]*[:][0-9]*'
pat1 = '[_Position][=][0-9]*'

>>> for line in seq:
	pat = '[0-9]*[:][0-9]*'
	pat1 = '[_Position][=][0-9]*'
	print (re.search(pat,line) and re.search(pat1,line))

	
<_sre.SRE_Match object at 0x163CAF00>
None


Now I know that I have a hit in the seq list evident
by  <_sre.SRE_Match object at 0x163CAF00>.


Here is the black box:

What kind of operations can I do on this to get those
two matches: 
164:623 and 6649. 


I read 
http://www.python.org/doc/2.2.3/lib/re-objects.html


This did not help me to progress further. May I
request tutors to give a small note explaining things.
In Alan Gauld's book, most of the explanation stopped
at 
<_sre.SRE_Match object at 0x163CAF00> this level.
After that there is no example where he did some
operations on these objects.  If I am wrong, I might
have skipped/missed to read it. Aplogies for that. 

Thank you very much in advance. 

K









--- Liam Clarke <cyresse at gmail.com> wrote:

> ...as do I.
> 
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
> 
> openFile.close()
> 
> indexesToRemove=[]
> 
> for lineIndex in range(len(probe_pairs)):
> 
>        if
> probe_pairs[lineIndex].startswith("Name="):
>                     
> indexesToRemove.append(lineIndex)
> 
> for index in indexesToRemove:
>           probe_pairs[index]='""
> 
> Could just be
> 
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
> 
> openFile.close()
> 
> indexesToRemove=[]
> 
> for lineIndex in range(len(probe_pairs)):
> 
>        if
> probe_pairs[lineIndex].startswith("Name="):
>                      probe_pairs[lineIndex]=''
> 
> 
> 
> 
> 
> On Fri, 14 Jan 2005 09:38:17 +1300, Liam Clarke
> <cyresse at gmail.com> wrote:
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         key
> > >
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > 
> > 
> > You are overwriting key each time you iterate.
> key.group() gives the
> > matched characters in that object, not a group of
> objects!!!
> > 
> > You want
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> keys=[]
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         keys.append[key]
> > 
> > >>> print keys
> > 
> > > 'Name='
> > >
> > > 1. My aim:
> > > To remove those Name=**** lines from my
> probe_pairs
> > > list
> > 
> > Why are you deleting the object key?
> > 
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         del key
> > >         print probe_pairs[i]
> > 
> > Here's the easy way. Assuming that probe_pairs is
> stored in a file callde
> > probe_pairs.txt
> > 
> > openFile=file("probe_pairs.txt","r")
> > probe_pairs=openFile.readlines()
> > 
> > openFile.close()
> > 
> > indexesToRemove=[]
> > 
> > for lineIndex in range(len(probe_pairs)):
> > 
> >         if
> probe_pairs[lineIndex].startswith("Name="):
> >                      
> indexesToRemove.append(lineIndex)
> > 
> > for index in indexesToRemove:
> >            probe_pairs[index]='""
> > 
> > Try that.
> > 
> > Argh, my head. You do some strange things to
> Python.
> > 
> > Liam Clarke
> > 
> > On Thu, 13 Jan 2005 10:56:00 -0800 (PST), kumar s
> <ps_python at yahoo.com> wrote:
> > > Dear group:
> > >
> > > My list looks like this: List name = probe_pairs
> > > Name=AFFX-BioB-5_at
> > > Cell1=96        369     N       control
> AFFX-BioB-5_at
> > > Cell2=96        370     N       control
> AFFX-BioB-5_at
> > > Cell3=441       3       N       control
> AFFX-BioB-5_at
> > > Cell4=441       4       N       control
> AFFX-BioB-5_at
> > > Name=223473_at
> > > Cell1=307       87      N       control
> 223473_at
> > > Cell2=307       88      N       control
> 223473_at
> > > Cell3=367       84      N       control
> 223473_at
> > >
> > > My Script:
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         key
> > >
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > ..................................... (cont. 10K
> > > lines)
> > >
> > > Here it prints a bunch of reg.match objects.
> However
> > > when I say group() it prints only one object
> why?
> > >
> > > Alternatively:
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         key.group()
> > >
> > > 'Name='
> > >
> > > 1. My aim:
> > > To remove those Name=**** lines from my
> probe_pairs
> > > list
> > >
> > > with name1 as the pattern, I asked using
> re.match()
> > > method to identify the lines and then remove by
> using
> > > re.sub(pat,'',string) method.  I want to
> substitute
> > > Name=*** line by an empty string.
> > >
> > > After I get the reg.match object, I tried to
> remove
> > > that match object like this:
> > > >>> for i in range(len(probe_pairs)):
> > >         key = re.match(name1,probe_pairs[i])
> > >         del key
> > >         print probe_pairs[i]
> > >
> > > Name=AFFX-BioB-5_at
> > > Cell1=96        369     N       control
> AFFX-BioB-5_at
> > > Cell2=96        370     N       control
> AFFX-BioB-5_at
> > > Cell3=441       3       N       control
> AFFX-BioB-5_at
> > >
> > > Result shows that that Name** line has not been
> > > deleted.
> > >
> > > Is the way I am doing a good one. Could you
> please
> > > suggest a good simple method.
> > >
> > > Thanks in advance
> > > K
> > >
> > >
> > > __________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail - Easier than ever with enhanced
> search. Learn more.
> > > http://info.mail.yahoo.com/mail_250
> 
=== message truncated ===



		
__________________________________ 
Do you Yahoo!? 
All your favorites on one personal page – Try My Yahoo!
http://my.yahoo.com 


More information about the Tutor mailing list