[Tutor] Regular expression re.search() object . Please help
kumar s
ps_python at yahoo.com
Thu Jan 13 23:57:24 CET 2005
Hello group:
thank you for the suggestions. It worked for me using
if not line.startswith('Name='): expression.
I have been practising regular expression problems. I
tumle over one simple thing always. After obtaining
either a search object or a match object, I am unable
to apply certain methods on these objects to get
stuff.
I have looked into many books including my favs(
Larning python and Alan Gaulds Learn to program using
python) I did not find the basic question, how can I
get what I intend to do with returned reg.ex match
object (search(), match()).
For example:
I have a simple list like the following:
>>> seq
['>probe:HG-U133B:200000_s_at:164:623;
Interrogation_Position=6649; Antisense;',
'TCATGGCTGACAACCCATCTTGGGA']
Now I intend to extract particular pattern and write
to another list say: desired[]
What I want to extract:
I want to extract 164:623:
Which always comes after _at: and ends with ;
2. The second pattern/number I want to extract is
6649:
This always comes after position=.
How I want to put to desired[]:
>>> desired
['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA']
I write a pattern:
pat = '[0-9]*[:][0-9]*'
pat1 = '[_Position][=][0-9]*'
>>> for line in seq:
pat = '[0-9]*[:][0-9]*'
pat1 = '[_Position][=][0-9]*'
print (re.search(pat,line) and re.search(pat1,line))
<_sre.SRE_Match object at 0x163CAF00>
None
Now I know that I have a hit in the seq list evident
by <_sre.SRE_Match object at 0x163CAF00>.
Here is the black box:
What kind of operations can I do on this to get those
two matches:
164:623 and 6649.
I read
http://www.python.org/doc/2.2.3/lib/re-objects.html
This did not help me to progress further. May I
request tutors to give a small note explaining things.
In Alan Gauld's book, most of the explanation stopped
at
<_sre.SRE_Match object at 0x163CAF00> this level.
After that there is no example where he did some
operations on these objects. If I am wrong, I might
have skipped/missed to read it. Aplogies for that.
Thank you very much in advance.
K
--- Liam Clarke <cyresse at gmail.com> wrote:
> ...as do I.
>
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
>
> openFile.close()
>
> indexesToRemove=[]
>
> for lineIndex in range(len(probe_pairs)):
>
> if
> probe_pairs[lineIndex].startswith("Name="):
>
> indexesToRemove.append(lineIndex)
>
> for index in indexesToRemove:
> probe_pairs[index]='""
>
> Could just be
>
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
>
> openFile.close()
>
> indexesToRemove=[]
>
> for lineIndex in range(len(probe_pairs)):
>
> if
> probe_pairs[lineIndex].startswith("Name="):
> probe_pairs[lineIndex]=''
>
>
>
>
>
> On Fri, 14 Jan 2005 09:38:17 +1300, Liam Clarke
> <cyresse at gmail.com> wrote:
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > key
> > >
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> >
> >
> > You are overwriting key each time you iterate.
> key.group() gives the
> > matched characters in that object, not a group of
> objects!!!
> >
> > You want
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> keys=[]
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > keys.append[key]
> >
> > >>> print keys
> >
> > > 'Name='
> > >
> > > 1. My aim:
> > > To remove those Name=**** lines from my
> probe_pairs
> > > list
> >
> > Why are you deleting the object key?
> >
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > del key
> > > print probe_pairs[i]
> >
> > Here's the easy way. Assuming that probe_pairs is
> stored in a file callde
> > probe_pairs.txt
> >
> > openFile=file("probe_pairs.txt","r")
> > probe_pairs=openFile.readlines()
> >
> > openFile.close()
> >
> > indexesToRemove=[]
> >
> > for lineIndex in range(len(probe_pairs)):
> >
> > if
> probe_pairs[lineIndex].startswith("Name="):
> >
> indexesToRemove.append(lineIndex)
> >
> > for index in indexesToRemove:
> > probe_pairs[index]='""
> >
> > Try that.
> >
> > Argh, my head. You do some strange things to
> Python.
> >
> > Liam Clarke
> >
> > On Thu, 13 Jan 2005 10:56:00 -0800 (PST), kumar s
> <ps_python at yahoo.com> wrote:
> > > Dear group:
> > >
> > > My list looks like this: List name = probe_pairs
> > > Name=AFFX-BioB-5_at
> > > Cell1=96 369 N control
> AFFX-BioB-5_at
> > > Cell2=96 370 N control
> AFFX-BioB-5_at
> > > Cell3=441 3 N control
> AFFX-BioB-5_at
> > > Cell4=441 4 N control
> AFFX-BioB-5_at
> > > Name=223473_at
> > > Cell1=307 87 N control
> 223473_at
> > > Cell2=307 88 N control
> 223473_at
> > > Cell3=367 84 N control
> 223473_at
> > >
> > > My Script:
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > key
> > >
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > ..................................... (cont. 10K
> > > lines)
> > >
> > > Here it prints a bunch of reg.match objects.
> However
> > > when I say group() it prints only one object
> why?
> > >
> > > Alternatively:
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > key.group()
> > >
> > > 'Name='
> > >
> > > 1. My aim:
> > > To remove those Name=**** lines from my
> probe_pairs
> > > list
> > >
> > > with name1 as the pattern, I asked using
> re.match()
> > > method to identify the lines and then remove by
> using
> > > re.sub(pat,'',string) method. I want to
> substitute
> > > Name=*** line by an empty string.
> > >
> > > After I get the reg.match object, I tried to
> remove
> > > that match object like this:
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > del key
> > > print probe_pairs[i]
> > >
> > > Name=AFFX-BioB-5_at
> > > Cell1=96 369 N control
> AFFX-BioB-5_at
> > > Cell2=96 370 N control
> AFFX-BioB-5_at
> > > Cell3=441 3 N control
> AFFX-BioB-5_at
> > >
> > > Result shows that that Name** line has not been
> > > deleted.
> > >
> > > Is the way I am doing a good one. Could you
> please
> > > suggest a good simple method.
> > >
> > > Thanks in advance
> > > K
> > >
> > >
> > > __________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail - Easier than ever with enhanced
> search. Learn more.
> > > http://info.mail.yahoo.com/mail_250
>
=== message truncated ===
__________________________________
Do you Yahoo!?
All your favorites on one personal page Try My Yahoo!
http://my.yahoo.com
More information about the Tutor
mailing list