[Tutor] Regular expression re.search() object . Please help

jfouhy at paradise.net.nz jfouhy at paradise.net.nz
Fri Jan 14 00:17:46 CET 2005


Quoting kumar s <ps_python at yahoo.com>:

> For example:
> 
> I have a simple list like the following:
> 
> >>> seq
> ['>probe:HG-U133B:200000_s_at:164:623;
> Interrogation_Position=6649 ; Antisense;',
> 'TCATGGCTGACAACCCATCTTGGGA']
> 
> 
> Now I intend to extract particular pattern and write
> to another list say: desired[]
> 
> What I want to extract:
> I want to extract 164:623:
> Which always comes after _at: and ends with ;
> 2. The second pattern/number I want to extract is
> 6649:
> This always comes after position=.
> 
> How I want to put to desired[]:
> 
> >>> desired
> ['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA']

You need to look into groups, or (even better) named groups.  Look at the re syntax.

Example:

>>> import re
>>> s = 'probe:HG-U133B:200000_s_at:164:623;\nnterrogation_Position=6649 ;
Antisense;'
>>> regEx = r'_at:(?P<num1>\d+):(?P<num2>\d+);.*?_Position=(?P<pos>\d+)'
>>> m = re.search(regEx, s, re.DOTALL)
>>> m.group('num1'), m.group('num2'), m.group('pos')
('164', '623', '6649')

The (?P<foo>bar) syntax creates a group which will match the regular expression
'bar', and then give it the name 'foo'.

A simpler-looking regular expression would be:

>>> regEx = r'_at:(\d+):(\d+);.*?_Position=(\d+)'

The parentheses still create groups, but now you have to access them using their
index (from left to right, counting from 1).  But I think named groups are nicer
in terms of self-documenting code :-)

-- 
John.


More information about the Tutor mailing list