[Tutor] Regular expression re.search() object . Please help
jfouhy at paradise.net.nz
jfouhy at paradise.net.nz
Fri Jan 14 00:17:46 CET 2005
Quoting kumar s <ps_python at yahoo.com>:
> For example:
>
> I have a simple list like the following:
>
> >>> seq
> ['>probe:HG-U133B:200000_s_at:164:623;
> Interrogation_Position=6649 ; Antisense;',
> 'TCATGGCTGACAACCCATCTTGGGA']
>
>
> Now I intend to extract particular pattern and write
> to another list say: desired[]
>
> What I want to extract:
> I want to extract 164:623:
> Which always comes after _at: and ends with ;
> 2. The second pattern/number I want to extract is
> 6649:
> This always comes after position=.
>
> How I want to put to desired[]:
>
> >>> desired
> ['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA']
You need to look into groups, or (even better) named groups. Look at the re syntax.
Example:
>>> import re
>>> s = 'probe:HG-U133B:200000_s_at:164:623;\nnterrogation_Position=6649 ;
Antisense;'
>>> regEx = r'_at:(?P<num1>\d+):(?P<num2>\d+);.*?_Position=(?P<pos>\d+)'
>>> m = re.search(regEx, s, re.DOTALL)
>>> m.group('num1'), m.group('num2'), m.group('pos')
('164', '623', '6649')
The (?P<foo>bar) syntax creates a group which will match the regular expression
'bar', and then give it the name 'foo'.
A simpler-looking regular expression would be:
>>> regEx = r'_at:(\d+):(\d+);.*?_Position=(\d+)'
The parentheses still create groups, but now you have to access them using their
index (from left to right, counting from 1). But I think named groups are nicer
in terms of self-documenting code :-)
--
John.
More information about the Tutor
mailing list