[Tutor] How to substitute an element of a list as a pattern for
re.compile()
Kent Johnson
kent37 at tds.net
Thu Dec 30 19:28:16 CET 2004
kumar s wrote:
> My situation:
>
> I have a list of numbers that I have to match in
> another list and write them to a new file:
>
> List 1: range_cors
>
>>>>range_cors[1:5]
>
> ['161:378', '334:3', '334:4', '65:436']
>
> List 2: seq
>
>>>>seq[0:2]
>
> ['>probe:HG-U133A_2:1007_s_at:416:177;
> Interrogation_Position=3330; Antisense;',
> 'CACCCAGCTGGTCCTGTGGATGGGA']
>
>
> A slow method:
>
>>>>sequences = []
>>>>for elem1 in range_cors:
>
> for index,elem2 in enumerate(seq):
> if elem1 in elem2:
> sequences.append(elem2)
> sequences.append(seq[index+1])
>
> This process is very slow and it is taking a lot of
> time. I am not happy.
It looks like you really only want to search every other element of seq. You could speed your loop
up by using an explicit iterator:
for elem1 in range_cors:
i = iter(seq)
try:
tag, data = i.next(), i.next()
if elem1 in tag:
sequences.append(tag)
sequences.append(data)
except StopIteration:
pass
You don't say how long the sequences are. If range_cors is short enough you can use a single regex
to do the search. (I don't actually know how short range_cors has to be or how this will break down
if it is too long; this will probably work with 100 items in range_cors; it may only be limited by
available memory; it may become slow to compile the regex when range_cors gets too big...) This will
eliminate your outer loop entirely and I expect a substantial speedup. The code would look like this:
>>> range_cors = ['161:378', '334:3', '334:4', '65:436']
Make a pattern by escaping special characters in the search string, and joining them with '|':
>>> pat = '|'.join(map(re.escape, range_cors))
>>> pat
'161\\:378|334\\:3|334\\:4|65\\:436'
>>> pat = re.compile(pat)
Now you can use pat.search() to find matches:
>>> pat.search('123:456')
>>> pat.search('aaa161:378')
<_sre.SRE_Match object at 0x008DC8E0>
The complete search loop would look like this:
i = iter(seq)
try:
tag, data = i.next(), i.next()
if pat.search(tag):
sequences.append(tag)
sequences.append(data)
except StopIteration:
pass
Kent
>
>
>
> A faster method (probably):
>
>
>>>>for i in range(len(range_cors)):
>
> for index,m in enumerate(seq):
> pat = re.compile(i)
> if re.search(pat,seq[m]):
> p.append(seq[m])
> p.append(seq[index+1])
>
>
> I am getting errors, because I am trying to create an
> element as a pattern in re.compile().
>
>
> Questions:
>
> 1. Is it possible to do this. If so, how can I do
> this.
>
> Can any one help correcting my piece of code and
> suggesting where I went wrong.
>
> Thank you in advance.
>
>
> -K
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list