[Tutor] How to substitute an element of a list as a pattern for re.compile()

Kent Johnson kent37 at tds.net
Thu Dec 30 19:28:16 CET 2004


kumar s wrote:
> My situation:
> 
> I have a list of numbers that I have to match in
> another list and write them to a new file:
> 
> List 1: range_cors 
> 
>>>>range_cors[1:5]
> 
> ['161:378', '334:3', '334:4', '65:436']
> 
> List 2: seq
> 
>>>>seq[0:2]
> 
> ['>probe:HG-U133A_2:1007_s_at:416:177;
> Interrogation_Position=3330; Antisense;',
> 'CACCCAGCTGGTCCTGTGGATGGGA']
> 
> 
> A slow method:
> 
>>>>sequences = []
>>>>for elem1 in range_cors:
> 
> 	for index,elem2 in enumerate(seq):
> 		if elem1 in elem2:
> 			sequences.append(elem2)
> 			sequences.append(seq[index+1])
> 
> This process is very slow and it is taking a lot of
> time. I am not happy.

It looks like you really only want to search every other element of seq. You could speed your loop 
up by using an explicit iterator:
for elem1 in range_cors:
   i = iter(seq)
   try:
     tag, data = i.next(), i.next()
     if elem1 in tag:
       sequences.append(tag)
       sequences.append(data)
   except StopIteration:
     pass

You don't say how long the sequences are. If range_cors is short enough you can use a single regex 
to do the search. (I don't actually know how short range_cors has to be or how this will break down 
if it is too long; this will probably work with 100 items in range_cors; it may only be limited by 
available memory; it may become slow to compile the regex when range_cors gets too big...) This will 
eliminate your outer loop entirely and I expect a substantial speedup. The code would look like this:

  >>> range_cors = ['161:378', '334:3', '334:4', '65:436']

Make a pattern by escaping special characters in the search string, and joining them with '|':
  >>> pat = '|'.join(map(re.escape, range_cors))
  >>> pat
'161\\:378|334\\:3|334\\:4|65\\:436'
  >>> pat = re.compile(pat)

Now you can use pat.search() to find matches:
  >>> pat.search('123:456')
  >>> pat.search('aaa161:378')
<_sre.SRE_Match object at 0x008DC8E0>

The complete search loop would look like this:

   i = iter(seq)
   try:
     tag, data = i.next(), i.next()
     if pat.search(tag):
       sequences.append(tag)
       sequences.append(data)
   except StopIteration:
     pass

Kent

> 
> 
> 
> A faster method (probably):
> 
> 
>>>>for i in range(len(range_cors)):
> 
> 	for index,m in enumerate(seq):
> 		pat = re.compile(i)
> 		if re.search(pat,seq[m]):
> 			p.append(seq[m])
> 			p.append(seq[index+1])
> 
> 
> I am getting errors, because I am trying to create an
> element as a pattern in re.compile(). 
> 
> 
> Questions:
> 
> 1. Is it possible to do this. If so, how can I do
> this. 
> 
> Can any one help correcting my piece of code and
> suggesting where I went wrong. 
> 
> Thank you in advance. 
> 
> 
> -K
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


More information about the Tutor mailing list