[Tutor] How to substitute an element of a list as a pattern for
re.compile()
Rich Krauter
rmkrauter at yahoo.com
Thu Dec 30 07:06:10 CET 2004
kumar s wrote:
> I have Question:
> How can I substitute an object as a pattern in making
> a pattern.
>
>>>> x = 30
>>>> pattern = re.compile(x)
>
Kumar,
You can use string interpolation to insert x into a string, which can
then be compiled into a pattern:
x = 30
pat = re.compile('%s'%x)
I really doubt regular expressions will speed up your current searching
algorithm. You probably need to reconsider the data structures you are
using to represent your data.
> I have a list of numbers that I have to match in
> another list and write them to a new file:
>
> List 1: range_cors
>>>> range_cors[1:5]
> ['161:378', '334:3', '334:4', '65:436']
>
> List 2: seq
>>>> seq[0:2]
> ['>probe:HG-U133A_2:1007_s_at:416:177;
> Interrogation_Position=3330; Antisense;',
> 'CACCCAGCTGGTCCTGTGGATGGGA']
>
>
Can you re-process your second list? One option might be to store that
list instead as a dict, where the keys are what you want to search by
(maybe a string like '12:34' or a tuple like (12,34)).
Maybe something like the following:
>>> range_cors = ['12:34','34:56']
>>> seq = {'12:34': ['some 12:34 data'],
... '34:56': ['some 34:56'data','more 34:56 data']}
>>> for item in range_cors:
... print seq[item]
...
['some 12:34 data']
['some 34:56 data','more 34:56 data']
Why is this better?
If you have m lines of data and n patterns to search for, then using
either of your methods you perform n searches per line, totalling
approx. m*n operations. You have to complete approx. m*n operations
whether you use the string searching version, or re searching version.
If you pre-process the data so that it can be stored in and retrieved
from a dict, pre-processing to get your data into that dict costs you
roughly m operations, but your n pattern lookups into that dict cost you
only n operations, so you only have to complete approx. m+n operations.
> A slow method:
>>>> sequences = []
>>>> for elem1 in range_cors:
> for index,elem2 in enumerate(seq):
> if elem1 in elem2:
> sequences.append(elem2)
> sequences.append(seq[index+1])
>
> A faster method (probably):
>
>>>> for i in range(len(range_cors)):
> for index,m in enumerate(seq):
> pat = re.compile(i)
> if re.search(pat,seq[m]):
> p.append(seq[m])
> p.append(seq[index+1])
>
> I am getting errors, because I am trying to create an
> element as a pattern in re.compile().
>
pat = re.compile('%s'%i) would probably get rid of the error message,
but that's probably still not what you want.
>
> Questions:
>
> 1. Is it possible to do this. If so, how can I do this.
You can try, but I doubt regular expressions will help; that approach
will probably be even slower.
> Can any one help correcting my piece of code and
> suggesting where I went wrong.
I would scrap what you have and try using a better data structure. I
don't know enough about your data to make more specific processing
recommendations; but you can probably avoid those nested loops with some
careful data pre-processing.
You'll likely get better suggestions if you post a more representative
sample of your data, and explain exactly what you want as output.
Good luck.
Rich
More information about the Tutor
mailing list