Need a specific sort of string modification. Can someone help?
Nick Mellor
thebalancepro at gmail.com
Mon Jan 7 00:28:45 EST 2013
Note that the multi-line version above tolerates missing digits: if the number is missing after the '+/-' it doesn't skip any letters.
Brief explanation of the multi-digit version:
+/- are converted to spaces and used to split the string into sections. The split process effectively swallows the +/- characters.
The complication of multi-digits is that you need to skip the (possibly multiple) digits, which adds another stage to the calculation. In:
+3ACG. -> .
you skip 1 + 3 characters, 1 for the digit, 3 for the following letters as specified by the digit 3. In:
-11ACGACGACGACG. -> G.
You skip 2 + 11 characters, 2 digits in "12" and 11 letters following. And incidentally in:
+ACG. -> ACG.
there's no digit, so you skip 0 digits + 0 letters.
Having split on +/- using .translate() and .split() I use takewhile to separate the zero or more digits from the following letters. If takewhile doesn't find any digits at the start of the sequence, it returns the empty list []. ''.join(list) swallows empty lists so dropwhile and ''.join() cover the no-digit case between them. If a lack of digits is a data error then it would be easy to test for-- just look for an empty list in 'digits'.
I was pleasantly surprised to find that using list comprehensions, zip, join (all highly optimised in Python) and several intermediate lists still works at a fairly decent speed, despite using more stages to handle multi-digits. But it is about 4x slower than the less flexible 1-digit version on my hardware (about 25,000 per second.)
Nick
On Monday, 7 January 2013 14:40:02 UTC+11, Nick Mellor wrote:
> Hi Sia,
>
>
>
> Find a multi-digit method in this version:
>
>
>
> from string import maketrans
>
> from itertools import takewhile
>
>
>
> def is_digit(s): return s.isdigit()
>
>
>
> class redux:
>
>
>
> def __init__(self):
>
> intab = '+-'
>
> outtab = ' '
>
> self.trantab = maketrans(intab, outtab)
>
>
>
>
>
> def reduce_plusminus(self, s):
>
> list_form = [r[int(r[0]) + 1:] if r[0].isdigit() else r
>
> for r
>
> in s.translate(self.trantab).split()]
>
> return ''.join(list_form)
>
>
>
> def reduce_plusminus_multi_digit(self, s):
>
> spl = s.translate(self.trantab).split()
>
> digits = [list(takewhile(is_digit, r))
>
> for r
>
> in spl]
>
> numbers = [int(''.join(r)) if r else 0
>
> for r
>
> in digits]
>
> skips = [len(dig) + num for dig, num in zip(digits, numbers)]
>
> return ''.join([s[r:] for r, s in zip(skips, spl)])
>
>
>
> if __name__ == "__main__":
>
> p = redux()
>
> print p.reduce_plusminus(".+3ACG.+5CAACG.+3ACG.+3ACG")
>
> print p.reduce_plusminus("tA.-2AG.-2AG,-2ag")
>
> print 'multi-digit...'
>
> print p.reduce_plusminus_multi_digit(".+3ACG.+5CAACG.+3ACG.+3ACG")
>
> print p.reduce_plusminus_multi_digit(".+12ACGACGACGACG.+5CAACG.+3ACG.+3ACG")
>
>
>
>
>
> HTH,
>
>
>
> Nick
>
>
>
> On Saturday, 5 January 2013 19:35:26 UTC+11, Sia wrote:
>
> > I have strings such as:
>
> >
>
> >
>
> >
>
> > tA.-2AG.-2AG,-2ag
>
> >
>
> > or
>
> >
>
> > .+3ACG.+5CAACG.+3ACG.+3ACG
>
> >
>
> >
>
> >
>
> > The plus and minus signs are always followed by a number (say, i). I want python to find each single plus or minus, remove the sign, the number after it and remove i characters after that. So the two strings above become:
>
> >
>
> >
>
> >
>
> > tA..,
>
> >
>
> > and
>
> >
>
> > ...
>
> >
>
> >
>
> >
>
> > How can I do that?
>
> >
>
> > Thanks.
More information about the Python-list
mailing list