[Tutor] Finding all locations of a sequence

Lauren laurenb01 at gmail.com
Wed Jun 27 23:31:57 CEST 2007


Firstly, I'd like to thank everyone for their help. I ended up throwing
something together using dictionaries (because I understood those best out
of what I had), that was a lot faster than my initial attempt, but have run
into a different problem, that I was hoping for help with. So, what I have
is all the subsequences that I was looking for in separate entries in the
dictionary, and where each of them is found as the value. If a subsequence
binds to more than one other item, I want to have the locations of the items
all together.
The closest I've been able to manage to get to what I want is this:

dict_of_bond_location = {}
dict1 = {'AAA':['UUU'], 'AAU':['UUG', 'UUA'], 'AAC':['UUG'], 'AAG':['UUC',
'UUU'], 'CCC':['GGG']}
dict2 = {'AAA':[1], 'AAU':[2], 'AAC':[3], 'AAG':[0, 4], 'GGG':[10]}
dict3 = {'UUU':[3, 5], 'UUG':[0], 'UUA':[1], 'UUC':[2], 'GGG':[14]}


for key in dict2:
    if key in dict1:
        matching_subseq = dict1.get(key)
        for item in matching_subseq:
            if item in dict3:
                location = dict3.get(item)
                dict_of_bond_location.setdefault(key, []).append(location)
print dict_of_bond_location

which gives this:
{'AAU': [[0], [1]], 'AAG': [[2], [3, 5]], 'AAA': [[3, 5]], 'AAC': [[0]]}

but what I want is
'AAU':[0, 1], 'AAG':[2, 3, 5], 'AAA':[3. 5], 'AAC':[0]

the setdefault(key, []).append(location) thing sort of does what I want, but
I don't want the result to be a list of lists...just one big list. The
production of a new dictionary is not necessary, but it made sense to me a
few hours ago. Anyway, is there a fast and dirty way to add lists together,
if the lists are not named (I think that's essentially what I want?)

Thanks again,

Lauren



On 19/06/07, tutor-request at python.org <tutor-request at python.org> wrote:
>
> Send Tutor mailing list submissions to
>         tutor at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
>         tutor-request at python.org
>
> You can reach the person managing the list at
>         tutor-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
>
>
> Today's Topics:
>
>    1. Re: iterating over a sequence question.. (Luke Paireepinart)
>    2. Re: Help converting base32 to base16 (Alan Gauld)
>    3. Re: Finding all locations of a sequence (fwd) (Danny Yoo)
>    4. Re: sockets ( Linus Nordstr?m )
>    5. Re: sockets (Alan Gauld)
>    6. Re: Finding all locations of a sequence (fwd) (Alan Gauld)
>    7. Re: cannot pickle instancemethod objects (hok kakada)
>    8. Re: Python and XSI (Vishal Jain)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 18 Jun 2007 13:37:21 -0500
> From: "Luke Paireepinart" <rabidpoobear at gmail.com>
> Subject: Re: [Tutor] iterating over a sequence question..
> To: "Simon Hooper" <simon at partex.co.uk>, tutor at python.org
> Message-ID:
>         <dfeb4470706181137x737f84b9l358a84c8c2043b16 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On 6/18/07, Simon Hooper <simon at partex.co.uk> wrote:
> >
> > Hi Luke,
> >
> > * On 17/06/07, Luke Paireepinart wrote:
> > > a more expanded version that accounts for either list being the longer
> > > one, or both being the same length, would be:
> > >
> > >  >>> if len(t) > len(l): x = len(t)
> > > else: x = len(l)
> > >  >>> print [(l[i%len(l)],t[i%len(t)]) for i in range(x)]
> > > [(1, 'r'), (2, 'g'), (3, 'b'), (4, 'r'), (5, 'g')]
> >
> > Being the duffer that I am, I'm very pleased with myself that I came up
> > with a similar solution (albeit as a function rather than a list
> > comprehension) :)
> >
> > You do not need the if statement either,
>
>
> Yeah, I never knew about the max() function!
> I noticed someone else used it in one of their solutions.
> I'm pretty sure I've seen it a lot before, just didn't remember it.
> -Luke
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.python.org/pipermail/tutor/attachments/20070618/1cf0ac67/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Mon, 18 Jun 2007 21:12:02 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] Help converting base32 to base16
> To: tutor at python.org
> Message-ID: <f56oul$lah$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>         reply-type=original
>
>
> "Jason Massey" <jason.massey at gmail.com> wrote
>
> >  Nice entry at wikipedia:
> >
> > http://en.wikipedia.org/wiki/Base_32
> >
>
> Thanks for the link, I should have thought of oooking there!
> I've heardof Base64 for encoding email but never come
> across Base32 - any of the versions!
>
> Alan G.
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 18 Jun 2007 16:54:53 -0400 (EDT)
> From: Danny Yoo <dyoo at cs.wpi.edu>
> Subject: Re: [Tutor] Finding all locations of a sequence (fwd)
> To: tutor at python.org
> Message-ID: <Pine.LNX.4.63.0706181653280.18230 at cs.wpi.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> Hi everyone,
>
> Can someone help Lauren?  I apologize for this, but I am very time
> constrained at this moment, and won't be able to reply back for at least a
> few hours.  His question is below.  Thanks!
>
>
> ---------- Forwarded message ----------
> Date: Mon, 18 Jun 2007 16:41:39 -0400
> From: Lauren <laurenb01 at gmail.com>
> To: Danny Yoo <dyoo at cs.wpi.edu>
> Subject: Re: [Tutor] Finding all locations of a sequence
>
> Ok, these may be useful. I do have a potentially embarrassing problem,
> however I will preface this with I'm practically computer illiterate. Ok
> after I download the one you wrote (which I think I understand better than
> the one listed previous...famous last words, I'm sure), but when I ask to
> import the ahocorasick module, it says it can't find it :( Is it possible
> to
> get step by step instructions on how to open the module? Do I need
> something
> other than the latest download for it?
> Again, I'm not good at this programming thing, so I'm sure I'm missing
> something obvious
>
> Thank you for your help,
>
> Lauren
>
> On 18/06/07, Danny Yoo <dyoo at cs.wpi.edu> wrote:
> >
> >
> >
> >> Ok, what I have is a RNA sequence (about 5 million nucleotides
> >> [characters] long) and have (4100) subsequences (from another sequence)
> >> and the sub-sequences are 6 characters long, that I want to find in it.
> >
> > Hi Lauren,
> >
> > I am positive that this problem has been tackled before.  Have you
> talked
> > to any of your fellow bioinformaticists?  It might not be effective to
> ask
> > on Python-tutor because, for typical problems, regexes are sufficient.
> > For your particular scenario, regexes may not be, so you may want to ask
> > specialists like those on the Biopython mailing lists:
> >
> >      http://biopython.org/wiki/Mailing_lists
> >
> > It may seem like a simple extension of regular expression search, but as
> > you may have noticed, feeding 4100 regex searches across that RNA
> sequence
> > is going to take some time.  And trying to feed all of those
> subsequences
> > as a single regular expression (using ORs) is probably not going to work
> > too well either: the engine has limitations on how large the pattern can
> > be before it starts behaving very badly.
> >
> >
> > To scale to this problem, we need to do something different.  What
> you're
> > probably looking for is more typically known as the keyword matching
> > problem, and there are algorithms specifically used for keyword
> matching.
> > For example, take a look at Nicholas Lehuen's Pytst package, which
> > implements ternary search trees:
> >
> >      http://nicolas.lehuen.com/index.php/category/Pytst
> >
> > I've also written a different package that uses the "Aho-Corasick"
> > automata matcher, but to tell the truth I've let the code stale a bit,
> and
> > can't support it (too busy with other work).  But if you're interested
> in
> > tinkering with it, it's here:
> > http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/<
> http://hkn.eecs.berkeley.edu/%7Edyoo/python/ahocorasick/>
> > .
> >
> >
> > If you're going to do more string-matching stuff, I'd recommend a look
> > into "Algorithms on Strings, Trees, and Sequences".
> >
> >      http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521585198
> >
> > It's one of the better books on bioinformatics string algorithms, and it
> > specifically covers this class of sequence-searching problems.
> >
>
>
>
> --
> Lauren
>
> Laurenb01 at gmail.com
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 19 Jun 2007 01:19:45 +0200
> From: " Linus Nordstr?m " <linusno at gmail.com>
> Subject: Re: [Tutor] sockets
> To: Tutor at python.org
> Message-ID:
>         <1eb3a0e10706181619r762bdacdqf64424148d0749dd at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> gusse i use this thread as my own little help thread.. :)
>
> im having problem whit recv. It will not break the loop if ther are
> nothing more to recive. It dose recv all tit should, but then it go
> another round in the loop and get stuck on recv, as far as print
> debugging has showed,  Dont realy know waht ells information i could
> write sown to make my problem anny clrearer, so ask if anythin is
> unclear :)
>
> And on another note, is there a way to use the self. less, it might be
> just me but it looks rather ugly :)
>
> def recive(self):
>          all_data = []
>          while 1:
>              self.buf = self.s.recv(4096)
>              if not self.buf: break
>              all_data.append(self.buf)
>          return all_data
>
>
> On 6/18/07, Linus Nordstr?m <linus at linusnordstrom.com> wrote:
> > Hello
> > I'm trying to rewrite a chat-program i did in school this spring in
> > python, the school program was in java. All this to leran python.
> >
> > Anyway. I m trying to send a message using udp to a server that
> > conntains the message 3 0 0 0, it has to be in network byte order and
> > unsigned. I have tried to send it like a string "3000" but get an
> > error message from the server saying that it did recive 4 bytes, but
> > that it was an unknown message
> >
> > So what i am wondering is if there are anny special byte arrays, or
> > some thing i should use. or if there are anny other way than to send
> > the message than a string.
> >
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 19 Jun 2007 00:34:28 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] sockets
> To: tutor at python.org
> Message-ID: <f574q7$tlr$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>         reply-type=original
>
> "Linus Nordstr?m" <linusno at gmail.com> wrote
>
> > im having problem whit recv. It will not break the loop if ther are
> > nothing more to recive.
>
> Take a look at the client side of the address book
> example in my network profgramming topic. It shows
> an example of breaking out of a recv loop.
>
> Another option is to use the select module services
>
> > And on another note, is there a way to use the self. less,
> > it might be just me but it looks rather ugly :)
>
> No, it's good practice, it distinguishes between class level
> variables and local variables. The use of self/this etc is usually
> required in industrial coding standards for C++ and Java
> (eg Dept of defense/Government etc) for the same reason,
> even though those languages don't require it. As in many
> things python is simply enforcing what is already considered
> good practice elsewhere.
>
> HTH,
>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 19 Jun 2007 00:37:46 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] Finding all locations of a sequence (fwd)
> To: tutor at python.org
> Message-ID: <f5750e$u4o$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>         reply-type=original
>
>
> > From: Lauren <laurenb01 at gmail.com>
> > To: Danny Yoo <dyoo at cs.wpi.edu>
> > Subject: Re: [Tutor] Finding all locations of a sequence
> >
> > after I download the one you wrote (which I think I understand
> > better than
> > the one listed previous...famous last words, I'm sure), but when I
> > ask to
> > import the ahocorasick module, it says it can't find it :(
>
> You will need to ocate and download the module too.
> Either store it along with your program or in the site-packages
> folder in your python installation.
>
> > get step by step instructions on how to open the module?
>
> You just import it.
> You can read more about that in my modules and functions topic
> if that helps.
>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 19 Jun 2007 12:19:31 +0700
> From: hok kakada <hokkakada at khmeros.info>
> Subject: Re: [Tutor] cannot pickle instancemethod objects
> To: tutor-python <tutor at python.org>
> Message-ID: <200706191219.32263.hokkakada at khmeros.info>
> Content-Type: text/plain;  charset="utf-8"
>
> ?????? ??? 13 ?????? 2007 19:09, Kent Johnson ??????????????
> > hok kakada wrote:
> > >> What kind of object is matcher? Does it have any attributes that are
> > >> functions? (Not methods you defined for the class, but functions or
> > >> methods that you assign to attributes of self.)
> > >
> > > Actually, I use the translate-toolkit from
> > > http://translate.sourceforge.net/snapshots/translate-toolkit-1.0.1rc1/
> > > in translate/search/match.py:
> > >         if comparer is None:
> > >             comparer = lshtein.LevenshteinComparer(max_length)
> > >
> >  >         self.comparer = comparer
> > >
> > > I just found the problem that it is because of the
> LevenshteinComparer.
> > > Once I assign self.comparer = None, the I can dump the matcher
> > > successfully. However, I still don't understand what could be wrong
> with
> > > LevenshteinComparer.
> >
> > I think the problem is this code in LevenshteinComparer.__init__():
> >
> >          if Levenshtein:
> >              self.distance = self.native_distance
> >          else:
> >              self.distance = self.python_distance
> >
> > which assigns an instance method to an instance attribute; this is the
> > instancemethod that can't be pickled.
> Ok...but how can we deal with it?
>
> Kind Regards,
> da
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 19 Jun 2007 12:08:36 +0400
> From: "Vishal Jain" <vishal.thebigv at gmail.com>
> Subject: Re: [Tutor] Python and XSI
> To: tutor at python.org
> Message-ID:
>         <5f3b7ddd0706190108y7ebe22ax2e859fb3760d1ee2 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Yes XSI is a 3D package from Avid. Also knows as Softimage|XSI
>
> By everything failed I mean that XSI isnt registering Python resulting in
> I
> cant select scripitng langauge as Python inside XSI. After running the
> said
> script the command prompt says "Python:Registered" but no error messages.
>
> Yes XSI supports only ActiveX Scripts.
>
> If everything goes good I expect to select scripting langauge as Python
> inside XSI and start scripting. But apparently eveything went good as per
> the manuals and docs except my luck :(
>
> I did search google but will search more as per your guidlines. And I am a
> beginner too in Python. I never coded before.
>
> Thanks a lot for ur suggestions and concern.
>
> By the way are you the same Alan Gauld who posts in XSI Base and CG Talk??
>
> Vishal
>
> Message: 6
> Date: Mon, 18 Jun 2007 16:09:35 +0100
> From: "Alan Gauld" <alan.gauld at btinternet.com>
> Subject: Re: [Tutor] Python and XSI
> To: tutor at python.org
> Message-ID: <f5677j$e1f$1 at sea.gmane.org>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>        reply-type=original
>
> "Vishal Jain" <vishal.thebigv at gmail.com> wrote
>
> > I am trying to get Python registered with XSI but
> > everything described in docs failed.
>
> OK, I'm assuming that XSI is the 3D Graphics software from Avid?
>
> Can you tell us which docs and what exactly 'failed' means?
> What did you do and what actually happened?
> Any error messages?
>
> > C:\Python25\Lib\site-packages\win32comext\axscript\client\pyscript.py
>
> ISTR that as being the script for registering Python as an
> Active Scripting language for WSH and the IE ActiveX engine.
>
> I assume from that, that XSI uses active scripting?
>
> What did you expect to happen after running it?
> And what did happen?
>
> > still no luck. Is anybody else also facing the same situation? I am
> > not very
> > sure how many people here uses Python for XSI but still hoping to
> > get some
>
> I had to do a Google search...
> You might have more luck on the main comp.lang.python list or
> maybe even in the pyGame list. This group is for beginners to Python
> and XSI looks a tad advanced for most beginners, going by the couple
> of pages I browsed. OTOH there is usually somebody here with an
> interest in most things...
>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.python.org/pipermail/tutor/attachments/20070619/ded2eb30/attachment.htm
>
> ------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> End of Tutor Digest, Vol 40, Issue 46
> *************************************
>



-- 
Lauren

Laurenb01 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070627/715d7501/attachment.htm 


More information about the Tutor mailing list