Line segments, overlap, and bits
Istvan Albert
istvan.albert at gmail.com
Thu Mar 27 11:54:58 EDT 2008
On Mar 26, 5:28 pm, Sean Davis <seand... at gmail.com> wrote:
> I am working with genomic data. Basically, it consists of many tuples
> of (start,end) on a line. I would like to convert these tuples of
> (start,end) to a string of bits where a bit is 1 if it is covered by
> any of the regions described by the (start,end) tuples and 0 if it is
> not. I then want to do set operations on multiple bit strings (AND,
> OR, NOT, etc.). Any suggestions on how to (1) set up the bit string
> and (2) operate on 1 or more of them? Java has a BitSet class that
> keeps this kind of thing pretty clean and high-level, but I haven't
> seen anything like it for python.
The solution depends on what size of genomes you want to work with.
There is a bitvector class that probably could do what you want, there
are some issues on scaling as it is pure python.
http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html
If you want high speed stuff (implemented in C and PyRex) that works
for large scale genomic data analysis the bx-python package might do
what you need (and even things that you don't yet know that you really
want to do)
http://bx-python.trac.bx.psu.edu/
but of course this one is a lot more complicated
i.
More information about the Python-list
mailing list