[Chicago] Perl Follow-up

Jonathan Hayward christos.jonathan.hayward at gmail.com
Sat Mar 13 17:27:26 CET 2010


One other comment in regard to generators...

If I remember my biology, the Y chromosome is a short one, perhaps the
shortest human chromosome, and it weighed in at 600M (conceivably this could
be compressed considerably by straightforwardly using two bits rather than
one byte per nucleotide). You may have a lot of memory, but it could be
conceivable that you could want to do things with chromosome(s) that cannot
easily be loaded in memory, especially with two copies, one before the
translation, and one afterwards.

The following pseudocode is not optimized but takes a file object (opened
for reading) and, piece by piece, returns the transformed result, taking a
fixed (i.e. O(1)) amount of memory regardless of the size of the data. That
is, you should theoretically be able to buy a low-memory computer *at a
garage sale* and run code on this principle over the entire human genome
without memory constraints being an issue:

import string

def transform(filehandle, block_size=1024):
    translation = string.maketrans("TACG", "ATGC")
    while 1:
        input = filehandle.read(block_size)
        if not input:
            return
        yield input.translate(translation)

Are you familiar with generators? Basically, a generator is a function that
instead of returning once, keeps on yielding a result; an obvious use-case
is to take a problem that could be solved by using a large list (using O(n)
memory, a bit problematic if you're handling encyclopedias, chromosomes,
etc.), and allow another approach that uses an amount of memory that is
small and fixed. You don't specifically need as a specification to your
program that it store a whole chromosome in memory at once; it's just that
the most obvious solutions involve doing that, and generators allow you to
transform an arbitrarily large piece of information while using a fixed
(and, perhaps, quite low) amount of memory.

On Sat, Mar 13, 2010 at 7:46 AM, Jonathan Hayward <
christos.jonathan.hayward at gmail.com> wrote:

> EXTERIOR: DAGOBAH--DAY
>            With Yoda strapped to his back, Luke climbs up one of the
>         many thick vines that grow in the swamp until he reaches the
>         Dagobah statistics lab. Panting heavily, he continues his
>         exercises--grepping, installing new packages, logging in as
>         root, and writing replacements for two-year-old shell scripts
>         in Python.
>
>
> YODA: Code!  Yes.  A programmer's strength flows from code maintainability.
>       But beware of Perl.  Terse syntax... more than one way to do it...
>       default variables.  The dark side of code maintainability are they.
>       Easily they flow, quick to join you when code you write.  If once
>       you start down the dark path, forever will it dominate your destiny,
>       consume you it will.
>
>
> LUKE: Is Perl better than Python?
>
>
> YODA: No... no... no.  Quicker, easier, more seductive.
>
>
> LUKE: But how will I know why Python is better than Perl?
>
>
> YODA: You will know.  When your code you try to read six months from
>       now.
>
> [From rec.humor.funny, reprinted in an O'Reilly title]
>
>
> On Fri, Mar 12, 2010 at 11:45 PM, Clyde Forrester <
> clydeforrester at gmail.com> wrote:
>
>> And the results are in. I have a Perl program and a Python program, each
>> of which read a 60MB human Y chromosome, and compute the reverse complement.
>> The Perl program takes about 15 seconds, and the Python program does it in
>> about 3 seconds. The Python program also tends to be slightly more compact.
>>
>>  # revcomp.py
>>>
>>> import string
>>>
>>> chrY_line_list = []
>>> for line in open('chrY.fa','r'):
>>>  if line[0] == '>':
>>>    continue
>>>  else:
>>>    chrY_line_list.append(line.strip('\n'))
>>> chrY_string = ''.join(chrY_line_list)
>>> print len(chrY_string)
>>> print chrY_string[10000:10020]
>>>
>>> chrY_revcomp = chrY_string[::-1]
>>> trans = string.maketrans("ACGTacgt", "TGCAtgca")
>>> chrY_revcomp = chrY_revcomp.translate(trans)
>>> print len(chrY_revcomp)
>>> print chrY_revcomp[-10020:-10000]
>>>
>>
>>  # revcomp.pl
>>>
>>> use strict;
>>> use warnings;
>>>
>>> my $infile = 'chrY.fa';
>>> open (my $in,'<',$infile) or die "Can't read $infile: $!\n";
>>> chomp(my @dna = <$in>);
>>> close ($in);
>>>
>>> while(substr($dna[0],0,1) eq '>') {
>>>  shift(@dna);
>>> }
>>>
>>> my $dna = join('', at dna);
>>> my $length = length($dna);
>>> print "DNA length: $length\n";
>>> my $substr = substr($dna,10000,20);
>>> print "$substr\n";
>>>
>>> my $revcomp = reverse $dna;
>>> $revcomp =~ tr/ACGTacgt/TGCAtgca/;
>>> $length = length($revcomp);
>>> print "revcomp length: $length\n";
>>> $substr = substr($revcomp,-10020,20);
>>> print "$substr\n";
>>>
>>
>>
>> Alex Gaynor wrote:
>>
>>> On Fri, Mar 12, 2010 at 12:54 PM, Clyde Forrester
>>> <clydeforrester at gmail.com> wrote:
>>>
>>>> I raised some issues about Perl vs. Python, and I'd like to invite some
>>>> comment and advice.
>>>>
>>>> First, can anyone recommend a properly Pythonic way of doing
>>>> translations?
>>>>
>>>> One example of such translations would be complementing DNA sequences.
>>>> Translating T to A, A to T, C to G, and G to C.
>>>>
>>>>
>>>  import string
>>>>>> trans = string.maketrans("TACG", "ATGC")
>>>>>> my_dna = "agtcaagta".upper()
>>>>>> my_dna.translate(trans)
>>>>>>
>>>>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>>
>
>
>
> --
> → Jonathan Hayward, a Senior Web Developer who cares deeply about usability
> → www.linkedin.com/in/jonathanhayward • jonathan.hayward at pobox.com
> → Ajax, CGI, CMS, CSS, HTML, IA, JSON, JavaScript, LAMP, Linux, Perl, PHP,
> Python, SQL, UI, Unix, Usability, UX, XHTML, XML
> → With a good interest in the human side of computing and making software
> and websites a joy to use
>



-- 
→ Jonathan Hayward, a Senior Web Developer who cares deeply about usability
→ www.linkedin.com/in/jonathanhayward • jonathan.hayward at pobox.com
→ Ajax, CGI, CMS, CSS, HTML, IA, JSON, JavaScript, LAMP, Linux, Perl, PHP,
Python, SQL, UI, Unix, Usability, UX, XHTML, XML
→ With a good interest in the human side of computing and making software
and websites a joy to use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20100313/9ed0a97b/attachment-0001.html>


More information about the Chicago mailing list