[Tutor] serial to parallel

Mon Nov 5 14:44:17 CET 2012

On 11/05/2012 06:53 AM, Bala subramanian wrote:
> Friends,
> In the previous mail there was an "mistake" i was not aware of. So pls
> dnt get upset.
>
> For frame in trajectory-A:
>>         cunt= str(frame.time)
> It is count =str(frame.time). A counter to find frame number.
>
> Thanks joel for letting me to know it.
>
> Bala
>
> On Mon, Nov 5, 2012 at 11:46 AM, Bala subramanian
> <bala.biophysics at gmail.com> wrote:
>> Friends,
>> I use a python package to analyse molecular trajectories. For those
>> not familiar, I have described the the problem below.
>> I have two trajectories A,B. Each trajectory has a collection of
>> frames. A frame is a numpy array.
>> For frame in trajectory-A:
>>         cunt= str(frame.time)
>>         function(trajectoryB, frame, outfile=cunt+'.txt')
>> process all .txt files
>>
>> The function is described in the package that I use. It also has a
>> built-in counter for each frame.
>> I want to convert this to a parallel code in the following way. Each
>> processor can take one frame from trajectory-A and applies the
>> function and write the corresponding output file.
>> This is the first time I am trying such parallelism. I would
>> appreciate your guidance on how I can do it. The original code is
>> pasted below.
>> -----------------------
>> #!/usr/bin/env python
>> import MDAnalysis
>> from MDAnalysis.analysis.align import rmsd,fasta2select, rms_fit_trj
>> import argparse
>> import numpy as np
>>
>> parser = argparse.ArgumentParser(description=info)
>> # a series of  parser.add_argument definitions
>>
>> U1=MDAnalysis.Universe(args.rtop,args.rtrj)   # open  trajectory-A
>> U2=MDAnalysis.Universe(args.ttop,args.ttrj)   # open   trajectory-B
>>
>>
>> for fr in U1.trajectory:
>>         nd='%0*d' % ( 5,fr.frame)
>>         rms_fit_trj(U2,U1.selectAtoms('all'),rmsdfile=str(nd) + '.rmsd')
>>
>> Thanks in advance,
>> Bala
>
>

Before you spend too much energy on this, I'd suggest that you'll
probably see a substantial slowdown trying to write the two files in
parallel.  Unless the calculations are extensive that actually format
the data for writing.

On the other hand, if the calculations dominate the problem, then you
probably want to do multiprocessing to get them to happen in parallel. 
See the recent thread "using multiprocessing efficiently to process
large data file"

Just be sure and do some measuring before spending substantial energy
optimizing.

-- 

DaveA