"Python for Bioinformatics" available and in stock

Tue Oct 20 22:58:38 EDT 2009

On Mon, Oct 19, 2009 at 5:43 AM, Bearophile <bearophileHUGS at lycos.com> wrote:
> A more pythonic code is:
...
> Note the use of xrange and names_with_underscores. In Python names are
> usually lower case and their parts are separated by underscores.

Regarding underscore (and code notation in general) I wrote in the
book (page 6):

Some code in the book will not follow accepted coding styles for the
following reasons:

* There are some instances where the most didactic way to show a
particular piece of code conflicts with the style guide. On those few
occasions, I choose to deviate from the style guide in favor of
clarity.
* Due to size limitation in a printed book, some names were shortened
and other minor drifts from the coding styles have been introduced.
* To show there are more than one way to write the same code. Coding
style is a guideline, so some programmers don't follow them. You
should be able to read ``bad'' code, since sooner or later you will
have to read other people's code.

> >From #6:
....
> If you want to limit the space in the book the you can pack those
> lines in a single line, but it's better to keep the underscores.

#5 to #9 are very introductory programs that are introduced in order
to show standard flow control structures (if-elif-else-for-while). I
think that at this level, packing several lines into one is not the
best option for learning.

> >From #18:
> prop = 100.*cp/len(AAseq)
> return (charge,prop)
> ==>
> prop = 100.0 * cp / len(aa_seq)
> return (charge, prop)

> Adding spaces between operators and after a comma, and a zero after
> the point improves readability.

Yes, you are right.

> >From #35:
> import re
> pattern = "[LIVM]{2}.RL[DE].{4}RLE"
> ...
> rgx = re.compile(pattern)
> When the pattern gets more complex it's better to show readers to use
> a re.VERBOSE pattern, to split it on more lines, indent those lines as
> a program, and add #comments to those lines.

This is a very nice suggestion. I will consider for next edition, but
the book is about 600 pages now, so I have to consider very carefully
about adding new material.

> The #51 is missing.

Thank you, it is corrected now. It was an HTML file instead of a .py
file so the script I use didn't notice the original file.

> I like Python and I think Python is fit for bioinformatics purposes,
> but 3/4 of the purposes of a book like this are to teach
> bioinformatics first and computer science and Python second. And

This book does not teach bioinformatics, let me copy the "Who Should
Read This Book" section:

"This book is for the life science researcher who wants to learn how
to program. He may have previous exposure to computer programming, but
this is not necessary to understand this book (although it surely
helps).

This book is designed to be useful to several separate but related
audiences, students, graduates, postdocs, and staff scientists, since
all of them can benefit from knowing how to program.

Exposing students to programming at early stages in their career helps
to boost their creativity and logical thinking, and both skills can be
applied in research. In order to ease the learning process for
students, all subjects are introduced with the minimal prerequisites.
There are also questions at the end of each chapter. They can be used
for self-assessing how much you've learnt. The answers are available
to teachers in a separate guide.

Graduates and staff scientists having actual programming needs should
find its several real world examples and abundant reference material
extremely valuable.

What You Should Already Know

Since this book is called \emph{Python for Bioinformatics} it has been
written with the following assumptions in mind:

\begin{itemize}
\item The reader should know how to use a computer. No programming
knowledge is assumed, but the reader is required to have minimum
computer proficiency to be able to use a text editor and handle basic
tasks in your operating system (OS). Since Python is multi-platform,
most instructions from this book will apply to the most common
operating systems (Windows, Mac OSX and Linux); when there is a
command or a procedure that applies only to a specific OS, it will be
clearly noted.

\item The reader should be working (or at least planning to work) with
bioinformatics tools. Even low scale hand made jobs, such as using the
NCBI BLAST to ID a sequence, aligning proteins, primer searching, or
estimating a phylogenetic tree will be useful to follow the examples.
The more familiar the reader is with bioinformatics the better he will
be able to apply the concepts learned in this book.
\end{itemize}

> sometimes a dynamic language isn't fast enough for bioinformatics
> purposes, so a book about this topic probably has to contain some

Thats depend on what bioinformatic application are you working on. I
think that 3D molecular modeling is a field suitable for a low level
language like C or Fortran, but most bioinformatic applications like
sequence annotation, primer design, sequence processing and curating
biological databases are handled fine with a scripting language like
Python or Perl (btw, Perl is still the most used language in
bioinformatics)

If you are looking for a introduction to bioinformatics book, I don't
think this is a suitable book. But if you want to learn Python for
using in Bioinformatics, you should give it a try.

Best,
SB.