[SciPy-user] Pros and Cons of Python verses other array environments
John Hunter
jdhunter at ace.bsd.uchicago.edu
Thu Sep 28 22:48:22 EDT 2006
>>>>> "Rob" == Rob Hetland <hetland at tamu.edu> writes:
Rob> All of the arguments made *for* PyLab are true -- you think
Rob> so too, or you wouldn't be reading this. I have been a huge
Rob> proponent of PyLab, and have taught seminars on it here at
Rob> Texas A&M and Woods Hole to people who primarily use MATLAB.
Rob> I have heard a number of objections or excuses that it all
Rob> looks good, but..... - it's hard to install - I already know
Rob> how to use MATLAB, and it works fine for me - when do I find
Rob> a week (or month or semester) to learn a new programing
Rob> language - I already have so many m-files that I would need
Rob> to rewrite
The first thing this thread makes me think is: why does wikipedia work
but wikis for scientific python not. If we followed Travis' lead and
aggregated the collective wisdom on this thread into the wiki page, we
would have something enduring for the masses. As it is, only geeks
like us who read mailing lists or archives will benefit from it.
Maybe this points to the problem: the primary users and developers of
scientific computing in python are sufficiently technologically
literate that they not only overcome the additional complexity, they
need it and crave it.
I was a huge matlab user for almost a decade; I tried to write a book
about matlab (see http://matplotlib.sf.net/matlab_cookbook.pdf,
unfortunately as incomplete as the mpl cookbook and other
documentation). At some point I "hit the wall" and could no longer be
productive in matlab. The extra overhead of managing complex data
structures, developing complex GUIs, and working with networked data
and databases was consuming most of my programming energy. Yes,
matlab provides you a simple, comprehensive interface, and a fairly
complete set of numerical libs, but when you want to work with complex
data in a realistic networked environment, you hit the limits of the
language and environment pretty hard. Then you rewrite what you like
about matlab in python and get on with it.
matlab is a great tool for beginners and intermediates. For experts,
it has limitations which are hard to overcome. My advice to students:
if you aspire to be an expert, bite the bullet now and build a set of
tools that can scale with you on your ascent. Also, realize that The
Mathworks is like the crack dealer on the street: the first hit is
free; once you are addicted it becomes quite expensive. An academic
license or a student version sells for under $100. If you are a
business and need the important toolkits, you are looking at 50K per
year. If you are an entrepreneurial student and dream of starting
your own business once you graduate, ask yourself what you could do
with the extra cash saved from a single site license. If your
fledgling business grows, ask yourself what you can do with the cash
saved from 50 site licenses (hint, that is 2.5 million dollars a
year). If you are ready to spend the 2.5 million dollars, fine, but
first try the following exercises in matlab and python
* download and parse a CSV file from a web server, eg
http://ichart.finance.yahoo.com/table.csv?s=INTC&d=8&e=29&f=2006&g=d&a=6&b=9&c=1986&ignore=.csv
(for a python implementation, see the matplotlib.finance module)
* fill out a web CGI form in matlab (hint: you can do it with the
embedded JVM, a virtual machine running in a virtual machine)
* query a mysql database on linux, win32, and OS X with the same
script and populate an array with the results
Now how much would you pay?
PS: it's been a while since I looked at that matlab cookbook I was
working on. I find the following sections of the matlab PDF linked
above fun in a historical light::
Alternatives to matlab
I am a devotee of open source software. I (almost exclusively) use
linux as an operating system, emacs as an integrated development
environment, python for small and large scale programming, C++ for
numerics, and so on. Matlab is the only commercial piece of software I
use regularly.
I really don't want to use it, mainly because it is so expensive. I
work in an academic environment, where site licenses go for the
incredibly cheap price of $75 per year, toolboxes included. Check
out the commercial price list to get an idea of just how expensive
it is outside of academia. I'll give you a hint. About as much as
a new Lexus sport utility vehicle.
So aside from my support for GNU and linux and open source software, I
don't want to wake up some day outside the folds of academia having to
pay for matlab. Every day I use matlab is another set of plotting and
analysis functions that I come to rely on, which makes it increasingly
hard to go cold turkey. Every once in a while I make an aborted attempt
to give it up (I know it's not good for me) but I always find
myself coming back. The main reason is the graphics -- the ease with
which I can make publication quality figures that I just haven't found
in competing, open source, free as in Richard Stallman
(http://www.gnu.org/philosophy/free-sw.html), solutions.
Free alternatives
* python -- python is the one true language. I have written
extensively in perl, C++, FORTRAN, BASIC, and yes matlab, and in
python I have found the one true language. I say that with tongue
in cheek -- there is no one true language, because the strengths
of a language often imply its weaknesses. The classic trade offs
between user friendliness and power, expressiveness and
readability, development time and execution time. python solves
all these problems for me because it is so clear syntactically,
has so many great libraries built in, and so many great external
libraries. In the final category, relevant to this discussion, is
numpy (http://www.pfdubois.com/numpy) and its recent
successor scipy (http://www.scipy.org).
These libraries provide efficient C/C++/FORTRAN libraries, all wrapped
in python, that give you a huge array of highly tested, optimized,
numerical libraries, for free. And you can read and modify the source
code at will, in large part obviating the classic problem of closed
source (matlab) libraries. That in a few years, when another platform
is dominant, your solution of today is no longer supported. With open
source, your solution is supported as long as users continue to use it
and support it. SGI was the proprietary platform of choice for high
performance graphics software 5 years ago. Today, support and
maintenance have become increasingly difficult and expensive.
And while numerous graphics packages for scipy exist, none compare to
the breadth, ease of use, generality and quality of the matlab
libraries. Yet. As a general rule, open source solutions follow
excellent close source solutions with a short time lag. Witness the
gimp, an excellent drop in replacement for Photoshop). So keep your eye
on python for standardized, excellent graphics solutions in the near
future.
If you want to split the difference, python does support an
interface to matlab called pymat
(http://claymore.engineer.gvsu.edu/~steriana/Python/pymat.html),
so you can do your number crunching in numpy, and pass the results
off to matlab for plotting, thus minimizing your dependence on
matlab until the final step of producing graphical output.
* octave (http://www.octave.org) Octave is an open source clone of
matlab. Many m-files will run in octave without changes. But
when you start to make plots, you'll hit incompatibilities.
octave uses gnuplot for plotting, and the support, particularly
for handle graphics, is limited, as is the quality of the graphics
produced.
JDH
More information about the SciPy-User
mailing list