Python vs. Perl, which is better to learn?

Wed May 8 05:14:23 EDT 2002

Peter Hansen wrote:

> "James J. Besemer" wrote:
> >
> > A lot of real-world applications require performance approximating the
> > underlying CPU speeds
>
> Disagree.  A lot of real-world applications may say that, but in fact
> they simply need some fixed but often unspecified performance and
> they might have chosen a slower CPU than they should have.  Few projects
> are based on adequate analysis of the relative costs of a bit of
> faster hardware versus a longer development cycle.

I have seen that mistake made so I agree it does happen.  However I don't believe it's
nearly universally the case as you make it out to be.

In the printer prototype example I cited we had to upgrade to increasingly faster computers
several times during the project.  Our C code (with critical sections hand coded in
assembly) simply could not keep up with the increasing custom hardware speeds.  The
algorithm is highly proprietary and I can't go into any detail but it is the most complex
transformation of bits I've ever encountered.  In the end we had to deploy multiple CPUs in
parallel to keep up.  There is absolutely no friggin' way Python could have come close to
helping solve the problem.

There were some ancillary tasks in this project which I did code in Python.  E.g.,
compressing and decompressing raw image data per several local standard encoding schemes.
It worked ok for my original small test case but for large images, Python took, like 20
minutes, vs. a C app that could do the job in 10-15 seconds.  There was no point in a
Python/C++ hybrid solution, as once you had the main compression algorithm completed there
was nothing else much to do.  And the algorithm itself was prosaic enough that there was no
significant productivity gain in using Python.  I imagined that performance would not be an
issue and I was completely wrong.

I did successfully use Python to extract some statistics from image data.  Images were
large and pixels in a given plane were packed 2 per byte.  The runtimes, although orders of
magnitude slower than if I had used C++ were marginally tolerable for this occasional
task.  However, if I hadn't been so attracted to Python, in all honesty I probably would
have done better by my client NOT dinking around with Python and simply writing the damn
thing in C (but that wouldn't have been as much FUN!)

I've done a lot of work in past lives with image processing.  This is another large class
of applications that tends to be so labor intensive that Python would certainly not excel,
if it worked out to be suitable at all.

E.g., there's a popular algorithm that takes multispectral image data (N floats per pixel)
and classifies the data points into distinct groups of similar color characteristics (M
groups in N space).  The algorithm IIRC consists of computing each points' geometric
distance from the center of mass of each existing group, then reclassifying the points into
whichever group they're closest to.  So with images several thousand pixels on a side (by N
planes deep) this is a hell of a lot of number crunching.  Then, the algorithm iterates
until no points change classification.  There's a bunch of mathematics that shows this
always converges and works well.  In Earth resource images where a swimming pool is 1
pixel, it can properly pick out the pool.  (In one case on a popular image (directly
overhead) they were temporarily puzzled once when the system said the pool was concrete
instead of water.  But they checked the records, and yes, the pool had been drained that
day.)

Another algorithm is two dimensional FFT.  You do a one dimensional FFT on each row and
column on each plane of your image.  Then you can apply various filtering and then you
reverse the FFT transform.  Very expensive and if you did it in Python you'd very soon be
reaching for C or some other 'faster' language.

These are but two example of computations unsuitable for Python out of an entire
application domain where I expect Python would have a lot of trouble.  And this isn't even
real time image processing!

[Of course, now somebody is going to point me to some Python C extension for processing
Image data....  ;o}  But my main point still holds unless the entire library is written in
Python.  ;o]

> For mass market hardware, clearly cost of hardware rules.

Exactly.  And in the commercial world, this is the case arises more frequently than you
seem willing to admit.

In my printer example, the manufacturing cost drives the entire business model.  When you
are manufacturing millions of units in a highly competitve market, every dollar saved here
and there add up, well, to millions of dollars, annually.  The one-time cost of programmer
time in contrast is absolutely INSIGNIFICANT.  They basically develop and prototype and
debug the entire application and when it's debugged and working 100% only THEN the does the
development team rewrite everything in efficient C for the version that runs in ROM.  (They
actually start earlier with the parts that are unlikely to change.)  Even though the
software development cost is measured in dozens of person years it's still pales in
comparison to a slight increase in manufacturing cost.  And this is an environment where
very sophisticated custom hardware already does a lot of the truly labor intensive work.

I don't think most people begin to fathom the complexity of modern color printers.  Fact of
the matter, the software cost is even more insignificant in the context of the overall
development cost.  There are IC designers, ink jet specialists and armies of mechanical
engineers.  Entire labs full of chemists focus on just one ink color.   Engineering labs
all over the world collaborate on a single commercial print head.  There's a huge staff of
specialists who maintain the vast knowledge base of all the 'tricks' they have to employ to
get good color pictures out what essentially are 4 bit pixels.  Thousands of engineers and
scientists collaborate for years on that $199 color printer you buy that prints amazingly
good images (at least at first ;o)...

In this case, the hardware budget for software RAM, ROM and cycles is very carefully
estimated based on many past projects.  They have a high degree of confidence that the
software will fit in the allocated space and be able to do its job with just enough reserve
to be safe.  This is another reason that the choice of hardware is far from arbitrary, as
they cannot blithely abandon their experience base, tools, etc.  The outcome of
experimentation with the prototype was that yet another layer of the most labor intensive
bit diddling would move from software into custom hardware.  The marginal manufacturing
cost of the custom hardware was ZERO, since, as it was, virtually all the electronics fit
into 2 or 3 custom chips as it was.  A frew more sq mm of silicon doesn't matter a bit.  A
slight savings in one-time software development was traded off for a big cost in one-time
HW development in order to get the overall application to fit in the existing hardware
architecture.  A marvel of modern engineering.

> For custom/one-off projects
> <warning reason="blanket statement"> much faster hardware combined with
> Python would be a far better choice than slower/cheaper hardware
> and a solution in C </warning>.

In some cases I would agree but like all generalizations it's ultimately false.

I myself HAVE found Python to be suitable for some of these applications.  E.g., we used
Python as a printer driver to test serial and USB I/O in a variety of configurations and
operations.  Of course, it was more economical to implement a number of the low level
functions in a DLL.  But Python told the DLL what to do and everything progressed to a
happy ending (and subsequent repeat business).

I myself am presently working on another application where Python will control a variety of
external devices in real time.  In this case I am confident that performance will not be a
problem as the frequency of events is sufficiently low and the real time constraints are
loose enough that signals can be propagated via TCP/IP over a LAN.

> Well, with a slow enough CPU, any language may be inadequate.  Contrariwise,
> with a fast enough CPU, Python can certainly be fast enough for the job.

(a) Certainly there are CPUs below which any language may become inadequate.  HOWEVER, an
interpretive language like Python will run out of gas an order of magnitude (or two or
three, depending on the application) sooner than, say C.

(b) Your second part may hold in SOME circumstances IFF cost is no object, which frequently
is NOT the case.

(c) I submit that there are many applications where Python would be inadequate with even
the fastest CPU possible.  And for the foreseeable future, no matter how fast CPUs become,
there always will be some interest in preferring faster languages to get the job done
faster or with less hardware.

E.g., there are many hugely cpu intensive application, such as meteorological modeling,
cracking encryption codes, atomic bomb simulations, solving complex aerodynamics and
hydrodynamics problems, modeling complex analog or digital circuit behavior, real time
image compression and decompression, floating point emulators, etc, etc..

While for argument's sake I'll stipulate that there's no reason Python could not perform
any of these tasks, there are always going to be "many" applications like this where there
are economic or other practical reasons to instead choose a language that is more runtime
efficient.

> What project have you seen where the requirements said "must run as
> fast as can possibly be achieved, at all costs, regardless of choice of
> implementation language, duration of project, or other concerns."

Nobody ever said that.

And I thought you gentle folks were supposed to eschew SARCASM.

I come from a commercial background, not academic.  In my career I have seen MANY examples
where cost or available hardware was insufficient to allow an interpretive language to be
fast enough.  I also have seen cases where the problem arguably was that inadequate
hardware was selected in the first place.  But usually by the time the the problem is known
(or finally acknowledged by upper management) it's too late to correct and software people
have to live with it anyway.  I've seen it happen TOO MANY times.  Although arguably the
wrong way to do things, this type of SNAFU still counts on my side of the argument.

The real world is not a particularly pretty place; it's far from perfect.

> I'm sure there are some close to that, but they are not widespread
> and don't justify the "there are many situations" claim.

Your experience clearly is different from mine.

> Well please, let's stick to reality-based discussions.

There you go again, with that nasty SARCASM.

Actually, this is kind of fun.  In a way I'm being REALLY sarcastic without really being
sarcastic at all.  ;o)

> Obviously there
> are places where Python simply won't fit, so discussing whether it
> is fast enough (that was the topic) in those applications is pretty
> meaningless.

My point was that IF it would fit it would have still been too slow to do the job in this
case.  This was another example where cost AND space constraints (for the computer) did not
allow an arbitrarily powerful CPU and hence Python was not 'fast enough' because it could
not perform at all.  Incidentally the example is doubly poignant since there IS a C
compiler for this particular little 8 bit chip and most of the APP was done in C (including
a 48 bit floating point math package).

But if you wish to discount this example, the same client had another example....

These probes all fed into a single PC that had to accept these thousands of readings per
second (several hundred bytes per sample) from up to 10 probes.  Additionally, it had to
transform the raw readings into meaningful data for a complex physical phenomenon (multiple
scale factors, multiple table lookups, etc. etc.).  Of course it had to do all this without
ever lagging behind and without dropping a single bit.  We were just barely able to achieve
this spec. on a 1.2G Hz PC with a modern C++ like language that shall remain nameless.  We
started out with slower machines that did not work and were forced to scramble to find
successively faster machines until the HW+SW could keep up.  We worked some on the software
but savings were marginal and comparatively costly.  Here again is yet another application
where I submit Python, at 1-2 orders of magnitude slower, simply would not have been able
to keep up.  Maybe when there are 5 and 10 G Hz PCs readily available, but not today.  As
it was, the more expensive PC (which was part of the deliverable) ate noticeably into the
$5K product price vs. the 600 MHz one we originally targeted.  For the record, I came late
to the project to help clean up the mess; I was not involved when the original specs and HW
assumptions were casually considered.

I'm certainly not complaining or damning Python.  It's simply a reality that many apps push
the envelope into that region where the difference between running approximately at machine
speed vs. several times slower makes a difference either in economics or in the ability
even to accomplish the task.

> I have, however, used Python in a number of other areas including:
>
>  - web applications (Zope, with an Intranet running on basic hardware serving
>    a company of 100 people), where response time is important

>  - automated factory test (controlling RF equipment via GPIB, serial, and
>    CAN interfaces and providing a GUI and an output stream in XML), where
>    total test time is critical to factory production rates
>
>  - embedded control and monitoring (running on 386 and 486 chips on PC104
>    boards with from 1MB to 32MB RAM and flash memory, monitoring and
>    controlling dozens of external devices via a triple CAN interface),
>    where scan rates are important (soft realtime) and bandwidth matters
>
>  - basic sys admin utilities (wow! so I do have some experience there...
>    go figure), where generally speed is quite unimportant
>
>  - probably a couple of other areas I've forgotten.

Impressive resume.  Seriously.

I would submit that these are all "less time critical applications," for which I stipulated
up front that Python can work just fine.  Both of your "real time" examples seem to be
"soft real time" not balls to the wall stuff.  In fact, a lot of so called "real time"
applications may have critical timing requirements but they do not require a lot of CPU
time.  A significant portion of the time CPU is waiting for the right moment to trigger or
measure a time critical phenomenon.  All of which is meet and proper.  If you don't do it
this way you get into trouble.  But sometimes there is no choice.

Anyway, I am not and never was disagreeing that your personal experience never included any
examples.  I certainly don't mean to criticize your work as you clearly have delivered some
substantial product.  I was only saying your experience must be "limited" if it does not
include examples where Python isn't fast enough.

> Perhaps this /is/ limited experience from which to draw conclusions...

Well I would respectfully and humbly suggest that your experience is "limited" BY
DEFINITION if in your 2 years with Python you never encountered any apps where Python would
be too slow.  Meanwhile in my 3 years I encountered "many" and furthermore can readily
remember/predict quite a few others besides.

> Ohh... you must have misread my comments.  Please go back to the top
> and find where I said that I objected to the claim there were "many
> situations where Python is too slow", not what you say I said, which
> is apparently more like "Python is never too slow".

> Just because you THINK
> I said something doesn't mean I ACTUALLY said it.

First off, here you misinterpreted MY comment.  I was offering the well intentioned generic
advice that

IF your principal argument is of the form

    It never happened to me (or I never saw it) ergo it cannot not possibly be true

THEN you're probably on the wrong side of the argument.

Its a fundamental logical error (and arrogant) to extrapolate your own private reality to
the rest of the universe.

Second off, I did not misrepresented your words in any material way.

Perhaps I did misunderstand you when you said,

>> PH:  I have not yet found python to be 'too slow'.  I believe this is a common myth or
>> nasty rumor, without strong basis in reality.
>

However, if I did, the misunderstanding does not appear to be my fault.

>From your very words one could only reasonably conclude that "this" in the second sentence
meant "python to be too slow".  (Else you'd merely be contradicting your first sentence.)
So you clearly appear to be saying (I take the bold liberty to paraphrase) that you believe
python being too slow is a common MYTH or nasty RUMOR.  Myth = traditional story or belief
not founded in fact.  Rumor = information spread by word of mouth but not known to be
true.  And we're not merely talking rumor, we're talking "nasty rumor" which further
suggests it's completely scandalous and untrue.  Python too slow is a "MYTH."  The strength
of your statement DID seem to state very clearly that python being too slow practically
NEVER happens.

If you are saying that you MEANT to say something else, it is perfectly reasonable, because
what you said seems to be to be demonstrably untrue.  But you can't deny your own words or
accuse me of distorting what you said.

In any case, to be fair, I took the overall meaning of your statement to mean that you
merely object to "many" and that is the only argument I have presented all along.  You are
saying "not many" i.e., "hardly any", not "never" but "almost never."  I am broadening your
horizon by informing you that, based on MY humble experience, yes, there are in fact MANY
examples of applications where Python would be too slow.

> I just haven't encountered any myself (as I said)

> and I object to the
> statement that it will /often/ be the case that it is found to be too slow.

> I believe quite the opposite, if it wasn't clear: it will /rarely/
> be the case that Python is found to be too slow.

(a) Nobody ever questioned or criticized your experience.  I only suggested that it was not
representative of all applications.

(b) Neither Patrick W nor I ever said or implied that Python "often" would be too slow.

(c) I don't think I misunderstood or misrepresented your position.  You're saying the cases
are insignificant in number and I respectfully am trying to point out your error.  I don't
see how you can disagree.

(d) I am arguing "many" which means a significant number of cases but not necessarily even
the majority of cases.  My position is consistent with your backing off to "most of the
time it's NOT too slow but, yes, 'many' times it can be."  Perhaps that's a fair place to
leave off on this argument?

> 'Nuff said?

Only if you concede all the points I make in my reply, above [as any logical and reasonable
person would do ;o]...

Regards

--jb

--
James J. Besemer  503-280-0838 voice
http://cascade-sys.com  503-280-0375 fax
mailto:jb at cascade-sys.com