One quick question: Why does the MA module have an average function,
but not Numeric? And what is the equivalent in numarray?
Magnus Lie Hetland The Anygui Project
Please excuse me for dropping a feature request here
as I'm new to the list and don't have the 'feel' of
this list yet.
Should feature requests be submitted to the bug tracker?
Anyways, I installed Numarray on a SuSE/Linux box,
following the Numarray PDF manual's directions.
Having installed Python packages (like, ehm, Numeric)
before, here are a few impressions:
1. When running 'python setup.py' and 'python setup.py --help'
I was surprised to see that already source generation
Using EXTRA_COMPILE_ARGS = 
generating new version of Src/_convmodule.c
generating new version of Src/_ufuncComplex64module.c
Normally, you would expect that at build/install time.
2. Because I'm running two versions of Python (because Zope
and a lot of Zope/C products depend on a particular version)
the 'development' Python is installed in /usr/local/bin
(whereas SuSE's python is in /usr/bin).
It probably wouldn't do any harm if the manual would include
a hint at the '--prefix' option and mention an alternative
Python installation like:
/usr/local/bin/python ./setup.py install --prefix=/usr/local
3. After installation, I usually test the success of a library's
import by looking at version info (especially with multiple
installations, see ). However, numarray does not seem to
have version info? :
Python 2.2.1 (#1, Jun 25 2002, 20:45:02)
[GCC 2.95.3 20010315 (SuSE)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
'2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]'
(2, 2, 1, 'final', 0)
>>> import Numeric
>>> import numarray
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute '__version__'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'version'
The __doc__ string:
'numarray: The big enchilada numeric module\n\n
$Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n'
does not seem to give a hint at the version (i.c. 0.3.4), either.
Well, enough nitpicking for now I guess.
Thanks to the Numarray developers for this project,
it's much appreciated.
Eric Maryniak <e.maryniak(a)pobox.com>
WWW homepage: http://pobox.com/~e.maryniak/
Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL.
An error in the premise will appear in the conclusion.
In the "Copy on demand" discussion, the differences between ravel and flat
were discussed with regards to contiguous/non-contiguous arrays. I want to
experiment, but after looking/researching I can't figure it out: How is a
non-contiguous array created?
Space Data Corporation
Impressions so far on various issues raised regarding numarray
1) We are mostly persuaded that rank-0 arrays are the way to go.
We will pursue the issue of whether it is possible to have
Python accept these as indices for sequence objects with python-dev.
2) We are still mulling over the axis order issue. Regardless of
which convention we choose, we are almost certainly going to
make it consistent (always the same axis as default). A compatibility
module will be provided to replicate Numeric defaults.
3) repr. Finally, a consensus! Even unanimity.
4) Complex comparisons. Implement equality, non-equality, predictable
sorting. Make >,<,>=,<= illegal.
5) Copy vs view. Open to more input (but no delayed copying or such).
Thanks for the input on k-means clustering, but the main questionw as
actully this... If I have the following:
for i in xrange(k):
w[i] = average(compress(C == i, V, 0))
... can that be expressed without the Python for loop? (I.e. without
using compress etc.) I want w[i] to be the average of the vectors in
V[x] for which C[x] == i...
Magnus Lie Hetland The Anygui Project
Python for Scientific Computing Workshop
CalTech, Pasadena, CA
Septemer 5-6, 2002
This workshop provides a unique opportunity to learn and affect what is
happening in the realm of scientific computing with Python. Attendees will
have the opportunity to review the available tools and how they apply to
specific problems. By providing a forum for developers to share their Python
expertise with the wider industrial, academic, and research communities,
this workshop will foster collaboration and facilitate the sharing of
software components, techniques and a vision for high level language use in
The two-day workshop will be a mix of invited talks and training sessions in
the morning. The afternoons will be breakout sessions with the intent of
getting standardization of tools and interfaces.
The cost of the workshop is $50.00 and includes 2 breakfasts and 2 lunches
on Sept. 5th and 6th, one dinner on Sept. 5th, and snacks during breaks.
There is a limit of 50 attendees. Should we exceed the limit of 50
registrants, the 50 persons selected to attend will be invited individually
by the organizers.
Discussion about the conference may be directed to the SciPy-user mailing
The National Biomedical Computation Resource (NBCR, SDSC, San Diego, CA)
The mission of the National Biomedical Computation Resource at the San Diego
Supercomputer Center is to conduct, catalyze, and enable biomedical research
by harnessing advanced computational technology.
The Center for Advanced Computing Research (CACR, CalTech, Pasadena, CA)
CACR is dedicated to the pursuit of excellence in the field of
high-performance computing, communication, and data engineering. Major
activities include carrying out large-scale scientific and engineering
applications on parallel supercomputers and coordinating collaborative
research projects on high-speed network technologies, distributed computing
and database methodologies, and related topics. Our goal is to help further
the state of the art in scientific computing.
Enthought, Inc. (Austin, TX)
Enthought, Inc. provides business and scientific computing solutions through
software development, consulting and training. Enthought also fosters the
development of SciPy (http://scipy.org), an open source library of
scientific tools for Python.
> Wouldn't an (almost) automatic solution be to simply replace (almost) all
> instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual
That would convert all slicing operations, even those working on
strings, lists, and user-defined sequence-type objects.
> cases (like if you heavily mix arrays and lists) you could still
I do, and I don't consider it that unusual. Anyway, even if some
function gets called only with array arguments, I don't see how a code
analyzer could detect that. So it would be...
> autoconvert by inserting ``if type(foo) == ArrayType:...``, although
typechecks for every slicing or indexing operation (a generates a
view as well for a multidimensional array). Guaranteed to render most
code unreadable, and of course slow down execution.
A further challenge for your code convertor:
f(a, b[2:3], c[-1, 1])
That makes eight type combination cases.
> Well, AFAIK there are actually three mutable sequence types in
> python core and all have copy-slicing behavior: list, UserList and
UserList is not an independent type, it is merely a subclassable
wrapper around lists. As for the array module, I haven't seen any code
that uses it.
> I would suppose that in the grand scheme of things numarray.array is intended
> as an eventual replacement for array.array, or not?
In the interest of those who rely on the current array module, I hope not.
> much "lets make it really good (where good is what *I* say) then loads of
> people will adopt it", it was more: "Numeric has a good chance to grow
> considerably in popularity over the next years, so it will be much easier to
> fix things now than later" (for slicing behavior, now is likely to be the last
I agree - except that I think it is already too late.
> The fact that matlab users are used to copy-on-demand and the fact that many
> people, (including you if I understand you correctly) think that copy-slicing
> semantics as such (without backward compatibility concerns) are preferable,
Yes, assuming that views are somehow available. But my preference is
not so strong that I consider it a sufficient reason to break lots of
code. View semantics is not a catastrophe. All of us continue to use
NumPy in spite of it, and I suspect none of use loses any sleep over
it. I have spent perhaps a few hours in total (over six years of using
NumPy) to track down view-related bugs, which makes it a minor problem
on my personal scale.
> I don't think matlab or similar alternatives make legally binding promises
> about backwards compatibility, or do they? It guess it is actually more
Of course not, software providers for the mass market take great care
not to promise anything. But if Matlab did anything as drastic as what
we are discussing, they would loose lots of paying customers.
> But reliability to me also includes the ability for growth -- I not only want
> my old code to work in a couple of years, I also want the tool I wrote it in
> to remain competitive and this can conflict with backwards-compatibility. I
In what way does the current slicing behaviour render your code
> like the balance python strikes here so far -- the language has
Me too. But there haven't been any incompatible changes in the
documented core language, and only very few in the standard library
(the to-be-abandoned re module comes to mind - anything else?).
For a bad example, see the Python XML package(s). Lots of changes,
incompatibilities between parsers, etc. The one decision I really
regret is to have chosen an XML-based solution for documentation. Now
I spend two days at every new release of my stuff to adapt the XML
code to the fashion of the day.
It is almost ironic that I appear here as the great anti-change
advocate, since in many other occasions I have argued for improvement
over excessive compatiblity. Basically I favour motivated incompatible
changes, but under the condition that updating of existing code is
manageable. Changing the semantics of a type is about the worst I can
imagine in this respect.
Konrad Hinsen | E-Mail: hinsen(a)cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-22.214.171.124.24
Rue Charles Sadron | Fax: +33-126.96.36.199.17
45071 Orleans Cedex 2 | Deutsch/Esperanto/English/
France | Nederlands/Francais
> Using argmin it should be relatively easy to assign each vector to the
> cluster with the closest representative (using sum((x-y)**2) as the
> distance measure), but how do I calculate the new representatives
> effectively? (The representative of a cluster, e.g., 10, should be the
> average of all vectors currently assigned to that cluster.) I could
> always use a loop and then compress() the data based on cluster
> number, but I'm looking for a way of calculating all the averages
> "simultaneously", to avoid using a Python loop... I'm sure there's a
> simple solution -- I just haven't been able to think of it yet. Any
Maybe this helps (old code, may contain some suboptimal or otherwise
from Numeric import *
from RandomArray import randint
return add.outer(sum(X*X,-1),sum(Y*Y,-1))- 2*dot(X,transpose(Y))
epsilon=0.001, debug=0, minit=20):
"""Computes kmeans for DATA with M centers until convergence
in the sense that relative change of the quantization error is less than
the optional RCONV (3rd param). WEGSTEIN (2nd param), by default .2 but always
between 0 and 1, stabilizes the convergence process.
EPSILON is used to quarantee centers are initially all different.
DEBUG causes some intermediate output to appear to stderr.
Returns centers and the average (squared) quantization error.
# Selecting the initial centers has to be done carefully.
# We have to ensure all of them are different, otherwise the
# algorithm below will produce empty classes.
sys.stderr.write("kmeans: Picking centers.\n")
# Pick one data item randomly
if d>epsilon: centers.append(candidate)
# Not like this, you get doubles: centers=take(data,randint(0,N,(M,)))
# Squared distances from data to centers (all pairs)
# Matrix telling which data item is closest to which center
# Compute new centers
centers=( ( wegstein)*(dot(transpose(x),data)/sum(x)[...,NewAxis])
# Quantization error
sys.stderr.write("%f %f %i\n" %(qerror,old_qerror,counter))
sys.stderr.write("%f None %i\n" %(qerror,counter))
return centers, qerror
I've been looking for an implementation of k-means clustering in
Python, and haven't really found anything I could use... I believe
there is one in SciPy, but I'd rather keep the required number of
packages as low as possible (already using Numeric/numarray), and
Orange seems a bit hard to install in UNIX... So, I've fiddled with
using Numeric/numarray for the purpose. Has anyone else done something
like this (or some other clustering algorithm for that matter)?
The approach I've been using (but am not completely finished with) is
to use a two-dimensional multiarray for the data (i.e. a "set" of
vectors) and a one-dimensional array with a cluster assignment for
each vector. E.g.
array([1, 2, 3, 4, 5])
array([1, 2, 4, 5, 4])
Here reps is the representative of the cluster.
Using argmin it should be relatively easy to assign each vector to the
cluster with the closest representative (using sum((x-y)**2) as the
distance measure), but how do I calculate the new representatives
effectively? (The representative of a cluster, e.g., 10, should be the
average of all vectors currently assigned to that cluster.) I could
always use a loop and then compress() the data based on cluster
number, but I'm looking for a way of calculating all the averages
"simultaneously", to avoid using a Python loop... I'm sure there's a
simple solution -- I just haven't been able to think of it yet. Any
Magnus Lie Hetland The Anygui Project
[I thought I replied yesterday, but somehow that apparently vanished.]
<Konrad Hinsen writes>:
> "Perry Greenfield" <perry(a)stsci.edu> writes:
> > Numarray has different coercion rules so that this doesn't
> > happen. Thus one doesn't need c[1,1] to give a rank-0 array.
> What are those coercion rules?
For binary operations between a Python scalar and array, there is
no coercion performed on the array type if the scalar is of the
same kind as the array (but not same size or precision). For example
(assuming ints happen to be 32 bit in this case)
Python Int (Int32) * Int16 array --> Int16 array
Python Float (Float64) * Float32 array --> Float32 array.
But if the Python scalar is of a higher kind, e.g., Python float
scalar with Int array, then the array is coerced to the corresponding
type of the Python scalar.
Python Float (Float64) * Int16 array --> Float64 array.
Python Complex (Complex64) * Float32 array --> Complex64 array.
Numarray basically has the same coercion rules as Numeric when two
arrays are involved (there are some extra twists such as:
UInt16 array * Int16 array --> Int32 array
since neither input type is a proper subset of the other. (But since
Numeric doesn't (or didn't until Travis changed that) have unsigned
types, that wouldn't have been an issue with Numeric.)
> > (if that isn't too hard to implement). Of course you get into
> > backward compatibility issues. But really, to get it right, some
> > incompatibility is necessary if you want to eliminate this particular
> > wart.
> For a big change such as Numarray, I'd accept some incompatibilities.
> For just a new version of NumPy, no. There is a lot of code out there
> that uses NumPy, and I am sure that a good part of it relies on the
> current coercion rules. Moreover, there is no simple way to detect
> code that depends on coercion rules, so adapting existing code would
> be an enormous amount of work.
Certainly. I didn't mean to minimize that. But the current coercion
rules have produced a demand for solutions to the problem of upcasting,
and I consider those solutions to be less than ideal (savespace and
rank-0 arrays). If people really are troubled by these warts, I'm
arguing that the real solution is in changing the coercion behavior.
(Yes, it would be easiest to deal with if Python had all these types,
but I think that will never happen, nor should it happen.)