From charlesr.harris at gmail.com  Thu Oct  1 02:16:13 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 1 Oct 2009 00:16:13 -0600
Subject: [Numpy-discussion] More guestions on Chebyshev class.
Message-ID: <e06186140909302316v7dd6a741ubc28d98977d00b03@mail.gmail.com>

The Chebyshev class is now working pretty well, but I would like to settle
some things up front.

1) Order in which coefficients are stored/passed/accessed.

The current poly1d class ctor is called with the coefficients in high to low
order, yet the __getitem__ and __setitem__ methods access them in reverse
order. This seems confusing and I think both should go in the same order and
my preference would be from low to high. The low to high order also works a
bit better for implementation.

2) poly1d allows the size of the coefficient array to be dynamically
extended. I have mixed feeling about that and would prefer not, but there
are arguments for that: students might find it easier to fool with.

3) The poly1d class prunes leading (high power) zeros. Because the Cheb
class has a fit static method that returns a Cheb object, and because when
fitting with Chebyshev polynomials the user often wants to see *all* of the
coefficients, even if some of the leading ones are zero, the Cheb class does
not automatically prune the zeros, instead there are methods for that.

4) All the attributes of the Cheb class are read/write. The poly1d class
attempts to hide some, but the method used breaks the copy module. Python
really doesn't have private attributes, so I left all the attributes exposed
with the usual Python proviso: if you don't know what it does, don't fool
with it.

5) Is Cheb the proper name for the class?

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/9aaba4a4/attachment.html>

From charlesr.harris at gmail.com  Thu Oct  1 02:23:45 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 1 Oct 2009 00:23:45 -0600
Subject: [Numpy-discussion] repr and object arrays
In-Reply-To: <3d375d730909301952u35fd0475pab12f21ca4baf720@mail.gmail.com>
References: <e06186140909301945h18588239mf1f3c16b8fd19665@mail.gmail.com>
	<3d375d730909301952u35fd0475pab12f21ca4baf720@mail.gmail.com>
Message-ID: <e06186140909302323s1a21f505q3343a54c207a93fc@mail.gmail.com>

On Wed, Sep 30, 2009 at 8:52 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Sep 30, 2009 at 21:45, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi All,
> >
> > It seems that repr applied do an object array does not provide the info
> > needed to recreate it:
> >
> > In [22]: y = array([Decimal(1)]*2)
> >
> > In [23]: repr(y)
> > Out[23]: 'array([1, 1], dtype=object)'
> >
> > And of course, there is going to be a problem with arrays of more than
> one
> > dimension anyway. But I wonder if this should be fixed?
>
> Using repr() instead of str() for the items would probably be wise.
>
>
OK, I'll open a ticket for it.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/0c7e5ffe/attachment.html>

From pav+sp at iki.fi  Thu Oct  1 02:55:17 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Thu, 1 Oct 2009 06:55:17 +0000 (UTC)
Subject: [Numpy-discussion] ufunc and errors
References: <4AC36C79.1030202@student.matnat.uio.no>
	<3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com>
Message-ID: <ha1jol$krh$1@ger.gmane.org>

Wed, 30 Sep 2009 10:33:46 -0500, Robert Kern wrote:
[clip]
>> Also, will the arguments always be named x1, x2, x3, ..., or can I
>> somehow give them custom names?
> 
> The only place where names appear is in the docstring. Write whatever
> text you like.

The first line of the docstring is generated by Numpy and cannot be 
modified.

-- 
Pauli Virtanen


From nadavh at visionsense.com  Thu Oct  1 06:46:13 2009
From: nadavh at visionsense.com (Nadav Horesh)
Date: Thu, 1 Oct 2009 12:46:13 +0200
Subject: [Numpy-discussion] Weird behaviour of scipy.signal.sepfir2d
Message-ID: <710F2847B0018641891D9A21602763605AD196@ex3.envision.co.il>


This function function often result in incorrect output when the cpu is very loaded. I do not know how to trace the bug since every "single shot" use, or step by step trace gives the correct answer, also when running scripts under "ipython -pdb" solves the problem.

System:

>>> numpy.__version__
'1.4.0.dev7400'

>>> scipy.__version__
'0.8.0.dev5922'

Python 2.6.2 (64 bits)

numpy/scipy are built with atlas support.

OS: Gentoo linux (I use two independent (not a clone of each other) machines).

Sorry for posting it on the numpy list, but anyway this is not the first cross-lists post.

   Nadav


From ralf.gommers at googlemail.com  Thu Oct  1 12:19:01 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Thu, 1 Oct 2009 12:19:01 -0400
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
Message-ID: <dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>

On Sun, Sep 20, 2009 at 8:59 PM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Sun, Sep 20, 2009 at 3:49 PM, Ralf Gommers <ralf.gommers at googlemail.com
> > wrote:
>
>> Hi,
>>
>> I'm done reviewing all the improved docstrings for NumPy, they can be
>> merged now from the doc editor Patch page. Maybe I'll get around to doing
>> the SciPy ones as well this week, but I can't promise that.
>>
>> Actually, scipy was a lot less work. Please merge that too.
>
>
Sorry to ask again, but it would really be very useful to get those
docstrings merged for both scipy and numpy.

The scipy docs merge cleanly, anyone with commit access can do it, like
this:
1. Go to http://docs.scipy.org/scipy/patch/ and log in.
2. click on "Select OK to apply"
3. click on "Generate patch"
4. select all the text in the browser and save as a patch.
5. apply patch, commit

For numpy in principle the same procedure, except there are some objects
that need the add_newdocs treatment. There are two types of errors, my
question is (mainly to Pauli) if they both need the same treatment or a
different one. Errors:
1. source location not known, like:

ERROR: numpy.broadcast.next: source location for docstring is not known

2. source location known but failed to find a place to add docstrings, like:

ERROR: Source location for numpy.lib.function_base.iterable known, but
failed to find a place for the docstring


Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/cf64a019/attachment.html>

From mdroe at stsci.edu  Thu Oct  1 12:26:29 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Thu, 01 Oct 2009 12:26:29 -0400
Subject: [Numpy-discussion] [SciPy-dev] Deprecate chararray [was Plea
 for help]
In-Reply-To: <45d1ab480909301433q5ff65872o1f2edc12047d0425@mail.gmail.com>
References: <857977.74958.qm@web52106.mail.re2.yahoo.com>	<4ABCC0A1.30402@stsci.edu>	<45d1ab480909250923t2dd716a2s502967fb231dd889@mail.gmail.com>	<4AC24A0C.4090005@stsci.edu>	<45d1ab480909291248j107047fld5da82eb607d5614@mail.gmail.com>	<4AC3571F.6020901@stsci.edu>	<dde7764a0909300626y406ec1b2r63b502f0f10d269b@mail.gmail.com>	<4AC36924.7020605@stsci.edu>	<45d1ab480909301337s3fef8564uce77ebbbba8978da@mail.gmail.com>	<dde7764a0909301409h7ce54db5sc378ad60819dc8b0@mail.gmail.com>
	<45d1ab480909301433q5ff65872o1f2edc12047d0425@mail.gmail.com>
Message-ID: <4AC4D835.4050303@stsci.edu>

Thanks.  There's a bit of a snag getting my SVN permissions, and the doc 
editor permissions I assume are pending.  Once those things are in 
place, I can move forward on this and hopefully it will be clear what 
needs to be done.

Mike

David Goldsmith wrote:
> On Wed, Sep 30, 2009 at 2:09 PM, Ralf Gommers 
> <ralf.gommers at googlemail.com <mailto:ralf.gommers at googlemail.com>> wrote:
>
>
>     On Wed, Sep 30, 2009 at 4:37 PM, David Goldsmith
>     <d.l.goldsmith at gmail.com <mailto:d.l.goldsmith at gmail.com>> wrote:
>
>         So, Ralf (or anyone), how, if at all, should we modify the
>         status of the existing chararray objects/methods in the wiki?
>
>
>     Nothing has to be done until *after* Mike has committed his
>     changes to svn. Please see my previous email for what has to
>     happen at that point. Since Mike wrote the new docstrings it would
>     be best if he updated the status of the wiki pages then.
>
>
> OK; Mike: hopefully it will be clear what you have to do to update the 
> status (it's pretty trivial) but of course don't hesitate to email 
> (you can do so off-list if you prefer) w/ any questions; 
> unfortunately, AFAIK, there's no way to update the status of many 
> docstrings all at once - you'll have to do them each individually (if 
> you like, let me know when you've committed them and I can help - it 
> sounds like there will be a lot); the main "silly" thing to remember 
> is that the option to change the "Review status" only appears if 
> you're logged in. :-)
>  
>
>           Assuming you have no problem sharing them with me, Michael,
>         I could add those docstrings you created for the existing methods,
>
>
>     They will show up in the wiki when they get committed to svn
>     (presumably within a few days), so this is needless effort for the
>     most part. If there are different changes in the wiki and svn,
>     that will show up in the "merge" page.
>
>     The ony thing that requires manual effort is if there are changes
>     in the wiki and the object got moved in svn.
>
>
> And, as above, updating the status in the Wiki. :-)
>
> DG
>  
>
>     Cheers,
>     Ralf
>
>
>
>         DG
>
>
>         On Wed, Sep 30, 2009 at 7:20 AM, Michael Droettboom
>         <mdroe at stsci.edu <mailto:mdroe at stsci.edu>> wrote:
>
>             Ralf Gommers wrote:
>             >
>             >
>             > On Wed, Sep 30, 2009 at 9:03 AM, Michael Droettboom
>             <mdroe at stsci.edu <mailto:mdroe at stsci.edu>
>             > <mailto:mdroe at stsci.edu <mailto:mdroe at stsci.edu>>> wrote:
>             >
>             >     In the source in my working copy.  Is that going to
>             cause problems?  I
>             >     wasn't sure if it was possible to document methods
>             that didn't yet
>             >     exist
>             >     in the code in the wiki.
>             >
>             > That is fine. New functions will automatically show up
>             in the wiki. It
>             > would be helpful though if you could mark them ready for
>             review in the
>             > wiki (if they are) after they show up. Could take up to
>             24 hours for
>             > svn changes to propagate.
>             Thanks.  Will do.
>             >
>             > Only if you moved functions around it would be useful if
>             you pinged
>             > Pauli after you committed them. This is a temporary
>             problem, right now
>             > the wiki creates a new page for a moved object, and the
>             old content
>             > (if any) has to be copied over to the new page.
>             All of the functions that were moved were previously
>             without docstrings
>             in SVN, though some had docstrings (that I just now
>             discovered) in the
>             wiki.  This may cause some hiccups, I suppose, so I'll be
>             sure to
>             announce when these things get committed to SVN so I know
>             how to help
>             straighten these things out.
>
>             Mike
>             >
>             > Cheers,
>             > Ralf
>             >
>             >
>             >     Mike
>             >
>             >     David Goldsmith wrote:
>             >     > On Tue, Sep 29, 2009 at 10:55 AM, Michael Droettboom
>             >     <mdroe at stsci.edu <mailto:mdroe at stsci.edu>
>             <mailto:mdroe at stsci.edu <mailto:mdroe at stsci.edu>>
>             >     > <mailto:mdroe at stsci.edu <mailto:mdroe at stsci.edu>
>             <mailto:mdroe at stsci.edu <mailto:mdroe at stsci.edu>>>> wrote:
>             >     >
>             >     >     2) Improve documentation
>             >     >
>             >     >     Every method now has a docstring, and a new
>             page of routines
>             >     has been
>             >     >     added to the Sphinx tree.
>             >     >
>             >     >
>             >     > Um, where did you do this, 'cause it's not showing
>             up in the doc
>             >     wiki.
>             >     >
>             >     > DG
>             >     >
>             >    
>             ------------------------------------------------------------------------
>             >     >
>             >     > _______________________________________________
>             >     > NumPy-Discussion mailing list
>             >     > NumPy-Discussion at scipy.org
>             <mailto:NumPy-Discussion at scipy.org>
>             <mailto:NumPy-Discussion at scipy.org
>             <mailto:NumPy-Discussion at scipy.org>>
>             >     >
>             http://mail.scipy.org/mailman/listinfo/numpy-discussion
>             >     >
>             >
>             >     --
>             >     Michael Droettboom
>             >     Science Software Branch
>             >     Operations and Engineering Division
>             >     Space Telescope Science Institute
>             >     Operated by AURA for NASA
>             >
>             >     _______________________________________________
>             >     NumPy-Discussion mailing list
>             >     NumPy-Discussion at scipy.org
>             <mailto:NumPy-Discussion at scipy.org>
>             <mailto:NumPy-Discussion at scipy.org
>             <mailto:NumPy-Discussion at scipy.org>>
>             >     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>             >
>             >
>             >
>             ------------------------------------------------------------------------
>             >
>             > _______________________________________________
>             > NumPy-Discussion mailing list
>             > NumPy-Discussion at scipy.org
>             <mailto:NumPy-Discussion at scipy.org>
>             > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>             >
>
>             --
>             Michael Droettboom
>             Science Software Branch
>             Operations and Engineering Division
>             Space Telescope Science Institute
>             Operated by AURA for NASA
>
>             _______________________________________________
>             NumPy-Discussion mailing list
>             NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>             http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>         http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From millman at berkeley.edu  Thu Oct  1 12:32:06 2009
From: millman at berkeley.edu (Jarrod Millman)
Date: Thu, 1 Oct 2009 09:32:06 -0700
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
Message-ID: <c7009a550910010932p77634564k8189b9ce16508a9b@mail.gmail.com>

On Thu, Oct 1, 2009 at 9:19 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> Sorry to ask again, but it would really be very useful to get those
> docstrings merged for both scipy and numpy.

I will do this now.
Jarrod


From ralf.gommers at googlemail.com  Thu Oct  1 12:35:04 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Thu, 1 Oct 2009 12:35:04 -0400
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <c7009a550910010932p77634564k8189b9ce16508a9b@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
	<c7009a550910010932p77634564k8189b9ce16508a9b@mail.gmail.com>
Message-ID: <dde7764a0910010935o39a9178ak1cccbfeeb2379a5c@mail.gmail.com>

On Thu, Oct 1, 2009 at 12:32 PM, Jarrod Millman <millman at berkeley.edu>wrote:

> On Thu, Oct 1, 2009 at 9:19 AM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> > Sorry to ask again, but it would really be very useful to get those
> > docstrings merged for both scipy and numpy.
>
> I will do this now.
> Jarrod
>

Thanks a lot!
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/7e883588/attachment.html>

From pfeldman at verizon.net  Thu Oct  1 12:55:50 2009
From: pfeldman at verizon.net (Dr. Phillip M. Feldman)
Date: Thu, 1 Oct 2009 09:55:50 -0700 (PDT)
Subject: [Numpy-discussion]  difficulty with numpy.where
Message-ID: <25702676.post@talk.nabble.com>


I've defined the following one-line function that uses numpy.where:

def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0)

When I try to use this function, I get an error message:

In [4]: z=linspace(0,2*pi,9)
In [5]: sin_half_period(z)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

The truth value of an array with more than one element is ambiguous. Use
a.any
() or a.all()

Any suggestions will be appreciated.
-- 
View this message in context: http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.


From kwgoodman at gmail.com  Thu Oct  1 13:00:00 2009
From: kwgoodman at gmail.com (Keith Goodman)
Date: Thu, 1 Oct 2009 10:00:00 -0700
Subject: [Numpy-discussion] difficulty with numpy.where
In-Reply-To: <25702676.post@talk.nabble.com>
References: <25702676.post@talk.nabble.com>
Message-ID: <f4f93d420910011000t96180d3ufe4d0bc61914b2c6@mail.gmail.com>

On Thu, Oct 1, 2009 at 9:55 AM, Dr. Phillip M. Feldman
<pfeldman at verizon.net> wrote:
>
> I've defined the following one-line function that uses numpy.where:
>
> def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0)
>
> When I try to use this function, I get an error message:
>
> In [4]: z=linspace(0,2*pi,9)
> In [5]: sin_half_period(z)
> ---------------------------------------------------------------------------
> ValueError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last)
>
> The truth value of an array with more than one element is ambiguous. Use
> a.any
> () or a.all()
>
> Any suggestions will be appreciated.

Take a look at this thread:

http://www.nabble.com/Compound-conditional-indexing-td25686443.html


From zachary.pincus at yale.edu  Thu Oct  1 13:10:08 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Thu, 1 Oct 2009 13:10:08 -0400
Subject: [Numpy-discussion] difficulty with numpy.where
In-Reply-To: <25702676.post@talk.nabble.com>
References: <25702676.post@talk.nabble.com>
Message-ID: <06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu>

Hello,

a < b < c (or any equivalent expression) is python syntactic sugar for  
(a < b) and (b < c).

Now, for numpy arrays, a < b gives an array with boolean True or False  
where the elements of a are less than those of b. So this gives us two  
arrays that python now wants to "and" together. To do this, python  
tries to convert the array "a < b" to a single True or False value,  
and the array "b < c" to a single True or False value, which it then  
knows how to "and" together. Except that "a < b" could contain many  
True or False elements, so how to convert them to a single one?  
There's no obvious way to guess -- typically, one uses "any" or "all"  
to convert a boolean array to a single true or false value, depending,  
obviously, on what one needs.

So this explains the error you see, but has nothing to do with the  
results you desire... you need to and-together two boolean arrays  
*element-wise* -- which is something Python doesn't know how to do  
with the builtin "and" operator (which cannot be overridden). To do  
this, you need to use the bitwise logic operators:
(a < b) & (b < c).

So:

def sin_half_period(x): return where((0.0 <= x) & (x <= pi), sin(x),  
0.0)

Zach


On Oct 1, 2009, at 12:55 PM, Dr. Phillip M. Feldman wrote:

>
> I've defined the following one-line function that uses numpy.where:
>
> def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0)
>
> When I try to use this function, I get an error message:
>
> In [4]: z=linspace(0,2*pi,9)
> In [5]: sin_half_period(z)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent  
> call last)
>
> The truth value of an array with more than one element is ambiguous.  
> Use
> a.any
> () or a.all()
>
> Any suggestions will be appreciated.
> -- 
> View this message in context: http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From gokhansever at gmail.com  Thu Oct  1 13:48:33 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Thu, 1 Oct 2009 12:48:33 -0500
Subject: [Numpy-discussion] difficulty with numpy.where
In-Reply-To: <06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu>
References: <25702676.post@talk.nabble.com>
	<06FE4761-5930-4160-A90A-88AD15FAE71D@yale.edu>
Message-ID: <49d6b3500910011048o63a1b593qac1bdea0e69a6733@mail.gmail.com>

On Thu, Oct 1, 2009 at 12:10 PM, Zachary Pincus <zachary.pincus at yale.edu>wrote:

> Hello,
>
> a < b < c (or any equivalent expression) is python syntactic sugar for
> (a < b) and (b < c).
>
> Now, for numpy arrays, a < b gives an array with boolean True or False
> where the elements of a are less than those of b. So this gives us two
> arrays that python now wants to "and" together. To do this, python
> tries to convert the array "a < b" to a single True or False value,
> and the array "b < c" to a single True or False value, which it then
> knows how to "and" together. Except that "a < b" could contain many
> True or False elements, so how to convert them to a single one?
> There's no obvious way to guess -- typically, one uses "any" or "all"
> to convert a boolean array to a single true or false value, depending,
> obviously, on what one needs.
>
> So this explains the error you see, but has nothing to do with the
> results you desire... you need to and-together two boolean arrays
> *element-wise* -- which is something Python doesn't know how to do
> with the builtin "and" operator (which cannot be overridden). To do
> this, you need to use the bitwise logic operators:
> (a < b) & (b < c).
>
> So:
>
> def sin_half_period(x): return where((0.0 <= x) & (x <= pi), sin(x),
> 0.0)
>
> Zach
>
>
>
Very well expressed Zach.

The reason that I wanted use this kind of conditional indexing is as
follows: I have a dataset with a main time-variable and various other
measurement results including some atmospheric data (cloud microphysics in
particular). In one instance of this dataset I have 8000 something rows for
each of the variables in the file. We wanted to segment cloud droplet
concentration data only for some certain time-window (only if a measurement
was done at cloud base conditions.) We have a-priori knowledge for this
time-window, the only other thing to do is conditionally indexing our cloud
drop concentration with this window. Putting in more technical terms:

time = 40000 to 48000 a numpy array
conc = 300 to 500 numpy array with 8000 elements.

say that cloud bases occur in 45000 and 45400, and I am only interested
analysing that portion of the data. Do a boxplot or even being fancier and
making violing plots out this section :) So I do:

conc[(time>45000) & (time<45400)]

Voila!


>
> On Oct 1, 2009, at 12:55 PM, Dr. Phillip M. Feldman wrote:
>
> >
> > I've defined the following one-line function that uses numpy.where:
> >
> > def sin_half_period(x): return where(0.0 <= x <= pi, sin(x), 0.0)
> >
> > When I try to use this function, I get an error message:
> >
> > In [4]: z=linspace(0,2*pi,9)
> > In [5]: sin_half_period(z)
> >
> ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent
> > call last)
> >
> > The truth value of an array with more than one element is ambiguous.
> > Use
> > a.any
> > () or a.all()
> >
> > Any suggestions will be appreciated.
> > --
> > View this message in context:
> http://www.nabble.com/difficulty-with-numpy.where-tp25702676p25702676.html
> > Sent from the Numpy-discussion mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/6ede855b/attachment.html>

From millman at berkeley.edu  Thu Oct  1 14:26:56 2009
From: millman at berkeley.edu (Jarrod Millman)
Date: Thu, 1 Oct 2009 11:26:56 -0700
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <dde7764a0910010935o39a9178ak1cccbfeeb2379a5c@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
	<c7009a550910010932p77634564k8189b9ce16508a9b@mail.gmail.com>
	<dde7764a0910010935o39a9178ak1cccbfeeb2379a5c@mail.gmail.com>
Message-ID: <c7009a550910011126x7c8e7c6ke7b45148938761d2@mail.gmail.com>

OK, I've checked in the scipy doc improvements:
http://projects.scipy.org/scipy/changeset/5954
http://projects.scipy.org/scipy/changeset/5955

Thanks to everyone who contributed!  I will merge the numpy docs later today.

Is there anything else I need to do on the SciPy documentation editor
to indicate that I've merged the changes or will it update itself.

Best,
Jarrod


From Klaus.Noekel at gmx.de  Thu Oct  1 14:43:36 2009
From: Klaus.Noekel at gmx.de (Klaus Noekel)
Date: Thu, 01 Oct 2009 20:43:36 +0200
Subject: [Numpy-discussion] Windows 64-bit
Message-ID: <4AC4F858.3000108@gmx.de>

Hi all,

at the end of July David answered my question about future 64-bit 
Windows support as follows:

"There were some discussion about pushing 1.4.0 'early', but instead, I
think we let it slipped - one consequence is that there will be enough
time for 1.4.0 to be released with proper AMD64 support on windows.

The real issue is not numpy per-se, but making scipy work on top of
numpy in 64 bits mode. It is hard to give an exact date as to when those
issues will be fixed, but it is being worked on."

As our project needs 64-bit numpy under Windows quite soon, I am curious 
about the state of the project:

- Is a stable 64-bit Windows installer (with or without numpy 1.4.0) 
going to be released anytime SOON?
- We need only numpy, not scipy. Does that imply that we have a good 
chance of producing an install ourselves with the current sources? I am 
a bit concerned, because earlier posts indicated that the issue is not 
trivial. Or have all the hard aspects to do with scipy?

Thanks for an update!

Cheers,
Klaus N?kel


From ralf.gommers at googlemail.com  Thu Oct  1 15:20:51 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Thu, 1 Oct 2009 15:20:51 -0400
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <c7009a550910011126x7c8e7c6ke7b45148938761d2@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
	<c7009a550910010932p77634564k8189b9ce16508a9b@mail.gmail.com>
	<dde7764a0910010935o39a9178ak1cccbfeeb2379a5c@mail.gmail.com>
	<c7009a550910011126x7c8e7c6ke7b45148938761d2@mail.gmail.com>
Message-ID: <dde7764a0910011220u4ed26202je28ce760cfb43ea6@mail.gmail.com>

On Thu, Oct 1, 2009 at 2:26 PM, Jarrod Millman <millman at berkeley.edu> wrote:

> OK, I've checked in the scipy doc improvements:
> http://projects.scipy.org/scipy/changeset/5954
> http://projects.scipy.org/scipy/changeset/5955
>
> Thanks again Jarrod!


> Thanks to everyone who contributed!  I will merge the numpy docs later
> today.
>
> Is there anything else I need to do on the SciPy documentation editor
> to indicate that I've merged the changes or will it update itself.
>

That should be all I think. Changes should show up in the wiki within a day,
at which point the "diff to svn" is empty and the updated docstrings should
disappear from the patch page.

Cheers,
Ralf


> Best,
> Jarrod
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091001/21590057/attachment.html>

From rowen at uw.edu  Thu Oct  1 15:55:19 2009
From: rowen at uw.edu (Russell E. Owen)
Date: Thu, 01 Oct 2009 12:55:19 -0700
Subject: [Numpy-discussion] More guestions on Chebyshev class.
References: <e06186140909302316v7dd6a741ubc28d98977d00b03@mail.gmail.com>
Message-ID: <rowen-E499FA.12551901102009@news.gmane.org>

In article 
<e06186140909302316v7dd6a741ubc28d98977d00b03 at mail.gmail.com>,
 Charles R Harris <charlesr.harris at gmail.com> wrote:

> The Chebyshev class is now working pretty well, but I would like to settle
> some things up front.
> 
> 1) Order in which coefficients are stored/passed/accessed.
> 
> The current poly1d class ctor is called with the coefficients in high to low
> order, yet the __getitem__ and __setitem__ methods access them in reverse
> order. This seems confusing and I think both should go in the same order and
> my preference would be from low to high. The low to high order also works a
> bit better for implementation.

This sounds like a very useful change to me.

> 2) poly1d allows the size of the coefficient array to be dynamically
> extended. I have mixed feeling about that and would prefer not, but there
> are arguments for that: students might find it easier to fool with.

If it's easy to make a new instance that copies the old cofficients and 
allow the user to add new ones or trim some high order terms, then 
surely that suffices and you need not support resizing the coefficient 
array?

> 3) The poly1d class prunes leading (high power) zeros. Because the Cheb
> class has a fit static method that returns a Cheb object, and because when
> fitting with Chebyshev polynomials the user often wants to see *all* of the
> coefficients, even if some of the leading ones are zero, the Cheb class does
> not automatically prune the zeros, instead there are methods for that.

Will this be affected if you list the coefficients low to high, as you 
recommend in (1) or make the coefficient list not resizable as per (2)? 
Certainly it seems much safer to elide trailing zeros, rather than 
leading zeros.

In any case, I agree with you that manually trimming sounds safer than 
than automatically trimming.

> 4) All the attributes of the Cheb class are read/write. The poly1d class
> attempts to hide some, but the method used breaks the copy module. Python
> really doesn't have private attributes, so I left all the attributes exposed
> with the usual Python proviso: if you don't know what it does, don't fool
> with it.
> 
> 5) Is Cheb the proper name for the class?

I suggest spelling it out: Chebyshev. Explicit is better than implicit 
and it doesn't save that much typing. (Failing that, I suggest at least 
including the Y -- I think Cheby is clearer than Cheb).

-- Russell


From cournape at gmail.com  Thu Oct  1 20:33:32 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 2 Oct 2009 09:33:32 +0900
Subject: [Numpy-discussion] Windows 64-bit
In-Reply-To: <4AC4F858.3000108@gmx.de>
References: <4AC4F858.3000108@gmx.de>
Message-ID: <5b8d13220910011733w221f65ata82381a0bedaaa9@mail.gmail.com>

On Fri, Oct 2, 2009 at 3:43 AM, Klaus Noekel <Klaus.Noekel at gmx.de> wrote:

> - We need only numpy, not scipy. Does that imply that we have a good
> chance of producing an install ourselves with the current sources?

The current sources can be compiled by visual studio in 64 bits mode
without problem and should be quite stsable- you won't have a fast
blas/lapack, though,

David


From bsouthey at gmail.com  Fri Oct  2 10:34:39 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 02 Oct 2009 09:34:39 -0500
Subject: [Numpy-discussion] Question about improving genfromtxt errors
In-Reply-To: <c048da1c0909301044p30e49dces57c45285068e8a75@mail.gmail.com>
References: <c048da1c0909251000o24e26269i37685043eb0538f7@mail.gmail.com>
	<4AC0E740.60309@noaa.gov>	<c048da1c0909280951p51d58240h44fe51bc4cf1b3db@mail.gmail.com>
	<A7E8B900-6FD4-4587-8B45-0F6C153AE4C6@gmail.com>	<4AC237BA.9050104@noaa.gov>
	<4AC24A7F.4020904@gmail.com>	<00CD2A47-A721-4A46-8D81-203BE795A6E4@gmail.com>
	<4AC26FEB.60401@gmail.com>	<c048da1c0909300822u406c02f4h759200f125deddeb@mail.gmail.com>
	<4AC38DD4.9000609@gmail.com>
	<c048da1c0909301044p30e49dces57c45285068e8a75@mail.gmail.com>
Message-ID: <4AC60F7F.1050806@gmail.com>

On 09/30/2009 12:44 PM, Skipper Seabold wrote:
> On Wed, Sep 30, 2009 at 12:56 PM, Bruce Southey<bsouthey at gmail.com>  wrote:
>    
>> On 09/30/2009 10:22 AM, Skipper Seabold wrote:
>>      
>>> On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey<bsouthey at gmail.com>    wrote:
>>> <snip>
>>>
>>>        
>>>> Hi,
>>>> The first case just has to handle a missing delimiter - actually I expect
>>>> that most of my cases would relate this. So here is simple Python code to
>>>> generate arbitrary large list with the occasional missing delimiter.
>>>>
>>>> I set it so it reads the desired number of rows and frequency of bad rows
>>>> from the linux command line.
>>>> $time python tbig.py 1000000 100000
>>>>
>>>> If I comment out the extra prints in io.py that I put in, it takes about 22
>>>> seconds to finish if the delimiters are correct. If I have the missing
>>>> delimiter it takes 20.5 seconds to crash.
>>>>
>>>>
>>>> Bruce
>>>>
>>>>
>>>>          
>>> I think this would actually cover most of the problems I was running
>>> into.  The only other one I can think of is when I used a converter
>>> that I thought would work, but it got unexpected data.  For example,
>>>
>>> from StringIO import StringIO
>>> import numpy as np
>>>
>>> strip_rand = lambda x : float(('r' in x.lower() and x.split()[-1]) or
>>> (not 'r' in x.lower() and x.strip() or 0.0))
>>>
>>> # Example usage
>>> strip_rand('R 40')
>>> strip_rand('  ')
>>> strip_rand('')
>>> strip_rand('40')
>>>
>>> strip_per = lambda x : float(('%' in x.lower() and x.split()[0]) or
>>> (not '%' in x.lower() and x.strip() or 0.0))
>>>
>>> # Example usage
>>> strip_per('7 %')
>>> strip_per('7')
>>> strip_per(' ')
>>> strip_per('')
>>>
>>> # Unexpected usage
>>> strip_per('R 1')
>>>
>>>        
>> Does this work for you?
>> I get an:
>> ValueError: invalid literal for float(): R 1
>>
>>      
> No, that's the idea.  Sorry this was a bit opaque.
>
>    
>>      
>>> s = StringIO('D01N01,10/1/2003 ,1 %,R 75,400,600\r\nL24U05,12/5/2003\
>>> ,2 %,1,300, 150.5\r\nD02N03,10/10/2004 ,R 1,,7,145.55')
>>>
>>>        
>> Can you provide the correct line before the bad line?
>> It just makes it easy to understand why a line is bad.
>>
>>      
> The idea is that I have a column, which I expect to be percentages,
> but these are coded in by different data collectors, so some code a 0
> for 0, some just leave it missing which could just as well be 0, some
> use the %.  What I didn't expect was that some put in a money amount,
> hence the 'R 7', which my converter doesn't catch.
>
>    
>>> data = np.genfromtxt(s, converters = {2 : strip_per, 3 : strip_rand},
>>> delimiter=",", dtype=None)
>>>
>>> I don't have a clean install right now, but I think this returned a
>>> converter is locked for upgrading error.  I would just like to know
>>> where the problem occured (line and column, preferably not
>>> zero-indexed), so I can go and have a look at my data.
>>>
>>>        
>> I rather limited understanding here. I think the problem is that Python
>> is raising a ValueError because your strip_per() is wrong. It is not
>> informative to you because _iotools.py is not aware that an invalid
>> converter will raise a ValueError. Therefore there needs to be some way
>> to test that the converter is correct or not.
>>
>>      
> _iotools does catch this I believe, though I don't understand the
> upgrading and locking properly.  The kludgy fix that I provided in the
> first post "I do not report the error from
> _iotools.StringConverter...", catches that an error is raised from
> _iotools and tells me exactly where the converter fails, so I can go
> to, say line 750,000 column 250 (and converter with key 249) instead
> of not knowing anything except that one of my ~500 converters failed
> somewhere in a 1 million line data file.  If you still want to keep
> the error messages from _iotools.StringConverter, then they maybe they
> could have a (%s, %s) added and then this can be filled in in
> genfromtxt when you know (line, column) or something similar as was
> kind of suggested in a post in this thread I believe.  Then again,
> this might not be possible.  I haven't tried.
>
>    
I added another patch to ticket 1212
http://projects.scipy.org/numpy/ticket/1212

I tried to rework my first patch because I had forgotten that the header 
of the file that I was using was missing a delimiter. (Something I need 
to investigate more.) Hopefully it helps towards a better solution.

I added a try/except block around the 'converter.upgrade(item)' line 
which appears to provide the results for your file. While not the best 
solution. In addition, I modified the loop to enumerate the converter 
list so I could find which one in the list fails. The output for your 
example:

Row Number: 3 Failed Converter 2 in list of converters
[('D01N01', '10/1/2003 ', 1.0, 75.0, 400, 600.0)
  ('L24U05', '12/5/2003', 2.0, 1.0, 300, 150.5)
  ('D02N03', '10/10/2004 ', 0.0, 0.0, 7, 145.55000000000001)]

>> This this case I think it is the delimiter so checking the column
>> numbers should occur before the application of the converter to that row.
>>
>>      
> Sometimes it was the case where I had an extra comma in a number 1,000
> say and then the converter tried to work on the wrong column, and
> sometimes it was because my converter didn't cover every use case,
> because I didn't know it yet.  Either way, I just needed a gentle
> nudge in the right direction.
>
> If that doesn't clear up what I was after, I can try to provide a more
> detailed code sample.
>
> Skipper
> _______________________________________________
>    
I do not see how to write code to determine when a delimiter has more 
than one meaning. While there are more columns than expected, it can be 
very hard to determine which column is incorrect without additional 
information. We might be able to that we we associate a format to a 
column. But then you would have to split columns one by one and checking 
each one as you do so. Probably not hard to do but a lot of work to 
validate it. For example, I have numerous problems with dates in SAS 
because you have 2 or 4 digit years, 1 or  2 digits days and months. But 
any variation than expected leads to errors if it expects 2 digit years 
and gets a 4 digit year. So I usually read dates as strings and then 
parse it as I want.

Bruce


From josef.pktd at gmail.com  Fri Oct  2 13:08:46 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 2 Oct 2009 13:08:46 -0400
Subject: [Numpy-discussion] poly class question
Message-ID: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>

Is there a way in numpy (or scipy) to get an infinite expansion for
the inverse of a polynomial (for a finite number of terms)

np.poly1d([ -0.8, 1])**(-1)

application for example the MA representation of an AR(1)


and fractional powers

np.poly1d([ -1, 1])**0.5

this is useful for fractionally integrated time series, e.g. ARFIMA

Until now I did this directly or using scipy.signal, but I thought
maybe the polynomial class would handle some of it, both examples
raise exceptions.

Josef


From charlesr.harris at gmail.com  Fri Oct  2 13:30:10 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 2 Oct 2009 11:30:10 -0600
Subject: [Numpy-discussion] poly class question
In-Reply-To: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
Message-ID: <e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>

On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:

> Is there a way in numpy (or scipy) to get an infinite expansion for
> the inverse of a polynomial (for a finite number of terms)
>
> np.poly1d([ -0.8, 1])**(-1)
>
> application for example the MA representation of an AR(1)
>
>
Hmm, I've been working on a chebyshev class and division of a scalar by a
chebyshev series is
expressly forbidden, but it could be included if a good interface is
proposed. Same would go for polynomials.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/f606e6f2/attachment.html>

From charlesr.harris at gmail.com  Fri Oct  2 13:33:00 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 2 Oct 2009 11:33:00 -0600
Subject: [Numpy-discussion] poly class question
In-Reply-To: <e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
Message-ID: <e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>

On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>
>> Is there a way in numpy (or scipy) to get an infinite expansion for
>> the inverse of a polynomial (for a finite number of terms)
>>
>> np.poly1d([ -0.8, 1])**(-1)
>>
>> application for example the MA representation of an AR(1)
>>
>>
> Hmm, I've been working on a chebyshev class and division of a scalar by a
> chebyshev series is
> expressly forbidden, but it could be included if a good interface is
> proposed. Same would go for polynomials.
>

In fact is isn't hard to get, for poly1d you should be able to multiply the
series by a power of x to shift it left, then divide.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/fa6161ce/attachment.html>

From charlesr.harris at gmail.com  Fri Oct  2 13:35:48 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 2 Oct 2009 11:35:48 -0600
Subject: [Numpy-discussion] poly class question
In-Reply-To: <e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
Message-ID: <e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>

On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>
>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>> the inverse of a polynomial (for a finite number of terms)
>>>
>>> np.poly1d([ -0.8, 1])**(-1)
>>>
>>> application for example the MA representation of an AR(1)
>>>
>>>
>> Hmm, I've been working on a chebyshev class and division of a scalar by a
>> chebyshev series is
>> expressly forbidden, but it could be included if a good interface is
>> proposed. Same would go for polynomials.
>>
>
> In fact is isn't hard to get, for poly1d you should be able to multiply the
> series by a power of x to shift it left, then divide.
>
>
That is, divide a power of x by the polynomial.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/2b6ab6f0/attachment.html>

From charlesr.harris at gmail.com  Fri Oct  2 14:09:32 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 2 Oct 2009 12:09:32 -0600
Subject: [Numpy-discussion] poly class question
In-Reply-To: <e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
Message-ID: <e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>

On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>>
>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>>> the inverse of a polynomial (for a finite number of terms)
>>>>
>>>> np.poly1d([ -0.8, 1])**(-1)
>>>>
>>>> application for example the MA representation of an AR(1)
>>>>
>>>>
>>> Hmm, I've been working on a chebyshev class and division of a scalar by a
>>> chebyshev series is
>>> expressly forbidden, but it could be included if a good interface is
>>> proposed. Same would go for polynomials.
>>>
>>
>> In fact is isn't hard to get, for poly1d you should be able to multiply
>> the series by a power of x to shift it left, then divide.
>>
>>
> That is, divide a power of x by the polynomial.
>
>
You will also need to reverse the denominator coefficients...Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/a8b4c65e/attachment.html>

From josef.pktd at gmail.com  Fri Oct  2 14:30:57 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 2 Oct 2009 14:30:57 -0400
Subject: [Numpy-discussion] poly class question
In-Reply-To: <e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
Message-ID: <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>

On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>>
>>>>
>>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>>>>
>>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>>>> the inverse of a polynomial (for a finite number of terms)
>>>>>
>>>>> np.poly1d([ -0.8, 1])**(-1)
>>>>>
>>>>> application for example the MA representation of an AR(1)
>>>>>
>>>>
>>>> Hmm, I've been working on a chebyshev class and division of a scalar by
>>>> a chebyshev series is
>>>> expressly forbidden, but it could be included if a good interface is
>>>> proposed. Same would go for polynomials.
>>>
>>> In fact is isn't hard to get, for poly1d you should be able to multiply
>>> the series by a power of x to shift it left, then divide.
>>>
>>
>> That is, divide a power of x by the polynomial.
>>
>
> You will also need to reverse the denominator coefficients...Chuck

That's the hint I needed. However the polynomial coefficients are then
reversed and not consistent with other polynomial operations, aren't
they?

>>> from scipy.signal import lfilter

>>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
(poly1d([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,
0.13421773]), poly1d([ 0.10737418]))

>>> lfilter([1], [1,-0.8], [1] + [0]*9)
array([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,  0.13421773])

>>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
(poly1d([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
        0.01408   , -0.001856  , -0.0043008 , -0.00306944]),
poly1d([-0.00159539,  0.00061389]))
>>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
array([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
        0.01408   , -0.001856  , -0.0043008 , -0.00306944, -0.00159539])


What I meant initally doesn't necessarily mean division of a scalar.

>>> np.poly1d([1])/np.poly1d([-0.8, 1])
(poly1d([ 0.]), poly1d([ 1.]))

I didn't find any polynomial division that does the expansion of the
remainder. The same problem, I think is inherited, by the
scipy.signal.lti, and it took me a while to find the usefulness of
lfilter in this case.

If it were possible to extend the methods for the polynomial class to
do a longer expansions, it would make them more useful for arma and
lti.

(in some areas, I'm still trying to figure out whether some
functionality is just hidden to me, or actually a limitation of the
implementation or a missing feature.)

Thanks,

Josef


>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From mjanikas at esri.com  Fri Oct  2 15:16:44 2009
From: mjanikas at esri.com (Mark Janikas)
Date: Fri, 2 Oct 2009 12:16:44 -0700
Subject: [Numpy-discussion] Database with Nulls to Numpy Structure
Message-ID: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com>

Hello All,

I was hoping you could help me out with a simple little problem I am having:

I am reading data from a database that contains NULL values.  There is more than one field being read in with equal length, but if any of them are NULL in a row, then I do NOT want to include it in my numpy structure (I.e. no records for that row across fields).  As the values from each field are of the same type, I can pre-allocate the space for the entire dataset (if all were not NULL), but there may be less observations after accounting for the NULLS.  So, do I use lists and append then create the arrays... Or do I fill up the pre-allocated "empty" arrays and slice off the ends?  Thoughts?  Thanks much...

MJ

Mark Janikas
Product Engineer
ESRI, Geoprocessing
380 New York St.
Redlands, CA 92373
909-793-2853 (2563)
mjanikas at esri.com<mailto:mjanikas at esri.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/c919ab75/attachment.html>

From Chris.Barker at noaa.gov  Fri Oct  2 15:33:35 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Fri, 02 Oct 2009 12:33:35 -0700
Subject: [Numpy-discussion] Database with Nulls to Numpy Structure
In-Reply-To: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com>
References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com>
Message-ID: <4AC6558F.8080400@noaa.gov>

Mark Janikas wrote:
> So, do I use lists and 
> append then create the arrays? Or do I fill up the pre-allocated ?empty? 
> arrays and slice off the ends?  Thoughts?  Thanks much?

Either will work. I think the decision would be based on how many Null 
records you expect -- if it's a small fraction then go ahead and 
pre-allocate the array, if it's a large fraction, then you might want to 
go with a list.

Note: you may be able to use arr.resize() to chop it off at the end.

The list method has the downside of using more memory, and being a bit 
slower, which may be mitigated if there are lots of null records.

See an upcoming email of mine for another option...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From charlesr.harris at gmail.com  Fri Oct  2 15:38:53 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 2 Oct 2009 13:38:53 -0600
Subject: [Numpy-discussion] poly class question
In-Reply-To: <1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
	<1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
Message-ID: <e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>

On Fri, Oct 2, 2009 at 12:30 PM, <josef.pktd at gmail.com> wrote:

> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >>>
> >>>
> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
> >>> <charlesr.harris at gmail.com> wrote:
> >>>>
> >>>>
> >>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
> >>>>>
> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
> >>>>> the inverse of a polynomial (for a finite number of terms)
> >>>>>
> >>>>> np.poly1d([ -0.8, 1])**(-1)
> >>>>>
> >>>>> application for example the MA representation of an AR(1)
> >>>>>
> >>>>
> >>>> Hmm, I've been working on a chebyshev class and division of a scalar
> by
> >>>> a chebyshev series is
> >>>> expressly forbidden, but it could be included if a good interface is
> >>>> proposed. Same would go for polynomials.
> >>>
> >>> In fact is isn't hard to get, for poly1d you should be able to multiply
> >>> the series by a power of x to shift it left, then divide.
> >>>
> >>
> >> That is, divide a power of x by the polynomial.
> >>
> >
> > You will also need to reverse the denominator coefficients...Chuck
>
> That's the hint I needed. However the polynomial coefficients are then
> reversed and not consistent with other polynomial operations, aren't
> they?
>
> >>> from scipy.signal import lfilter
>
> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
> (poly1d([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,
> 0.13421773]), poly1d([ 0.10737418]))
>
> >>> lfilter([1], [1,-0.8], [1] + [0]*9)
> array([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,  0.13421773])
>
> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
> (poly1d([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>        0.01408   , -0.001856  , -0.0043008 , -0.00306944]),
> poly1d([-0.00159539,  0.00061389]))
> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
> array([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>        0.01408   , -0.001856  , -0.0043008 , -0.00306944, -0.00159539])
>
>
> What I meant initally doesn't necessarily mean division of a scalar.
>
> >>> np.poly1d([1])/np.poly1d([-0.8, 1])
> (poly1d([ 0.]), poly1d([ 1.]))
>
> I didn't find any polynomial division that does the expansion of the
> remainder. The same problem, I think is inherited, by the
> scipy.signal.lti, and it took me a while to find the usefulness of
> lfilter in this case.
>
> If it were possible to extend the methods for the polynomial class to
> do a longer expansions, it would make them more useful for arma and
> lti.
>
> (in some areas, I'm still trying to figure out whether some
> functionality is just hidden to me, or actually a limitation of the
> implementation or a missing feature.)
>
>
Could you describe the sort of problems you want to solve? There are lots of
curious things out there we could maybe work with. Covariances, for
instance, are closely related to Chebyshev series.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/817f2f5f/attachment.html>

From pav at iki.fi  Fri Oct  2 15:56:02 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Fri, 02 Oct 2009 22:56:02 +0300
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
Message-ID: <1254513362.5712.10.camel@idol>

to, 2009-10-01 kello 12:19 -0400, Ralf Gommers kirjoitti: 
> Sorry to ask again, but it would really be very useful to get those
> docstrings merged for both scipy and numpy.
[clip]

Numpy's new docstrings is are now in SVN too, for the most part. An
amazing amount of work was done during the summer, thanks to all who
participated!
> 
> For numpy in principle the same procedure, except there are some
> objects that need the add_newdocs treatment. There are two types of
> errors, my question is (mainly to Pauli) if they both need the same
> treatment or a different one.
>
> Errors:
> 1. source location not known, like: 
> ERROR: numpy.broadcast.next: source location for docstring is not known
> 2. source location known but failed to find a place to add docstrings,
> like: 
> ERROR: Source location for numpy.lib.function_base.iterable known,
> but failed to find a place for the docstring 

These I didn't commit yet. Mostly, they can be fixed by adding necessary
entries to add_newdocs.py. However, some of these may be objects
assigning docstrings to which may be technically difficult and requires
larger changes. The second error may also indicate a bug in patch
generation.

-- 
Pauli Virtanen 


From d.l.goldsmith at gmail.com  Fri Oct  2 16:21:02 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 2 Oct 2009 13:21:02 -0700
Subject: [Numpy-discussion] merging docs from wiki
In-Reply-To: <1254513362.5712.10.camel@idol>
References: <dde7764a0909201249g7ffdb90cw95a3f4f4758ca5e6@mail.gmail.com>
	<dde7764a0909201759t2eab3811k5b67c9589bc4069f@mail.gmail.com>
	<dde7764a0910010919w4b80c8e9o3d3fa7b7a0a0c956@mail.gmail.com>
	<1254513362.5712.10.camel@idol>
Message-ID: <45d1ab480910021321m4bd3346at25bc702f89d38c35@mail.gmail.com>

Is there any way to move the existing parts of this thread (i.e., not just
future posts, which of course is as simple as posting them there instead)
over to scipy-dev, where it really belongs?

DG

On Fri, Oct 2, 2009 at 12:56 PM, Pauli Virtanen <pav at iki.fi> wrote:

> to, 2009-10-01 kello 12:19 -0400, Ralf Gommers kirjoitti:
> > Sorry to ask again, but it would really be very useful to get those
> > docstrings merged for both scipy and numpy.
> [clip]
>
> Numpy's new docstrings is are now in SVN too, for the most part. An
> amazing amount of work was done during the summer, thanks to all who
> participated!
> >
> > For numpy in principle the same procedure, except there are some
> > objects that need the add_newdocs treatment. There are two types of
> > errors, my question is (mainly to Pauli) if they both need the same
> > treatment or a different one.
> >
> > Errors:
> > 1. source location not known, like:
> > ERROR: numpy.broadcast.next: source location for docstring is not known
> > 2. source location known but failed to find a place to add docstrings,
> > like:
> > ERROR: Source location for numpy.lib.function_base.iterable known,
> > but failed to find a place for the docstring
>
> These I didn't commit yet. Mostly, they can be fixed by adding necessary
> entries to add_newdocs.py. However, some of these may be objects
> assigning docstrings to which may be technically difficult and requires
> larger changes. The second error may also indicate a bug in patch
> generation.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091002/73378627/attachment.html>

From josef.pktd at gmail.com  Fri Oct  2 16:40:03 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 2 Oct 2009 16:40:03 -0400
Subject: [Numpy-discussion] poly class question
In-Reply-To: <e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
	<1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
	<e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>
Message-ID: <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com>

On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Oct 2, 2009 at 12:30 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
>> > <charlesr.harris at gmail.com> wrote:
>> >>
>> >>
>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >>>
>> >>>
>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
>> >>> <charlesr.harris at gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>> >>>>>
>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>> >>>>> the inverse of a polynomial (for a finite number of terms)
>> >>>>>
>> >>>>> np.poly1d([ -0.8, 1])**(-1)
>> >>>>>
>> >>>>> application for example the MA representation of an AR(1)
>> >>>>>
>> >>>>
>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar
>> >>>> by
>> >>>> a chebyshev series is
>> >>>> expressly forbidden, but it could be included if a good interface is
>> >>>> proposed. Same would go for polynomials.
>> >>>
>> >>> In fact is isn't hard to get, for poly1d you should be able to
>> >>> multiply
>> >>> the series by a power of x to shift it left, then divide.
>> >>>
>> >>
>> >> That is, divide a power of x by the polynomial.
>> >>
>> >
>> > You will also need to reverse the denominator coefficients...Chuck
>>
>> That's the hint I needed. However the polynomial coefficients are then
>> reversed and not consistent with other polynomial operations, aren't
>> they?
>>
>> >>> from scipy.signal import lfilter
>>
>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?,
>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216,
>> 0.13421773]), poly1d([ 0.10737418]))
>>
>> >>> lfilter([1], [1,-0.8], [1] + [0]*9)
>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?,
>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, ?0.13421773])
>>
>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?,
>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944]),
>> poly1d([-0.00159539, ?0.00061389]))
>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?,
>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944, -0.00159539])
>>
>>
>> What I meant initally doesn't necessarily mean division of a scalar.
>>
>> >>> np.poly1d([1])/np.poly1d([-0.8, 1])
>> (poly1d([ 0.]), poly1d([ 1.]))
>>
>> I didn't find any polynomial division that does the expansion of the
>> remainder. The same problem, I think is inherited, by the
>> scipy.signal.lti, and it took me a while to find the usefulness of
>> lfilter in this case.
>>
>> If it were possible to extend the methods for the polynomial class to
>> do a longer expansions, it would make them more useful for arma and
>> lti.
>>
>> (in some areas, I'm still trying to figure out whether some
>> functionality is just hidden to me, or actually a limitation of the
>> implementation or a missing feature.)
>>
>
> Could you describe the sort of problems you want to solve? There are lots of
> curious things out there we could maybe work with. Covariances, for
> instance, are closely related to Chebyshev series.

I am working on a discrete time arma process of the form

a(L) x_t = b(L) u_t,  where L is the lag operator L^k x_t = x_(t-k)

what I just programmed using lfilter is
x_t = b(L)/a(L) u_t  where  b(L)/a(L) is the impulse response function
or moving average representation

 a(L)/b(L)  is the autoregressive representation

the extension
a(L)(1-L)^d x_t = b(L) u_t,  where d = 0,1,2,...  (standard) or  also
continuous d <1 (fractional integration)

 a(L)/b(L),  b(L)/a(L)  (1-L)^(-d)  or (1-L)^d (0<d<1)  are infinite
dimensional lag polynomials in the general case.
Initially I was looking for an easy way to do these calculation as polynomials.
(The fractional case (1-L)^d (0<d<1) might be a pretty special case,
and I just looked it up today, but is a popular model class in
econometrics, fractionally integrated arma processes)

multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1])
(with reversed poly coefficient scipy.signal I think)

the functions in scipy.signal for lti are only for continuous time
processes and use poly1d under the hood, which means for example

>>> from scipy import signal
>>> signal.impulse(([1, -0.8],[1]), N=10)
    raise ValueError, "Improper transfer function."
ValueError: Improper transfer function.

while this works
>>> signal.impulse(([1],[1, -0.8]), N=10)

(It's been a while since I looked inside scipy.signal.lti)

A separate issue would be the multivariate version VARMA, or MIMO in
system modeling.  a(L), b(L) are matrix polynomials and x_t, u_t are
1d arrays evolving in time.
But that is a different discussion.

I'm not very familiar with Chebychev polynomials, the last time I
wanted to use them I didn't see anything about their use as a base for
functions in several variables and gave up. I've seen papers that use
them as base for functions in one variable, but I'm not doing anything
like this right now.

Thanks,

Josef

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From mjanikas at esri.com  Fri Oct  2 19:31:33 2009
From: mjanikas at esri.com (Mark Janikas)
Date: Fri, 2 Oct 2009 16:31:33 -0700
Subject: [Numpy-discussion] Database with Nulls to Numpy Structure
In-Reply-To: <4AC6558F.8080400@noaa.gov>
References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com>
	<4AC6558F.8080400@noaa.gov>
Message-ID: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com>

Thanks for the input!  I wonder if I can resize my own record array?  I.e. one call to truncate... Ill give it a go.  But the resize works great as it doesn't make a copy:

In [12]: a = NUM.arange(10)

In [13]: id(a)
Out[13]: 190182896

In [14]: a.resize(5,)

In [15]: a
Out[15]: array([0, 1, 2, 3, 4])

In [16]: id(a)
Out[16]: 190182896

Whereas the slice seems to make a copy/reassign:

In [18]: a = a[0:2]

In [19]: id(a)
Out[19]: 189981184


Pretty Nice.  Pre-allocate the full space and count number of good records... then resize.  Doesn't seem that much faster than using the lists then creating arrays, but memory should be better.  Thanks again, and anything further would be appreciated.

MJ


-----Original Message-----
From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Barker
Sent: Friday, October 02, 2009 12:34 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Database with Nulls to Numpy Structure

Mark Janikas wrote:
> So, do I use lists and 
> append then create the arrays... Or do I fill up the pre-allocated "empty" 
> arrays and slice off the ends?  Thoughts?  Thanks much...

Either will work. I think the decision would be based on how many Null 
records you expect -- if it's a small fraction then go ahead and 
pre-allocate the array, if it's a large fraction, then you might want to 
go with a list.

Note: you may be able to use arr.resize() to chop it off at the end.

The list method has the downside of using more memory, and being a bit 
slower, which may be mitigated if there are lots of null records.

See an upcoming email of mine for another option...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From Chris.Barker at noaa.gov  Fri Oct  2 23:38:55 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Fri, 02 Oct 2009 20:38:55 -0700
Subject: [Numpy-discussion] Database with Nulls to Numpy Structure
In-Reply-To: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com>
References: <6DF3F8F869B22C4393D67CA19A35AA0E0236E865C0@redmx1.esri.com>
	<4AC6558F.8080400@noaa.gov>
	<6DF3F8F869B22C4393D67CA19A35AA0E0236E865C3@redmx1.esri.com>
Message-ID: <4AC6C74F.1000203@noaa.gov>

Mark Janikas wrote:
> Thanks for the input!  I wonder if I can resize my own record array?  I.e. one call to truncate... Ill give it a go.

you should be able too, yes. Be careful though, you can't call resize() 
if there are any other references to the array.

>  But the resize works great as it doesn't make a copy:

Actually, it's not that simple. With numpy arrays, there is the array 
object itself, and there is the data block that the array points to. Whn 
you call resize() it may make a copy of the data block (which is why it 
won't work if there are other references to it), while keeping the same 
python object.

> In [12]: a = NUM.arange(10)
> 
> In [13]: id(a)
> Out[13]: 190182896
> 
> In [14]: a.resize(5,)
> 
> In [15]: a
> Out[15]: array([0, 1, 2, 3, 4])
> 
> In [16]: id(a)
> Out[16]: 190182896

So this shows you have the same python object. I think there is a way to 
get the value of the pointer to the data block, but I dont' know off the 
top of my head how.

> Whereas the slice seems to make a copy/reassign:
> 
> In [18]: a = a[0:2]
> 
> In [19]: id(a)
> Out[19]: 189981184

slicing creates a new python object, but it doesn't copy the actual data:

In [4]: b = a[2:5]

In [5]: a is b
Out[5]: False

In [6]: a[2:5] = 10

In [7]: a
Out[7]: array([ 0,  1, 10, 10, 10,  5,  6,  7,  8,  9])

In [8]: b
Out[8]: array([10, 10, 10])

so you can see a and b are different python objects, but they share the 
same data block.

HTH,
-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Sat Oct  3 03:26:49 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Sat, 03 Oct 2009 00:26:49 -0700
Subject: [Numpy-discussion] A numpy accumulator...
Message-ID: <4AC6FCB9.5040107@noaa.gov>

Hasi all,

This idea was  inspired by a discussion at SciPY, in which we spent a 
LOT of time during the numpy tutorial talking about how to accumulate 
values in an array when you don't know how big the array needs to be 
when you start.

The "standard practice" is to accumulate in a python list, then convert
the final result into an array. This is a good idea because Python lists
are standard, well tested, efficient, etc.

However, as was pointed out in that lengthy discussion, if what you are
doing is accumulating is a whole bunch of numbers (ints, floats,
whatever), or particularly if you need to accumulate a data type that
plain python doesn't support, there is a lot of overhead involved: a
python float type is pretty heavyweight. If performance or memory use is 
  important, it might create issues. You can use and array.array, but it 
doesn't support all numpy types, particularly custom dtypes.

I talked about this on the cython list (as someone asked how to do 
accumulate in cython), and a few folks thought it would be useful, so I 
put together a prototype.

What I have in mind is very simple. It would be:
   - Only 1-d
   - Support append() and extend() methods
   - support indexing and slicing
   - Support any valid numpy dtype
     - which could even get you pseudo n-d arrays...
   - maybe it would act like an array in other ways, I'm not so sure.
     - ufuncs, etc.

It could take the place of using python lists/arrays when you really 
want a numpy array, but don't know how big it will be until you've 
filled it.

The implementation I have now uses a regular numpy array as the 
"buffer". The buffer is re-sized as needed with ndarray.resize(). I've 
enclosed the class, a bunch of tests (This is the first time I've ever 
really done test-driven development, though I wouldn't say that this is 
a complete test suite).

A few notes about this implementation:

  * the name of the class could be better, and so could some of the 
method names.

  * on further thought, I think it could handle n-d arrays, as long as 
you only accumulated along the first index.

  * It could use a bunch more methods
    - deleting part of eh array
    - math
    - probably anything supported by array.array would be good.

  * Robert pointed me to the array.array implimentation to see how it 
expands the buffer as you append. It did tricks to get it to grow fast 
when the array is very small, then eventually to add about 1/16 of the 
used array size to the buffer. I imagine that this would gets used 
because you were likely to have a big array, so I didn't bother and 
start with a buffer at 128 elements, then add 1/4 each time you need to 
expand -- these are both tweakable attributes.

  * I did a little simple profiling, and discovered that it's slower 
than a python list by a factor of more than 2 (for accumulating python 
ints, anyway). With a bit of experimentation, I think that's because of 
a couple factors:
   - an extra function call -- the append() method needs to then do an 
assignemt to the buffer
   - Object conversion -- python lists store python objects, so the 
python int can jsut go right in there. with numpy, it needs to be 
converted to a C int first -- a bit if extra overhead.


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From robert.kern at gmail.com  Sat Oct  3 03:32:13 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 3 Oct 2009 02:32:13 -0500
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4AC6FCB9.5040107@noaa.gov>
References: <4AC6FCB9.5040107@noaa.gov>
Message-ID: <3d375d730910030032w2f5639c4p34c6de292d063335@mail.gmail.com>

On Sat, Oct 3, 2009 at 02:26, Christopher Barker <Chris.Barker at noaa.gov> wrote:
> The implementation I have now uses a regular numpy array as the
> "buffer". The buffer is re-sized as needed with ndarray.resize(). I've
> enclosed the class, a bunch of tests (This is the first time I've ever
> really done test-driven development, though I wouldn't say that this is
> a complete test suite).

Forgot the attachment?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Chris.Barker at noaa.gov  Sat Oct  3 03:38:26 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Sat, 03 Oct 2009 00:38:26 -0700
Subject: [Numpy-discussion] A numpy accumulator...
Message-ID: <4AC6FF72.1030308@noaa.gov>

(I clicked send too early the last time -- sorry about that!)

Hi all,

This idea was inspired by a discussion at the SciPy conference, in which
we spent a LOT of time during the numpy tutorial talking about how to 
accumulate values in an array when you don't know how big the array 
needs to be when you start.

The "standard practice" is to accumulate in a python list, then convert
the final result into an array. This is a good idea because Python lists
are standard, well tested, efficient, etc.

However, as was pointed out in that lengthy discussion, if what you are
doing is accumulating is a whole bunch of numbers (ints, floats,
whatever), or particularly if you need to accumulate a data type that
plain python doesn't support, there is a lot of overhead involved: a
python float type is pretty heavyweight. If performance or memory use is
  important, it might create issues. You can use and array.array, but it
doesn't support all numpy types, particularly custom dtypes.

I talked about this on the cython list (as someone asked how to do
accumulate in cython), and a few folks thought it would be useful, so I
put together a prototype.

What I have in mind is very simple. It would be:
   - Only 1-d
   - Support append() and extend() methods
   - support indexing and slicing
   - Support any valid numpy dtype
     - which could even get you pseudo n-d arrays...
   - maybe it would act like an array in other ways, I'm not so sure.
     - ufuncs, etc.

It could take the place of using python lists/arrays when you really
want a numpy array, but don't know how big it will be until you've
filled it.

The implementation I have now uses a regular numpy array as the
"buffer". The buffer is re-sized as needed with ndarray.resize(). I've
enclosed the class, a bunch of tests (This is the first time I've ever
really done test-driven development, though I wouldn't say that this is
a complete test suite).

A few notes about this implementation:

  * the name of the class could be better, and so could some of the
method names.

  * on further thought, I think it could handle n-d arrays, as long as
you only accumulated along the first index.

  * It could use a bunch more methods
    - deleting part of the array
    - math
    - probably anything supported by array.array would be good.

  * Robert pointed me to the array.array implimentation to see how it
expands the buffer as you append. It did tricks to get it to grow fast
when the array is very small, then eventually to add about 1/16 of the
used array size to the buffer. I imagine that this would gets used
because you were likely to have a big array, so I didn't bother and
start with a buffer at 128 elements, then add 1/4 each time you need to
expand -- these are both tweakable attributes.

  * I'm keeping the buffer a hidden variable, and slicing and __array__ 
return copies - this is so that it won't get multiple references, and 
then not be expandable.

  * I did a little simple profiling, and discovered that it's slower
than a python list by a factor of more than 2 (for accumulating python
ints, anyway). With a bit of experimentation, I think that's because of
a couple factors:
   - an extra function call -- the append() method needs to then do an
assignment to the buffer
   - Object conversion -- python lists store python objects, so the
python int can just go right in there. with numpy, it needs to be
converted to a C int first -- a bit if extra overhead. Though a straight 
assignment into a pre-allocated array i faster than a list.

I think it's still an improvement for memory use.

Maybe it would be worth writing in C or Cython to avoid some of this. In 
particular, it would be nice if you could use it in Cython, and put C 
types directly it...

  * This could be pretty useful for things like genfromtxt.

What do folks think? is this useful? What would you change, etc?


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Sat Oct  3 04:06:12 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Sat, 03 Oct 2009 01:06:12 -0700
Subject: [Numpy-discussion] A numpy accumulator...
Message-ID: <4AC705F4.1040702@noaa.gov>

OK -- this one I'm intending to send!

Hi all,

This idea was inspired by a discussion at the SciPy conference, in which
we spent a LOT of time during the numpy tutorial talking about how to
accumulate values in an array when you don't know how big the array
needs to be when you start.

The "standard practice" is to accumulate in a python list, then convert
the final result into an array. This is a good idea because Python lists
are standard, well tested, efficient, etc.

However, as was pointed out in that lengthy discussion, if what you are
doing is accumulating is a whole bunch of numbers (ints, floats,
whatever), or particularly if you need to accumulate a data type that
plain python doesn't support, there is a lot of overhead involved: a
python float type is pretty heavyweight. If performance or memory use is
  important, it might create issues. You can use and array.array, but it
doesn't support all numpy types, particularly custom dtypes.

I talked about this on the cython list (as someone asked how to do
accumulate in cython), and a few folks thought it would be useful, so I
put together a prototype.

What I have in mind is very simple. It would be:
   - Only 1-d
   - Support append() and extend() methods
   - support indexing and slicing
   - Support any valid numpy dtype
     - which could even get you pseudo n-d arrays...
   - maybe it would act like an array in other ways, I'm not so sure.
     - ufuncs, etc.

It could take the place of using python lists/arrays when you really
want a numpy array, but don't know how big it will be until you've
filled it.

The implementation I have now uses a regular numpy array as the
"buffer". The buffer is re-sized as needed with ndarray.resize(). I've
enclosed the class, a bunch of tests (This is the first time I've ever
really done test-driven development, though I wouldn't say that this is
a complete test suite).

A few notes about this implementation:

  * the name of the class could be better, and so could some of the
method names.

  * on further thought, I think it could handle n-d arrays, as long as
you only accumulated along the first index.

  * It could use a bunch more methods
    - deleting part of the array
    - math
    - probably anything supported by array.array would be good.

  * Robert pointed me to the array.array implementation to see how it
expands the buffer as you append. It did tricks to get it to grow fast
when the array is very small, then eventually to add about 1/16 of the
used array size to the buffer. I imagine that this would gets used
because you were likely to have a big array, so I didn't bother and
start with a buffer at 128 elements, then add 1/4 each time you need to
expand -- these are both tweakable attributes.

  * I'm keeping the buffer a hidden variable, and slicing and __array__
return copies - this is so that it won't get multiple references, and
then not be expandable.

  * I did a little simple profiling, and discovered that it's slower
than a python list by a factor of more than 2 (for accumulating python
ints, anyway). With a bit of experimentation, I think that's because of
a couple factors:
   - an extra function call -- the append() method needs to then do an
assignment to the buffer
   - Object conversion -- python lists store python objects, so the
python int can just go right in there. with numpy, it needs to be
converted to a C int first -- a bit if extra overhead. Though a straight
assignment into a pre-allocated array i faster than a list.

I think it's still an improvement for memory use.

Maybe it would be worth writing in C or Cython to avoid some of this. In
particular, it would be nice if you could use it in Cython, and put C
types directly it...

  * This could be pretty useful for things like genfromtxt.

What do folks think? is this useful? What would you change, etc?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: accumulator.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/4bf6ae06/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_accumulator.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/4bf6ae06/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: profile.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/4bf6ae06/attachment-0002.ksh>

From dagss at student.matnat.uio.no  Sat Oct  3 04:24:44 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Sat, 03 Oct 2009 10:24:44 +0200
Subject: [Numpy-discussion] ufunc and errors
In-Reply-To: <3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com>
References: <4AC36C79.1030202@student.matnat.uio.no>
	<3d375d730909300833o7b2db961m45d9ad26575955cf@mail.gmail.com>
Message-ID: <4AC70A4C.9080405@student.matnat.uio.no>

Robert Kern wrote:
> On Wed, Sep 30, 2009 at 09:34, Dag Sverre Seljebotn
> <dagss at student.matnat.uio.no> wrote:
>> I looked and looked in the docs, but couldn't find an answer to this:
>> When writing a ufunc, is it possible somehow to raise a Python exception
>> (by acquiring the GIL first to raise it, set a flag and a callback which
>> will be called with the GIL, or otherwise?).
> 
> You cannot acquire the GIL inside the loop. In order to do so, you
> would have to have access to the saved PyGILState_STATE which you
> don't.

I thought I could use PyGILState_Ensure (via Cython's "with gil" primitive):

http://docs.python.org/c-api/init.html?PyGILState_Ensure

(I've taken the rest of your email to heart, thanks.)


> 
>> Or should one always use
>> NaN even if the input does not make any sense (like, herhm, passing
>> anything but integers or half-integers to a Wigner 3j symbol).
> 
> You should use a NaN and ideally set the fpstatus to INVALID (creating
> the NaN may or may not do this; you will have to experiment). This
> will allow people to handle the issue as they wish using
> numpy.seterr(). An exception for just one value out of thousands is
> often undesirable.
> 
>> I know how I'd to it manually in a wrapper w/ passed in context if not,
>> but wanted to see.
>>
>> Also, will the arguments always be named x1, x2, x3, ..., or can I
>> somehow give them custom names?
> 
> The only place where names appear is in the docstring. Write whatever
> text you like.
> 


-- 
Dag Sverre


From dagss at student.matnat.uio.no  Sat Oct  3 12:06:32 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Sat, 03 Oct 2009 18:06:32 +0200
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4AC705F4.1040702@noaa.gov>
References: <4AC705F4.1040702@noaa.gov>
Message-ID: <4AC77688.2040601@student.matnat.uio.no>

Christopher Barker wrote:
> OK -- this one I'm intending to send!
> 
> Hi all,
> 
> This idea was inspired by a discussion at the SciPy conference, in which
> we spent a LOT of time during the numpy tutorial talking about how to
> accumulate values in an array when you don't know how big the array
> needs to be when you start.
> 
> The "standard practice" is to accumulate in a python list, then convert
> the final result into an array. This is a good idea because Python lists
> are standard, well tested, efficient, etc.
> 
> However, as was pointed out in that lengthy discussion, if what you are
> doing is accumulating is a whole bunch of numbers (ints, floats,
> whatever), or particularly if you need to accumulate a data type that
> plain python doesn't support, there is a lot of overhead involved: a
> python float type is pretty heavyweight. If performance or memory use is
>  important, it might create issues. You can use and array.array, but it
> doesn't support all numpy types, particularly custom dtypes.
> 
> I talked about this on the cython list (as someone asked how to do
> accumulate in cython), and a few folks thought it would be useful, so I
> put together a prototype.
> 
> What I have in mind is very simple. It would be:
>   - Only 1-d
>   - Support append() and extend() methods
>   - support indexing and slicing
>   - Support any valid numpy dtype
>     - which could even get you pseudo n-d arrays...
>   - maybe it would act like an array in other ways, I'm not so sure.
>     - ufuncs, etc.
> 
> It could take the place of using python lists/arrays when you really
> want a numpy array, but don't know how big it will be until you've
> filled it.
> 
> The implementation I have now uses a regular numpy array as the
> "buffer". The buffer is re-sized as needed with ndarray.resize(). I've
> enclosed the class, a bunch of tests (This is the first time I've ever
> really done test-driven development, though I wouldn't say that this is
> a complete test suite).
> 
> A few notes about this implementation:
> 
>  * the name of the class could be better, and so could some of the
> method names.
> 
>  * on further thought, I think it could handle n-d arrays, as long as
> you only accumulated along the first index.
> 
>  * It could use a bunch more methods
>    - deleting part of the array
>    - math
>    - probably anything supported by array.array would be good.
> 
>  * Robert pointed me to the array.array implementation to see how it
> expands the buffer as you append. It did tricks to get it to grow fast
> when the array is very small, then eventually to add about 1/16 of the
> used array size to the buffer. I imagine that this would gets used
> because you were likely to have a big array, so I didn't bother and
> start with a buffer at 128 elements, then add 1/4 each time you need to
> expand -- these are both tweakable attributes.
> 
>  * I'm keeping the buffer a hidden variable, and slicing and __array__
> return copies - this is so that it won't get multiple references, and
> then not be expandable.
> 
>  * I did a little simple profiling, and discovered that it's slower
> than a python list by a factor of more than 2 (for accumulating python
> ints, anyway). With a bit of experimentation, I think that's because of
> a couple factors:
>   - an extra function call -- the append() method needs to then do an
> assignment to the buffer
>   - Object conversion -- python lists store python objects, so the
> python int can just go right in there. with numpy, it needs to be
> converted to a C int first -- a bit if extra overhead. Though a straight
> assignment into a pre-allocated array i faster than a list.
> 
> I think it's still an improvement for memory use.
> 
> Maybe it would be worth writing in C or Cython to avoid some of this. In
> particular, it would be nice if you could use it in Cython, and put C
> types directly it...
> 
>  * This could be pretty useful for things like genfromtxt.
> 
> What do folks think? is this useful? What would you change, etc?

I'd drop the __getslice__ as it is deprecated (in Python 3 it is 
removed). Slices will be passed as "slice" objects to __getitem__ if you 
don't provide __getslice__.

One could support myaccumulator[[1,2,3]] as well in __getitem__, 
although I guess it gets a little hairy as you must seek through the 
array-like object passed and see to it that no values are too large.

-- 
Dag Sverre


From gokhansever at gmail.com  Sat Oct  3 12:04:36 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Sat, 3 Oct 2009 11:04:36 -0500
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4AC6FCB9.5040107@noaa.gov>
References: <4AC6FCB9.5040107@noaa.gov>
Message-ID: <49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com>

On Sat, Oct 3, 2009 at 2:26 AM, Christopher Barker <Chris.Barker at noaa.gov>wrote:

> Hasi all,
>
> This idea was  inspired by a discussion at SciPY, in which we spent a
> LOT of time during the numpy tutorial talking about how to accumulate
> values in an array when you don't know how big the array needs to be
> when you start.
>
> The "standard practice" is to accumulate in a python list, then convert
> the final result into an array. This is a good idea because Python lists
> are standard, well tested, efficient, etc.
>
> However, as was pointed out in that lengthy discussion, if what you are
> doing is accumulating is a whole bunch of numbers (ints, floats,
> whatever), or particularly if you need to accumulate a data type that
> plain python doesn't support, there is a lot of overhead involved: a
> python float type is pretty heavyweight. If performance or memory use is
>  important, it might create issues. You can use and array.array, but it
> doesn't support all numpy types, particularly custom dtypes.
>
> I talked about this on the cython list (as someone asked how to do
> accumulate in cython), and a few folks thought it would be useful, so I
> put together a prototype.
>
> What I have in mind is very simple. It would be:
>   - Only 1-d
>   - Support append() and extend() methods
>


Thanks for working on this. This append() method is a very handy for me,
when working with lists. It is exiting to hear that it will be ported to
ndarrays as well.

Any plans for insert() ?


>   - support indexing and slicing
>   - Support any valid numpy dtype
>     - which could even get you pseudo n-d arrays...
>   - maybe it would act like an array in other ways, I'm not so sure.
>     - ufuncs, etc.
>
> It could take the place of using python lists/arrays when you really
> want a numpy array, but don't know how big it will be until you've
> filled it.
>
> The implementation I have now uses a regular numpy array as the
> "buffer". The buffer is re-sized as needed with ndarray.resize(). I've
> enclosed the class, a bunch of tests (This is the first time I've ever
> really done test-driven development, though I wouldn't say that this is
> a complete test suite).
>
> A few notes about this implementation:
>
>  * the name of the class could be better, and so could some of the
> method names.
>
>  * on further thought, I think it could handle n-d arrays, as long as
> you only accumulated along the first index.
>
>  * It could use a bunch more methods
>    - deleting part of eh array
>    - math
>    - probably anything supported by array.array would be good.
>
>  * Robert pointed me to the array.array implimentation to see how it
> expands the buffer as you append. It did tricks to get it to grow fast
> when the array is very small, then eventually to add about 1/16 of the
> used array size to the buffer. I imagine that this would gets used
> because you were likely to have a big array, so I didn't bother and
> start with a buffer at 128 elements, then add 1/4 each time you need to
> expand -- these are both tweakable attributes.
>
>  * I did a little simple profiling, and discovered that it's slower
> than a python list by a factor of more than 2 (for accumulating python
> ints, anyway). With a bit of experimentation, I think that's because of
> a couple factors:
>   - an extra function call -- the append() method needs to then do an
> assignemt to the buffer
>   - Object conversion -- python lists store python objects, so the
> python int can jsut go right in there. with numpy, it needs to be
> converted to a C int first -- a bit if extra overhead.
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/8b7e8d4f/attachment.html>

From Chris.Barker at noaa.gov  Sat Oct  3 18:08:38 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Sat, 03 Oct 2009 15:08:38 -0700
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com>
References: <4AC6FCB9.5040107@noaa.gov>
	<49d6b3500910030904v208b96e2w3b6b61fbc42718ab@mail.gmail.com>
Message-ID: <4AC7CB66.3020705@noaa.gov>

G?khan Sever wrote:

> Thanks for working on this. This append() method is a very handy for me, 
> when working with lists. It is exiting to hear that it will be ported to 
> ndarrays as well.

not exactly ported -- this will be a special, limited-use class.

> Any plans for insert() ?

I wouldn't say I have any plans at all -- but yes, insert() would be good.

Dag Sverre Seljebotn wrote:
> I'd drop the __getslice__ as it is deprecated (in Python 3 it is 
> removed). Slices will be passed as "slice" objects to __getitem__ if you 
> don't provide __getslice__.

I noticed that, but didn't know about the deprecation -- I'll refactor that.

> One could support myaccumulator[[1,2,3]] as well in __getitem__, 

good idea.

> although I guess it gets a little hairy as you must seek through the 
> array-like object passed and see to it that no values are too large.

well, it wouldn't hard, though it might be slow...I'll give it a try and 
see how it works out.

thanks,
-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From charlesr.harris at gmail.com  Sat Oct  3 21:12:29 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 3 Oct 2009 19:12:29 -0600
Subject: [Numpy-discussion] rc1 of the chebyshev module.
Message-ID: <e06186140910031812h3d7e3629r3828cb11064a8969@mail.gmail.com>

Attached is the first rc of the chebyshev module. The module documentation
is not yet complete and no doubt the rest of the documentation needs to be
reviewed. The tests cover basic functionality at this point but need to be
extended to cover the Chebyshev object. Nevertheless, the module should be
usable.

Note that the most convenient way to do the least squared fits is with the
static method Chebyshev.fit, which will return a Chebyshev object that
contains both the resulting Chebyshev series and its domain.

Some naming questions remain. ISTM that "lstsq" or "leastsq" might be a
better name than fit. Likewise, I have kept the poly1d names "deriv" and
"integ", but "der" and "int" might be more appropriate.

Operators behave as expected for +, -, and * but there is no truedivision
unless both operands can be interpreted as scalars. When division hasn't
been imported from __future__, the / and // operators are both floordivision
and % returns the remainder. Divmod behaves as expected.

Any feedback is welcome.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/394b600e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chebyshev.zip
Type: application/zip
Size: 10861 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/394b600e/attachment.zip>

From charlesr.harris at gmail.com  Sat Oct  3 22:18:53 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 3 Oct 2009 20:18:53 -0600
Subject: [Numpy-discussion] rc1 of the chebyshev module.
In-Reply-To: <e06186140910031812h3d7e3629r3828cb11064a8969@mail.gmail.com>
References: <e06186140910031812h3d7e3629r3828cb11064a8969@mail.gmail.com>
Message-ID: <e06186140910031918h351ef59dmfd2d4890c8a1322f@mail.gmail.com>

On Sat, Oct 3, 2009 at 7:12 PM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

> Attached is the first rc of the chebyshev module. The module documentation
> is not yet complete and no doubt the rest of the documentation needs to be
> reviewed. The tests cover basic functionality at this point but need to be
> extended to cover the Chebyshev object. Nevertheless, the module should be
> usable.
>
> Note that the most convenient way to do the least squared fits is with the
> static method Chebyshev.fit, which will return a Chebyshev object that
> contains both the resulting Chebyshev series and its domain.
>
> Some naming questions remain. ISTM that "lstsq" or "leastsq" might be a
> better name than fit. Likewise, I have kept the poly1d names "deriv" and
> "integ", but "der" and "int" might be more appropriate.
>
> Operators behave as expected for +, -, and * but there is no truedivision
> unless both operands can be interpreted as scalars. When division hasn't
> been imported from __future__, the / and // operators are both floordivision
> and % returns the remainder. Divmod behaves as expected.
>
> Any feedback is welcome.
>
>
And an updated test to reflect changes in treatment of leading zeros.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/5d1b86b9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_chebyshev.py
Type: text/x-python
Size: 6571 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/5d1b86b9/attachment.py>

From faltet at pytables.org  Mon Oct  5 04:53:02 2009
From: faltet at pytables.org (Francesc Alted)
Date: Mon, 5 Oct 2009 10:53:02 +0200
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4AC705F4.1040702@noaa.gov>
References: <4AC705F4.1040702@noaa.gov>
Message-ID: <200910051053.05259.faltet@pytables.org>

A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?:
> OK -- this one I'm intending to send!
>
> Hi all,
>
> This idea was inspired by a discussion at the SciPy conference, in which
> we spent a LOT of time during the numpy tutorial talking about how to
> accumulate values in an array when you don't know how big the array
> needs to be when you start.
>
> The "standard practice" is to accumulate in a python list, then convert
> the final result into an array. This is a good idea because Python lists
> are standard, well tested, efficient, etc.
>
> However, as was pointed out in that lengthy discussion, if what you are
> doing is accumulating is a whole bunch of numbers (ints, floats,
> whatever), or particularly if you need to accumulate a data type that
> plain python doesn't support, there is a lot of overhead involved: a
> python float type is pretty heavyweight. If performance or memory use is
>   important, it might create issues. You can use and array.array, but it
> doesn't support all numpy types, particularly custom dtypes.
>
> I talked about this on the cython list (as someone asked how to do
> accumulate in cython), and a few folks thought it would be useful, so I
> put together a prototype.
>
> What I have in mind is very simple. It would be:
>    - Only 1-d
>    - Support append() and extend() methods
>    - support indexing and slicing
>    - Support any valid numpy dtype
>      - which could even get you pseudo n-d arrays...
>    - maybe it would act like an array in other ways, I'm not so sure.
>      - ufuncs, etc.
>
> It could take the place of using python lists/arrays when you really
> want a numpy array, but don't know how big it will be until you've
> filled it.
>
> The implementation I have now uses a regular numpy array as the
> "buffer". The buffer is re-sized as needed with ndarray.resize(). I've
> enclosed the class, a bunch of tests (This is the first time I've ever
> really done test-driven development, though I wouldn't say that this is
> a complete test suite).
>
> A few notes about this implementation:
>
>   * the name of the class could be better, and so could some of the
> method names.
>
>   * on further thought, I think it could handle n-d arrays, as long as
> you only accumulated along the first index.
>
>   * It could use a bunch more methods
>     - deleting part of the array
>     - math
>     - probably anything supported by array.array would be good.
>
>   * Robert pointed me to the array.array implementation to see how it
> expands the buffer as you append. It did tricks to get it to grow fast
> when the array is very small, then eventually to add about 1/16 of the
> used array size to the buffer. I imagine that this would gets used
> because you were likely to have a big array, so I didn't bother and
> start with a buffer at 128 elements, then add 1/4 each time you need to
> expand -- these are both tweakable attributes.
>
>   * I'm keeping the buffer a hidden variable, and slicing and __array__
> return copies - this is so that it won't get multiple references, and
> then not be expandable.
>
>   * I did a little simple profiling, and discovered that it's slower
> than a python list by a factor of more than 2 (for accumulating python
> ints, anyway). With a bit of experimentation, I think that's because of
> a couple factors:
>    - an extra function call -- the append() method needs to then do an
> assignment to the buffer
>    - Object conversion -- python lists store python objects, so the
> python int can just go right in there. with numpy, it needs to be
> converted to a C int first -- a bit if extra overhead. Though a straight
> assignment into a pre-allocated array i faster than a list.
>
> I think it's still an improvement for memory use.
>
> Maybe it would be worth writing in C or Cython to avoid some of this. In
> particular, it would be nice if you could use it in Cython, and put C
> types directly it...
>
>   * This could be pretty useful for things like genfromtxt.
>
> What do folks think? is this useful? What would you change, etc?

That's interesting.  I'd normally use the `resize()` method for what you want, 
but indeed your approach is way more easy-to-use.

If you are looking for performance improvements, I'd have a look at the 
`PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk).  It 
seems to me that the zero-initialization of added memory can be skipped, 
allowing for more performance for the `resize()` method (most specially for 
large size increments).  A new parameter (say, ``zero_init=True``) could be 
added to `resize()` to specify that you don't want the memory initialized.

-- 
Francesc Alted


From sebastian.walter at gmail.com  Mon Oct  5 05:37:39 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Mon, 5 Oct 2009 11:37:39 +0200
Subject: [Numpy-discussion] poly class question
In-Reply-To: <1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
	<1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
	<e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>
	<1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com>
Message-ID: <ec9f80fa0910050237j540df815n70f45cf88627f8e2@mail.gmail.com>

On Fri, Oct 2, 2009 at 10:40 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Fri, Oct 2, 2009 at 12:30 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
>>> > <charlesr.harris at gmail.com> wrote:
>>> >>
>>> >>
>>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >>>
>>> >>>
>>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
>>> >>> <charlesr.harris at gmail.com> wrote:
>>> >>>>
>>> >>>>
>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>> >>>>>
>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>> >>>>> the inverse of a polynomial (for a finite number of terms)
>>> >>>>>
>>> >>>>> np.poly1d([ -0.8, 1])**(-1)
>>> >>>>>
>>> >>>>> application for example the MA representation of an AR(1)
>>> >>>>>
>>> >>>>
>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar
>>> >>>> by
>>> >>>> a chebyshev series is
>>> >>>> expressly forbidden, but it could be included if a good interface is
>>> >>>> proposed. Same would go for polynomials.
>>> >>>
>>> >>> In fact is isn't hard to get, for poly1d you should be able to
>>> >>> multiply
>>> >>> the series by a power of x to shift it left, then divide.
>>> >>>
>>> >>
>>> >> That is, divide a power of x by the polynomial.
>>> >>
>>> >
>>> > You will also need to reverse the denominator coefficients...Chuck
>>>
>>> That's the hint I needed. However the polynomial coefficients are then
>>> reversed and not consistent with other polynomial operations, aren't
>>> they?
>>>
>>> >>> from scipy.signal import lfilter
>>>
>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
>>> (poly1d([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>>>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,
>>> 0.13421773]), poly1d([ 0.10737418]))
>>>
>>> >>> lfilter([1], [1,-0.8], [1] + [0]*9)
>>> array([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>>>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,  0.13421773])
>>>
>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
>>> (poly1d([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>>>        0.01408   , -0.001856  , -0.0043008 , -0.00306944]),
>>> poly1d([-0.00159539,  0.00061389]))
>>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
>>> array([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>>>        0.01408   , -0.001856  , -0.0043008 , -0.00306944, -0.00159539])
>>>
>>>
>>> What I meant initally doesn't necessarily mean division of a scalar.
>>>
>>> >>> np.poly1d([1])/np.poly1d([-0.8, 1])
>>> (poly1d([ 0.]), poly1d([ 1.]))
>>>
>>> I didn't find any polynomial division that does the expansion of the
>>> remainder. The same problem, I think is inherited, by the
>>> scipy.signal.lti, and it took me a while to find the usefulness of
>>> lfilter in this case.
>>>
>>> If it were possible to extend the methods for the polynomial class to
>>> do a longer expansions, it would make them more useful for arma and
>>> lti.
>>>
>>> (in some areas, I'm still trying to figure out whether some
>>> functionality is just hidden to me, or actually a limitation of the
>>> implementation or a missing feature.)
>>>
>>
>> Could you describe the sort of problems you want to solve? There are lots of
>> curious things out there we could maybe work with. Covariances, for
>> instance, are closely related to Chebyshev series.
>
> I am working on a discrete time arma process of the form
>
> a(L) x_t = b(L) u_t,  where L is the lag operator L^k x_t = x_(t-k)
>
> what I just programmed using lfilter is
> x_t = b(L)/a(L) u_t  where  b(L)/a(L) is the impulse response function
> or moving average representation
>
>  a(L)/b(L)  is the autoregressive representation
>
> the extension
> a(L)(1-L)^d x_t = b(L) u_t,  where d = 0,1,2,...  (standard) or  also
> continuous d <1 (fractional integration)
>
>  a(L)/b(L),  b(L)/a(L)  (1-L)^(-d)  or (1-L)^d (0<d<1)  are infinite
> dimensional lag polynomials in the general case.
> Initially I was looking for an easy way to do these calculation as polynomials.
> (The fractional case (1-L)^d (0<d<1) might be a pretty special case,
> and I just looked it up today, but is a popular model class in
> econometrics, fractionally integrated arma processes)
>
> multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1])
> (with reversed poly coefficient scipy.signal I think)
>
> the functions in scipy.signal for lti are only for continuous time
> processes and use poly1d under the hood, which means for example
>
>>>> from scipy import signal
>>>> signal.impulse(([1, -0.8],[1]), N=10)
>    raise ValueError, "Improper transfer function."
> ValueError: Improper transfer function.
>
> while this works
>>>> signal.impulse(([1],[1, -0.8]), N=10)
>
> (It's been a while since I looked inside scipy.signal.lti)
>
> A separate issue would be the multivariate version VARMA, or MIMO in
> system modeling.  a(L), b(L) are matrix polynomials and x_t, u_t are
> 1d arrays evolving in time.
> But that is a different discussion.

I'm working on something that requires truncated
univariate/multivariate operations
on scalar, vector and matrix polynomials. I'm pretty sure I'm doing
something different
than what is used in MIMO.  Still, do you have a good reference for
implementation/complexity/stability of operations on multivariate
(matrix) polynomials?
Want to make sure I'm not reinventing the wheel ;)


>
> I'm not very familiar with Chebychev polynomials, the last time I
> wanted to use them I didn't see anything about their use as a base for
> functions in several variables and gave up. I've seen papers that use
> them as base for functions in one variable, but I'm not doing anything
> like this right now.
>
> Thanks,
>
> Josef
>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From denis-bz-py at t-online.de  Mon Oct  5 05:55:26 2009
From: denis-bz-py at t-online.de (denis bzowy)
Date: Mon, 5 Oct 2009 09:55:26 +0000 (UTC)
Subject: [Numpy-discussion] Convert data into rectangular grid
References: <df39e47d0909281619t65bd638apc296c835035c91f6@mail.gmail.com>
	<loom.20090930T174753-267@post.gmane.org>
	<df39e47d0909301756g6435442dj477f7671546d36c5@mail.gmail.com>
Message-ID: <loom.20091005T114449-801@post.gmane.org>

jah <jah.mailinglist <at> gmail.com> writes:

> Thanks all.? Robert, griddata is exactly what I was looking for.? David, I
think that should work too.? And Denis, griddata is sufficiently fast that I am
not complaining---contouring about 1e6 or 1e7 points typically.
> 

Fyinfo, take a look at http://yt.enzotools.org
"YT is an analysis and visualization system written in Python, designed for use
with Adaptive Mesh Refinement codes ..."
I haven't used it, but the doc and pictures are terrific, top 2 % or better


From pearu.peterson at gmail.com  Mon Oct  5 06:47:33 2009
From: pearu.peterson at gmail.com (Pearu Peterson)
Date: Mon, 05 Oct 2009 13:47:33 +0300
Subject: [Numpy-discussion] ANN: a journal paper about F2PY has been
	published
Message-ID: <4AC9CEC5.6050705@cens.ioc.ee>


-------- Original Message --------
Subject: [f2py] ANN: a journal paper about F2PY has been published
Date: Mon, 05 Oct 2009 11:52:20 +0300
From: Pearu Peterson <pearu.peterson at gmail.com>
Reply-To: For users of the f2py program <f2py-users at cens.ioc.ee>
To: For users of the f2py program <f2py-users at cens.ioc.ee>

Hi,

A journal paper about F2PY has been published in International Journal
of Computational Science and Engineering:

  Peterson, P. (2009) 'F2PY: a tool for connecting Fortran and Python
  programs', Int. J. Computational Science and Engineering.
  Vol.4, No. 4, pp.296-305.

So, if you would like to cite F2PY in a paper or presentation, using
this reference is recommended.

Interscience Publishers will update their web pages with the new journal
number within few weeks. A softcopy of the article
available in my homepage:
  http://cens.ioc.ee/~pearu/papers/IJCSE4.4_Paper_8.pdf

Best regards,
Pearu

_______________________________________________
f2py-users mailing list
f2py-users at cens.ioc.ee
http://cens.ioc.ee/mailman/listinfo/f2py-users


From denis-bz-py at t-online.de  Mon Oct  5 08:53:03 2009
From: denis-bz-py at t-online.de (denis bzowy)
Date: Mon, 5 Oct 2009 12:53:03 +0000 (UTC)
Subject: [Numpy-discussion] numpy-discussion in google groups ?
Message-ID: <loom.20091005T144808-127@post.gmane.org>

Folks,
  http://groups.google.com/group/numpy-discussion
->
The group named numpy-discussion has been removed because it violated Google's
Terms Of Service

however scipy-user is there; how come ?

I like google groups for its viewer, otherwise don't care much.
What mail / group viewer do experts use ?

cheers
  -- denis


From paul at rudin.co.uk  Mon Oct  5 09:00:39 2009
From: paul at rudin.co.uk (Paul Rudin)
Date: Mon, 05 Oct 2009 14:00:39 +0100
Subject: [Numpy-discussion] numpy-discussion in google groups ?
References: <loom.20091005T144808-127@post.gmane.org>
Message-ID: <87ljjq572g.fsf@rudin.co.uk>


denis bzowy <denis-bz-py at t-online.de> writes:

> Folks,
>   http://groups.google.com/group/numpy-discussion
> ->
> The group named numpy-discussion has been removed because it violated Google's
> Terms Of Service
>
> however scipy-user is there; how come ?
>
> I like google groups for its viewer, otherwise don't care much.
> What mail / group viewer do experts use ?

I don't think the "expert" bit applies, but I read via NNTP from gmane.


From josef.pktd at gmail.com  Mon Oct  5 09:44:42 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 5 Oct 2009 09:44:42 -0400
Subject: [Numpy-discussion] numpy-discussion in google groups ?
In-Reply-To: <87ljjq572g.fsf@rudin.co.uk>
References: <loom.20091005T144808-127@post.gmane.org>
	<87ljjq572g.fsf@rudin.co.uk>
Message-ID: <1cd32cbb0910050644o26ab84d9w5f5d89c62b5f79f4@mail.gmail.com>

On Mon, Oct 5, 2009 at 9:00 AM, Paul Rudin <paul at rudin.co.uk> wrote:
>
> denis bzowy <denis-bz-py at t-online.de> writes:
>
>> Folks,
>> ? http://groups.google.com/group/numpy-discussion
>> ->
>> The group named numpy-discussion has been removed because it violated Google's
>> Terms Of Service
>>
>> however scipy-user is there; how come ?

I also used the google groups for numpy-discussion for a long time. It had
disappeared for a while. And after it reappeared, it got hit by some
("adult-material") spam and was removed by Google. I don't know who the
administrator for the google groups mirroring is.

Now, I'm subscribed to the list and just read it in gmail.

Josef

>>
>> I like google groups for its viewer, otherwise don't care much.
>> What mail / group viewer do experts use ?
>
> I don't think the "expert" bit applies, but I read via NNTP from gmane.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From josef.pktd at gmail.com  Mon Oct  5 10:52:01 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 5 Oct 2009 10:52:01 -0400
Subject: [Numpy-discussion] poly class question
In-Reply-To: <ec9f80fa0910050237j540df815n70f45cf88627f8e2@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
	<1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
	<e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>
	<1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com>
	<ec9f80fa0910050237j540df815n70f45cf88627f8e2@mail.gmail.com>
Message-ID: <1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com>

On Mon, Oct 5, 2009 at 5:37 AM, Sebastian Walter
<sebastian.walter at gmail.com> wrote:
> On Fri, Oct 2, 2009 at 10:40 PM, ?<josef.pktd at gmail.com> wrote:
>> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>> On Fri, Oct 2, 2009 at 12:30 PM, <josef.pktd at gmail.com> wrote:
>>>>
>>>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
>>>> <charlesr.harris at gmail.com> wrote:
>>>> >
>>>> >
>>>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
>>>> > <charlesr.harris at gmail.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
>>>> >> <charlesr.harris at gmail.com> wrote:
>>>> >>>
>>>> >>>
>>>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
>>>> >>> <charlesr.harris at gmail.com> wrote:
>>>> >>>>
>>>> >>>>
>>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>>> >>>>> the inverse of a polynomial (for a finite number of terms)
>>>> >>>>>
>>>> >>>>> np.poly1d([ -0.8, 1])**(-1)
>>>> >>>>>
>>>> >>>>> application for example the MA representation of an AR(1)
>>>> >>>>>
>>>> >>>>
>>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar
>>>> >>>> by
>>>> >>>> a chebyshev series is
>>>> >>>> expressly forbidden, but it could be included if a good interface is
>>>> >>>> proposed. Same would go for polynomials.
>>>> >>>
>>>> >>> In fact is isn't hard to get, for poly1d you should be able to
>>>> >>> multiply
>>>> >>> the series by a power of x to shift it left, then divide.
>>>> >>>
>>>> >>
>>>> >> That is, divide a power of x by the polynomial.
>>>> >>
>>>> >
>>>> > You will also need to reverse the denominator coefficients...Chuck
>>>>
>>>> That's the hint I needed. However the polynomial coefficients are then
>>>> reversed and not consistent with other polynomial operations, aren't
>>>> they?
>>>>
>>>> >>> from scipy.signal import lfilter
>>>>
>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
>>>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?,
>>>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216,
>>>> 0.13421773]), poly1d([ 0.10737418]))
>>>>
>>>> >>> lfilter([1], [1,-0.8], [1] + [0]*9)
>>>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.64 ? ? ?, ?0.512 ? ? , ?0.4096 ? ?,
>>>> ? ? ? ?0.32768 ? , ?0.262144 ?, ?0.2097152 , ?0.16777216, ?0.13421773])
>>>>
>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
>>>> (poly1d([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?,
>>>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944]),
>>>> poly1d([-0.00159539, ?0.00061389]))
>>>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
>>>> array([ 1. ? ? ? ?, ?0.8 ? ? ? , ?0.44 ? ? ?, ?0.192 ? ? , ?0.0656 ? ?,
>>>> ? ? ? ?0.01408 ? , -0.001856 ?, -0.0043008 , -0.00306944, -0.00159539])
>>>>
>>>>
>>>> What I meant initally doesn't necessarily mean division of a scalar.
>>>>
>>>> >>> np.poly1d([1])/np.poly1d([-0.8, 1])
>>>> (poly1d([ 0.]), poly1d([ 1.]))
>>>>
>>>> I didn't find any polynomial division that does the expansion of the
>>>> remainder. The same problem, I think is inherited, by the
>>>> scipy.signal.lti, and it took me a while to find the usefulness of
>>>> lfilter in this case.
>>>>
>>>> If it were possible to extend the methods for the polynomial class to
>>>> do a longer expansions, it would make them more useful for arma and
>>>> lti.
>>>>
>>>> (in some areas, I'm still trying to figure out whether some
>>>> functionality is just hidden to me, or actually a limitation of the
>>>> implementation or a missing feature.)
>>>>
>>>
>>> Could you describe the sort of problems you want to solve? There are lots of
>>> curious things out there we could maybe work with. Covariances, for
>>> instance, are closely related to Chebyshev series.
>>
>> I am working on a discrete time arma process of the form
>>
>> a(L) x_t = b(L) u_t, ?where L is the lag operator L^k x_t = x_(t-k)
>>
>> what I just programmed using lfilter is
>> x_t = b(L)/a(L) u_t ?where ?b(L)/a(L) is the impulse response function
>> or moving average representation
>>
>> ?a(L)/b(L) ?is the autoregressive representation
>>
>> the extension
>> a(L)(1-L)^d x_t = b(L) u_t, ?where d = 0,1,2,... ?(standard) or ?also
>> continuous d <1 (fractional integration)
>>
>> ?a(L)/b(L), ?b(L)/a(L) ?(1-L)^(-d) ?or (1-L)^d (0<d<1) ?are infinite
>> dimensional lag polynomials in the general case.
>> Initially I was looking for an easy way to do these calculation as polynomials.
>> (The fractional case (1-L)^d (0<d<1) might be a pretty special case,
>> and I just looked it up today, but is a popular model class in
>> econometrics, fractionally integrated arma processes)
>>
>> multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1])
>> (with reversed poly coefficient scipy.signal I think)
>>
>> the functions in scipy.signal for lti are only for continuous time
>> processes and use poly1d under the hood, which means for example
>>
>>>>> from scipy import signal
>>>>> signal.impulse(([1, -0.8],[1]), N=10)
>> ? ?raise ValueError, "Improper transfer function."
>> ValueError: Improper transfer function.
>>
>> while this works
>>>>> signal.impulse(([1],[1, -0.8]), N=10)
>>
>> (It's been a while since I looked inside scipy.signal.lti)
>>
>> A separate issue would be the multivariate version VARMA, or MIMO in
>> system modeling. ?a(L), b(L) are matrix polynomials and x_t, u_t are
>> 1d arrays evolving in time.
>> But that is a different discussion.
>


> I'm working on something that requires truncated
> univariate/multivariate operations
> on scalar, vector and matrix polynomials. I'm pretty sure I'm doing
> something different
> than what is used in MIMO. ?Still, do you have a good reference for
> implementation/complexity/stability of operations on multivariate
> (matrix) polynomials?
> Want to make sure I'm not reinventing the wheel ;)

Sorry, but I'm no help here. I was hoping to benefit from the knowledge
of others. I have used univariate and multivariate polynomials for special
cases for function approximation and time series analysis, but I know
little about the general numerical issues in this. For MIMO, I only looked
at the matlab systems toolbox, which would be an extension of scipy.signal.lti.

Since these operations are in my case usually inside an optimization loop
in a (supposedly) well behaved problem, I usually cared more about
speed and approximation errors in low order polynomials than numerical
stability.

If you invent the wheel, I would be very glad to use it.

Josef

>
>
>
>>
>> I'm not very familiar with Chebychev polynomials, the last time I
>> wanted to use them I didn't see anything about their use as a base for
>> functions in several variables and gave up. I've seen papers that use
>> them as base for functions in one variable, but I'm not doing anything
>> like this right now.
>>
>> Thanks,
>>
>> Josef
>>
>>>
>>> Chuck
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From sebastian.walter at gmail.com  Mon Oct  5 11:39:56 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Mon, 5 Oct 2009 17:39:56 +0200
Subject: [Numpy-discussion] poly class question
In-Reply-To: <1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com>
References: <1cd32cbb0910021008p6cfbd256n977bb72575ebf390@mail.gmail.com>
	<e06186140910021030y3fefca32nab5be442fef499d8@mail.gmail.com>
	<e06186140910021033w24b4f20dv48a66ab1811b4018@mail.gmail.com>
	<e06186140910021035tee41d4ald9e9a7f417cbe1df@mail.gmail.com>
	<e06186140910021109l6f6a8ca7qb1c566bd87eea8db@mail.gmail.com>
	<1cd32cbb0910021130g25bb0dc5ge6f1580e953d84e4@mail.gmail.com>
	<e06186140910021238g5f61d367ga1ca89d4cbe3fb78@mail.gmail.com>
	<1cd32cbb0910021340q77c8c592v1d0b2cb7278f737f@mail.gmail.com>
	<ec9f80fa0910050237j540df815n70f45cf88627f8e2@mail.gmail.com>
	<1cd32cbb0910050752m5f8e6bddia1d3c3d71d156cf0@mail.gmail.com>
Message-ID: <ec9f80fa0910050839t5ab195f4jfa7d2f4ab117d6f5@mail.gmail.com>

On Mon, Oct 5, 2009 at 4:52 PM,  <josef.pktd at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 5:37 AM, Sebastian Walter
> <sebastian.walter at gmail.com> wrote:
>> On Fri, Oct 2, 2009 at 10:40 PM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Oct 2, 2009 at 3:38 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>>
>>>>
>>>> On Fri, Oct 2, 2009 at 12:30 PM, <josef.pktd at gmail.com> wrote:
>>>>>
>>>>> On Fri, Oct 2, 2009 at 2:09 PM, Charles R Harris
>>>>> <charlesr.harris at gmail.com> wrote:
>>>>> >
>>>>> >
>>>>> > On Fri, Oct 2, 2009 at 11:35 AM, Charles R Harris
>>>>> > <charlesr.harris at gmail.com> wrote:
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Oct 2, 2009 at 11:33 AM, Charles R Harris
>>>>> >> <charlesr.harris at gmail.com> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>> On Fri, Oct 2, 2009 at 11:30 AM, Charles R Harris
>>>>> >>> <charlesr.harris at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Fri, Oct 2, 2009 at 11:08 AM, <josef.pktd at gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Is there a way in numpy (or scipy) to get an infinite expansion for
>>>>> >>>>> the inverse of a polynomial (for a finite number of terms)
>>>>> >>>>>
>>>>> >>>>> np.poly1d([ -0.8, 1])**(-1)
>>>>> >>>>>
>>>>> >>>>> application for example the MA representation of an AR(1)
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>> Hmm, I've been working on a chebyshev class and division of a scalar
>>>>> >>>> by
>>>>> >>>> a chebyshev series is
>>>>> >>>> expressly forbidden, but it could be included if a good interface is
>>>>> >>>> proposed. Same would go for polynomials.
>>>>> >>>
>>>>> >>> In fact is isn't hard to get, for poly1d you should be able to
>>>>> >>> multiply
>>>>> >>> the series by a power of x to shift it left, then divide.
>>>>> >>>
>>>>> >>
>>>>> >> That is, divide a power of x by the polynomial.
>>>>> >>
>>>>> >
>>>>> > You will also need to reverse the denominator coefficients...Chuck
>>>>>
>>>>> That's the hint I needed. However the polynomial coefficients are then
>>>>> reversed and not consistent with other polynomial operations, aren't
>>>>> they?
>>>>>
>>>>> >>> from scipy.signal import lfilter
>>>>>
>>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8])
>>>>> (poly1d([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>>>>>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,
>>>>> 0.13421773]), poly1d([ 0.10737418]))
>>>>>
>>>>> >>> lfilter([1], [1,-0.8], [1] + [0]*9)
>>>>> array([ 1.        ,  0.8       ,  0.64      ,  0.512     ,  0.4096    ,
>>>>>        0.32768   ,  0.262144  ,  0.2097152 ,  0.16777216,  0.13421773])
>>>>>
>>>>> >>> (np.poly1d([1, 0])**10)/np.poly1d([1, -0.8, 0.2])
>>>>> (poly1d([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>>>>>        0.01408   , -0.001856  , -0.0043008 , -0.00306944]),
>>>>> poly1d([-0.00159539,  0.00061389]))
>>>>> >>> lfilter([1], [1,-0.8, 0.2], [1] + [0]*9)
>>>>> array([ 1.        ,  0.8       ,  0.44      ,  0.192     ,  0.0656    ,
>>>>>        0.01408   , -0.001856  , -0.0043008 , -0.00306944, -0.00159539])
>>>>>
>>>>>
>>>>> What I meant initally doesn't necessarily mean division of a scalar.
>>>>>
>>>>> >>> np.poly1d([1])/np.poly1d([-0.8, 1])
>>>>> (poly1d([ 0.]), poly1d([ 1.]))
>>>>>
>>>>> I didn't find any polynomial division that does the expansion of the
>>>>> remainder. The same problem, I think is inherited, by the
>>>>> scipy.signal.lti, and it took me a while to find the usefulness of
>>>>> lfilter in this case.
>>>>>
>>>>> If it were possible to extend the methods for the polynomial class to
>>>>> do a longer expansions, it would make them more useful for arma and
>>>>> lti.
>>>>>
>>>>> (in some areas, I'm still trying to figure out whether some
>>>>> functionality is just hidden to me, or actually a limitation of the
>>>>> implementation or a missing feature.)
>>>>>
>>>>
>>>> Could you describe the sort of problems you want to solve? There are lots of
>>>> curious things out there we could maybe work with. Covariances, for
>>>> instance, are closely related to Chebyshev series.
>>>
>>> I am working on a discrete time arma process of the form
>>>
>>> a(L) x_t = b(L) u_t,  where L is the lag operator L^k x_t = x_(t-k)
>>>
>>> what I just programmed using lfilter is
>>> x_t = b(L)/a(L) u_t  where  b(L)/a(L) is the impulse response function
>>> or moving average representation
>>>
>>>  a(L)/b(L)  is the autoregressive representation
>>>
>>> the extension
>>> a(L)(1-L)^d x_t = b(L) u_t,  where d = 0,1,2,...  (standard) or  also
>>> continuous d <1 (fractional integration)
>>>
>>>  a(L)/b(L),  b(L)/a(L)  (1-L)^(-d)  or (1-L)^d (0<d<1)  are infinite
>>> dimensional lag polynomials in the general case.
>>> Initially I was looking for an easy way to do these calculation as polynomials.
>>> (The fractional case (1-L)^d (0<d<1) might be a pretty special case,
>>> and I just looked it up today, but is a popular model class in
>>> econometrics, fractionally integrated arma processes)
>>>
>>> multiplication works well np.poly1d([-1, 1])*np.poly1d([-0.8, 1])
>>> (with reversed poly coefficient scipy.signal I think)
>>>
>>> the functions in scipy.signal for lti are only for continuous time
>>> processes and use poly1d under the hood, which means for example
>>>
>>>>>> from scipy import signal
>>>>>> signal.impulse(([1, -0.8],[1]), N=10)
>>>    raise ValueError, "Improper transfer function."
>>> ValueError: Improper transfer function.
>>>
>>> while this works
>>>>>> signal.impulse(([1],[1, -0.8]), N=10)
>>>
>>> (It's been a while since I looked inside scipy.signal.lti)
>>>
>>> A separate issue would be the multivariate version VARMA, or MIMO in
>>> system modeling.  a(L), b(L) are matrix polynomials and x_t, u_t are
>>> 1d arrays evolving in time.
>>> But that is a different discussion.
>>
>
>
>> I'm working on something that requires truncated
>> univariate/multivariate operations
>> on scalar, vector and matrix polynomials. I'm pretty sure I'm doing
>> something different
>> than what is used in MIMO.  Still, do you have a good reference for
>> implementation/complexity/stability of operations on multivariate
>> (matrix) polynomials?
>> Want to make sure I'm not reinventing the wheel ;)
>
> Sorry, but I'm no help here. I was hoping to benefit from the knowledge
> of others. I have used univariate and multivariate polynomials for special
> cases for function approximation and time series analysis, but I know
> little about the general numerical issues in this. For MIMO, I only looked
> at the matlab systems toolbox, which would be an extension of scipy.signal.lti.
>
> Since these operations are in my case usually inside an optimization loop
> in a (supposedly) well behaved problem, I usually cared more about
> speed and approximation errors in low order polynomials than numerical
> stability.
>
> If you invent the wheel, I would be very glad to use it.

Well, I'd be happy to have a user of my code :). However, I'm not sure
what I'm doing really helps you:
the package ALGOPY  I'm working on  computes on truncated Taylor
polynomials x(t) = x_0 + x_1 t + x_2 t^2 + ... + x_{D-1}t^D
I have implemented most common functions like

z(t) = x(t)/y(t)
z(t) = x(t) * y(t)
z(t) = exp(x(t))
z(t) = sin(x(t))

z(t) has always the same degree as x(t) and y(t) (assumed to have the
same degree unless it is constant, i.e. z(t) = 1./x(t) works).
All this is implemented in a single class called UTPS in
http://github.com/b45ch1/algopy/blob/master/algopy/utp/utps.py

The actual reason for ALGOPY is the matrix polynomial class UTPM in
http://github.com/b45ch1/algopy/blob/master/algopy/utp/utpm.py

The rationale is that to compute derivatives of matrix valued
functions one should compute on truncated matrix polynomials.

One good example is to compute the Jacobian

J = [dy/dA ,dy/dx]

of
y = solve(A,x)

where y an (N,) array
A (N,N) array
x (N,) array

The underlying method is to generalize the solve-function to work on
polynomials,

i.e.
y(t) = solve( A(t), x(t))

where y(t) = y_0 + y_1 t + y_2 t^2 + ...
A(t) = A_0 + A_1 t + A_2 t^2 + ...
x(t) = x_0 + x_1 t + ...

You can have a look at
http://github.com/b45ch1/algopy/blob/master/algopy/utp/utpm.py#L257
I'm afraid the best tutorial I can give is a talk that I've given on a
small workshop:
http://github.com/b45ch1/algopy/raw/master/documentation/Seventh_EuroAd_Workshop-Sebastian_Walter-Higher_Order_Forward_and_Reverse_Mode_on_Matrices_with_Application_to_Optimal_Experimental_Design.pdf


Sebastian

>
> Josef
>
>>
>>
>>
>>>
>>> I'm not very familiar with Chebychev polynomials, the last time I
>>> wanted to use them I didn't see anything about their use as a base for
>>> functions in several variables and gave up. I've seen papers that use
>>> them as base for functions in one variable, but I'm not doing anything
>>> like this right now.
>>>
>>> Thanks,
>>>
>>> Josef
>>>
>>>>
>>>> Chuck
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From seb.haase at gmail.com  Mon Oct  5 11:56:27 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Mon, 5 Oct 2009 17:56:27 +0200
Subject: [Numpy-discussion] numpy.asum ?
Message-ID: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com>

Hi,

Is this a dumb question ?
Why is there no np.asum() equivalent to np.sum()  - like amax() to max() ?

Another question: what does it mean that amax() (and max()) is a
"function" while maximum() is a ufunc !?

>>> N.max
<function amax at 0x14071b8>
>>> N.maximum
<ufunc 'maximum'>
>>> N.amax
<function amax at 0x14071b8>

Is there a performance difference connected to this ?


Cheers,
Sebastian Haase


From robert.kern at gmail.com  Mon Oct  5 12:04:21 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 11:04:21 -0500
Subject: [Numpy-discussion] numpy.asum ?
In-Reply-To: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com>
References: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com>
Message-ID: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com>

On Mon, Oct 5, 2009 at 10:56, Sebastian Haase <seb.haase at gmail.com> wrote:
> Hi,
>
> Is this a dumb question ?
> Why is there no np.asum() equivalent to np.sum() ?- like amax() to max() ?

Back when Numeric was being written, max() and min() existed as
builtins, but sum() did not. In order to support "from Numeric import
*", the amax() aliases were added. sum() was added to the builtins
later, but no one went back to add an asum() alias.

> Another question: what does it mean that amax() (and max()) is a
> "function" while maximum() is a ufunc !?
>
>>>> N.max
> <function amax at 0x14071b8>
>>>> N.maximum
> <ufunc 'maximum'>
>>>> N.amax
> <function amax at 0x14071b8>
>
> Is there a performance difference connected to this ?

No. maximum(x,y) is a binary ufunc that takes two arrays and returns
an array with the element-wise maximum from between the two inputs.
amax(x) is an unary function that returns the maximum value in the
array. amax(x) is a convenience for maximum.reduce(x.flat).

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Chris.Barker at noaa.gov  Mon Oct  5 14:06:27 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 05 Oct 2009 11:06:27 -0700
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <200910051053.05259.faltet@pytables.org>
References: <4AC705F4.1040702@noaa.gov>
	<200910051053.05259.faltet@pytables.org>
Message-ID: <4ACA35A3.3030707@noaa.gov>

Francesc Alted wrote:
> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?:
>> This idea was inspired by a discussion at the SciPy conference, in which
>> we spent a LOT of time during the numpy tutorial talking about how to
>> accumulate values in an array when you don't know how big the array
>> needs to be when you start.

>> What I have in mind is very simple. It would be:
>>    - Only 1-d
>>    - Support append() and extend() methods
>>    - support indexing and slicing
>>    - Support any valid numpy dtype
>>      - which could even get you pseudo n-d arrays...
>>    - maybe it would act like an array in other ways, I'm not so sure.
>>      - ufuncs, etc.

> That's interesting.  I'd normally use the `resize()` method for what you want, 
> but indeed your approach is way more easy-to-use.

Of course, this is using resize() under the hood, but giving it an 
easier interface, but more importantly, it's adding the pre-allocation 
for you, and the code to deal with that. I suppose I should benchmark 
it, but I think calling resize(0 with every append would be a lot slower 
(though maybe not -- might the compiler/os be pre-allocating some extra 
memory anyway?)

I should profile this -- if you can call resize() with every new item, 
and it's not too slow, then it may not be worth writing this class at 
all (or I could make it simpler, maybe even an nd-array subclass instead.

> If you are looking for performance improvements, I'd have a look at the 
> `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk).  It 
> seems to me that the zero-initialization of added memory can be skipped, 
> allowing for more performance for the `resize()` method (most specially for 
> large size increments).

I suppose so, but I doubt that's causing any of my performance issues. 
Another thing to profile.

> A new parameter (say, ``zero_init=True``) could be 
> added to `resize()` to specify that you don't want the memory initialized.

That does seem like a good idea, but maybe over my head to implement.

Now I need some time to work on this some more...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From seb.haase at gmail.com  Mon Oct  5 14:37:16 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Mon, 5 Oct 2009 20:37:16 +0200
Subject: [Numpy-discussion] numpy.asum ?
In-Reply-To: <3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com>
References: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com> 
	<3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com>
Message-ID: <bc657ead0910051137r489dea32k541260de499b4481@mail.gmail.com>

Thanks for the reply.
I thought one reason for amax was that
from numpy import *
would not not import max but only amax.
How about sum ?
Does "from numpy import *"
overwrite the builtin sum ?
not to mention the "symmetry" / consistency argument for having "asum" ?

More comments ??

--Sebastian Haase


On Mon, Oct 5, 2009 at 6:04 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 10:56, Sebastian Haase <seb.haase at gmail.com> wrote:
>> Hi,
>>
>> Is this a dumb question ?
>> Why is there no np.asum() equivalent to np.sum() ?- like amax() to max() ?
>
> Back when Numeric was being written, max() and min() existed as
> builtins, but sum() did not. In order to support "from Numeric import
> *", the amax() aliases were added. sum() was added to the builtins
> later, but no one went back to add an asum() alias.
>
>> Another question: what does it mean that amax() (and max()) is a
>> "function" while maximum() is a ufunc !?
>>
>>>>> N.max
>> <function amax at 0x14071b8>
>>>>> N.maximum
>> <ufunc 'maximum'>
>>>>> N.amax
>> <function amax at 0x14071b8>
>>
>> Is there a performance difference connected to this ?
>
> No. maximum(x,y) is a binary ufunc that takes two arrays and returns
> an array with the element-wise maximum from between the two inputs.
> amax(x) is an unary function that returns the maximum value in the
> array. amax(x) is a convenience for maximum.reduce(x.flat).
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ?-- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From robert.kern at gmail.com  Mon Oct  5 14:43:30 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 13:43:30 -0500
Subject: [Numpy-discussion] numpy.asum ?
In-Reply-To: <bc657ead0910051137r489dea32k541260de499b4481@mail.gmail.com>
References: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com> 
	<3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> 
	<bc657ead0910051137r489dea32k541260de499b4481@mail.gmail.com>
Message-ID: <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com>

On Mon, Oct 5, 2009 at 13:37, Sebastian Haase <seb.haase at gmail.com> wrote:
> Thanks for the reply.
> I thought one reason for amax was that
> from numpy import *
> would not not import max but only amax.

I have my timelines confused. Numeric has neither amax() nor max(). I
don't actually recall the sequence of events, then.

> How about sum ?
> Does "from numpy import *"
> overwrite the builtin sum ?

Try it.

> not to mention the "symmetry" / consistency argument for having "asum" ?

At this point, I don't care to cater to "from numpy import *" use
case. Too much code uses numpy.sum() remove it, or even deprecate it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From pgmdevlist at gmail.com  Mon Oct  5 15:13:40 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 5 Oct 2009 15:13:40 -0400
Subject: [Numpy-discussion] genfromtxt - the return
Message-ID: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>

All,
Could you try r7449 ? I introduced some mechanisms to keep track of  
invalid lines (where the number of columns don't match what's  
expected). By default, a warning is emitted and these lines are  
skipped, but an optional argument gives the possibility to raise an  
exception instead.
Now, I need more tests about wrong converters. I'm trying to optimize  
the upgrade mechanism (there are too many intertwined loops for my  
taste now), I'll keep you posted.
Meanwhile, if you could come with more cases of failure, please send  
them my way.
Cheers
P.


From charlesr.harris at gmail.com  Mon Oct  5 15:54:53 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 5 Oct 2009 13:54:53 -0600
Subject: [Numpy-discussion] Easy way to test documentation?
Message-ID: <e06186140910051254x23b59864k5fc0221fa69e7fb2@mail.gmail.com>

Hi All,

Is there an easy way to test build documentation for a module that is not
yet part of numpy?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/d45792cc/attachment.html>

From seb.haase at gmail.com  Mon Oct  5 15:55:01 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Mon, 5 Oct 2009 21:55:01 +0200
Subject: [Numpy-discussion] numpy.asum ?
In-Reply-To: <3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com>
References: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com> 
	<3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> 
	<bc657ead0910051137r489dea32k541260de499b4481@mail.gmail.com> 
	<3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com>
Message-ID: <bc657ead0910051255g16d2689r264396d7b4001503@mail.gmail.com>

On Mon, Oct 5, 2009 at 8:43 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 13:37, Sebastian Haase <seb.haase at gmail.com> wrote:
>> Thanks for the reply.
>> I thought one reason for amax was that
>> from numpy import *
>> would not not import max but only amax.
>
> I have my timelines confused. Numeric has neither amax() nor max(). I
> don't actually recall the sequence of events, then.
>
>> How about sum ?
>> Does "from numpy import *"
>> overwrite the builtin sum ?
>
> Try it.
>
>>> sum
<built-in function sum>
>>> from numpy import *
>>> sum
<function sum at 0x0334E2B0>
>>> asum
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'asum' is not defined
>>> N.__version__
'1.3.0'
>>>

>> not to mention the "symmetry" / consistency argument for having "asum" ?
>
> At this point, I don't care to cater to "from numpy import *" use
> case. Too much code uses numpy.sum() remove it, or even deprecate it.
>

I did not mean to suggest to remove or deprecate it. I only remember
that there was a discussion - long time ago - that "from numpy import
*" (still common in many places, like interactive sessions) - should
not overwrite builtins ....
Personally, I would prefer to write np.amax and np.asum ... do you see
my argument for consistency here ?


- Sebastian


From robert.kern at gmail.com  Mon Oct  5 15:58:47 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 14:58:47 -0500
Subject: [Numpy-discussion] numpy.asum ?
In-Reply-To: <bc657ead0910051255g16d2689r264396d7b4001503@mail.gmail.com>
References: <bc657ead0910050856u55eadfdfk7b6ac43b8060544d@mail.gmail.com> 
	<3d375d730910050904v2d91aff6i256330f98dbfc4cb@mail.gmail.com> 
	<bc657ead0910051137r489dea32k541260de499b4481@mail.gmail.com> 
	<3d375d730910051143y9fa3dc1vdca2b7bb10c5ab65@mail.gmail.com> 
	<bc657ead0910051255g16d2689r264396d7b4001503@mail.gmail.com>
Message-ID: <3d375d730910051258w4ba92dtb9740e3753caaec3@mail.gmail.com>

On Mon, Oct 5, 2009 at 14:55, Sebastian Haase <seb.haase at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 8:43 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Mon, Oct 5, 2009 at 13:37, Sebastian Haase <seb.haase at gmail.com> wrote:
>>> Thanks for the reply.
>>> I thought one reason for amax was that
>>> from numpy import *
>>> would not not import max but only amax.
>>
>> I have my timelines confused. Numeric has neither amax() nor max(). I
>> don't actually recall the sequence of events, then.
>>
>>> How about sum ?
>>> Does "from numpy import *"
>>> overwrite the builtin sum ?
>>
>> Try it.
>>
>>>> sum
> <built-in function sum>
>>>> from numpy import *
>>>> sum
> <function sum at 0x0334E2B0>
>>>> asum
> Traceback (most recent call last):
> ?File "<input>", line 1, in <module>
> NameError: name 'asum' is not defined
>>>> N.__version__
> '1.3.0'
>>>>
>
>>> not to mention the "symmetry" / consistency argument for having "asum" ?
>>
>> At this point, I don't care to cater to "from numpy import *" use
>> case. Too much code uses numpy.sum() remove it, or even deprecate it.
>>
>
> I did not mean to suggest to remove or deprecate it. I only remember
> that there was a discussion - long time ago - that "from numpy import
> *" (still common in many places, like interactive sessions) - should
> not overwrite builtins ....

We are not removing sum from numpy.__all__ at this point in time. It's too late.

> Personally, I would prefer to write np.amax and np.asum ... do you see
> my argument for consistency here ?

Yes, but it's not important enough to me to want to introduce more aliases.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From pav at iki.fi  Mon Oct  5 16:10:07 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Mon, 05 Oct 2009 23:10:07 +0300
Subject: [Numpy-discussion] Easy way to test documentation?
In-Reply-To: <e06186140910051254x23b59864k5fc0221fa69e7fb2@mail.gmail.com>
References: <e06186140910051254x23b59864k5fc0221fa69e7fb2@mail.gmail.com>
Message-ID: <1254773407.6463.6.camel@idol>

ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti:
> Is there an easy way to test build documentation for a module that is
> not yet part of numpy?

Make a small Sphinx project for that:

$ easy_install numpydoc
$ mkdir foo
$ cd foo
$ sphinx-quickstart
...
$ vi conf.py
... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ...
$ cp /some/path/modulename.py modulename.py
$ vi index.rst
...
  add 
  .. automodule:: modulename
     :members:
...
$ make PYTHONPATH=$PWD html

Could be automated.

-- 
Pauli Virtanen


From elaine.angelino at gmail.com  Mon Oct  5 17:22:54 2009
From: elaine.angelino at gmail.com (Elaine Angelino)
Date: Mon, 5 Oct 2009 17:22:54 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
Message-ID: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>

Hi there,

We are writing to announce the release of "Tabular", a package of Python
modules for working with tabular data.

Tabular is a package of Python modules for working with tabular data. Its
main object is the tabarray class, a data structure for holding and
manipulating tabular data. By putting data into a tabarray object, you?ll
get a representation of the data that is more flexible and powerful than a
native Python representation. More specifically, tabarray provides:

-- ultra-fast filtering, selection, and numerical analysis methods, using
convenient Matlab-style matrix operation syntax
-- spreadsheet-style operations, including row & column operations, 'sort',
'replace', 'aggregate', 'pivot', and 'join'
-- flexible load and save methods for a variety of file formats, including
delimited text (CSV), binary, and HTML
-- helpful inference algorithms for determining formatting parameters and
data types of input files
-- support for hierarchical groupings of columns, both as data structures
and file formats

You can download Tabular from PyPI
(http://pypi.python.org/pypi/tabular/<http://pypi.python.org/pypi/tabular/>)
or alternatively clone our hg repository from bitbucket (
http://bitbucket.org/elaine/tabular/ <http://bitbucket.org/elaine/tabular/>).
We also have posted tutorial-style Sphinx documentation (
http://www.parsemydata.com/tabular/).

The tabarray object is based on the record
array<http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html?highlight=recarray#numpy.recarray>object
from the Numerical Python package (
NumPy <http://numpy.scipy.org/>), and Tabular is built to interface well
with NumPy in general.  Our intended audience is two-fold: (1) Python users
who, though they may not be familiar with NumPy, are in need of a way to
work with tabular data, and (2) NumPy users who would like to do
spreadsheet-style operations on top of their more "numerical" work.

We hope that some of you find Tabular useful!

Best,

Elaine and Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/64eff094/attachment.html>

From peridot.faceted at gmail.com  Mon Oct  5 17:34:40 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 5 Oct 2009 17:34:40 -0400
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4ACA35A3.3030707@noaa.gov>
References: <4AC705F4.1040702@noaa.gov>
	<200910051053.05259.faltet@pytables.org> <4ACA35A3.3030707@noaa.gov>
Message-ID: <ce557a360910051434n299cd21ahcd37d745cf4657dd@mail.gmail.com>

2009/10/5 Christopher Barker <Chris.Barker at noaa.gov>:
> Francesc Alted wrote:
>> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigu?:
>>> This idea was inspired by a discussion at the SciPy conference, in which
>>> we spent a LOT of time during the numpy tutorial talking about how to
>>> accumulate values in an array when you don't know how big the array
>>> needs to be when you start.
>
>>> What I have in mind is very simple. It would be:
>>> ? ?- Only 1-d
>>> ? ?- Support append() and extend() methods
>>> ? ?- support indexing and slicing
>>> ? ?- Support any valid numpy dtype
>>> ? ? ?- which could even get you pseudo n-d arrays...
>>> ? ?- maybe it would act like an array in other ways, I'm not so sure.
>>> ? ? ?- ufuncs, etc.
>
>> That's interesting. ?I'd normally use the `resize()` method for what you want,
>> but indeed your approach is way more easy-to-use.
>
> Of course, this is using resize() under the hood, but giving it an
> easier interface, but more importantly, it's adding the pre-allocation
> for you, and the code to deal with that. I suppose I should benchmark
> it, but I think calling resize(0 with every append would be a lot slower
> (though maybe not -- might the compiler/os be pre-allocating some extra
> memory anyway?)

I looked into this at some point, and under Linux, the malloc doesn't
allocate substantial extra memory until you get big enough that it's
allocating complete memory pages, at which point you get until the end
of the page. At this point it's possible that adding more memory onto
the end of the malloced region (and maybe even moving the array around
in memory) can become really cheap, since it's just requesting more
memory from the OS. Also, a friend who's a bare-metal programming
wizard pointed out to me that modern malloc implementations really
hate realloc, since it tends to put memory blocks in arenas intended
for different sizes. I think that's only really an issue for shrinking
blocks, since they probably just always allocate a new block when
growing (unless they're in the pages-at-a-time regime).

In short, I think it's better to have a python-list-like growing
scheme. In fact it's maybe more important for arrays than python
lists, since in a python list all that needs to be moved are pointers
to the actual python objects, only ever a small fraction of the data
volume.

> I should profile this -- if you can call resize() with every new item,
> and it's not too slow, then it may not be worth writing this class at
> all (or I could make it simpler, maybe even an nd-array subclass instead.

Keep in mind the need for sensible handling of slices, since the
underlying array will probably move on every resize. I think there's a
need for this code.

>> If you are looking for performance improvements, I'd have a look at the
>> `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk). ?It
>> seems to me that the zero-initialization of added memory can be skipped,
>> allowing for more performance for the `resize()` method (most specially for
>> large size increments).
>
> I suppose so, but I doubt that's causing any of my performance issues.
> Another thing to profile.

Probably worth profiling, yes - I wouldn't worry about the time taken
writing zeros, but that does mean you have to touch all the allocated
memory, which can't be too great for the cache.

Anne


From pgmdevlist at gmail.com  Mon Oct  5 17:47:12 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 5 Oct 2009 17:47:12 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
Message-ID: <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>

Ciao Elaine,
I just quickly browsed through your code. Say, what's the reason  
behind using np.recarrays instead of just standard ndarrays (with  
flexible dtype). Do you really need the overhead of accessing fields  
as attributes ? It looks like you're always accessing fields as items...
Cheers
P.


On Oct 5, 2009, at 5:22 PM, Elaine Angelino wrote:

> Hi there,
>
> We are writing to announce the release of "Tabular", a package of  
> Python modules for working with tabular data.
>
> Tabular is a package of Python modules for working with tabular  
> data. Its main object is the tabarray class, a data structure for  
> holding and manipulating tabular data. By putting data into a  
> tabarray object, you?ll get a representation of the data that is  
> more flexible and powerful than a native Python representation. More  
> specifically, tabarray provides:
>
> -- ultra-fast filtering, selection, and numerical analysis methods,  
> using convenient Matlab-style matrix operation syntax
> -- spreadsheet-style operations, including row & column operations,  
> 'sort', 'replace', 'aggregate', 'pivot', and 'join'
> -- flexible load and save methods for a variety of file formats,  
> including delimited text (CSV), binary, and HTML
> -- helpful inference algorithms for determining formatting  
> parameters and data types of input files
> -- support for hierarchical groupings of columns, both as data  
> structures and file formats
>
> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/ 
> ) or alternatively clone our hg repository from bitbucket (http://bitbucket.org/elaine/tabular/ 
> ).  We also have posted tutorial-style Sphinx documentation (http://www.parsemydata.com/tabular/ 
> ).
>
> The tabarray object is based on the record array object from the  
> Numerical Python package (NumPy), and Tabular is built to interface  
> well with NumPy in general.  Our intended audience is two-fold: (1)  
> Python users who, though they may not be familiar with NumPy, are in  
> need of a way to work with tabular data, and (2) NumPy users who  
> would like to do spreadsheet-style operations on top of their more  
> "numerical" work.
>
> We hope that some of you find Tabular useful!
>
> Best,
>
> Elaine and Dan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From pgmdevlist at gmail.com  Mon Oct  5 18:03:35 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 5 Oct 2009 18:03:35 -0400
Subject: [Numpy-discussion] What Python version are we supporting ?
Message-ID: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com>

All,
What Python version are we supporting in 1.4.0dev ? 2.4 still ? For  
which version of numpy will we be moving to a more recent one ?
Thx in advance
P.


From robert.kern at gmail.com  Mon Oct  5 18:13:12 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 17:13:12 -0500
Subject: [Numpy-discussion] What Python version are we supporting ?
In-Reply-To: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com>
References: <1CDC6C62-CCE9-4F9F-84FB-D29C311ECFC7@gmail.com>
Message-ID: <3d375d730910051513i57d74cbcn3ea7769fc22d02ec@mail.gmail.com>

On Mon, Oct 5, 2009 at 17:03, Pierre GM <pgmdevlist at gmail.com> wrote:
> All,
> What Python version are we supporting in 1.4.0dev ? 2.4 still ?

Yes.

> For
> which version of numpy will we be moving to a more recent one ?

There is no plan in place to change this requirement.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From elaine.angelino at gmail.com  Mon Oct  5 18:16:42 2009
From: elaine.angelino at gmail.com (Elaine Angelino)
Date: Mon, 5 Oct 2009 18:16:42 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
Message-ID: <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>

hey pierre -- good question. this is something we debated a while ago (we
actually sent a couple of emails over the numpy list about this very topic)
when coming up with our design.  at the time, there did not seem to be
strong opinions either way about using ndarray vs. recarray

the main reason we went with the recarray over the ndarray is because the
recarray has a couple of useful construction functions (e.g.
np.rec.fromrecords and np.rec.fromarrays).  not only are these functions
convenient to use, they have nice data type inference properties which we'd
have to rebuild ourselves if we wanted to avoid recarrays entirely.

It would be fairly straightforward to switch from recarray to ndarray if
this were really an important thing to do (e.g. if recarray were being
deprecated or if most NumPy people have strong feelings about this), and
doing so wouldn't modify anything about the tabarray API.

elaine


On Mon, Oct 5, 2009 at 5:47 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

> Ciao Elaine,
> I just quickly browsed through your code. Say, what's the reason
> behind using np.recarrays instead of just standard ndarrays (with
> flexible dtype). Do you really need the overhead of accessing fields
> as attributes ? It looks like you're always accessing fields as items...
> Cheers
> P.
>
>
>
> On Oct 5, 2009, at 5:22 PM, Elaine Angelino wrote:
>
> > Hi there,
> >
> > We are writing to announce the release of "Tabular", a package of
> > Python modules for working with tabular data.
> >
> > Tabular is a package of Python modules for working with tabular
> > data. Its main object is the tabarray class, a data structure for
> > holding and manipulating tabular data. By putting data into a
> > tabarray object, you?ll get a representation of the data that is
> > more flexible and powerful than a native Python representation. More
> > specifically, tabarray provides:
> >
> > -- ultra-fast filtering, selection, and numerical analysis methods,
> > using convenient Matlab-style matrix operation syntax
> > -- spreadsheet-style operations, including row & column operations,
> > 'sort', 'replace', 'aggregate', 'pivot', and 'join'
> > -- flexible load and save methods for a variety of file formats,
> > including delimited text (CSV), binary, and HTML
> > -- helpful inference algorithms for determining formatting
> > parameters and data types of input files
> > -- support for hierarchical groupings of columns, both as data
> > structures and file formats
> >
> > You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/
> > ) or alternatively clone our hg repository from bitbucket (
> http://bitbucket.org/elaine/tabular/
> > ).  We also have posted tutorial-style Sphinx documentation (
> http://www.parsemydata.com/tabular/
> > ).
> >
> > The tabarray object is based on the record array object from the
> > Numerical Python package (NumPy), and Tabular is built to interface
> > well with NumPy in general.  Our intended audience is two-fold: (1)
> > Python users who, though they may not be familiar with NumPy, are in
> > need of a way to work with tabular data, and (2) NumPy users who
> > would like to do spreadsheet-style operations on top of their more
> > "numerical" work.
> >
> > We hope that some of you find Tabular useful!
> >
> > Best,
> >
> > Elaine and Dan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/9c142453/attachment.html>

From robert.kern at gmail.com  Mon Oct  5 18:36:11 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 17:36:11 -0500
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> 
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> 
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
Message-ID: <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>

On Mon, Oct 5, 2009 at 17:16, Elaine Angelino <elaine.angelino at gmail.com> wrote:
> hey pierre -- good question. this is something we debated a while ago (we
> actually sent a couple of emails over the numpy list about this very topic)
> when coming up with our design.? at the time, there did not seem to be
> strong opinions either way about using ndarray vs. recarray
>
> the main reason we went with the recarray over the ndarray is because the
> recarray has a couple of useful construction functions (e.g.
> np.rec.fromrecords and np.rec.fromarrays).? not only are these functions
> convenient to use, they have nice data type inference properties which we'd
> have to rebuild ourselves if we wanted to avoid recarrays entirely.

Try np.rec.fromrecords(...).view(np.ndarray).

Most likely, we should have versions of those functions that return
plain ndarrays. They are quite useful.

Perhaps

def fromarrays(..., type=None):
    ...
    if type is not None:
        _array = _array.view(type)
    return _array

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From elaine.angelino at gmail.com  Mon Oct  5 18:52:47 2009
From: elaine.angelino at gmail.com (Elaine Angelino)
Date: Mon, 5 Oct 2009 18:52:47 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>
Message-ID: <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>

On Mon, Oct 5, 2009 at 6:36 PM, Robert Kern <robert.kern at gmail.com> wrote:


> >
> > the main reason we went with the recarray over the ndarray is because the
> > recarray has a couple of useful construction functions (e.g.
> > np.rec.fromrecords and np.rec.fromarrays).  not only are these functions
> > convenient to use, they have nice data type inference properties which
> we'd
> > have to rebuild ourselves if we wanted to avoid recarrays entirely.
>
> Try np.rec.fromrecords(...).view(np.ndarray).
>
>
Hi Robert, thanks your email.  We definitely understand this use of
.view().  However,  our question is,  should we have implemented tabular
this way, e.g. in the tabarray constructor, first make a recarray and then
view it as an ndarray?  (and then of course view it as a tabarray).  This
would have the effect of eliminating the extra recarray functionality, and
some if its overhead as well. Is this the desirable design, or should we
stick with recarrays?

(Also, is first casting to recarrays and then viewing as ndarrays more
expensive than if we went through ndarray directly?)


> Most likely, we should have versions of those functions that return
> plain ndarrays. They are quite useful.
>
> Perhaps
>
> def fromarrays(..., type=None):
>    ...
>    if type is not None:
>        _array = _array.view(type)
>    return _array
>
>
Yes, we definitely agree with you that there should be plain ndarray
versions of the fromarrays and fromrecords constructors.  The only reason we
didn't include a function like your "fromarrays" function in tabular is that
we thought it might be a bit hackish for our package, and seemed like
something to be addressed by numpy directly, perhaps at a later time.  This
was especially given that it didn't seem like people hated recarrays
especially.

In the event that people really think we should switch "tabular" from using
ndarrays to recarrays, we would definitely support a discussion of adding
these kinds of constructors directly to ndarrays.

Thanks
Elaine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/33f63e98/attachment.html>

From robert.kern at gmail.com  Mon Oct  5 18:58:34 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 17:58:34 -0500
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> 
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> 
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> 
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> 
	<901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>
Message-ID: <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com>

On Mon, Oct 5, 2009 at 17:52, Elaine Angelino <elaine.angelino at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 6:36 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> > the main reason we went with the recarray over the ndarray is because
>> > the
>> > recarray has a couple of useful construction functions (e.g.
>> > np.rec.fromrecords and np.rec.fromarrays).? not only are these functions
>> > convenient to use, they have nice data type inference properties which
>> > we'd
>> > have to rebuild ourselves if we wanted to avoid recarrays entirely.
>>
>> Try np.rec.fromrecords(...).view(np.ndarray).
>>
>
> Hi Robert, thanks your email.? We definitely understand this use of
> .view().? However,? our question is,? should we have implemented tabular
> this way, e.g. in the tabarray constructor, first make a recarray and then
> view it as an ndarray?? (and then of course view it as a tabarray).

Do the minimum number of .view()s that you can get away with.

> This
> would have the effect of eliminating the extra recarray functionality, and
> some if its overhead as well. Is this the desirable design, or should we
> stick with recarrays?

Well, what other recarray functionality are you using? I addressed the
from*() functions because you said it was the main reason. What are
your other reasons?

> (Also, is first casting to recarrays and then viewing as ndarrays more
> expensive than if we went through ndarray directly?)

The overhead should be miniscule. No data is converted.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From elaine.angelino at gmail.com  Mon Oct  5 19:15:44 2009
From: elaine.angelino at gmail.com (Elaine Angelino)
Date: Mon, 5 Oct 2009 19:15:44 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>
	<901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>
	<3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com>
Message-ID: <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com>

Do the minimum number of .view()s that you can get away with.
>
>
I guess our bottom line is that we're still not 100% clear as to the
recommendation of the NumPy community regarding whether we should use
recarray or ndarray.  It seems like recarray has some advantages (e.g. the
nice inference functions/constructors, and the fact that some people like
the ability to fields as attributes) as well as some disadvantages (e.g. the
overhead).

it definitely wouldn't be much difficulty to convert tabular to using
ndarrays, but is it very desirable?  Of course if we were to do this, having
recarray-style constructors for ndarrays directly in Numpy would be seem to
be a "cleaner" way to do things than either writing our own ndarray versions
or casting from recarray to ndarray, but we're happy to do either if
changing tabular to ndarray is really desirable.


>
>
> Well, what other recarray functionality are you using?


None, in our code.   We also thought that since at least some people like
using the attribute reference property, perhaps users of tabarrays might too
(though we don't personally in our own work)   Recarrays still seemed to be
being supported by NumPy, so it seemed to make sense to use them.   but the
only functional thing in our code are those constructors.


>
> > (Also, is first casting to recarrays and then viewing as ndarrays more
> > expensive than if we went through ndarray directly?)
>
>
But if NumPy decided to include ndarray versions of the from*() constructors
in the distribution, would this be achieved by first using the recarray
constructor and then viewing as ndarray?  Or would something more "direct"
be done?


thanks,
e
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/e9f3bac8/attachment.html>

From robert.kern at gmail.com  Mon Oct  5 19:20:35 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 18:20:35 -0500
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com> 
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com> 
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com> 
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com> 
	<901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com> 
	<3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com> 
	<901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com>
Message-ID: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com>

On Mon, Oct 5, 2009 at 18:15, Elaine Angelino <elaine.angelino at gmail.com> wrote:

>> Well, what other recarray functionality are you using?
>
> None, in our code.?? We also thought that since at least some people like
> using the attribute reference property, perhaps users of tabarrays might too
> (though we don't personally in our own work) ? Recarrays still seemed to be
> being supported by NumPy, so it seemed to make sense to use them.?? but the
> only functional thing in our code are those constructors.

Then I would suggest making tabarrays subclass from ndarray. If you
like, provide a tabrecarray that subclasses from both recarray and
tabarray so that people who like attribute access can .view() to their
heart's content.

>> > (Also, is first casting to recarrays and then viewing as ndarrays more
>> > expensive than if we went through ndarray directly?)
>>
>
> But if NumPy decided to include ndarray versions of the from*() constructors
> in the distribution, would this be achieved by first using the recarray
> constructor and then viewing as ndarray?? Or would something more "direct"
> be done?

We would fix the functions to not do any unnecessary .view()s.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From charlesr.harris at gmail.com  Mon Oct  5 20:57:27 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 5 Oct 2009 18:57:27 -0600
Subject: [Numpy-discussion] Easy way to test documentation?
In-Reply-To: <1254773407.6463.6.camel@idol>
References: <e06186140910051254x23b59864k5fc0221fa69e7fb2@mail.gmail.com>
	<1254773407.6463.6.camel@idol>
Message-ID: <e06186140910051757o12cecb01qe5f9ffc105767c83@mail.gmail.com>

On Mon, Oct 5, 2009 at 2:10 PM, Pauli Virtanen <pav at iki.fi> wrote:

> ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti:
> > Is there an easy way to test build documentation for a module that is
> > not yet part of numpy?
>
> Make a small Sphinx project for that:
>
> $ easy_install numpydoc
> $ mkdir foo
> $ cd foo
> $ sphinx-quickstart
>

What to choose for math rendering? Defaults for everything else?


> ...
> $ vi conf.py
> ... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ...
> $ cp /some/path/modulename.py modulename.py
> $ vi index.rst
>

index.py, right?


> ...
>  add
>  .. automodule:: modulename
>     :members:
> ...
> $ make PYTHONPATH=$PWD html
>
>
Bombs when it hits the first Parameters section:  "Unexpected section
title."


Could be automated.
>
>
That would be nice.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/c0197b8a/attachment.html>

From charlesr.harris at gmail.com  Mon Oct  5 21:27:40 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 5 Oct 2009 19:27:40 -0600
Subject: [Numpy-discussion] Easy way to test documentation?
In-Reply-To: <e06186140910051757o12cecb01qe5f9ffc105767c83@mail.gmail.com>
References: <e06186140910051254x23b59864k5fc0221fa69e7fb2@mail.gmail.com>
	<1254773407.6463.6.camel@idol>
	<e06186140910051757o12cecb01qe5f9ffc105767c83@mail.gmail.com>
Message-ID: <e06186140910051827k52a596cfic411a79acc20df10@mail.gmail.com>

On Mon, Oct 5, 2009 at 6:57 PM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Mon, Oct 5, 2009 at 2:10 PM, Pauli Virtanen <pav at iki.fi> wrote:
>
>> ma, 2009-10-05 kello 13:54 -0600, Charles R Harris kirjoitti:
>> > Is there an easy way to test build documentation for a module that is
>> > not yet part of numpy?
>>
>> Make a small Sphinx project for that:
>>
>> $ easy_install numpydoc
>> $ mkdir foo
>> $ cd foo
>> $ sphinx-quickstart
>>
>
> What to choose for math rendering? Defaults for everything else?
>
>
>> ...
>> $ vi conf.py
>> ... add 'sphinx.ext.autodoc', 'numpydoc' to extensions ...
>> $ cp /some/path/modulename.py modulename.py
>> $ vi index.rst
>>
>
> index.py, right?
>
>

OK, had to choose file type (txt/rst)


> ...
>>  add
>>
>
append


>  .. automodule:: modulename
>>     :members:
>> ...
>> $ make PYTHONPATH=$PWD html
>>
>>
>
Seems to work.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/8ef8d9a1/attachment.html>

From ondrej at certik.cz  Mon Oct  5 21:40:46 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 5 Oct 2009 18:40:46 -0700
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
Message-ID: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>

Hi,

I am getting a segfault in PyArray_SimpleNewFromData in Cython. I am
trying to debug it for the last 4 hours, but still absolutely no clue,
so I am posting it here, maybe someone knows where the problem is:

cdef ndarray array_double_c2numpy(double *A, int len):
    from numpy import empty
    print "got len:", len
    cdef npy_intp dims[10]
    cdef double X[500]
    print "1"
    dims[0] = 3
    print "2x"
    print dims[0], len
    print X[0], X[1], X[2]
    cdef npy_intp size
    cdef ndarray newarr
    cdef double *arrsource

    size = 10
    arrsource = <double *>malloc(sizeof(double) * size)
    print "still alive"
    newarr = PyArray_SimpleNewFromData(1, &size, 12,
                <void *>arrsource)
    print "I am already dead. :("
    print "3"
    return empty([len])


Essential is just the line:

    newarr = PyArray_SimpleNewFromData(1, &size, 12,
                <void *>arrsource)

Then I removed all numpy from my computer, downloaded the latest git
repository from:

http://projects.scipy.org/git/numpy.git

applied the following patch:

diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors
index 3fdded0..777563c 100644
--- a/numpy/core/src/multiarray/ctors.c
+++ b/numpy/core/src/multiarray/ctors.c
@@ -1318,6 +1318,7 @@ PyArray_NewFromDescr(PyTypeObject *subtype, PyArray_Descr
                      intp *dims, intp *strides, void *data,
                      int flags, PyObject *obj)
 {
+    printf("entering PyArray_NewFromDescr\n");
     PyArrayObject *self;
     int i;
     size_t sd;
@@ -1553,6 +1554,7 @@ PyArray_New(PyTypeObject *subtype, int nd, intp *dims, int
 {
     PyArray_Descr *descr;
     PyObject *new;
+    printf("entering PyArray_New, still kicking\n");

     descr = PyArray_DescrFromType(type_num);
     if (descr == NULL) {


then installed with:

python setup.py install --home=~/usr

and run my cython program. Here is the output:

$ ./schroedinger

-------------------------------------------
   This is Hermes1D - a free ODE solver
 based on the hp-FEM and Newton's method,
   developed by the hp-FEM group at UNR
  and distributed under the BSD license.
 For more details visit http://hpfem.org/.
-------------------------------------------
Importing hermes1d
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_New, still kicking
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_New, still kicking
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
entering PyArray_NewFromDescr
Python initialized
got len: 39601
1
2x
3 39601
0.0 0.0 0.0
still alive
Segmentation fault


What puzzles me is that there is no debugging print statement just
before the segfault. So like if the PyArray_New was not being called.
But looking into numpy/core/include/numpy/ndarrayobject.h, line 1359:

#define PyArray_SimpleNewFromData(nd, dims, typenum, data)                    \
        PyArray_New(&PyArray_Type, nd, dims, typenum, NULL,                   \
                    data, 0, NPY_CARRAY, NULL)


It should be called. Does it segfault in the printf() statement above?
Hm. I also tried gdb, but it doesn't step into
PyArray_SimpleNewFromData  (in the C file), not sure why.

So both print statements and gdb failed to bring me to the cause,
pretty sad day for me. I am going home now and start with a fresh
head, it just can't segfault like this... I guess I'll start by
creating a simple cython project to reproduce it (the schroedinger
code above is quite involved, it starts a python interpreter inside a
C++ program, etc. etc.).


Ondrej


From charlesr.harris at gmail.com  Mon Oct  5 22:34:48 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 5 Oct 2009 20:34:48 -0600
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
Message-ID: <e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>

On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik <ondrej at certik.cz> wrote:

> Hi,
>
> I am getting a segfault in PyArray_SimpleNewFromData in Cython. I am
> trying to debug it for the last 4 hours, but still absolutely no clue,
> so I am posting it here, maybe someone knows where the problem is:
>
> cdef ndarray array_double_c2numpy(double *A, int len):
>    from numpy import empty
>    print "got len:", len
>    cdef npy_intp dims[10]
>    cdef double X[500]
>    print "1"
>    dims[0] = 3
>    print "2x"
>    print dims[0], len
>    print X[0], X[1], X[2]
>    cdef npy_intp size
>    cdef ndarray newarr
>    cdef double *arrsource
>
>    size = 10
>    arrsource = <double *>malloc(sizeof(double) * size)
>    print "still alive"
>    newarr = PyArray_SimpleNewFromData(1, &size, 12,
>                <void *>arrsource)
>    print "I am already dead. :("
>    print "3"
>    return empty([len])
>
>
>
> Essential is just the line:
>
>    newarr = PyArray_SimpleNewFromData(1, &size, 12,
>                <void *>arrsource)
>
> Then I removed all numpy from my computer, downloaded the latest git
> repository from:
>
> http://projects.scipy.org/git/numpy.git
>
> applied the following patch:
>
> diff --git a/numpy/core/src/multiarray/ctors.c
> b/numpy/core/src/multiarray/ctors
> index 3fdded0..777563c 100644
> --- a/numpy/core/src/multiarray/ctors.c
> +++ b/numpy/core/src/multiarray/ctors.c
> @@ -1318,6 +1318,7 @@ PyArray_NewFromDescr(PyTypeObject *subtype,
> PyArray_Descr
>                      intp *dims, intp *strides, void *data,
>                      int flags, PyObject *obj)
>  {
> +    printf("entering PyArray_NewFromDescr\n");
>     PyArrayObject *self;
>     int i;
>     size_t sd;
> @@ -1553,6 +1554,7 @@ PyArray_New(PyTypeObject *subtype, int nd, intp
> *dims, int
>  {
>     PyArray_Descr *descr;
>     PyObject *new;
> +    printf("entering PyArray_New, still kicking\n");
>
>     descr = PyArray_DescrFromType(type_num);
>     if (descr == NULL) {
>
>
>
> then installed with:
>
> python setup.py install --home=~/usr
>
> and run my cython program. Here is the output:
>
> $ ./schroedinger
>
> -------------------------------------------
>   This is Hermes1D - a free ODE solver
>  based on the hp-FEM and Newton's method,
>   developed by the hp-FEM group at UNR
>  and distributed under the BSD license.
>  For more details visit http://hpfem.org/.
> -------------------------------------------
> Importing hermes1d
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_New, still kicking
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_New, still kicking
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> entering PyArray_NewFromDescr
> Python initialized
> got len: 39601
> 1
> 2x
> 3 39601
> 0.0 0.0 0.0
> still alive
> Segmentation fault
>
>
>
> What puzzles me is that there is no debugging print statement just
> before the segfault.


Maybe you need to flush the buffer. That is a good thing to do when
segfaults are about.


> So like if the PyArray_New was not being called.
> But looking into numpy/core/include/numpy/ndarrayobject.h, line 1359:
>
> #define PyArray_SimpleNewFromData(nd, dims, typenum, data)
>    \
>        PyArray_New(&PyArray_Type, nd, dims, typenum, NULL,
>   \
>                    data, 0, NPY_CARRAY, NULL)
>
>
> It should be called. Does it segfault in the printf() statement above?
> Hm. I also tried gdb, but it doesn't step into
> PyArray_SimpleNewFromData  (in the C file), not sure why.
>
> So both print statements and gdb failed to bring me to the cause,
> pretty sad day for me. I am going home now and start with a fresh
> head, it just can't segfault like this... I guess I'll start by
> creating a simple cython project to reproduce it

the schroedinger
> code above is quite involved, it starts a python interpreter inside a
> C++ program, etc. etc.).
>
>
> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/c797ecae/attachment.html>

From tpk at kraussfamily.org  Mon Oct  5 22:49:15 2009
From: tpk at kraussfamily.org (Tom K.)
Date: Mon, 5 Oct 2009 19:49:15 -0700 (PDT)
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <4AC705F4.1040702@noaa.gov>
References: <4AC705F4.1040702@noaa.gov>
Message-ID: <25762136.post@talk.nabble.com>


Christopher Barker wrote:
> 
> 
> What do folks think? is this useful? What would you change, etc?
> 

Chris - I really like this and find it useful.  I would change the name to
something like "growable" or "ArrayList" - accumulator seems like an object
for cumulative summation.  I think the right amount to grow is 2x - this
provides an amortized O(log n) append.  If the array doesn't have to grow,
the cost is 1 - no copies - whereas if you have to grow, the cost is n
copies.  Is 2x optimal?  Perhaps the configurable grow ratio is a good
thing, although giving a knob means people are going to set it wrong.

I would also vote "+1" for an ND version of this (growing only a single
dimension).  Keeping 2x for each of n dimensions, while conceivable, would
be 2**n extra memory, and hence probably too costly.

Cheers,
  Tom K.

-- 
View this message in context: http://www.nabble.com/A-numpy-accumulator...-tp25726568p25762136.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.


From ondrej at certik.cz  Mon Oct  5 23:38:18 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 5 Oct 2009 20:38:18 -0700
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
	<e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>
Message-ID: <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com>

On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik <ondrej at certik.cz> wrote:
[...]
>> still alive
>> Segmentation fault
>>
>>
>>
>> What puzzles me is that there is no debugging print statement just
>> before the segfault.
>
> Maybe you need to flush the buffer. That is a good thing to do when
> segfaults are about.

I tried to put "fflush(NULL);" after it, but it didn't help. I have
created a super simple demo for anyone to play:


$ git clone git://github.com/certik/segfault.git
$ cd segfault/
$ vim Makefile     # <-- edit the python and numpy include paths
$ make
$ python test.py
I am still alive
Segmentation fault

where test.py is:

$ cat test.py
import _hermes1d
v = _hermes1d.test()
print v


and _hermes1d.pyx is:

$ cat _hermes1d.pyx
def test():
    cdef npy_intp size
    cdef ndarray newarr
    cdef double *arrsource

    size = 10
    arrsource = <double *>malloc(sizeof(double) * size)
    print "I am still alive"
    newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, <void *>arrsource)
    print "I am dead."

    return newarr


So I bet there is something very stupid that I am missing. Still
investigating...

Ondrej


From ondrej at certik.cz  Tue Oct  6 00:25:48 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 5 Oct 2009 21:25:48 -0700
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
	<e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>
	<85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com>
Message-ID: <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com>

On Mon, Oct 5, 2009 at 8:38 PM, Ondrej Certik <ondrej at certik.cz> wrote:
> On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik <ondrej at certik.cz> wrote:
> [...]
>>> still alive
>>> Segmentation fault
>>>
>>>
>>>
>>> What puzzles me is that there is no debugging print statement just
>>> before the segfault.
>>
>> Maybe you need to flush the buffer. That is a good thing to do when
>> segfaults are about.
>
> I tried to put "fflush(NULL);" after it, but it didn't help. I have
> created a super simple demo for anyone to play:
>
>
> $ git clone git://github.com/certik/segfault.git
> $ cd segfault/
> $ vim Makefile ? ? # <-- edit the python and numpy include paths
> $ make
> $ python test.py
> I am still alive
> Segmentation fault
>
> where test.py is:
>
> $ cat test.py
> import _hermes1d
> v = _hermes1d.test()
> print v
>
>
> and _hermes1d.pyx is:
>
> $ cat _hermes1d.pyx
> def test():
> ? ?cdef npy_intp size
> ? ?cdef ndarray newarr
> ? ?cdef double *arrsource
>
> ? ?size = 10
> ? ?arrsource = <double *>malloc(sizeof(double) * size)
> ? ?print "I am still alive"
> ? ?newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, <void *>arrsource)
> ? ?print "I am dead."
>
> ? ?return newarr
>
>
> So I bet there is something very stupid that I am missing. Still
> investigating...

I didn't call _import_array()  !

This patch fixes it:


diff --git a/_hermes1d.pxd b/_hermes1d.pxd
index 9994c28..f5e8868 100644
--- a/_hermes1d.pxd
+++ b/_hermes1d.pxd
@@ -54,6 +54,8 @@ cdef extern from "arrayobject.h":
     object PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum,
             void* data)

+    void _import_array()
+
 cdef extern from "Python.h":
     ctypedef void PyObject
     void Py_INCREF(PyObject *x)
diff --git a/_hermes1d.pyx b/_hermes1d.pyx
index e542ddc..7a4beec 100644
--- a/_hermes1d.pyx
+++ b/_hermes1d.pyx
@@ -2,6 +2,7 @@ def test():
     cdef npy_intp size
     cdef ndarray newarr
     cdef double *arrsource
+    _import_array()

     size = 10
     arrsource = <double *>malloc(sizeof(double) * size)


I think I learned something today the hard way.

Ondrej


From ondrej at certik.cz  Tue Oct  6 00:34:40 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 5 Oct 2009 21:34:40 -0700
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
	<e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>
	<85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com>
	<85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com>
Message-ID: <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com>

On Mon, Oct 5, 2009 at 9:25 PM, Ondrej Certik <ondrej at certik.cz> wrote:
> On Mon, Oct 5, 2009 at 8:38 PM, Ondrej Certik <ondrej at certik.cz> wrote:
>> On Mon, Oct 5, 2009 at 7:34 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>> On Mon, Oct 5, 2009 at 7:40 PM, Ondrej Certik <ondrej at certik.cz> wrote:
>> [...]
>>>> still alive
>>>> Segmentation fault
>>>>
>>>>
>>>>
>>>> What puzzles me is that there is no debugging print statement just
>>>> before the segfault.
>>>
>>> Maybe you need to flush the buffer. That is a good thing to do when
>>> segfaults are about.
>>
>> I tried to put "fflush(NULL);" after it, but it didn't help. I have
>> created a super simple demo for anyone to play:
>>
>>
>> $ git clone git://github.com/certik/segfault.git
>> $ cd segfault/
>> $ vim Makefile ? ? # <-- edit the python and numpy include paths
>> $ make
>> $ python test.py
>> I am still alive
>> Segmentation fault
>>
>> where test.py is:
>>
>> $ cat test.py
>> import _hermes1d
>> v = _hermes1d.test()
>> print v
>>
>>
>> and _hermes1d.pyx is:
>>
>> $ cat _hermes1d.pyx
>> def test():
>> ? ?cdef npy_intp size
>> ? ?cdef ndarray newarr
>> ? ?cdef double *arrsource
>>
>> ? ?size = 10
>> ? ?arrsource = <double *>malloc(sizeof(double) * size)
>> ? ?print "I am still alive"
>> ? ?newarr = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, <void *>arrsource)
>> ? ?print "I am dead."
>>
>> ? ?return newarr
>>
>>
>> So I bet there is something very stupid that I am missing. Still
>> investigating...
>
> I didn't call _import_array() ?!
>
> This patch fixes it:
>
>
> diff --git a/_hermes1d.pxd b/_hermes1d.pxd
> index 9994c28..f5e8868 100644
> --- a/_hermes1d.pxd
> +++ b/_hermes1d.pxd
> @@ -54,6 +54,8 @@ cdef extern from "arrayobject.h":
> ? ? object PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum,
> ? ? ? ? ? ? void* data)
>
> + ? ?void _import_array()
> +
> ?cdef extern from "Python.h":
> ? ? ctypedef void PyObject
> ? ? void Py_INCREF(PyObject *x)
> diff --git a/_hermes1d.pyx b/_hermes1d.pyx
> index e542ddc..7a4beec 100644
> --- a/_hermes1d.pyx
> +++ b/_hermes1d.pyx
> @@ -2,6 +2,7 @@ def test():
> ? ? cdef npy_intp size
> ? ? cdef ndarray newarr
> ? ? cdef double *arrsource
> + ? ?_import_array()
>
> ? ? size = 10
> ? ? arrsource = <double *>malloc(sizeof(double) * size)
>
>
>
>
> I think I learned something today the hard way.


The only mention of the _import_array() in the documentation that I
found is here:

http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY


but I don't understand what it means ---- do I have to just call
_import_array() and then I can use numpy CAPI, or do I also have to
define those PY_ARRAY_UNIQUE_SYMBOL etc?


Btw, to explain my original post for future readers --- the real
problem was that PyArray_Type was NULL and thus &PyArray_Type
segfaulted. That happened in the definition:

#define PyArray_SimpleNewFromData(nd, dims, typenum, data)                    \
       PyArray_New(&PyArray_Type, nd, dims, typenum, NULL,                   \
                   data, 0, NPY_CARRAY, NULL)

so it is *extremely* confusing, since PyArray_SimpleNewFromData() was
being called from my code, but PyArray_New never started, it
segfaulted in between.

I think now it is clear what is going on. Only I don't understand the
intention, but I can now get my job done.

Ondrej


From robert.kern at gmail.com  Tue Oct  6 00:42:16 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 5 Oct 2009 23:42:16 -0500
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com> 
	<e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com> 
	<85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com> 
	<85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com> 
	<85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com>
Message-ID: <3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com>

On Mon, Oct 5, 2009 at 23:34, Ondrej Certik <ondrej at certik.cz> wrote:

> The only mention of the _import_array() in the documentation that I
> found is here:
>
> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY
>
>
> but I don't understand what it means ---- do I have to just call
> _import_array() and then I can use numpy CAPI, or do I also have to
> define those PY_ARRAY_UNIQUE_SYMBOL etc?

Not _import_array() but import_array().

http://docs.scipy.org/doc/numpy/reference/c-api.array.html#importing-the-api

You don't have multiple files, so you only use import_array() and not
PY_ARRAY_UNIQUE_SYMBOL or NO_IMPORT_ARRAY.

I'm not really sure what is unclear about the text except that you
searched for the wrong spelling and found the wrong entry.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Chris.Barker at noaa.gov  Tue Oct  6 01:15:31 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 05 Oct 2009 22:15:31 -0700
Subject: [Numpy-discussion] A numpy accumulator...
In-Reply-To: <25762136.post@talk.nabble.com>
References: <4AC705F4.1040702@noaa.gov> <25762136.post@talk.nabble.com>
Message-ID: <4ACAD273.4040000@noaa.gov>

Tom K. wrote:
> Chris - I really like this and find it useful.  I would change the name to
> something like "growable" or "ArrayList"

hmm. I think I like "growable" or maybe "growarray".

> I think the right amount to grow is 2x - 

I think that may be too much.. one if the key advantages of this over 
python lists is that there should be a memory use advantage -- when you 
are pushing memory bounds, using twice what you need is a bit much.

>  Perhaps the configurable grow ratio is a good
> thing, although giving a knob means people are going to set it wrong.

maybe, but most folk will use the default anyway. I'm certainly going to 
keep it configurable while under development -- the better to benchmark 
with.

> I would also vote "+1" for an ND version of this (growing only a single
> dimension).

Yes, I think that is a good idea, and would certainly be useful for a 
common case -- growing a table of data, perhaps when reading a file, etc.

>  Keeping 2x for each of n dimensions, while conceivable, would
> be 2**n extra memory, and hence probably too costly.

That, and the fact that you'd have  to move a bunch of memory around as 
it grew -- if you only grow the first dimension (for C order, anyway), 
you can just tack stuff on the end (which usually necessitates a copy 
anyway, but it still seems easier.

thanks for the feedback,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From ondrej at certik.cz  Tue Oct  6 02:11:19 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 5 Oct 2009 23:11:19 -0700
Subject: [Numpy-discussion] PyArray_SimpleNewFromData segfaults
In-Reply-To: <3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com>
References: <85b5c3130910051840s1793bd9bh722781b90a0e9a7e@mail.gmail.com>
	<e06186140910051934y50c42571va5ff9a2743146f78@mail.gmail.com>
	<85b5c3130910052038j49af5ddfme904c0cb78ddadad@mail.gmail.com>
	<85b5c3130910052125p4373c04ueb63d5fc48683a57@mail.gmail.com>
	<85b5c3130910052134ha25ce0ge45600990dbf5b2d@mail.gmail.com>
	<3d375d730910052142s4a66164bn534d7e710d99b80b@mail.gmail.com>
Message-ID: <85b5c3130910052311w2f5423a8i9376522018dce7c8@mail.gmail.com>

On Mon, Oct 5, 2009 at 9:42 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 23:34, Ondrej Certik <ondrej at certik.cz> wrote:
>
>> The only mention of the _import_array() in the documentation that I
>> found is here:
>>
>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NO_IMPORT_ARRAY
>>
>>
>> but I don't understand what it means ---- do I have to just call
>> _import_array() and then I can use numpy CAPI, or do I also have to
>> define those PY_ARRAY_UNIQUE_SYMBOL etc?
>
> Not _import_array() but import_array().
>
> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#importing-the-api
>
> You don't have multiple files, so you only use import_array() and not
> PY_ARRAY_UNIQUE_SYMBOL or NO_IMPORT_ARRAY.
>
> I'm not really sure what is unclear about the text except that you
> searched for the wrong spelling and found the wrong entry.

Ah, that's the way. I was using _import_array() and that worked, so I
changed that to import_array() and call it just once at the top of the
.pyx file and now everything works very nice.

Indeed, it is well documented in there, I didn't realize it written
one paragraph above it.

Thanks for help, all is fine now. Here is how to use that new code in
hermes1d from C++:

http://groups.google.com/group/hermes1d/msg/54f90f1aa740e93f

one can now easily decide if to copy or not to copy the data when
constructing the numpy arrays.

Ondrej


From stefan at sun.ac.za  Tue Oct  6 10:20:51 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Tue, 6 Oct 2009 16:20:51 +0200
Subject: [Numpy-discussion] NumPy SVN broken
Message-ID: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>

Hi all,

The current SVN HEAD of NumPy is broken and should not be used.
Extensions compiled against this version may (will) segfault.

Travis, if you could have a look at the side-effects caused by r7050,
that would be great.  I meant to figure out what was wrong, but seeing
that this is a 3000 line patch, I'm not confident I can find the
problem easily.

Regards
St?fan

P.S. The new functionality is great, but I don't think we're going to
be able to convince David to release without documenting and testing
those changes to the C API.


From elaine.angelino at gmail.com  Tue Oct  6 09:33:38 2009
From: elaine.angelino at gmail.com (Elaine Angelino)
Date: Tue, 6 Oct 2009 09:33:38 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>
	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>
	<901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>
	<3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com>
	<901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com>
	<3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com>
Message-ID: <901520e20910060633u23f7d7a3xa7038f5ed4483bd3@mail.gmail.com>

On Mon, Oct 5, 2009 at 7:20 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Mon, Oct 5, 2009 at 18:15, Elaine Angelino <elaine.angelino at gmail.com>
> wrote:
>
>
> Then I would suggest making tabarrays subclass from ndarray.
>

Ok, done.    We did it using the from*() function design you suggested.  In
the future, if there are more direct from*() functions working directly on
ndarrays we'd want to switch to those of course.

While implementing the change, we were reminded of another difference
between ndarray and recarray, namely that the constructor of ndarray doesn't
accept "names" or "formats" parameters while the recarray constructor does
(e.g. you have to specify `dtype` in the ndarray constructor).    This
feature of the recarray constructor was useful for our purposes, since one
of the goals of tabular is providing 'easy' construction methods.    We've
retained this feature, even though we've switched to subclassing ndarray.

There must be a good reason why ndarray does not accept "names" or "formats"
parameters and forces the use of the more explicit and unambiguous "dtype".
I guess it's "cleaner" in some sense, since the formats parameter is
necessarily more limited.  It does make sense to have a strongly unambiguous
interface for a cornerstone method like np.ndarray.__new__.

That said, I think it also makes sense to have more flexible interfaces too,
even if they're sometimes more ambiguous (this is part of the purpose of
tabular, see
http://www.parsemydata.com/tabular/reference/organization.html#design-philosophy
).

Thanks for the help,

elaine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/c2fad881/attachment.html>

From bsouthey at gmail.com  Tue Oct  6 09:18:02 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 06 Oct 2009 08:18:02 -0500
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<3C3042A3-502C-4D38-A94C-87D3A330F944@gmail.com>	<901520e20910051516t7dc3bad1m4362e7c104f5499b@mail.gmail.com>
	<3d375d730910051536s3cf3ac53gb45ed1c0aa6361b7@mail.gmail.com>
	<901520e20910051552m4e0c113cm2d625675fe1b95d1@mail.gmail.com>
	<3d375d730910051558v245c5b87v7e9220dc3d4c42f5@mail.gmail.com>
	<901520e20910051615l3c7be3f9oad3903ba570291a2@mail.gmail.com>
	<3d375d730910051620rc70c224qaa063ca1215e935b@mail.gmail.com>
Message-ID: <4ACB438A.5020007@gmail.com>

On 10/05/2009 06:20 PM, Robert Kern wrote:
> On Mon, Oct 5, 2009 at 18:15, Elaine Angelino<elaine.angelino at gmail.com>  wrote:
>
>    
>>> Well, what other recarray functionality are you using?
>>>        
>> None, in our code.   We also thought that since at least some people like
>> using the attribute reference property, perhaps users of tabarrays might too
>> (though we don't personally in our own work)   Recarrays still seemed to be
>> being supported by NumPy, so it seemed to make sense to use them.   but the
>> only functional thing in our code are those constructors.
>>      
> Then I would suggest making tabarrays subclass from ndarray. If you
> like, provide a tabrecarray that subclasses from both recarray and
> tabarray so that people who like attribute access can .view() to their
> heart's content.
>
>    
>>>> (Also, is first casting to recarrays and then viewing as ndarrays more
>>>> expensive than if we went through ndarray directly?)
>>>>          
>>>        
>> But if NumPy decided to include ndarray versions of the from*() constructors
>> in the distribution, would this be achieved by first using the recarray
>> constructor and then viewing as ndarray?  Or would something more "direct"
>> be done?
>>      
> We would fix the functions to not do any unnecessary .view()s.
>
>    
Hi Elaine,
I do want to look more at what you have done as some of the features are 
very interesting.

This discussion raises the question of what do you find missing in numpy 
that you have included in tabular package?
In particular is there a particular set of functions that you think 
could be added to numpy or even create a 'better' recarray class?
There are real advantages of having at least core components in numpy.

Bruce


From charlesr.harris at gmail.com  Tue Oct  6 12:28:54 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 6 Oct 2009 10:28:54 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
Message-ID: <e06186140910060928y7f285463pabd8274e69f2e197@mail.gmail.com>

2009/10/6 St?fan van der Walt <stefan at sun.ac.za>

> Hi all,
>
> The current SVN HEAD of NumPy is broken and should not be used.
> Extensions compiled against this version may (will) segfault.
>
> Travis, if you could have a look at the side-effects caused by r7050,
> that would be great.  I meant to figure out what was wrong, but seeing
> that this is a 3000 line patch, I'm not confident I can find the
> problem easily.
>
> Regards
> St?fan
>
> P.S. The new functionality is great, but I don't think we're going to
> be able to convince David to release without documenting and testing
> those changes to the C API.
> ___


Seeing as the next release process is probably going to start next month and
we want things to settle out, it might be advisable delay any intrusive
patches to the release after and subject them to review and discussion
first.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/46891177/attachment.html>

From josef.pktd at gmail.com  Tue Oct  6 12:31:52 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 6 Oct 2009 12:31:52 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
Message-ID: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com>

On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino
<elaine.angelino at gmail.com> wrote:
> Hi there,
>
> We are writing to announce the release of "Tabular", a package of Python
> modules for working with tabular data.
>
> Tabular is a package of Python modules for working with tabular data. Its
> main object is the tabarray class, a data structure for holding and
> manipulating tabular data. By putting data into a tabarray object, you?ll
> get a representation of the data that is more flexible and powerful than a
> native Python representation. More specifically, tabarray provides:
>
> -- ultra-fast filtering, selection, and numerical analysis methods, using
> convenient Matlab-style matrix operation syntax
> -- spreadsheet-style operations, including row & column operations, 'sort',
> 'replace', 'aggregate', 'pivot', and 'join'
> -- flexible load and save methods for a variety of file formats, including
> delimited text (CSV), binary, and HTML
> -- helpful inference algorithms for determining formatting parameters and
> data types of input files
> -- support for hierarchical groupings of columns, both as data structures
> and file formats
>
> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or
> alternatively clone our hg repository from bitbucket
> (http://bitbucket.org/elaine/tabular/).? We also have posted tutorial-style
> Sphinx documentation (http://www.parsemydata.com/tabular/).
>
> The tabarray object is based on the record array object from the Numerical
> Python package (NumPy), and Tabular is built to interface well with NumPy in
> general.? Our intended audience is two-fold: (1) Python users who, though
> they may not be familiar with NumPy, are in need of a way to work with
> tabular data, and (2) NumPy users who would like to do spreadsheet-style
> operations on top of their more "numerical" work.
>
> We hope that some of you find Tabular useful!
>
> Best,
>
> Elaine and Dan

I briefly looked at the sphinx docs and the code. Tabular looks pretty
useful and
the code can be partially read as recipes for working with recarrays
or structured
arrays. Thanks for the choice of license (it makes looking at the code "legal").

I didn't see any explicit nan handling. Are missing values allowed
e.g. in the constructor?

I looked a bit closer at function like tabular.fast.recarrayisin since
I always have problems
with these row operations.
Are these function supposed to work with arbitrary structured arrays?
The tests are only
for a 1d integer arrays.
With floats the default string representation doesn't sort correctly.
Or am I misreading the function?

>>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
>>> arr
array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
       (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
      dtype=[('f0', '<f8'), ('f1', '<f8')])
>>> np.sort([str(l) for l in arr])
array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
       '(5e-015, 1.0)', '(6.0, 1.0)'],
      dtype='|S30')

Being able to do a searchsorted on rows of an array would be a useful feature
in numpy. Is there a sortable 1d representation of the rows of a 2d float or
mixed type array?

Thanks,

Josef

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From charlesr.harris at gmail.com  Tue Oct  6 12:36:37 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 6 Oct 2009 10:36:37 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
Message-ID: <e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>

2009/10/6 St?fan van der Walt <stefan at sun.ac.za>

> Hi all,
>
> The current SVN HEAD of NumPy is broken and should not be used.
> Extensions compiled against this version may (will) segfault.
>
>
Can you be more specific? I haven't had any problems running current svn
with scipy.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/402f0f75/attachment.html>

From stefan at sun.ac.za  Tue Oct  6 12:46:20 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Tue, 6 Oct 2009 18:46:20 +0200
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> 
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
Message-ID: <9457e7c80910060946y39cf3186r17839125d6b1d20a@mail.gmail.com>

2009/10/6 Charles R Harris <charlesr.harris at gmail.com>:
> 2009/10/6 St?fan van der Walt <stefan at sun.ac.za>
>>
>> Hi all,
>>
>> The current SVN HEAD of NumPy is broken and should not be used.
>> Extensions compiled against this version may (will) segfault.
>>
>
> Can you be more specific? I haven't had any problems running current svn
> with scipy.

Both David and I had segfaults when running scipy compiled off
the latest numpy.  An example from Kiva:

Program received signal SIGSEGV, Segmentation fault.
PyArray_INCREF (mp=0x42)
    at build/scons/numpy/core/src/multiarray/refcount.c:103
103	    if (!PyDataType_REFCHK(mp->descr)) {
(gdb) bt
#0  PyArray_INCREF (mp=0x42)
    at build/scons/numpy/core/src/multiarray/refcount.c:103
#1  0x00985f67 in agg::pixel_map_as_unowned_array (pix_map=...)
    at build/src.linux-i686-2.6/enthought/kiva/agg/src/x11/plat_support_wrap.cpp:2909
#2  0x0098795f in _wrap_pixel_map_as_unowned_array (args=0xb7ed032c)
    at build/src.linux-i686-2.6/enthought/kiva/agg/src/x11/plat_support_wrap.cpp:3341

Via bisection, the source of the problem has been localised to the
merge of the datetime branch.

Cheers
St?fan


From cournape at gmail.com  Tue Oct  6 12:50:34 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 7 Oct 2009 01:50:34 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
Message-ID: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>

On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> 2009/10/6 St?fan van der Walt <stefan at sun.ac.za>
>>
>> Hi all,
>>
>> The current SVN HEAD of NumPy is broken and should not be used.
>> Extensions compiled against this version may (will) segfault.
>>
>
> Can you be more specific? I haven't had any problems running current svn
> with scipy.

The version itself is fine, but the ABI has been changed in an
incompatible way: if you have an extension built against say numpy
1.2.1, and then use a numpy built from sources after the datetime
merge, it will segfault right away. It does so for scipy and several
custom extensions. The abi breakage was found to be the datetime
merge.

David


From josef.pktd at gmail.com  Tue Oct  6 13:01:18 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 6 Oct 2009 13:01:18 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com>
Message-ID: <1cd32cbb0910061001k52ca9024r6bf0f47a4909963b@mail.gmail.com>

On Tue, Oct 6, 2009 at 12:31 PM,  <josef.pktd at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino
> <elaine.angelino at gmail.com> wrote:
>> Hi there,
>>
>> We are writing to announce the release of "Tabular", a package of Python
>> modules for working with tabular data.
>>
>> Tabular is a package of Python modules for working with tabular data. Its
>> main object is the tabarray class, a data structure for holding and
>> manipulating tabular data. By putting data into a tabarray object, you?ll
>> get a representation of the data that is more flexible and powerful than a
>> native Python representation. More specifically, tabarray provides:
>>
>> -- ultra-fast filtering, selection, and numerical analysis methods, using
>> convenient Matlab-style matrix operation syntax
>> -- spreadsheet-style operations, including row & column operations, 'sort',
>> 'replace', 'aggregate', 'pivot', and 'join'
>> -- flexible load and save methods for a variety of file formats, including
>> delimited text (CSV), binary, and HTML
>> -- helpful inference algorithms for determining formatting parameters and
>> data types of input files
>> -- support for hierarchical groupings of columns, both as data structures
>> and file formats
>>
>> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or
>> alternatively clone our hg repository from bitbucket
>> (http://bitbucket.org/elaine/tabular/).? We also have posted tutorial-style
>> Sphinx documentation (http://www.parsemydata.com/tabular/).
>>
>> The tabarray object is based on the record array object from the Numerical
>> Python package (NumPy), and Tabular is built to interface well with NumPy in
>> general.? Our intended audience is two-fold: (1) Python users who, though
>> they may not be familiar with NumPy, are in need of a way to work with
>> tabular data, and (2) NumPy users who would like to do spreadsheet-style
>> operations on top of their more "numerical" work.
>>
>> We hope that some of you find Tabular useful!
>>
>> Best,
>>
>> Elaine and Dan
>
> I briefly looked at the sphinx docs and the code. Tabular looks pretty
> useful and
> the code can be partially read as recipes for working with recarrays
> or structured
> arrays. Thanks for the choice of license (it makes looking at the code "legal").
>
> I didn't see any explicit nan handling. Are missing values allowed
> e.g. in the constructor?
>
> I looked a bit closer at function like tabular.fast.recarrayisin since
> I always have problems
> with these row operations.
> Are these function supposed to work with arbitrary structured arrays?
> The tests are only
> for a 1d integer arrays.
> With floats the default string representation doesn't sort correctly.
> Or am I misreading the function?
>
>>>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
>>>> arr
> array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
> ? ? ? (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
> ? ? ?dtype=[('f0', '<f8'), ('f1', '<f8')])
>>>> np.sort([str(l) for l in arr])
> array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
> ? ? ? '(5e-015, 1.0)', '(6.0, 1.0)'],
> ? ? ?dtype='|S30')

Maybe this doesn't matter for the purpose of this function.
I will download and try the code before I make any more irrelevant
comments.

Josef

>
> Being able to do a searchsorted on rows of an array would be a useful feature
> in numpy. Is there a sortable 1d representation of the rows of a 2d float or
> mixed type array?
>
> Thanks,
>
> Josef
>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>


From charlesr.harris at gmail.com  Tue Oct  6 13:04:02 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 6 Oct 2009 11:04:02 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
Message-ID: <e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>

On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau <cournape at gmail.com>wrote:

> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > 2009/10/6 St?fan van der Walt <stefan at sun.ac.za>
> >>
> >> Hi all,
> >>
> >> The current SVN HEAD of NumPy is broken and should not be used.
> >> Extensions compiled against this version may (will) segfault.
> >>
> >
> > Can you be more specific? I haven't had any problems running current svn
> > with scipy.
>
> The version itself is fine, but the ABI has been changed in an
> incompatible way: if you have an extension built against say numpy
> 1.2.1, and then use a numpy built from sources after the datetime
> merge, it will segfault right away. It does so for scipy and several
> custom extensions. The abi breakage was found to be the datetime
> merge.
>
>
Ah... That's a fine kettle of fish. Any idea what ABI calls are causing the
problem? Maybe the dtype change wasn't made in a compatible way. IIRC,
something was added to the dtype?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/5b5bea3d/attachment.html>

From cournape at gmail.com  Tue Oct  6 13:14:47 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 7 Oct 2009 02:14:47 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
Message-ID: <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>

On Wed, Oct 7, 2009 at 2:04 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau <cournape at gmail.com>
> wrote:
>>
>> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > 2009/10/6 St?fan van der Walt <stefan at sun.ac.za>
>> >>
>> >> Hi all,
>> >>
>> >> The current SVN HEAD of NumPy is broken and should not be used.
>> >> Extensions compiled against this version may (will) segfault.
>> >>
>> >
>> > Can you be more specific? I haven't had any problems running current svn
>> > with scipy.
>>
>> The version itself is fine, but the ABI has been changed in an
>> incompatible way: if you have an extension built against say numpy
>> 1.2.1, and then use a numpy built from sources after the datetime
>> merge, it will segfault right away. It does so for scipy and several
>> custom extensions. The abi breakage was found to be the datetime
>> merge.
>>
>
> Ah... That's a fine kettle of fish. Any idea what ABI calls are causing the
> problem? Maybe the dtype change wasn't made in a compatible way. IIRC,
> something was added to the dtype?

Yes, but that should not cause trouble. Adding members to structure
should be fine.

I quickly look at the diff, and some changes in the code generators
look suspicious, e.g.:

 types = ['Generic','Number','Integer','SignedInteger','UnsignedInteger',
-         'Inexact',
+         'Inexact', 'TimeInteger',
          'Floating', 'ComplexFloating', 'Flexible', 'Character',
          'Byte','Short','Int', 'Long', 'LongLong', 'UByte', 'UShort',
          'UInt', 'ULong', 'ULongLong', 'Float', 'Double', 'LongDouble',
          'CFloat', 'CDouble', 'CLongDouble', 'Object', 'String', 'Unicode',
-         'Void']
+         'Void', 'Datetime', 'Timedelta']

As the list is used to initialize some values from the API function
pointer array, inserts  should be avoided. You can see the consequence
on the generated files, e.g. part of __multiarray_api.h diff between
datetimemerge and just before:

< #define PyFloatingArrType_Type (*(PyTypeObject *)PyArray_API[16])
< #define PyComplexFloatingArrType_Type (*(PyTypeObject *)PyArray_API[17])
< #define PyFlexibleArrType_Type (*(PyTypeObject *)PyArray_API[18])
< #define PyCharacterArrType_Type (*(PyTypeObject *)PyArray_API[19])
< #define PyByteArrType_Type (*(PyTypeObject *)PyArray_API[20])
< #define PyShortArrType_Type (*(PyTypeObject *)PyArray_API[21])
< #define PyIntArrType_Type (*(PyTypeObject *)PyArray_API[22])
< #define PyLongArrType_Type (*(PyTypeObject *)PyArray_API[23])
< #define PyLongLongArrType_Type (*(PyTypeObject *)PyArray_API[24])
< #define PyUByteArrType_Type (*(PyTypeObject *)PyArray_API[25])
< #define PyUShortArrType_Type (*(PyTypeObject *)PyArray_API[26])
< #define PyUIntArrType_Type (*(PyTypeObject *)PyArray_API[27])
< #define PyULongArrType_Type (*(PyTypeObject *)PyArray_API[28])
< #define PyULongLongArrType_Type (*(PyTypeObject *)PyArray_API[29])
< #define PyFloatArrType_Type (*(PyTypeObject *)PyArray_API[30])
< #define PyDoubleArrType_Type (*(PyTypeObject *)PyArray_API[31])
< #define PyLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[32])
< #define PyCFloatArrType_Type (*(PyTypeObject *)PyArray_API[33])
< #define PyCDoubleArrType_Type (*(PyTypeObject *)PyArray_API[34])
< #define PyCLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[35])
< #define PyObjectArrType_Type (*(PyTypeObject *)PyArray_API[36])
< #define PyStringArrType_Type (*(PyTypeObject *)PyArray_API[37])
< #define PyUnicodeArrType_Type (*(PyTypeObject *)PyArray_API[38])
< #define PyVoidArrType_Type (*(PyTypeObject *)PyArray_API[39])
---
> #define PyTimeIntegerArrType_Type (*(PyTypeObject *)PyArray_API[16])
> #define PyFloatingArrType_Type (*(PyTypeObject *)PyArray_API[17])
> #define PyComplexFloatingArrType_Type (*(PyTypeObject *)PyArray_API[18])
> #define PyFlexibleArrType_Type (*(PyTypeObject *)PyArray_API[19])
> #define PyCharacterArrType_Type (*(PyTypeObject *)PyArray_API[20])
> #define PyByteArrType_Type (*(PyTypeObject *)PyArray_API[21])
> #define PyShortArrType_Type (*(PyTypeObject *)PyArray_API[22])
> #define PyIntArrType_Type (*(PyTypeObject *)PyArray_API[23])
> #define PyLongArrType_Type (*(PyTypeObject *)PyArray_API[24])
> #define PyLongLongArrType_Type (*(PyTypeObject *)PyArray_API[25])
> #define PyUByteArrType_Type (*(PyTypeObject *)PyArray_API[26])
> #define PyUShortArrType_Type (*(PyTypeObject *)PyArray_API[27])
> #define PyUIntArrType_Type (*(PyTypeObject *)PyArray_API[28])
> #define PyULongArrType_Type (*(PyTypeObject *)PyArray_API[29])
> #define PyULongLongArrType_Type (*(PyTypeObject *)PyArray_API[30])
> #define PyFloatArrType_Type (*(PyTypeObject *)PyArray_API[31])
> #define PyDoubleArrType_Type (*(PyTypeObject *)PyArray_API[32])
> #define PyLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[33])
> #define PyCFloatArrType_Type (*(PyTypeObject *)PyArray_API[34])
> #define PyCDoubleArrType_Type (*(PyTypeObject *)PyArray_API[35])
> #define PyCLongDoubleArrType_Type (*(PyTypeObject *)PyArray_API[36])
> #define PyObjectArrType_Type (*(PyTypeObject *)PyArray_API[37])
> #define PyStringArrType_Type (*(PyTypeObject *)PyArray_API[38])
> #define PyUnicodeArrType_Type (*(PyTypeObject *)PyArray_API[39])
> #define PyVoidArrType_Type (*(PyTypeObject *)PyArray_API[40])
> #define PyDatetimeArrType_Type (*(PyTypeObject *)PyArray_API[41])
> #define PyTimedeltaArrType_Type (*(PyTypeObject *)PyArray_API[42])

David


From dwf at cs.toronto.edu  Tue Oct  6 13:19:37 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Tue, 6 Oct 2009 13:19:37 -0400
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
Message-ID: <DAEB1FB0-E142-412C-8D3D-57DBD51589AB@cs.toronto.edu>

On 6-Oct-09, at 12:50 PM, David Cournapeau wrote:

> The version itself is fine, but the ABI has been changed in an
> incompatible way: if you have an extension built against say numpy
> 1.2.1, and then use a numpy built from sources after the datetime
> merge, it will segfault right away. It does so for scipy and several
> custom extensions. The abi breakage was found to be the datetime
> merge.

I experienced something similar recently with both ETS and pytables.  
Good to know finally what was going on. :)

David


From charlesr.harris at gmail.com  Tue Oct  6 13:31:22 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 6 Oct 2009 11:31:22 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
Message-ID: <e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>

On Tue, Oct 6, 2009 at 11:14 AM, David Cournapeau <cournape at gmail.com>wrote:

> On Wed, Oct 7, 2009 at 2:04 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Tue, Oct 6, 2009 at 10:50 AM, David Cournapeau <cournape at gmail.com>
> > wrote:
> >>
> >> On Wed, Oct 7, 2009 at 1:36 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > 2009/10/6 St?fan van der Walt <stefan at sun.ac.za>
> >> >>
> >> >> Hi all,
> >> >>
> >> >> The current SVN HEAD of NumPy is broken and should not be used.
> >> >> Extensions compiled against this version may (will) segfault.
> >> >>
> >> >
> >> > Can you be more specific? I haven't had any problems running current
> svn
> >> > with scipy.
> >>
> >> The version itself is fine, but the ABI has been changed in an
> >> incompatible way: if you have an extension built against say numpy
> >> 1.2.1, and then use a numpy built from sources after the datetime
> >> merge, it will segfault right away. It does so for scipy and several
> >> custom extensions. The abi breakage was found to be the datetime
> >> merge.
> >>
> >
> > Ah... That's a fine kettle of fish. Any idea what ABI calls are causing
> the
> > problem? Maybe the dtype change wasn't made in a compatible way. IIRC,
> > something was added to the dtype?
>
> Yes, but that should not cause trouble. Adding members to structure
> should be fine.
>
> I quickly look at the diff, and some changes in the code generators
> look suspicious, e.g.:
>
>  types = ['Generic','Number','Integer','SignedInteger','UnsignedInteger',
> -         'Inexact',
> +         'Inexact', 'TimeInteger',
>          'Floating', 'ComplexFloating', 'Flexible', 'Character',
>          'Byte','Short','Int', 'Long', 'LongLong', 'UByte', 'UShort',
>          'UInt', 'ULong', 'ULongLong', 'Float', 'Double', 'LongDouble',
>          'CFloat', 'CDouble', 'CLongDouble', 'Object', 'String', 'Unicode',
> -         'Void']
> +         'Void', 'Datetime', 'Timedelta']
>
> As the list is used to initialize some values from the API function
> pointer array, inserts  should be avoided. You can see the consequence
> on the generated files, e.g. part of __multiarray_api.h diff between
> datetimemerge and just before:
>
>
Looks like a clue ;)

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/2f50cf48/attachment.html>

From josef.pktd at gmail.com  Tue Oct  6 13:49:22 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 6 Oct 2009 13:49:22 -0400
Subject: [Numpy-discussion] tostring() for array rows
Message-ID: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>

If I have a structured or a regular array, is the use of strides in
the following always correct for the length of the row memory?

I would like to do tostring() but on each row, by creating a string
view of the memory in a 1d array.

Thanks,

Josef


>>> tmp = np.random.randn(4,3)
>>> tmp.ravel().view('S'+str(tmp.strides[0]))
array(['j\x94gv\xa5\x80\xe6?=\xea\xa3\xcb\xb9W\x05 at 4.\xa2J3\xe2\xee?',
       '\xe3\x89\x973My\xf7\xbf\xc1\x17\x0f\xff\xe9\x19\xb8\xbf\xdb?\x00\xc9c\xf0\xf9?',
       '\x1f\xc3,B\x9dQ\xa1?F\x1e\x12\x0f\x02\xfc\xd4\xbfz\xe0\xa5_G.\xd0?',
       '$#T\x0e\xad\x85\xfb\xbf\xf3S\xa6`\x89\x87\xdc?7]\xd9lt\xb4\xf4?'],
      dtype='|S24')

>>> tmp.tostring()
'j\x94gv\xa5\x80\xe6?=\xea\xa3\xcb\xb9W\x05 at 4.\xa2J3\xe2\xee?\xe3\x89\x973My\xf7\xbf\xc1\x17\x0f\xff\xe9\x19\xb8\xbf\xdb?\x00\xc9c\xf0\xf9?\x1f\xc3,B\x9dQ\xa1?F\x1e\x12\x0f\x02\xfc\xd4\xbfz\xe0\xa5_G.\xd0?$#T\x0e\xad\x85\xfb\xbf\xf3S\xa6`\x89\x87\xdc?7]\xd9lt\xb4\xf4?'

>>> tmp
array([(4.0, 0, 1), (1.0, 1, 3), (2.0, 2, 4), (4.0, 0, 1)],
      dtype=[('f0', '<f8'), ('f1', '<i4'), ('f2', '<i4')])
>>> tmp.view('S'+str(tmp.strides[0]))
array(['\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x01',
       '\x00\x00\x00\x00\x00\x00\xf0?\x01\x00\x00\x00\x03',
       '\x00\x00\x00\x00\x00\x00\x00@\x02\x00\x00\x00\x04',
       '\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x01'],
      dtype='|S16')


From dyamins at gmail.com  Tue Oct  6 14:09:30 2009
From: dyamins at gmail.com (Dan Yamins)
Date: Tue, 6 Oct 2009 14:09:30 -0400
Subject: [Numpy-discussion] Tabular data package
In-Reply-To: <1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com>
References: <901520e20910051421l37ee4882l23eaf0fb37225d5d@mail.gmail.com>
	<901520e20910051422m52d06699u25dfe322e672059d@mail.gmail.com>
	<1cd32cbb0910060931g6326a810xf49033aa90964ca8@mail.gmail.com>
Message-ID: <15e4667e0910061109h7ecdc5a5wef94778de3f5cd48@mail.gmail.com>

>
> I didn't see any explicit nan handling. Are missing values allowed
> e.g. in the constructor?
>

No, this is a valid point.  We don't handle this as explicitly as we
should.   Are you mostly talking about nan handling in loading from
delimited text files?  (Or are you talking about something more general,
like integration of masked arrays?)   In loading from delimited text files,
you can use the "linefixer" and "valuefixer" arguments, which are for more
general purposes, and which will get the job done, but slowly.  We should do
something more specialized for missing values that would be faster.


> Are these function supposed to work with arbitrary structured arrays?
>

Well, they're only really tested for working with strings, floats, and ints
(tho only the int tests are included in the test module, we should expand
that).   I imagine it's possible they'd work with more sophisticated things
but I'm not sure.


>
> >>> arr =
> np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
> >>> arr
> array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
>       (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
>      dtype=[('f0', '<f8'), ('f1', '<f8')])
> >>> np.sort([str(l) for l in arr])
> array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
>       '(5e-015, 1.0)', '(6.0, 1.0)'],
>      dtype='|S30')
>
> Well on this example (as in tests that we did), fast.recarrayisin performed
as spec'd.   ...  But definitely write back again if you think it's failing
somewhere.

In general, extending a number of the thigns in Tabular (e.g. the loadSV and
saveSV) to arbitrary structured dtypes as opposed to more basic types would
be great.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/98d16cdf/attachment.html>

From bsouthey at gmail.com  Tue Oct  6 14:42:59 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 06 Oct 2009 13:42:59 -0500
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
Message-ID: <4ACB8FB3.5040706@gmail.com>

On 10/05/2009 02:13 PM, Pierre GM wrote:
> All,
> Could you try r7449 ? I introduced some mechanisms to keep track of
> invalid lines (where the number of columns don't match what's
> expected). By default, a warning is emitted and these lines are
> skipped, but an optional argument gives the possibility to raise an
> exception instead.
> Now, I need more tests about wrong converters. I'm trying to optimize
> the upgrade mechanism (there are too many intertwined loops for my
> taste now), I'll keep you posted.
> Meanwhile, if you could come with more cases of failure, please send
> them my way.
> Cheers
> P.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
Hi,
Excellent as the changes appear to address incorrect number of delimiters.

I think that the default invalid_raise should be True.

One 'feature' is that there is no way to indicate multiple delimiters 
when the delimiter is whitespace.
A B C D
1 2 3 4
1     4 5

Which I consider a user beware issue when using whitespace as the 
delimiter especially in Python.


Bruce


From pgmdevlist at gmail.com  Tue Oct  6 15:33:53 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 15:33:53 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <4ACB8FB3.5040706@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
Message-ID: <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>


On Oct 6, 2009, at 2:42 PM, Bruce Southey wrote:
>>
> Hi,
> Excellent as the changes appear to address incorrect number of  
> delimiters.

They should also give some extra info if there's a problem w/ the  
converters.

> I think that the default invalid_raise should be True.

Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?


>
> One 'feature' is that there is no way to indicate multiple delimiters
> when the delimiter is whitespace.
> A B C D
> 1 2 3 4
> 1     4 5

Have you tried using a sequence of integers for the delimiter ? Would  
you mind sending me some test ?


From Chris.Barker at noaa.gov  Tue Oct  6 16:39:34 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Tue, 06 Oct 2009 13:39:34 -0700
Subject: [Numpy-discussion] tostring() for array rows
In-Reply-To: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>
References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>
Message-ID: <4ACBAB06.4000008@noaa.gov>

josef.pktd at gmail.com wrote:
> If I have a structured or a regular array, is the use of strides in
> the following always correct for the length of the row memory?
> 
> I would like to do tostring() but on each row, by creating a string
> view of the memory in a 1d array.

Maybe I'm missing what you want, but why not just:

In [15]: tmp
Out[15]:
array([[ 1.07810097, -1.74157351,  0.29740878],
        [-0.16786436,  0.45752272, -0.8038045 ],
        [-0.17195028, -1.16753882,  0.04329128],
        [ 0.45460137, -0.44584955, -0.77140505]])

In [16]: rows = []

In [17]: for r in range(tmp.shape[0]):
              rows.append(tmp[r,:].tostring())
    ....:

In [19]: rows
Out[19]:
['?\xf1?\xe6\xce\x1f9\xce\xbf\xfb\xdd|.\xc85Z?\xd3\x08\xbe\xd6\xb7\xb6\xe8',
  '\xbf\xc5|\x94Sx\x92\x18?\xddH\r\\T\xfbT\xbf\xe9\xb8\xc45\xff\x92\xdf',
  '\xbf\xc6\x02w\x82\x18i\xaf\xbf\xf2\xae=/\xfe\xff\x0b?\xa6*FD\xae\xd1F',
 
'?\xdd\x180Z\xcet\xa5\xbf\xdc\x88\xcc\x8a\x8c\x8b\xe7\xbf\xe8\xafY\xa2\xf8\xac 
']


in general, you can let numpy worry about the strides, etc.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From gokhansever at gmail.com  Tue Oct  6 16:42:51 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 15:42:51 -0500
Subject: [Numpy-discussion] Questions about masked arrays
Message-ID: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>

Hello,

I have a sample masked array data as shown below.

1-) When I list the whole array I see the fill value correctly. However
below that line, when I do access the 5th element, fill_value flies upto
1e+20. What might be wrong here?

I[5]: c.data['Air_Temp']
O[5]:
masked_array(data = [13.1509 13.1309 13.1278 13.1542 -- 13.1539 13.1387 --
-- -- 13.1107
 13.1351 13.2073 13.2562 13.3533 13.3889 13.4067 13.2938 13.1962 13.1248
 13.0411 12.9534 12.8354 12.7392 12.6725],
             mask = [False False False False  True False False  True  True
True False False
 False False False False False False False False False False False False
 False],
       fill_value = 999999.9999)


I[6]: c.data['Air_Temp'][4]
O[6]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


2-) What is wrong with the arccos calculation? Should not that result the
same as with cos(d) result?


I[9]: d = c.data['Air_Temp'][4]


I[11]: cos(d)
O[11]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


I[12]: arccos(d)
O[12]:
masked_array(data = 1.57079632679,
             mask = False,
       fill_value = 1e+20)


Any ideas?

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/e6c4d194/attachment.html>

From Chris.Barker at noaa.gov  Tue Oct  6 16:43:58 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Tue, 06 Oct 2009 13:43:58 -0700
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
Message-ID: <4ACBAC0E.3070708@noaa.gov>

Pierre GM wrote:
>> I think that the default invalid_raise should be True.
> 
> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?

yup -- make it +2 -- ignoring erreos and losing data by default is a 
"bad idea"!

>> One 'feature' is that there is no way to indicate multiple delimiters
>> when the delimiter is whitespace.
>> A B C D
>> 1 2 3 4
>> 1     4 5

I'd say someone has made a very poor choice of file formats!

Unless this s a fixed width file, in which case it should be processes 
as such, rather than as a delimited one. I suppose it wouldn't hurt to 
add that feature to genfromtxt.. or is it there already. Perhaps that's 
what this means:

> Have you tried using a sequence of integers for the delimiter ?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From george.trojan at noaa.gov  Tue Oct  6 16:42:22 2009
From: george.trojan at noaa.gov (George Trojan)
Date: Tue, 6 Oct 2009 20:42:22 +0000 (UTC)
Subject: [Numpy-discussion] vectorize() broken on Python2.6
Message-ID: <loom.20091006T221108-322@post.gmane.org>

f2py generated wrappers cannot be vectorized with numpy1.3.0 and Python2.6.2.
The reason is change to Python's getargs.c. Vectorize, or rather _get_nargs()
defined in lib/function_base.py tries to determine the number of arguments from
error message generated while the interpreter parses function invocation without
any arguments. The messages (in getargs.c) have changed, for example:
Required argument 'a' (pos 1) not found
Since the message no longer contains information how many arguments a function
takes, a fix is not obvious. Is there a solution coming soon?
I posted a message on comp.lang.python few days ago. Sturda Molden generated bug
report http://projects.scipy.org/numpy/ticket/1247. However the change he
suggests does not fix the problem.

I am tempted to apply a temporary workaround for my current needs:
The __init__ method in vectorize would accept an additional argument,
interface=None. That argument would be a Python code stub, prepared manually
(though it could easily be generated by f2py). This stub would be used by
_get_nargs() when the original object does not contain attribute 'func_code'.
Example:

Fortran code
      integer function f3(a, b, c)
      integer, intent(in) :: a, b
      integer, optional, intent(in) :: c
      if (present(c)) then
        f3 = a - b
      else
        f3 = a + b
      endif
      end function f3

Interface
def f3_iface(a, b, c=None):
    pass

Call
vf3 = numpy.vectorize(ftest.f3, f3_iface)

Are there any drawbacks to this approach?

George
 

From josef.pktd at gmail.com  Tue Oct  6 16:47:50 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 6 Oct 2009 16:47:50 -0400
Subject: [Numpy-discussion] tostring() for array rows
In-Reply-To: <4ACBAB06.4000008@noaa.gov>
References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>
	<4ACBAB06.4000008@noaa.gov>
Message-ID: <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com>

On Tue, Oct 6, 2009 at 4:39 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> josef.pktd at gmail.com wrote:
>> If I have a structured or a regular array, is the use of strides in
>> the following always correct for the length of the row memory?
>>
>> I would like to do tostring() but on each row, by creating a string
>> view of the memory in a 1d array.
>
> Maybe I'm missing what you want, but why not just:
>
> In [15]: tmp
> Out[15]:
> array([[ 1.07810097, -1.74157351, ?0.29740878],
> ? ? ? ?[-0.16786436, ?0.45752272, -0.8038045 ],
> ? ? ? ?[-0.17195028, -1.16753882, ?0.04329128],
> ? ? ? ?[ 0.45460137, -0.44584955, -0.77140505]])
>
> In [16]: rows = []
>
> In [17]: for r in range(tmp.shape[0]):
> ? ? ? ? ? ? ?rows.append(tmp[r,:].tostring())
> ? ?....:
>
> In [19]: rows
> Out[19]:
> ['?\xf1?\xe6\xce\x1f9\xce\xbf\xfb\xdd|.\xc85Z?\xd3\x08\xbe\xd6\xb7\xb6\xe8',
> ?'\xbf\xc5|\x94Sx\x92\x18?\xddH\r\\T\xfbT\xbf\xe9\xb8\xc45\xff\x92\xdf',
> ?'\xbf\xc6\x02w\x82\x18i\xaf\xbf\xf2\xae=/\xfe\xff\x0b?\xa6*FD\xae\xd1F',
>
> '?\xdd\x180Z\xcet\xa5\xbf\xdc\x88\xcc\x8a\x8c\x8b\xe7\xbf\xe8\xafY\xa2\xf8\xac
> ']
>
>
> in general, you can let numpy worry about the strides, etc.

I wanted to avoid the python loop and thought creating the view will be faster
with large arrays. But for this I need to know the memory length of a
row of arbitrary types for the conversion to strings, strides was the only
thing I could think of.

>
> -Chris
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice
> 7600 Sand Point Way NE ? (206) 526-6329 ? fax
> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From pgmdevlist at gmail.com  Tue Oct  6 17:04:28 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 17:04:28 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <4ACBAC0E.3070708@noaa.gov>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
	<4ACBAC0E.3070708@noaa.gov>
Message-ID: <B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>


On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote:

> Pierre GM wrote:
>>> I think that the default invalid_raise should be True.
>>
>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?
>
> yup -- make it +2 -- ignoring erreos and losing data by default is a
> "bad idea"!

OK then, that's enough for me: I'll put invalid_raise as True by  
default. Note that a warning was emitted no matter what.


>
>>> One 'feature' is that there is no way to indicate multiple  
>>> delimiters
>>> when the delimiter is whitespace.
>>> A B C D
>>> 1 2 3 4
>>> 1     4 5
>
> I'd say someone has made a very poor choice of file formats!
>
> Unless this s a fixed width file, in which case it should be processes
> as such, rather than as a delimited one. I suppose it wouldn't hurt to
> add that feature to genfromtxt.. or is it there already. Perhaps  
> that's
> what this means:
>
>> Have you tried using a sequence of integers for the delimiter ?

Yes, if you give a sequence of integers as delimiter, it is  
interpreted as the length of each field. At least, should be.


From pgmdevlist at gmail.com  Tue Oct  6 17:28:16 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 17:28:16 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
Message-ID: <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>


On Oct 6, 2009, at 4:42 PM, G?khan Sever wrote:

> Hello,
>
> I have a sample masked array data as shown below.
>
> 1-) When I list the whole array I see the fill value correctly.  
> However below that line, when I do access the 5th element,  
> fill_value flies upto 1e+20. What might be wrong here?

Nothing. Your 5th element is the special constant numpy.ma.masked,  
which has its own filling_value by default. I'll check whether it's  
worth inheriting the fill_value from the original array. If you could  
give me a test case where you'd need that value to keep the original  
filling_value, that'd help me make up my mind.

> 2-) What is wrong with the arccos calculation? Should not that  
> result the same as with cos(d) result?

Mmh, what numpy are you using ? When I try with a recent one,  
np.arccos does output ma.masked...

From gokhansever at gmail.com  Tue Oct  6 18:57:05 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 17:57:05 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
Message-ID: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>

On Tue, Oct 6, 2009 at 4:28 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 6, 2009, at 4:42 PM, G?khan Sever wrote:
>
> > Hello,
> >
> > I have a sample masked array data as shown below.
> >
> > 1-) When I list the whole array I see the fill value correctly.
> > However below that line, when I do access the 5th element,
> > fill_value flies upto 1e+20. What might be wrong here?
>
> Nothing. Your 5th element is the special constant numpy.ma.masked,
> which has its own filling_value by default. I'll check whether it's
> worth inheriting the fill_value from the original array. If you could
> give me a test case where you'd need that value to keep the original
> filling_value, that'd help me make up my mind.
>

Seeing a different filling value is causing confusion. Both for myself, and
when I try to demonstrate the usage of masked array to other people. Also
say, if I want to replace that one element back to its original state will
it use fill_value as 1e+20 or 999999.9999?


>
> > 2-) What is wrong with the arccos calculation? Should not that
> > result the same as with cos(d) result?
>

I first tested on 1.3.0, and later on my laptop using 1.4dev version which
is about an old month built.

Once again the results for each arc... function

I[31]: d
O[31]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


I[26]: arccos(d)
O[26]:
masked_array(data = 1.57079632679,
             mask = False,
       fill_value = 1e+20)


I[28]: arccosh(d)
O[28]:
masked_array(data = nan,
             mask = False,
       fill_value = 1e+20)


I[30]: arcsin(d)
O[30]:
masked_array(data = 0.0,
             mask = False,
       fill_value = 1e+20)


I[32]: arcsinh(d)
O[32]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


I[33]: arctan(d)
O[33]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


I[35]: arctanh(d)
O[35]:
masked_array(data = 0.0,
             mask = False,
       fill_value = 1e+20)


Only arcsinh and arctan results correctly.


>
> Mmh, what numpy are you using ? When I try with a recent one,
> np.arccos does output ma.masked...
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/9fda2f44/attachment.html>

From pgmdevlist at gmail.com  Tue Oct  6 20:38:23 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 20:38:23 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
Message-ID: <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>


On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote:
> Seeing a different filling value is causing confusion. Both for  
> myself, and when I try to demonstrate the usage of masked array to  
> other people.

Fair enough. I must admit that `fill_value` is a vestige from the  
previous implementation (talking pre 1.2 here), that is no longer  
really needed (cf below for more details).

> Also say, if I want to replace that one element back to its original  
> state will it use fill_value as 1e+20 or 999999.9999?

What do you mean by 'replace back to its original state' ? Using  
`filled`, you mean ?

> > 2-) What is wrong with the arccos calculation? Should not that
> > result the same as with cos(d) result?
>
> I first tested on 1.3.0, and later on my laptop using 1.4dev version  
> which is about an old month built.
>
> Once again the results for each arc... function

Er, I assume it's np.arccos ?
Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma).  
Could it be that something went wrng with some ufuncs ? I didn't touch  
ma since 09/08 (thanks, svn history), so I don't think it comes from  
here... Would you mind trying a more recent svn version ?


From gokhansever at gmail.com  Tue Oct  6 21:54:16 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 20:54:16 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
Message-ID: <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>

On Tue, Oct 6, 2009 at 7:38 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote:
> > Seeing a different filling value is causing confusion. Both for
> > myself, and when I try to demonstrate the usage of masked array to
> > other people.
>
> Fair enough. I must admit that `fill_value` is a vestige from the
> previous implementation (talking pre 1.2 here), that is no longer
> really needed (cf below for more details).
>
> > Also say, if I want to replace that one element back to its original
> > state will it use fill_value as 1e+20 or 999999.9999?
>
> What do you mean by 'replace back to its original state' ? Using
> `filled`, you mean ?
>

Yes, in more properly stated fashion "filled" :)


I[14]: c.data['Air_Temp'][4]
O[14]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


I[15]: c.data['Air_Temp'][4].filled()
O[15]: array(1e+20)

Little buggy, isn't it? It properly fill the whole array:

I[13]: c.data['Air_Temp'].filled()
O[13]:
array([  1.31509000e+01,   1.31309000e+01,   1.31278000e+01,
         1.31542000e+01,   1.00000000e+06,   1.31539000e+01,
         1.31387000e+01,   1.00000000e+06,   1.00000000e+06,
         1.00000000e+06,   1.31107000e+01,   1.31351000e+01,
         1.32073000e+01,   1.32562000e+01,   1.33533000e+01,
         1.33889000e+01,   1.34067000e+01,   1.32938000e+01,
         1.31962000e+01,   1.31248000e+01,   1.30411000e+01,
         1.29534000e+01,   1.28354000e+01,   1.27392000e+01,
         1.26725000e+01])


>
> > > 2-) What is wrong with the arccos calculation? Should not that
> > > result the same as with cos(d) result?
> >
> > I first tested on 1.3.0, and later on my laptop using 1.4dev version
> > which is about an old month built.
> >
> > Once again the results for each arc... function
>
> Er, I assume it's np.arccos ?
>

Sorry too much time spent in ipython -pylab :)

I[18]: arccos?
Type:             ufunc
Base Class:       <type 'numpy.ufunc'>
String Form:   <ufunc 'arccos'>
Namespace:        Interactive
File:             /home/gsever/Desktop/python-repo/numpy/numpy/__init__.py


> Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma).
> Could it be that something went wrng with some ufuncs ?


This I don't know :(


> I didn't touch
> ma since 09/08 (thanks, svn history), so I don't think it comes from
> here...


Yes, SVN is a very useful invention indeed.

I[6]: numpy.__version__
O[6]: '1.4.0.dev'

For some reason it doesn't list check-out revision.

Doing an ls -l reveals that those are checked-out and installed after August
13 which was a preparation for the SciPy 09 :)


Would you mind trying a more recent svn version ?
>

This is the last resort. I will eventually try this if I don't any other
options left.

I confirmed the same arccos weirdness in Sage Notebook (www.sagenb.org)
where Numpy 1.3.0 is installed there.


>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/7acf3548/attachment.html>

From pgmdevlist at gmail.com  Tue Oct  6 22:06:28 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 22:06:28 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
Message-ID: <BA11ECA1-9FA0-4D38-857F-AF1CC0CA3AE6@gmail.com>


On Oct 6, 2009, at 6:57 PM, G?khan Sever wrote:
>
> Seeing a different filling value is causing confusion. Both for  
> myself, and when I try to demonstrate the usage of masked array to  
> other people. Also say, if I want to replace that one element back  
> to its original state will it use fill_value as 1e+20 or 999999.9999?

I knew I was missing something:

When you use display a mask entry, you actually display the `masked`  
constant: it's a 0-shaped float masked array with its own  
`fill_value`, but more importantly, it's a constant. You can use it to  
test whether one element is masked. Check this example:
 >>> x = ma.array([1,2,3],mask=[0,1,0],dtype=int,fill_value=999)
 >>> x
masked_array(data = [1 -- 3],
              mask = [False  True False],
        fill_value = 999)
 >>> x[1] is masked
True
 >>> x[1]
masked_array(data = --,
              mask = True,
        fill_value = 1e+20)

Now, you can change the fill_value of the masked element to whatever  
you want, but it'll be propagated
 >>> ma.masked.fill_value = -999.
 >>> x[1]
masked_array(data = --,
              mask = True,
        fill_value = -999.0)
 >>> y = ma.array([3,2,1],mask=[1,0,1])
 >>> y[0]
masked_array(data = --,
              mask = True,
        fill_value = -999.0)

See ?

Now, I understand this behavior is a bit confusing. Unfortunately, we  
need to keep being able to use (element is masked), which implies that  
we need to keep this apparent inconsistency.
What we could do is to define some specific display for the `masked`  
constant like `masked`. I'm open to suggestions.


From bsouthey at gmail.com  Tue Oct  6 22:08:58 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 6 Oct 2009 21:08:58 -0500
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>
Message-ID: <bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>

On Tue, Oct 6, 2009 at 4:04 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
> On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote:
>
>> Pierre GM wrote:
>>>> I think that the default invalid_raise should be True.
>>>
>>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?
>>
>> yup -- make it +2 -- ignoring erreos and losing data by default is a
>> "bad idea"!
>
> OK then, that's enough for me: I'll put invalid_raise as True by
> default. Note that a warning was emitted no matter what.
>
>
>>
>>>> One 'feature' is that there is no way to indicate multiple
>>>> delimiters
>>>> when the delimiter is whitespace.
>>>> A B C D
>>>> 1 2 3 4
>>>> 1 ? ? 4 5
>>
>> I'd say someone has made a very poor choice of file formats!

No, just seeing what sort of problems I can create. This case is
partly based on if someone is using tab-delimited then they need to
set the delimiter='\t' otherwise it gives an error. Also I often parse
text files so, yes, you have to be careful of the delimiters. It is
also arises because certain programs like spreadsheets there is the
option to merge delimiters - actually in SAS it is default (you need
to specify the DSD option).

>>
>> Unless this s a fixed width file, in which case it should be processes
>> as such, rather than as a delimited one. I suppose it wouldn't hurt to
>> add that feature to genfromtxt.. or is it there already. Perhaps
>> that's
>> what this means:
>>
>>> Have you tried using a sequence of integers for the delimiter ?
>
> Yes, if you give a sequence of integers as delimiter, it is
> interpreted as the length of each field. At least, should be.

More to learn and test.

Anyhow, I am really impressed on how this function works.

Bruce


From pgmdevlist at gmail.com  Tue Oct  6 22:22:26 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 22:22:26 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
Message-ID: <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>


On Oct 6, 2009, at 9:54 PM, G?khan Sever wrote:
>
> > Also say, if I want to replace that one element back to its original
> > state will it use fill_value as 1e+20 or 999999.9999?
>
> What do you mean by 'replace back to its original state' ? Using
> `filled`, you mean ?
>
> Yes, in more properly stated fashion "filled" :)

> I[14]: c.data['Air_Temp'][4]
> O[14]:
> masked_array(data = --,
>              mask = True,
>        fill_value = 1e+20)
>
>
> I[15]: c.data['Air_Temp'][4].filled()
> O[15]: array(1e+20)
>
> Little buggy, isn't it? It properly fill the whole array:
>
> I[13]: c.data['Air_Temp'].filled()
> O[13]:
> array([  1.31509000e+01,   1.31309000e+01,   1.31278000e+01,
>          1.31542000e+01,   1.00000000e+06,   1.31539000e+01,
>          1.31387000e+01,   1.00000000e+06,   1.00000000e+06,
>          1.00000000e+06,   1.31107000e+01,   1.31351000e+01,
>          1.32073000e+01,   1.32562000e+01,   1.33533000e+01,
>          1.33889000e+01,   1.34067000e+01,   1.32938000e+01,
>          1.31962000e+01,   1.31248000e+01,   1.30411000e+01,
>          1.29534000e+01,   1.28354000e+01,   1.27392000e+01,
>          1.26725000e+01])

Once again, when you access your 5th element, you get the special  
`masked` constant. If you fill this constant, you'll get something  
which is probably not what you want. And I would need a *REALLY*  
compelling reason to change this behavior, as it's gonna break a lot  
of things (the masked constant has been around for a while)

> > > 2-) What is wrong with the arccos calculation? Should not that
>
> Er, I assume it's np.arccos ?
>
> Sorry too much time spent in ipython -pylab :)

Well, i use ipython -pylab regularly as well, but still have the  
reflex of using np. ;)


>
> I[18]: arccos?
> Type:             ufunc
> Base Class:       <type 'numpy.ufunc'>
> String Form:   <ufunc 'arccos'>
> Namespace:        Interactive
> File:             /home/gsever/Desktop/python-repo/numpy/numpy/ 
> __init__.py
>
>
> Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma).
> Could it be that something went wrng with some ufuncs ?
>
> This I don't know :(
>
> I didn't touch
> ma since 09/08 (thanks, svn history), so I don't think it comes from
> here...
>
> Yes, SVN is a very useful invention indeed.
>
> I[6]: numpy.__version__
> O[6]: '1.4.0.dev'
>
> For some reason it doesn't list check-out revision.

I know, and it's bugging me as well. if you have a build directory  
somewhere, check numpy/core/__svn_version__.py

> This is the last resort. I will eventually try this if I don't any  
> other options left.

I gonna have difficulties fixing something that I don't see broken...  
Now, there might be something wrong in my installation. I gonna try to  
install 1.3.0 somwehere. say, what Python are you using ?


From pgmdevlist at gmail.com  Tue Oct  6 22:27:12 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 22:27:12 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>
	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
Message-ID: <C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>


On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote:
> No, just seeing what sort of problems I can create. This case is
> partly based on if someone is using tab-delimited then they need to
> set the delimiter='\t' otherwise it gives an error. Also I often parse
> text files so, yes, you have to be careful of the delimiters. It is
> also arises because certain programs like spreadsheets there is the
> option to merge delimiters - actually in SAS it is default (you need
> to specify the DSD option).

Ahah! I get it. Well, I remmbr that we discussed something like that a  
few months ago when I started working on np.genfromtxt, and the  
default of *not* merging whitespaces was requested. I gonna check  
whether we can't put this option somewhere now...

> Anyhow, I am really impressed on how this function works.

Thx. I hope things haven't been slowed down too much.


From jsseabold at gmail.com  Tue Oct  6 22:40:50 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Tue, 6 Oct 2009 22:40:50 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com> 
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> 
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com> 
	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
Message-ID: <c048da1c0910061940h7af78ff1w2fffc0bd35f1e8d2@mail.gmail.com>

On Tue, Oct 6, 2009 at 10:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Tue, Oct 6, 2009 at 4:04 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>
>> On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote:
>>
>>> Pierre GM wrote:
>>>>> I think that the default invalid_raise should be True.
>>>>
>>>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?
>>>
>>> yup -- make it +2 -- ignoring erreos and losing data by default is a
>>> "bad idea"!
>>
>> OK then, that's enough for me: I'll put invalid_raise as True by
>> default. Note that a warning was emitted no matter what.
>>
>>
>>>
>>>>> One 'feature' is that there is no way to indicate multiple
>>>>> delimiters
>>>>> when the delimiter is whitespace.
>>>>> A B C D
>>>>> 1 2 3 4
>>>>> 1 ? ? 4 5
>>>
>>> I'd say someone has made a very poor choice of file formats!
>
> No, just seeing what sort of problems I can create. This case is
> partly based on if someone is using tab-delimited then they need to
> set the delimiter='\t' otherwise it gives an error. Also I often parse
> text files so, yes, you have to be careful of the delimiters. It is
> also arises because certain programs like spreadsheets there is the
> option to merge delimiters - actually in SAS it is default (you need
> to specify the DSD option).
>
>>>
>>> Unless this s a fixed width file, in which case it should be processes
>>> as such, rather than as a delimited one. I suppose it wouldn't hurt to
>>> add that feature to genfromtxt.. or is it there already. Perhaps
>>> that's
>>> what this means:
>>>
>>>> Have you tried using a sequence of integers for the delimiter ?
>>
>> Yes, if you give a sequence of integers as delimiter, it is
>> interpreted as the length of each field. At least, should be.
>
> More to learn and test.
>

There's an example on using the fixed-width delimiter here:
http://docs.scipy.org/numpy/docs/numpy.lib.io.genfromtxt/

As far as I know, it works fine.

> Anyhow, I am really impressed on how this function works.
>

Agreed.  Genfromtxt and the derived are very useful.

Skipper


From gokhansever at gmail.com  Tue Oct  6 22:58:26 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 21:58:26 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
Message-ID: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>

On Tue, Oct 6, 2009 at 9:22 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 6, 2009, at 9:54 PM, G?khan Sever wrote:
> >
> > > Also say, if I want to replace that one element back to its original
> > > state will it use fill_value as 1e+20 or 999999.9999?
> >
> > What do you mean by 'replace back to its original state' ? Using
> > `filled`, you mean ?
> >
> > Yes, in more properly stated fashion "filled" :)
>
> > I[14]: c.data['Air_Temp'][4]
> > O[14]:
> > masked_array(data = --,
> >              mask = True,
> >        fill_value = 1e+20)
> >
> >
> > I[15]: c.data['Air_Temp'][4].filled()
> > O[15]: array(1e+20)
> >
> > Little buggy, isn't it? It properly fill the whole array:
> >
> > I[13]: c.data['Air_Temp'].filled()
> > O[13]:
> > array([  1.31509000e+01,   1.31309000e+01,   1.31278000e+01,
> >          1.31542000e+01,   1.00000000e+06,   1.31539000e+01,
> >          1.31387000e+01,   1.00000000e+06,   1.00000000e+06,
> >          1.00000000e+06,   1.31107000e+01,   1.31351000e+01,
> >          1.32073000e+01,   1.32562000e+01,   1.33533000e+01,
> >          1.33889000e+01,   1.34067000e+01,   1.32938000e+01,
> >          1.31962000e+01,   1.31248000e+01,   1.30411000e+01,
> >          1.29534000e+01,   1.28354000e+01,   1.27392000e+01,
> >          1.26725000e+01])
>
> Once again, when you access your 5th element, you get the special
> `masked` constant. If you fill this constant, you'll get something
> which is probably not what you want. And I would need a *REALLY*
> compelling reason to change this behavior, as it's gonna break a lot
> of things (the masked constant has been around for a while)
>
>
I see your points. I don't want to give you extra work, don't worry :) It
just seem a bit bizarre:

I[27]: c.data['Air_Temp'].fill_value
O[27]: 999999.99990000005

I[28]: c.data['Air_Temp'][4].fill_value
O[28]: 1e+20

As you see, it just returns two different fill_values. I know eventually you
will be the one handling this :) it might be good to add this issue to the
tracker.


> > > > 2-) What is wrong with the arccos calculation? Should not that
> >
> > Er, I assume it's np.arccos ?
> >
> > Sorry too much time spent in ipython -pylab :)
>
> Well, i use ipython -pylab regularly as well, but still have the
> reflex of using np. ;)
>
>
>
Good reflex. Saves you from making extra explanations. But it works with
just typing array why should I type np.array (Ohh my namespacess :)

It is just an IPython magic.


>
> >
> > I[18]: arccos?
> > Type:             ufunc
> > Base Class:       <type 'numpy.ufunc'>
> > String Form:   <ufunc 'arccos'>
> > Namespace:        Interactive
> > File:             /home/gsever/Desktop/python-repo/numpy/numpy/
> > __init__.py
> >
> >
> > Anyway, I'm puzzled. Works like a charm here (r7438 for numpy.ma).
> > Could it be that something went wrng with some ufuncs ?
> >
> > This I don't know :(
> >
> > I didn't touch
> > ma since 09/08 (thanks, svn history), so I don't think it comes from
> > here...
> >
> > Yes, SVN is a very useful invention indeed.
> >
> > I[6]: numpy.__version__
> > O[6]: '1.4.0.dev'
> >
> > For some reason it doesn't list check-out revision.
>
> I know, and it's bugging me as well. if you have a build directory
> somewhere, check numpy/core/__svn_version__.py
>
>
There is build directory but no files that contains svn :(


> > This is the last resort. I will eventually try this if I don't any
> > other options left.
>
> I gonna have difficulties fixing something that I don't see broken...
> Now, there might be something wrong in my installation. I gonna try to
> install 1.3.0 somwehere. say, what Python are you using ?
>

OK, I use meld to diff my copy of ma/core.py with the latest trunk version.
There are lots of differences :) So there is a possibility that I might have
built my local numpy before 09/08. I should renew my copy. Do you know the
link of svn browser for the numpy? I don't know how you are making separate
installations without overriding other package? I either use Sage (if I have
extra time) or SPD. They are both shipped with numpy 1.3.0.

Let see how it will result with a new build...


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/d92fd0c3/attachment.html>

From jsseabold at gmail.com  Tue Oct  6 23:01:45 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Tue, 6 Oct 2009 23:01:45 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com> 
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com> 
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com> 
	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com> 
	<C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
Message-ID: <c048da1c0910062001l62738417y90f8a7be215da7f4@mail.gmail.com>

On Tue, Oct 6, 2009 at 10:27 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
<snip>
>> Anyhow, I am really impressed on how this function works.
>
> Thx. I hope things haven't been slowed down too much.

In keeping with the making some work for you theme, I filed an
enhancement ticket for one change that we discussed and another IMO
useful addition.  http://projects.scipy.org/numpy/ticket/1238

I think it would be nice if we could do

data = np.genfromtxt(SomeFile, dtype=float, names = ['var1', 'var2',
'var3' ...])

So that float is paired with each variable name.  Also, the one that
came up earlier of

data = np.genfromtxt(SomeFile, dtype=(int, int, float), names =
['var1','var2','var3']

I'm not completely convinced on this one though, since dtype =
"i8,i8,f8" works.  I don't want know how much confusion it would add
to have the dtype argument accept a non-valid dtype construction.

Skipper

PS.  Is it bad form for me to go ahead and assign these kinds of
tickets to you if you're going to be working on them, or do you get
pinged when any ticket is filed?


From pgmdevlist at gmail.com  Tue Oct  6 23:15:40 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 23:15:40 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
Message-ID: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>


On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote:
>
> I see your points. I don't want to give you extra work, don't  
> worry :) It just seem a bit bizarre:
>
> I[27]: c.data['Air_Temp'].fill_value
> O[27]: 999999.99990000005
>
> I[28]: c.data['Air_Temp'][4].fill_value
> O[28]: 1e+20
>
> As you see, it just returns two different fill_values.

I know, but I hope you see the difference : in the first line, you  
access the `fill_value` of the array. In the second, you access the  
`fill_value` of the `masked` constant. Each time you access a masked  
element of an array with __getitem__, you get the masked constant. We  
could force the constant to inherit the fill_value of the array that  
calls __getitem__, but it'd be propagated.

> I know eventually you will be the one handling this :) it might be  
> good to add this issue to the tracker.

Go for it, but don't expect anything before the release of 1.4.0 (in  
the next few months)

>
> > This is the last resort. I will eventually try this if I don't any
> > other options left.
>
> I gonna have difficulties fixing something that I don't see broken...
> Now, there might be something wrong in my installation. I gonna try to
> install 1.3.0 somwehere. say, what Python are you using ?
>
> OK, I use meld to diff my copy of ma/core.py with the latest trunk  
> version. There are lots of differences :) So there is a possibility  
> that I might have built my local numpy before 09/08. I should renew  
> my copy. Do you know the link of svn browser for the numpy? I don't  
> know how you are making separate installations without overriding  
> other package? I either use Sage (if I have extra time) or SPD. They  
> are both shipped with numpy 1.3.0.

Make yourself a favor and install virtualenv and virtualenvwrapper.  
That way, several versions of the same package can coexist without  
interference. Oh, and install pip till you're at it:

http://pypi.python.org/pypi/virtualenv
http://www.doughellmann.com/projects/virtualenvwrapper/
http://pypi.python.org/pypi/pip


From pgmdevlist at gmail.com  Tue Oct  6 23:23:57 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 6 Oct 2009 23:23:57 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <c048da1c0910062001l62738417y90f8a7be215da7f4@mail.gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>
	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
	<C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
	<c048da1c0910062001l62738417y90f8a7be215da7f4@mail.gmail.com>
Message-ID: <A9459C8C-9D14-4D2A-8728-CDAFD4F7CE34@gmail.com>


On Oct 6, 2009, at 11:01 PM, Skipper Seabold wrote:
>
> In keeping with the making some work for you theme, I filed an
> enhancement ticket for one change that we discussed and another IMO
> useful addition.  http://projects.scipy.org/numpy/ticket/1238
>
> I think it would be nice if we could do
>
> data = np.genfromtxt(SomeFile, dtype=float, names = ['var1', 'var2',
> 'var3' ...])
>
> So that float is paired with each variable name.  Also, the one that
> came up earlier of
>
> data = np.genfromtxt(SomeFile, dtype=(int, int, float), names =
> ['var1','var2','var3']
>
> I'm not completely convinced on this one though, since dtype =
> "i8,i8,f8" works.  I don't want know how much confusion it would add
> to have the dtype argument accept a non-valid dtype construction.

Actually, it's rather straightforward. I already have something that  
supports dtype=(int,int,float) (far easier to handle than "i4,i4,f8"),  
I need to tweak a couple of things when the names don't match before  
posting. Pairing the names with the dtype is pretty neat, that would  
be quite easy to implement


> PS.  Is it bad form for me to go ahead and assign these kinds of
> tickets to you if you're going to be working on them, or do you get
> pinged when any ticket is filed?

Go for it. I'm only notified when a ticket is assigned to me directly.


From gokhansever at gmail.com  Tue Oct  6 23:47:11 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 22:47:11 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
Message-ID: <49d6b3500910062047u445af52ata01ec68c2e14744f@mail.gmail.com>

On Tue, Oct 6, 2009 at 10:15 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote:
> >
> > I see your points. I don't want to give you extra work, don't
> > worry :) It just seem a bit bizarre:
> >
> > I[27]: c.data['Air_Temp'].fill_value
> > O[27]: 999999.99990000005
> >
> > I[28]: c.data['Air_Temp'][4].fill_value
> > O[28]: 1e+20
> >
> > As you see, it just returns two different fill_values.
>
> I know, but I hope you see the difference : in the first line, you
> access the `fill_value` of the array. In the second, you access the
> `fill_value` of the `masked` constant. Each time you access a masked
> element of an array with __getitem__, you get the masked constant. We
> could force the constant to inherit the fill_value of the array that
> calls __getitem__, but it'd be propagated.
>
>
Got these points. Thanks

It took a while I had to re-built matplotlib to use ipython -pylab :)

I built the numpy again source from the trunk and arccos (as well as other
arc functions) problem has disappeared. It all started with trying to
calculate great circle navigation equations using masked arrays, and seeing
this range_calc function returning some weird results where it was not
supposed to do. Further tracing down the error to arccos.

def range_calc(lat_r, lat_t, long_r, long_t):
    range = degrees(arccos(sin(radians(lat_r)) * sin(radians(lat_t)) +
cos(radians(lat_r)) * cos(radians(lat_t)) * cos(radians(long_t - long_r))))
* F
    azimuth = degrees(arccos((sin(radians(lat_t)) - cos(radians(range / F))
* sin(radians(lat_r))) / (sin(radians(range / F)) * cos(radians(lat_r)))))

    if long_t - long_r < 0:
        azimuth = 360 - azimuth

    return range, azimuth

Happy now ;)


> > I know eventually you will be the one handling this :) it might be
> > good to add this issue to the tracker.
>
> Go for it, but don't expect anything before the release of 1.4.0 (in
> the next few months)
>
>
I will do this shortly.


> >
> > > This is the last resort. I will eventually try this if I don't any
> > > other options left.
> >
> > I gonna have difficulties fixing something that I don't see broken...
> > Now, there might be something wrong in my installation. I gonna try to
> > install 1.3.0 somwehere. say, what Python are you using ?
> >
> > OK, I use meld to diff my copy of ma/core.py with the latest trunk
> > version. There are lots of differences :) So there is a possibility
> > that I might have built my local numpy before 09/08. I should renew
> > my copy. Do you know the link of svn browser for the numpy? I don't
> > know how you are making separate installations without overriding
> > other package? I either use Sage (if I have extra time) or SPD. They
> > are both shipped with numpy 1.3.0.
>
> Make yourself a favor and install virtualenv and virtualenvwrapper.
> That way, several versions of the same package can coexist without
> interference. Oh, and install pip till you're at it:
>
> http://pypi.python.org/pypi/virtualenv
> http://www.doughellmann.com/projects/virtualenvwrapper/
> http://pypi.python.org/pypi/pip
>
>
>
"pip" this is the first time I am hearing. Will give these tools a try
probably this weekend.

Thanks again for your clarifications.

Now, I have to update my advisor's numpy to make his code running correctly.
In the first place his code was running properly by using manually created
masks for numpy arrays. Using the masked arrays we broke it. Now we know
what causing the error. It feels good :)


>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/d90749aa/attachment.html>

From gokhansever at gmail.com  Wed Oct  7 00:10:55 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 6 Oct 2009 23:10:55 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
Message-ID: <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>

Created the ticket http://projects.scipy.org/numpy/ticket/1253

Could you tell me briefly what was the source of leak in arccos case?

And how do you write a test code for these cases?

On Tue, Oct 6, 2009 at 10:15 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 6, 2009, at 10:58 PM, G?khan Sever wrote:
> >
> > I see your points. I don't want to give you extra work, don't
> > worry :) It just seem a bit bizarre:
> >
> > I[27]: c.data['Air_Temp'].fill_value
> > O[27]: 999999.99990000005
> >
> > I[28]: c.data['Air_Temp'][4].fill_value
> > O[28]: 1e+20
> >
> > As you see, it just returns two different fill_values.
>
> I know, but I hope you see the difference : in the first line, you
> access the `fill_value` of the array. In the second, you access the
> `fill_value` of the `masked` constant. Each time you access a masked
> element of an array with __getitem__, you get the masked constant. We
> could force the constant to inherit the fill_value of the array that
> calls __getitem__, but it'd be propagated.
>
> > I know eventually you will be the one handling this :) it might be
> > good to add this issue to the tracker.
>
> Go for it, but don't expect anything before the release of 1.4.0 (in
> the next few months)
>
> >
> > > This is the last resort. I will eventually try this if I don't any
> > > other options left.
> >
> > I gonna have difficulties fixing something that I don't see broken...
> > Now, there might be something wrong in my installation. I gonna try to
> > install 1.3.0 somwehere. say, what Python are you using ?
> >
> > OK, I use meld to diff my copy of ma/core.py with the latest trunk
> > version. There are lots of differences :) So there is a possibility
> > that I might have built my local numpy before 09/08. I should renew
> > my copy. Do you know the link of svn browser for the numpy? I don't
> > know how you are making separate installations without overriding
> > other package? I either use Sage (if I have extra time) or SPD. They
> > are both shipped with numpy 1.3.0.
>
> Make yourself a favor and install virtualenv and virtualenvwrapper.
> That way, several versions of the same package can coexist without
> interference. Oh, and install pip till you're at it:
>
> http://pypi.python.org/pypi/virtualenv
> http://www.doughellmann.com/projects/virtualenvwrapper/
> http://pypi.python.org/pypi/pip
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091006/c996d004/attachment.html>

From pgmdevlist at gmail.com  Wed Oct  7 00:33:00 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 7 Oct 2009 00:33:00 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
Message-ID: <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>


On Oct 7, 2009, at 12:10 AM, G?khan Sever wrote:

> Created the ticket http://projects.scipy.org/numpy/ticket/1253

Want even more confusion ?
 >>> x = ma.array([1,2,3],mask=[0,1,0], dtype=int)
 >>> x[0].dtype
dtype('int64')
 >>> x[1].dtype
dtype('float64')
 >>> x[2].dtype
dtype('int64')

Yet another illustration of the masked constant... The more I think  
about it, the more I think we should have a specific object  
("MaskedConstant") that would do nothing but tell us that it is masked.


> Could you tell me briefly what was the source of leak in arccos case?

No idea, as I still haven't figured why you were having the problem in  
the first place

> And how do you write a test code for these cases?

assert(np.arccos(ma.masked), ma.masked) would be the simplest.


From gokhansever at gmail.com  Wed Oct  7 01:12:19 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 00:12:19 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
	<9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
Message-ID: <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>

On Tue, Oct 6, 2009 at 11:33 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 7, 2009, at 12:10 AM, G?khan Sever wrote:
>
> > Created the ticket http://projects.scipy.org/numpy/ticket/1253
>
> Want even more confusion ?
>  >>> x = ma.array([1,2,3],mask=[0,1,0], dtype=int)
>  >>> x[0].dtype
> dtype('int64')
>  >>> x[1].dtype
> dtype('float64')
>  >>> x[2].dtype
> dtype('int64')
>
> Yet another illustration of the masked constant... The more I think
> about it, the more I think we should have a specific object
> ("MaskedConstant") that would do nothing but tell us that it is masked.
>

Confusing indeed.

One more from me:

I[1]: a = np.arange(5)

I[2]: mask = 999

I[6]: a[3] = 999

I[7]: am = ma.masked_equal(a, mask)

I[8]: am
O[8]:
masked_array(data = [0 1 2 -- 4],
             mask = [False False False  True False],
       fill_value = 999999)

Where does this fill_value come from? To me it is little confusing having a
"value" and "fill_value" in masked array method arguments.


>
>
> > Could you tell me briefly what was the source of leak in arccos case?
>
> No idea, as I still haven't figured why you were having the problem in
> the first place
>

Probably you can pin-point the error by testing a 1.3.0 version numpy. Not
too many arc function with masked array users around I guess :)


>
> > And how do you write a test code for these cases?
>
> assert(np.arccos(ma.masked), ma.masked) would be the simplest.
>

Good to know this. The more I spend time with numpy the more I understand
the importance of testing the code automatically. This said, I still find
the test-driven-development approach somewhat bizarre. Start only by writing
test code and keep implementing your code until all the tests are satisfied.
Very interesting...These software engineers...


>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/3815d41c/attachment.html>

From pgmdevlist at gmail.com  Wed Oct  7 01:47:53 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 7 Oct 2009 01:47:53 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
	<9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
	<49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>
Message-ID: <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com>


On Oct 7, 2009, at 1:12 AM, G?khan Sever wrote:
> One more from me:
> I[1]: a = np.arange(5)
> I[2]: mask = 999
> I[6]: a[3] = 999
> I[7]: am = ma.masked_equal(a, mask)
>
> I[8]: am
> O[8]:
> masked_array(data = [0 1 2 -- 4],
>              mask = [False False False  True False],
>        fill_value = 999999)
>
> Where does this fill_value come from? To me it is little confusing  
> having a "value" and "fill_value" in masked array method arguments.

Because the two are unrelated. The `fill_value` is the value used to  
fill the masked elements (that is, the missing entries).
When you create a masked array, you get a `fill_value`, whose actual  
value is defined by default from the dtype of the array: for int, it's  
999999, for float, 1e+20, you get the idea.
The value you used for masking is different, it's just whatver value  
you consider invalid. Now, if I follow you, you would expect the value  
in `masked_equal(array, value)` to be the `fill_value` of the output.  
That's an idea, would you mind fiilling a ticket/enhancement and  
assign it to me? So that I don't forget.


> Probably you can pin-point the error by testing a 1.3.0 version  
> numpy. Not too many arc function with masked array users around I  
> guess :)

Will try, but "if it ain't broken, don't fix it"...

> assert(np.arccos(ma.masked), ma.masked) would be the simplest.

(and in fact, it'd be assert(np.arccos(ma.masked) is ma.masked) in  
this case).


> Good to know this. The more I spend time with numpy the more I  
> understand the importance of testing the code automatically. This  
> said, I still find the test-driven-development approach somewhat  
> bizarre. Start only by writing test code and keep implementing your  
> code until all the tests are satisfied. Very interesting...These  
> software engineers...

Bah, it's not a rule cast in iron... You can start writing your code  
but do write the tests at the same time. It's the best way to make  
sure you're not breaking something later on.
>


From gokhansever at gmail.com  Wed Oct  7 02:57:01 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 01:57:01 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <61E03928-9A65-449B-8785-E22E52F1C034@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
	<9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
	<49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>
	<61E03928-9A65-449B-8785-E22E52F1C034@gmail.com>
Message-ID: <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com>

On Wed, Oct 7, 2009 at 12:47 AM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 7, 2009, at 1:12 AM, G?khan Sever wrote:
> > One more from me:
> > I[1]: a = np.arange(5)
> > I[2]: mask = 999
> > I[6]: a[3] = 999
> > I[7]: am = ma.masked_equal(a, mask)
> >
> > I[8]: am
> > O[8]:
> > masked_array(data = [0 1 2 -- 4],
> >              mask = [False False False  True False],
> >        fill_value = 999999)
> >
> > Where does this fill_value come from? To me it is little confusing
> > having a "value" and "fill_value" in masked array method arguments.
>
> Because the two are unrelated. The `fill_value` is the value used to
> fill the masked elements (that is, the missing entries).
> When you create a masked array, you get a `fill_value`, whose actual
> value is defined by default from the dtype of the array: for int, it's
> 999999, for float, 1e+20, you get the idea.
> The value you used for masking is different, it's just whatver value
> you consider invalid. Now, if I follow you, you would expect the value
> in `masked_equal(array, value)` to be the `fill_value` of the output.
> That's an idea, would you mind fiilling a ticket/enhancement and
> assign it to me? So that I don't forget.
>

One more example. (I still think the behaviour of fill_value is
inconsistent) See below:

I[6]: f = np.arange(5, dtype=float)

I[7]: mask = 9999.9999

I[8]: f[3] = mask

I[9]: fm = ma.masked_equal(f, mask)

I[10]: fm
O[10]:
masked_array(data = [0.0 1.0 2.0 -- 4.0],
             mask = [False False False  True False],
       fill_value = 1e+20)


I[22]: fm2 = ma.masked_values(f, mask)

I[23]: fm2
O[23]:
masked_array(data = [0.0 1.0 2.0 -- 4.0],
             mask = [False False False  True False],
       fill_value = 9999.9999)


ma.masked_equal(x, value, copy=True)
ma.masked_values(x, value, rtol=1.0000000000000001e-05, atol=1e-08,
copy=True, shrink=True)

Similar function definitions, but different fill_values... Ok, it is almost
2 AM here my understanding might be crawling on the ground. Probably I will
re-read your comments and file an issue on the trac.


>
>
> > Probably you can pin-point the error by testing a 1.3.0 version
> > numpy. Not too many arc function with masked array users around I
> > guess :)
>
> Will try, but "if it ain't broken, don't fix it"...
>

Also if it is working don't update (This applies to Fedora updates :)
especially if you have an Nvidia display card)


>
> > assert(np.arccos(ma.masked), ma.masked) would be the simplest.
>
> (and in fact, it'd be assert(np.arccos(ma.masked) is ma.masked) in
> this case).
>
>
> > Good to know this. The more I spend time with numpy the more I
> > understand the importance of testing the code automatically. This
> > said, I still find the test-driven-development approach somewhat
> > bizarre. Start only by writing test code and keep implementing your
> > code until all the tests are satisfied. Very interesting...These
> > software engineers...
>
> Bah, it's not a rule cast in iron... You can start writing your code
> but do write the tests at the same time. It's the best way to make
> sure you're not breaking something later on.
> >
>
>
That's what I have been thinking, a more reasonable way. The other is way
too a reverse thinking.

Thanks for the long hours discussion.


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/bfd63cf2/attachment.html>

From johan.gronqvist at gmail.com  Wed Oct  7 03:30:22 2009
From: johan.gronqvist at gmail.com (=?ISO-8859-1?Q?Johan_Gr=F6nqvist?=)
Date: Wed, 07 Oct 2009 09:30:22 +0200
Subject: [Numpy-discussion] numpy.linalg.eig memory issue with libatlas?
Message-ID: <hahg25$rs3$1@ger.gmane.org>

[I am resending this as the previous attempt seems to have failed]

Hello List,

I am looking at memory errors when using numpy.linalg.eig().

Short version:

I had memory errors in numpy.linalg.eig(), and I have reasons (valgrind)
to believe these are due to writing to incorrect memory addresses in the
diagonalization routine zgeev, called by numpy.linalg.eig().

I realized that I had recently installed atlas, and now had several
lapack-like libraries, so I uninstalled atlas, and the issues seemed to
go away.

My question is: Could it be that some lapack/blas/atlas package I use is
incompatible with the numpy I use, and if so, is there a method to
diagnose this in a more reliable way?


Longer version:

The system used is an updated debian testing (squeeze), on amd64.
My program uses numpy, matplotlib, and a module compiled using cython.

I started getting errors from my program this week. Pdb and
print-statements tell me that the errors arise around the point where I
call numpy.linalg.eig(), but not every time. The type of error varies.
Most frequently a segmentation fault, but sometimes a matrix dimension
mismatch, and sometimes a message related to the python GC.

Valgrind tells me that something "impossible" happened, and that this is
probably due to invalid writes earlier during the program execution.
There seems to be two invalid writes after each program crash, and the
log looks like this (it only contains two invalid writes):

[...]
==6508== Invalid write of size 8
==6508==    at 0x92D2597: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590)
==6508==    by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612)
==6508==    by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x4DC991: function_call (funcobject.c:517)
==6508==  Address 0x67ab118 is not stack'd, malloc'd or (recently) free'd
==6508==
==6508== Invalid write of size 8
==6508==    at 0x92D25A8: zunmhr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x920A42B: zlaqr3_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x9205D11: zlaqr0_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x91B0C4D: zhseqr_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x911CA15: zgeev_ (in /usr/lib/atlas/liblapack.so.3gf.0)
==6508==    by 0x881B81B: lapack_lite_zgeev (lapack_litemodule.c:590)
==6508==    by 0x4911D4: PyEval_EvalFrameEx (ceval.c:3612)
==6508==    by 0x491CE1: PyEval_EvalFrameEx (ceval.c:3698)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x490F17: PyEval_EvalFrameEx (ceval.c:3708)
==6508==    by 0x4924CC: PyEval_EvalCodeEx (ceval.c:2875)
==6508==    by 0x4DC991: function_call (funcobject.c:517)
==6508==  Address 0x67ab110 is not stack'd, malloc'd or (recently) free'd
[...]
valgrind: m_mallocfree.c:248 (get_bszB_as_is): Assertion 'bszB_lo ==
bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 96, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.
[...]


Today I looked in my package installation logs to see what had changed
recently, and I noticed that I installed atlas (debian package
libatlas3gf-common) recently. I uninstalled that package, and now the
same program seems to have no memory errors.

The packages I removed from the system today were
libarpack2
libfltk1.1
libftgl2
libgraphicsmagick++3
libgraphicsmagick3
libibverbs1
libopenmpi1.3
libqrupdate1
octave3.2-common
octave3.2-emacsen
libatlas3gf-base
octave3.2


My interpretation is that I had several packages available containing
the diagonalization functionality, but that they differed subtly in
their interfaces. My recent installation of atlas made numpy use (the
incompatible) atlas instead of its previous choice, and removal of atlas
restored the situation to the state of last week.

Now for the questions: Is this a reasonable hypothesis?
Is it known? Can it be investigated more precisely by comparing versions
somehow?


Regards

/ johan


From pgmdevlist at gmail.com  Wed Oct  7 04:05:07 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 7 Oct 2009 04:05:07 -0400
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
	<9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
	<49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>
	<61E03928-9A65-449B-8785-E22E52F1C034@gmail.com>
	<49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com>
Message-ID: <79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com>


On Oct 7, 2009, at 2:57 AM, G?khan Sever wrote:
> One more example. (I still think the behaviour of fill_value is  
> inconsistent)

Well, ma.masked_values use `value` to define fill_value,  
ma.masked_equal does not. So yes, there's an inconsistency here. Once  
again, please fill an enhancement request ticket. I should be able to  
deal with this one quite soon. 

From cournape at gmail.com  Wed Oct  7 04:06:09 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 7 Oct 2009 17:06:09 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
Message-ID: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>

On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> Looks like a clue ;)

Ok, I fixed it here:

http://github.com/cournape/numpy/tree/fix_abi

But that's an ugly hack. I think we should consider rewriting how we
generate the API: instead of automatically growing the API array of
fptr, we should explicitly mark which function name has which index,
and hardcode it. It would help quite a bit to avoid changing the ABI
unvoluntary.

cheers,

David


From pgmdevlist at gmail.com  Wed Oct  7 07:38:54 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 7 Oct 2009 07:38:54 -0400
Subject: [Numpy-discussion] SVN + Python 2.5.4 (32b) + MacOS 10.6.1
Message-ID: <DBDBFDF0-378B-4F1A-9AB9-9A2B572D3A4F@gmail.com>

All,
I need to test the numpy SVN on a 10.6.1 mac, but using Python 2.5.4  
(32b) instead of the 2.6.1 (64b).

The sources get compiled OK (apparently, find the build here: http://pastebin.com/m147a2909 
) but numpy fails to import:
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
__init__.py", line 130, in <module>
     import add_newdocs
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
add_newdocs.py", line 9, in <module>
     from lib import add_newdoc
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
lib/__init__.py", line 4, in <module>
     from type_check import *
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
lib/type_check.py", line 8, in <module>
     import numpy.core.numeric as _nx
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
core/__init__.py", line 8, in <module>
     import numerictypes as nt
   File ".../.virtualenvs/default25/lib/python2.5/site-packages/numpy/ 
core/numerictypes.py", line 737, in <module>
     _typestr[key] = empty((1,),key).dtype.str[1:]
ValueError: array is too big.

Obviously, I'm messing between 32b and 64b, but can't figure where.  
Any help/hint will be deeply appreciated
Cheers
P.

FYI:
Python 2.5.4 (r254:67916, Jul  7 2009, 23:51:24)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin]
CFLAGS="-arch i386 -arch x86_64"
FFLAGS="-arch i386 -arch x86_64"


From charlesr.harris at gmail.com  Wed Oct  7 08:31:25 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 7 Oct 2009 06:31:25 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
Message-ID: <e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>

On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau <cournape at gmail.com> wrote:

> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > Looks like a clue ;)
>
> Ok, I fixed it here:
>
> http://github.com/cournape/numpy/tree/fix_abi
>
> But that's an ugly hack. I think we should consider rewriting how we
> generate the API: instead of automatically growing the API array of
> fptr, we should explicitly mark which function name has which index,
> and hardcode it. It would help quite a bit to avoid changing the ABI
> unvoluntary.
>
>
I'm thinking the safest thing to do is to move the new type to the end of
the list. I'm not sure what all the ramifications are for compatibility to
having it stuck in the middle like that, does it change the type numbers for
all the types after? I wonder what the type numbers are internally? No doubt
putting it at the end makes the logic for casting more difficult, but that
is something that needs fixing anyway. Question - if the new type is simply
removed from the list does anything break?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/9baf4988/attachment.html>

From cournape at gmail.com  Wed Oct  7 08:37:16 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 7 Oct 2009 21:37:16 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>
Message-ID: <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com>

On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau <cournape at gmail.com> wrote:
>>
>> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > Looks like a clue ;)
>>
>> Ok, I fixed it here:
>>
>> http://github.com/cournape/numpy/tree/fix_abi
>>
>> But that's an ugly hack. I think we should consider rewriting how we
>> generate the API: instead of automatically growing the API array of
>> fptr, we should explicitly mark which function name has which index,
>> and hardcode it. It would help quite a bit to avoid changing the ABI
>> unvoluntary.
>>
>
> I'm thinking the safest thing to do is to move the new type to the end of
> the list.

That's what the above branch does.

> I'm not sure what all the ramifications are for compatibility to
> having it stuck in the middle like that, does it change the type numbers for
> all the types after?

Yes, there is no space left between the types declarations and the
first functions. Currently, I just put things at the end manually, but
that's really error prone.

I am a bit lazy to fix this for real (I was thinking about using a
python dict with hardcoded indexes as an entry instead of the current
.txt files, but this requires several changes in the code generator,
which is already not the greatest code to begin with).

David


From charlesr.harris at gmail.com  Wed Oct  7 08:59:01 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 7 Oct 2009 06:59:01 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>
	<5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com>
Message-ID: <e06186140910070559u41dfe573q4a106e3875275424@mail.gmail.com>

On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau <cournape at gmail.com> wrote:

> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau <cournape at gmail.com>
> wrote:
> >>
> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > Looks like a clue ;)
> >>
> >> Ok, I fixed it here:
> >>
> >> http://github.com/cournape/numpy/tree/fix_abi
> >>
> >> But that's an ugly hack. I think we should consider rewriting how we
> >> generate the API: instead of automatically growing the API array of
> >> fptr, we should explicitly mark which function name has which index,
> >> and hardcode it. It would help quite a bit to avoid changing the ABI
> >> unvoluntary.
> >>
> >
> > I'm thinking the safest thing to do is to move the new type to the end of
> > the list.
>
> That's what the above branch does.
>
> > I'm not sure what all the ramifications are for compatibility to
> > having it stuck in the middle like that, does it change the type numbers
> for
> > all the types after?
>
> Yes, there is no space left between the types declarations and the
> first functions. Currently, I just put things at the end manually, but
> that's really error prone.
>
> I am a bit lazy to fix this for real (I was thinking about using a
> python dict with hardcoded indexes as an entry instead of the current
> .txt files, but this requires several changes in the code generator,
> which is already not the greatest code to begin with).
>
>
What I'm concerned about is that, IIRC, types in the c-code can be
referenced by their index in a list of types and that internal mechanism
might be exposed to the outside somewhere. That is, what has happened to the
order of the enumerated types? If that has changed, and if external code
references a type by a hard-wired number, then there is a problem that goes
beyond the code generator. The safe(r) thing to do in that case is add the
new type to the end of the enumerated types and fix the promotion code so it
doesn't try to rely on a linear order.

I expect Robert can give the fastest answer to that question.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/65b960b7/attachment.html>

From charlesr.harris at gmail.com  Wed Oct  7 09:07:18 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 7 Oct 2009 07:07:18 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910070559u41dfe573q4a106e3875275424@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>
	<5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com>
	<e06186140910070559u41dfe573q4a106e3875275424@mail.gmail.com>
Message-ID: <e06186140910070607s1a293cbekd3d0cf607894aebc@mail.gmail.com>

On Wed, Oct 7, 2009 at 6:59 AM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau <cournape at gmail.com>wrote:
>
>> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau <cournape at gmail.com>
>> wrote:
>> >>
>> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> >
>> >> > Looks like a clue ;)
>> >>
>> >> Ok, I fixed it here:
>> >>
>> >> http://github.com/cournape/numpy/tree/fix_abi
>> >>
>> >> But that's an ugly hack. I think we should consider rewriting how we
>> >> generate the API: instead of automatically growing the API array of
>> >> fptr, we should explicitly mark which function name has which index,
>> >> and hardcode it. It would help quite a bit to avoid changing the ABI
>> >> unvoluntary.
>> >>
>> >
>> > I'm thinking the safest thing to do is to move the new type to the end
>> of
>> > the list.
>>
>> That's what the above branch does.
>>
>> > I'm not sure what all the ramifications are for compatibility to
>> > having it stuck in the middle like that, does it change the type numbers
>> for
>> > all the types after?
>>
>> Yes, there is no space left between the types declarations and the
>> first functions. Currently, I just put things at the end manually, but
>> that's really error prone.
>>
>> I am a bit lazy to fix this for real (I was thinking about using a
>> python dict with hardcoded indexes as an entry instead of the current
>> .txt files, but this requires several changes in the code generator,
>> which is already not the greatest code to begin with).
>>
>>
> What I'm concerned about is that, IIRC, types in the c-code can be
> referenced by their index in a list of types and that internal mechanism
> might be exposed to the outside somewhere. That is, what has happened to the
> order of the enumerated types? If that has changed, and if external code
> references a type by a hard-wired number, then there is a problem that goes
> beyond the code generator. The safe(r) thing to do in that case is add the
> new type to the end of the enumerated types and fix the promotion code so it
> doesn't try to rely on a linear order.
>
>
Here, for instance:

"The various character codes indicating certain types are also part of an
enumerated
list. References to type characters (should they be needed at all) should
always use
these enumerations. The form of them is NPY <NAME>LTR where <NAME>"

So those macros will generate a hard-coded number at compile time, and
number that might have changed with the addition of the new types.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/9f1d8d2d/attachment.html>

From gokhansever at gmail.com  Wed Oct  7 10:55:33 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 09:55:33 -0500
Subject: [Numpy-discussion] Building a new copy of NumPy
Message-ID: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com>

Hello,

I checked-out the latest trunk and make a new installation of NumPy. My
question: Is it a known behaviour that this action will result with
re-building other packages that are dependent on NumPy. In my case, I had to
re-built matplotlib, and now scipy.

Here is the error message that I am getting while I try to import a scipy
module:


I[1]: run lab4.py
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)


RuntimeError: FATAL: module compiled aslittle endian, but detected different
endianness at runtime
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/home/gsever/AtSc450/labs/04_thermals/lab4.py in <module>()
      2
      3 import numpy as np
----> 4 from scipy import stats
      5
      6

/home/gsever/Desktop/python-repo/scipy/scipy/stats/__init__.py in <module>()
      5 from info import __doc__
      6
----> 7 from stats import *
      8 from distributions import *
      9 from rv import *

/home/gsever/Desktop/python-repo/scipy/scipy/stats/stats.py in <module>()
    196 # Scipy imports.

    197 from numpy import array, asarray, dot, ma, zeros, sum
--> 198 import scipy.special as special
    199 import scipy.linalg as linalg
    200 import numpy as np

/home/gsever/Desktop/python-repo/scipy/scipy/special/__init__.py in
<module>()
      6 #from special_version import special_version as __version__

      7
----> 8 from basic import *
      9 import specfun
     10 import orthogonal

/home/gsever/Desktop/python-repo/scipy/scipy/special/basic.py in <module>()
      6
      7 from numpy import *
----> 8 from _cephes import *
      9 import types
     10 import specfun

ImportError: numpy.core.multiarray failed to import
WARNING: Failure executing file: <lab4.py>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/b66d3cca/attachment.html>

From robert.kern at gmail.com  Wed Oct  7 11:10:28 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 7 Oct 2009 10:10:28 -0500
Subject: [Numpy-discussion] Building a new copy of NumPy
In-Reply-To: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com>
References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com>
Message-ID: <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com>

On Wed, Oct 7, 2009 at 09:55, G?khan Sever <gokhansever at gmail.com> wrote:
> Hello,
>
> I checked-out the latest trunk and make a new installation of NumPy. My
> question: Is it a known behaviour that this action will result with
> re-building other packages that are dependent on NumPy. In my case, I had to
> re-built matplotlib, and now scipy.

Known issue. See the thread "Numpy SVN broken".

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From mdroe at stsci.edu  Wed Oct  7 11:28:59 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 07 Oct 2009 11:28:59 -0400
Subject: [Numpy-discussion] byteswapping a complex scalar
Message-ID: <4ACCB3BB.8010805@stsci.edu>

I'm noticing an inconsistency as to how complex numbers are byteswapped 
as arrays vs. scalars, and wondering if I'm doing something wrong.

 >>> x = np.array([-1j], '<c8')
 >>> x.tostring().encode('hex')
'00000000000080bf'
# This is a little-endian representation, in the order (real, imag)

# When I swap the whole array, it swaps each of the (real, imag) parts 
separately
 >>> y = x.byteswap()
 >>> y.tostring().encode('hex')
'00000000bf800000'
# and this round-trips fine
 >>> z = np.fromstring(y.tostring(), dtype='>c8')
 >>> assert z[0] == -1j
 >>>

# When I swap the scalar, it seems to swap the entire 8 bytes
 >>> y = x[0].byteswap()
 >>> y.tostring().encode('hex')
'bf80000000000000'
# ...and this doesn't round-trip
 >>> z = np.fromstring(y.tostring(), dtype='>c8')
 >>> assert z[0] == -1j
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
 >>>

Any thoughts?

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From gokhansever at gmail.com  Wed Oct  7 13:14:30 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 12:14:30 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<0D005495-02DB-407B-ACC5-1C993C80E15D@gmail.com>
	<49d6b3500910062110s2a30eb98o35e9e141eddfd0c6@mail.gmail.com>
	<9F97FF67-08FC-4D61-8A9D-358C4722C41C@gmail.com>
	<49d6b3500910062212t594aded3u935396f4898c21ab@mail.gmail.com>
	<61E03928-9A65-449B-8785-E22E52F1C034@gmail.com>
	<49d6b3500910062357u7c831b3dvb6b4c6c2fd3cb3de@mail.gmail.com>
	<79D48A64-EAB1-429B-BB73-B0759D1BA060@gmail.com>
Message-ID: <49d6b3500910071014n4ca7eed8j5865b46864befa6c@mail.gmail.com>

Added as comment in the same entry:

http://projects.scipy.org/numpy/ticket/1253#comment:1

Guessing that this one should be easy to fix :)

On Wed, Oct 7, 2009 at 3:05 AM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 7, 2009, at 2:57 AM, G?khan Sever wrote:
> > One more example. (I still think the behaviour of fill_value is
> > inconsistent)
>
> Well, ma.masked_values use `value` to define fill_value,
> ma.masked_equal does not. So yes, there's an inconsistency here. Once
> again, please fill an enhancement request ticket. I should be able to
> deal with this one quite soon.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/23fc5ac2/attachment.html>

From Chris.Barker at noaa.gov  Wed Oct  7 13:20:32 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Wed, 07 Oct 2009 10:20:32 -0700
Subject: [Numpy-discussion] tostring() for array rows
In-Reply-To: <1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com>
References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>
	<4ACBAB06.4000008@noaa.gov>
	<1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com>
Message-ID: <4ACCCDE0.4090002@noaa.gov>

josef.pktd at gmail.com wrote:

> I wanted to avoid the python loop and thought creating the view will be faster
> with large arrays. But for this I need to know the memory length of a
> row of arbitrary types for the conversion to strings,

ndarray.itemsize

might do it.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From Chris.Barker at noaa.gov  Wed Oct  7 13:21:40 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Wed, 07 Oct 2009 10:21:40 -0700
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
Message-ID: <4ACCCE24.5000001@noaa.gov>

G?khan Sever wrote:
>      > Sorry too much time spent in ipython -pylab :)

> Good reflex. Saves you from making extra explanations. But it works with 
> just typing array why should I type np.array (Ohh my namespacess :)

Because it shouldn't work that way! I use -pylab, but I've added:

      o.pylab_import_all = 0

to my ipy_user_conf.py file, so I don't get the namespace pollution.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From gokhansever at gmail.com  Wed Oct  7 13:35:41 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 12:35:41 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <4ACCCE24.5000001@noaa.gov>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com>
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com>
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com>
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com>
	<4ACCCE24.5000001@noaa.gov>
Message-ID: <49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com>

On Wed, Oct 7, 2009 at 12:21 PM, Christopher Barker
<Chris.Barker at noaa.gov>wrote:

> G?khan Sever wrote:
> >      > Sorry too much time spent in ipython -pylab :)
>
> > Good reflex. Saves you from making extra explanations. But it works with
> > just typing array why should I type np.array (Ohh my namespacess :)
>
> Because it shouldn't work that way! I use -pylab, but I've added:
>
>      o.pylab_import_all = 0
>
> to my ipy_user_conf.py file, so I don't get the namespace pollution.
>
> -Chris
>
>
Yes, I am aware of this fact. Still either from laziness or practicality I
prefer typing plot to plt.plot and arange to np.arange while I have write
them so many times in one day.

Do you know what shortcut name is used for scipy package itself?


>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/14f0056c/attachment.html>

From robert.kern at gmail.com  Wed Oct  7 13:38:44 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 7 Oct 2009 12:38:44 -0500
Subject: [Numpy-discussion] Questions about masked arrays
In-Reply-To: <49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com>
References: <49d6b3500910061342x612ca8o27e933398b8340df@mail.gmail.com> 
	<955ABA44-457E-4416-B6D7-E209E640461F@gmail.com>
	<49d6b3500910061557r6ec7bb1dhd0c831183353a544@mail.gmail.com> 
	<1C814530-A9A8-486E-92A2-CF258BB7F723@gmail.com>
	<49d6b3500910061854j29aa8cfcyfd1048c0cb4937be@mail.gmail.com> 
	<076362CC-0B63-41CA-9C7D-05167BA52011@gmail.com>
	<49d6b3500910061958x378a842exd16f75ccb139448a@mail.gmail.com> 
	<4ACCCE24.5000001@noaa.gov>
	<49d6b3500910071035l1ee82ea3n7a63f7ad7ad61832@mail.gmail.com>
Message-ID: <3d375d730910071038x6485f3c5k114b0f30f2a7954f@mail.gmail.com>

On Wed, Oct 7, 2009 at 12:35, G?khan Sever <gokhansever at gmail.com> wrote:

> Do you know what shortcut name is used for scipy package itself?

I do not recommend using "import scipy" or "import scipy as ...".
Import the subpackages directly (e.g. "from scipy import linalg").

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From gokhansever at gmail.com  Wed Oct  7 13:39:57 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 7 Oct 2009 12:39:57 -0500
Subject: [Numpy-discussion] Building a new copy of NumPy
In-Reply-To: <3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com>
References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com>
	<3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com>
Message-ID: <49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com>

I have seen that message, but I wasn't sure these errors were directly
connected since he mentions of getting segfaults whereas in my case only
gives import errors. Building a new copy of scipy fixed this error.

On Wed, Oct 7, 2009 at 10:10 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Oct 7, 2009 at 09:55, G?khan Sever <gokhansever at gmail.com> wrote:
> > Hello,
> >
> > I checked-out the latest trunk and make a new installation of NumPy. My
> > question: Is it a known behaviour that this action will result with
> > re-building other packages that are dependent on NumPy. In my case, I had
> to
> > re-built matplotlib, and now scipy.
>
> Known issue. See the thread "Numpy SVN broken".
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/b9ef4687/attachment.html>

From charlesr.harris at gmail.com  Wed Oct  7 13:48:45 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 7 Oct 2009 11:48:45 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910070607s1a293cbekd3d0cf607894aebc@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<e06186140910070531j2a501c4dhb7fdcb9fe79dd21b@mail.gmail.com>
	<5b8d13220910070537h3900f644tef5055c57e7e708e@mail.gmail.com>
	<e06186140910070559u41dfe573q4a106e3875275424@mail.gmail.com>
	<e06186140910070607s1a293cbekd3d0cf607894aebc@mail.gmail.com>
Message-ID: <e06186140910071048i41048422k68ecee270eadc469@mail.gmail.com>

On Wed, Oct 7, 2009 at 7:07 AM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Wed, Oct 7, 2009 at 6:59 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Oct 7, 2009 at 6:37 AM, David Cournapeau <cournape at gmail.com>wrote:
>>
>>> On Wed, Oct 7, 2009 at 9:31 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Wed, Oct 7, 2009 at 2:06 AM, David Cournapeau <cournape at gmail.com>
>>> wrote:
>>> >>
>>> >> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > Looks like a clue ;)
>>> >>
>>> >> Ok, I fixed it here:
>>> >>
>>> >> http://github.com/cournape/numpy/tree/fix_abi
>>> >>
>>> >> But that's an ugly hack. I think we should consider rewriting how we
>>> >> generate the API: instead of automatically growing the API array of
>>> >> fptr, we should explicitly mark which function name has which index,
>>> >> and hardcode it. It would help quite a bit to avoid changing the ABI
>>> >> unvoluntary.
>>> >>
>>> >
>>> > I'm thinking the safest thing to do is to move the new type to the end
>>> of
>>> > the list.
>>>
>>> That's what the above branch does.
>>>
>>> > I'm not sure what all the ramifications are for compatibility to
>>> > having it stuck in the middle like that, does it change the type
>>> numbers for
>>> > all the types after?
>>>
>>> Yes, there is no space left between the types declarations and the
>>> first functions. Currently, I just put things at the end manually, but
>>> that's really error prone.
>>>
>>> I am a bit lazy to fix this for real (I was thinking about using a
>>> python dict with hardcoded indexes as an entry instead of the current
>>> .txt files, but this requires several changes in the code generator,
>>> which is already not the greatest code to begin with).
>>>
>>>
>> What I'm concerned about is that, IIRC, types in the c-code can be
>> referenced by their index in a list of types and that internal mechanism
>> might be exposed to the outside somewhere. That is, what has happened to the
>> order of the enumerated types? If that has changed, and if external code
>> references a type by a hard-wired number, then there is a problem that goes
>> beyond the code generator. The safe(r) thing to do in that case is add the
>> new type to the end of the enumerated types and fix the promotion code so it
>> doesn't try to rely on a linear order.
>>
>>
> Here, for instance:
>
> "The various character codes indicating certain types are also part of an
> enumerated
> list. References to type characters (should they be needed at all) should
> always use
> these enumerations. The form of them is NPY <NAME>LTR where <NAME>"
>
> So those macros will generate a hard-coded number at compile time, and
> number that might have changed with the addition of the new types.
>
>
Nevermind, it looks like the new type number is at the end as it should be.

In [22]: typecodes
Out[22]:
{'All': '?bhilqpBHILQPfdgFDGSUVOMm',
 'AllFloat': 'fdgFDG',
 'AllInteger': 'bBhHiIlLqQpP',
 'Character': 'c',
 'Complex': 'FDG',
 'Datetime': 'Mm',
 'Float': 'fdg',
 'Integer': 'bhilqp',
 'UnsignedInteger': 'BHILQP'}

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/26c54b16/attachment.html>

From Chris.Barker at noaa.gov  Wed Oct  7 15:14:58 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Wed, 07 Oct 2009 12:14:58 -0700
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>
	<4ACB8FB3.5040706@gmail.com>
	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>
	<4ACBAC0E.3070708@noaa.gov>
	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>
	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>
	<C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
Message-ID: <4ACCE8B2.1050306@noaa.gov>

Pierre GM wrote:
> On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote:
>> option to merge delimiters - actually in SAS it is default

Wow! that sure strikes me as a bad choice.

> Ahah! I get it. Well, I remember that we discussed something like that a  
> few months ago when I started working on np.genfromtxt, and the  
> default of *not* merging whitespaces was requested. I gonna check  
> whether we can't put this option somewhere now...

I'd think you might want to have two options: either "whitespace" which 
would be any type or amount of whitespace, or a specific delimeter: say 
"\t" or " " or "  " (two spaces), etc. In that case, it would mean "one 
and only one of these".

Of course, this would fail in Bruce's example:

 >>>> A B C D
 >>>> 1 2 3 4
 >>>> 1     4 5

as there is a space for the delimeter, and one for the data! This looks 
like fixed-format to me. if it were single-space delimited, it would 
look more like:

when the delimiter is whitespace.
A B C D E
1 2 3 4 5
1   4 5

which is the same as:

A, B, C, D, E
1, 2, 3, 4, 5
1,  ,  , 4, 5


If something like SAS actually does merge decimeters, which I interpret 
to mean that if there are a few empty fields and you call for 
tab-delimited , you only get one tab, then information as simply been 
lost -- there is no way to recover it!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From bsouthey at gmail.com  Wed Oct  7 15:54:51 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 07 Oct 2009 14:54:51 -0500
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <4ACCE8B2.1050306@noaa.gov>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>	<4ACB8FB3.5040706@gmail.com>	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>	<4ACBAC0E.3070708@noaa.gov>	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>	<C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
	<4ACCE8B2.1050306@noaa.gov>
Message-ID: <4ACCF20B.5040901@gmail.com>

On 10/07/2009 02:14 PM, Christopher Barker wrote:
> Pierre GM wrote:
>    
>> On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote:
>>      
>>> option to merge delimiters - actually in SAS it is default
>>>        
> Wow! that sure strikes me as a bad choice.
>
>    
>> Ahah! I get it. Well, I remember that we discussed something like that a
>> few months ago when I started working on np.genfromtxt, and the
>> default of *not* merging whitespaces was requested. I gonna check
>> whether we can't put this option somewhere now...
>>      
> I'd think you might want to have two options: either "whitespace" which
> would be any type or amount of whitespace, or a specific delimeter: say
> "\t" or " " or "  " (two spaces), etc. In that case, it would mean "one
> and only one of these".
>
> Of course, this would fail in Bruce's example:
>
>   >>>>  A B C D
>   >>>>  1 2 3 4
>   >>>>  1     4 5
>
> as there is a space for the delimeter, and one for the data! This looks
> like fixed-format to me. if it were single-space delimited, it would
> look more like:
>
> when the delimiter is whitespace.
> A B C D E
> 1 2 3 4 5
> 1   4 5
>
> which is the same as:
>
> A, B, C, D, E
> 1, 2, 3, 4, 5
> 1,  ,  , 4, 5
>
>
> If something like SAS actually does merge decimeters, which I interpret
> to mean that if there are a few empty fields and you call for
> tab-delimited , you only get one tab, then information as simply been
> lost -- there is no way to recover it!
>
> -Chris
>
>    
To use fixed length fields you really need nicely formatted data and I
usually do not have that. As a default it does not always work for non-whitespace delimiters such as:
A,B,C
,,1
1,2,3

There is an option to override that behavior. But it is very useful when you have
extra whitespace especially reading in text strings that have different
lengths or different levels of whitespace padding.

The following is correct in that Python does merge whitespace delimiters by default. This is also what SAS does by default for any delimiter. But it is incorrect if each whitespace character is a delimiter:

s = StringIO('''
  1 10 100\r\n
10  1 1000''')
np.genfromtxt(s)
array([[    1.,    10.,   100.],
        [   10.,     1.,  1000.]])

np.genfromtxt(s, delimiter=' ')
Traceback (most recent call last):
   File "<stdin>", line 1, in<module>
   File "/usr/lib64/python2.6/site-packages/numpy/lib/io.py", line 1048, in genfromtxt
     raise IOError('End-of-file reached before encountering data.')
IOError: End-of-file reached before encountering data.

Anyhow, I do like what genfromtxt is doing so merging multiple delimiters of the same type is not really needed.


Bruce


From pgmdevlist at gmail.com  Wed Oct  7 16:16:23 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Wed, 7 Oct 2009 16:16:23 -0400
Subject: [Numpy-discussion] genfromtxt - the return
In-Reply-To: <4ACCF20B.5040901@gmail.com>
References: <B8DA00AB-7285-4F6A-B665-E64535BB827E@gmail.com>	<4ACB8FB3.5040706@gmail.com>	<2A449CDC-6F42-4F8D-B4CD-8F9665B3EE5D@gmail.com>	<4ACBAC0E.3070708@noaa.gov>	<B82C44E2-F348-4096-9758-CCA65DC9B356@gmail.com>	<bbcd77d00910061908v5127c1b9u6db06c579ecb07eb@mail.gmail.com>	<C71E52F4-124A-4E6B-97C3-E23366EBDA1F@gmail.com>
	<4ACCE8B2.1050306@noaa.gov> <4ACCF20B.5040901@gmail.com>
Message-ID: <B9173789-C43C-448A-81E4-8B2D5DF77164@gmail.com>


On Oct 7, 2009, at 3:54 PM, Bruce Southey wrote:
>
> Anyhow, I do like what genfromtxt is doing so merging multiple  
> delimiters of the same type is not really needed.

Thinking about it, merging multiple delimiters of the same type can be  
tricky: how do you distinguish between, say,
"AAA\t\tCCC" where you expect 2 fields and "AAA\t\tCCC" where you  
expect 3 fields but the second one is missing ?
I think 'genfromtxt' works consistently right now (but of course, as  
soon as I say that we'll find some counter-examples), so let's not  
break it. Yet.


From stefan at sun.ac.za  Wed Oct  7 18:35:07 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 8 Oct 2009 00:35:07 +0200
Subject: [Numpy-discussion] Building a new copy of NumPy
In-Reply-To: <49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com>
References: <49d6b3500910070755j1047a9s21218e63485608ab@mail.gmail.com> 
	<3d375d730910070810x460f9f69j6711cca72db8cb09@mail.gmail.com> 
	<49d6b3500910071039j76649696p51ed7293ef9ffed2@mail.gmail.com>
Message-ID: <9457e7c80910071535k697315begd3d839d86ee2dcfc@mail.gmail.com>

You can pull the patches from David's fix_abi branch:

http://github.com/cournape/numpy/tree/fix_abi

This branch has been hacked to be ABI compatible with previous versions.

Cheers
St?fan

2009/10/7 G?khan Sever <gokhansever at gmail.com>:
> I have seen that message, but I wasn't sure these errors were directly
> connected since he mentions of getting segfaults whereas in my case only
> gives import errors. Building a new copy of scipy fixed this error.


From oliphant at enthought.com  Wed Oct  7 22:39:33 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Wed, 7 Oct 2009 21:39:33 -0500
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
Message-ID: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>


On Oct 7, 2009, at 3:06 AM, David Cournapeau wrote:

> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> Looks like a clue ;)
>
> Ok, I fixed it here:
>
> http://github.com/cournape/numpy/tree/fix_abi
>
> But that's an ugly hack. I think we should consider rewriting how we
> generate the API: instead of automatically growing the API array of
> fptr, we should explicitly mark which function name has which index,
> and hardcode it. It would help quite a bit to avoid changing the ABI
> unvoluntary.

I apologize for the mis communication that has occurred here.   I did  
not understand that there was a desire to keep ABI compatibility with  
NumPy 1.3 when NumPy 1.4 was released.    The datetime merge was made  
under that presumption.

I had assumed that people would be fine with recompilation of  
extension modules that depend on the NumPy C-API.    There are several  
things that needed to be done to merge in new fundamental data-types.

Why don't we call the next release NumPy 2.0 if that helps things?     
Personally, I'd prefer that over hacks to keep ABI compatibility.   It  
feels like we are working very hard to track ABI issues that can also  
be handled with dependency checking and good package management.

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/aae7aaed/attachment.html>

From cournape at gmail.com  Wed Oct  7 22:51:11 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 8 Oct 2009 11:51:11 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
Message-ID: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>

On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> I apologize for the mis communication that has occurred here.

No problem

>? I did not
> understand that there was a desire to keep ABI compatibility with NumPy 1.3
> when NumPy 1.4 was released. ? ?The datetime merge was made under that
> presumption.
> I had assumed that people would be fine with recompilation of extension
> modules that depend on the NumPy C-API. ? ?There are several things that
> needed to be done to merge in new fundamental data-types.
> Why don't we call the next release NumPy 2.0 if that helps things?
> ?Personally, I'd prefer that over hacks to keep ABI compatibility.

Keeping ABI compatibility by itself is not an hack - the current
workaround is an hack, but that's only because the current way of
doing things in code generator is a bit ugly, and I did not want to
spend too much time on it. It is purely an implementation issue, the
fundamental idea is straightforward.

If you want a cleaner solution, I can work on it. I think the hour or
so that it would take is worth it compared to breaking many people's
code.

> ? It
> feels like we are working very hard to track ABI issues that can also be
> handled with dependency checking and good package management.

I think ABI issues are mostly orthogonal to versioning - generally,
versions are related to API changes (API changes is what should drive
ABI changes, at least for projects like numpy).

I would prefer passing to "numpy 2.0" when we really need to break ABI
and API - at that point, I think we should also think hard about
changing our structures and all to make them more robust to those
changes (using pimp-like strategies in particular).

David


From cournape at gmail.com  Wed Oct  7 22:55:28 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 8 Oct 2009 11:55:28 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
Message-ID: <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com>

On Thu, Oct 8, 2009 at 11:51 AM, David Cournapeau <cournape at gmail.com> wrote:

> I would prefer passing to "numpy 2.0" when we really need to break ABI
> and API - at that point, I think we should also think hard about
> changing our structures and all to make them more robust to those
> changes (using pimp-like strategies in particular).

Sorry, I mean pimple, not pimp (makes you wonder what goes in my head):

David


From robert.kern at gmail.com  Wed Oct  7 22:57:44 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 7 Oct 2009 21:57:44 -0500
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> 
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com> 
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> 
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com> 
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> 
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com> 
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> 
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> 
	<5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com>
Message-ID: <3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com>

On Wed, Oct 7, 2009 at 21:55, David Cournapeau <cournape at gmail.com> wrote:
> On Thu, Oct 8, 2009 at 11:51 AM, David Cournapeau <cournape at gmail.com> wrote:
>
>> I would prefer passing to "numpy 2.0" when we really need to break ABI
>> and API - at that point, I think we should also think hard about
>> changing our structures and all to make them more robust to those
>> changes (using pimp-like strategies in particular).
>
> Sorry, I mean pimple, not pimp (makes you wonder what goes in my head):

Indeed! (And it's "pimpl".) :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From aisaac at american.edu  Wed Oct  7 23:04:28 2009
From: aisaac at american.edu (Alan G Isaac)
Date: Wed, 07 Oct 2009 23:04:28 -0400
Subject: [Numpy-discussion] robustness strategies
In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
Message-ID: <4ACD56BC.9080302@american.edu>

On 10/7/2009 10:51 PM, David Cournapeau wrote:
> pimp-like strategies


Which means ... ?

Alan


From robert.kern at gmail.com  Wed Oct  7 23:08:33 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 7 Oct 2009 22:08:33 -0500
Subject: [Numpy-discussion] robustness strategies
In-Reply-To: <4ACD56BC.9080302@american.edu>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> 
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com> 
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> 
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com> 
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> 
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com> 
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> 
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com> 
	<4ACD56BC.9080302@american.edu>
Message-ID: <3d375d730910072008r7a22f661hbee943a57c085839@mail.gmail.com>

On Wed, Oct 7, 2009 at 22:04, Alan G Isaac <aisaac at american.edu> wrote:
> On 10/7/2009 10:51 PM, David Cournapeau wrote:
>> pimp-like strategies
>
>
> Which means ... ?

He meant "pimpl-like".

http://en.wikipedia.org/wiki/Opaque_pointer

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From aisaac at american.edu  Wed Oct  7 23:09:08 2009
From: aisaac at american.edu (Alan G Isaac)
Date: Wed, 07 Oct 2009 23:09:08 -0400
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
	<5b8d13220910071955r3ae089c6x2b80bdfab7de6fca@mail.gmail.com>
	<3d375d730910071957u5ea0d6f9o40fde161cde3590@mail.gmail.com>
Message-ID: <4ACD57D4.7020607@american.edu>

On 10/7/2009 10:57 PM, Robert Kern wrote:
> it's "pimpl"


OK: http://en.wikipedia.org/wiki/Opaque_pointer

Thanks,
Alan Isaac


From david at ar.media.kyoto-u.ac.jp  Wed Oct  7 23:08:42 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Thu, 08 Oct 2009 12:08:42 +0900
Subject: [Numpy-discussion] robustness strategies
In-Reply-To: <4ACD56BC.9080302@american.edu>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
	<4ACD56BC.9080302@american.edu>
Message-ID: <4ACD57BA.4010607@ar.media.kyoto-u.ac.jp>

Alan G Isaac wrote:
> On 10/7/2009 10:51 PM, David Cournapeau wrote:
>   
>> pimp-like strategies
>>     
>
>
> Which means ... ?
>   

The idea is to put one pointer in you struct instead of all members - it
is a form of encapsulation, and it is enforced at compile time. I think
part of the problem with changing API/ABI in numpy is that the headers
show way too much information. I would really like to improve this, but
this would clearly break the ABI (and API - a lot of macros would have
to go).

There is a performance cost of one more indirection (if you have a
pointer to a struct, you need to dereference both the struct and the D
pointer inside), but for most purpose, that's likely to be negligeable,
except for a few special cases (like iterators).

cheers,

David


From charlesr.harris at gmail.com  Thu Oct  8 00:01:05 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 7 Oct 2009 22:01:05 -0600
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
Message-ID: <e06186140910072101j7a77947cm1c2f06c3f16e4c27@mail.gmail.com>

On Wed, Oct 7, 2009 at 8:39 PM, Travis Oliphant <oliphant at enthought.com>wrote:

>
> On Oct 7, 2009, at 3:06 AM, David Cournapeau wrote:
>
> On Wed, Oct 7, 2009 at 2:31 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>
>
>
> Looks like a clue ;)
>
>
> Ok, I fixed it here:
>
> http://github.com/cournape/numpy/tree/fix_abi
>
> But that's an ugly hack. I think we should consider rewriting how we
> generate the API: instead of automatically growing the API array of
> fptr, we should explicitly mark which function name has which index,
> and hardcode it. It would help quite a bit to avoid changing the ABI
> unvoluntary.
>
>
> I apologize for the mis communication that has occurred here.   I did not
> understand that there was a desire to keep ABI compatibility with NumPy 1.3
> when NumPy 1.4 was released.    The datetime merge was made under that
> presumption.
>
> I had assumed that people would be fine with recompilation of extension
> modules that depend on the NumPy C-API.    There are several things that
> needed to be done to merge in new fundamental data-types.
>
> Why don't we call the next release NumPy 2.0 if that helps things?
>  Personally, I'd prefer that over hacks to keep ABI compatibility.   It
> feels like we are working very hard to track ABI issues that can also be
> handled with dependency checking and good package management.
>
>
I was that the code generator shifted the API order because it was inserting
the new types after the old types but before the other API functions. It's a
code generator problem and doesn't call for a jump in major version. We hope
;) I think David's hack, which looks to have been committed by Stefan,
should fix things up.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091007/4187e7b7/attachment.html>

From stefan at sun.ac.za  Thu Oct  8 02:37:01 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 8 Oct 2009 08:37:01 +0200
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <e06186140910072101j7a77947cm1c2f06c3f16e4c27@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com> 
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com> 
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com> 
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com> 
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com> 
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com> 
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com> 
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<e06186140910072101j7a77947cm1c2f06c3f16e4c27@mail.gmail.com>
Message-ID: <9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com>

2009/10/8 Charles R Harris <charlesr.harris at gmail.com>:
> code generator problem and doesn't call for a jump in major version. We hope
> ;) I think David's hack, which looks to have been committed by Stefan,
> should fix things up.

I accidentally committed some of David's patches, but I reverted them
back out.  I think David's idea of generating an API from dictionary
is much cleaner.  We can work on implementing that today.

Cheers
St?fan


From david at ar.media.kyoto-u.ac.jp  Thu Oct  8 02:48:47 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Thu, 08 Oct 2009 15:48:47 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>	<e06186140910072101j7a77947cm1c2f06c3f16e4c27@mail.gmail.com>
	<9457e7c80910072337n37399ad2l2f16b47d18b3ca98@mail.gmail.com>
Message-ID: <4ACD8B4F.8000303@ar.media.kyoto-u.ac.jp>

St?fan van der Walt wrote:
>  We can work on implementing that today.
>   

I am working on it ATM - it is taking me longer than expected, though.

David


From oliphant at enthought.com  Thu Oct  8 07:55:10 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 06:55:10 -0500
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
Message-ID: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>


On Oct 7, 2009, at 9:51 PM, David Cournapeau wrote:

> On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant <oliphant at enthought.com 
> > wrote:
>>
>> I apologize for the mis communication that has occurred here.
>
> No problem
>
>>   I did not
>> understand that there was a desire to keep ABI compatibility with  
>> NumPy 1.3
>> when NumPy 1.4 was released.    The datetime merge was made under  
>> that
>> presumption.
>> I had assumed that people would be fine with recompilation of  
>> extension
>> modules that depend on the NumPy C-API.    There are several things  
>> that
>> needed to be done to merge in new fundamental data-types.
>> Why don't we call the next release NumPy 2.0 if that helps things?
>>  Personally, I'd prefer that over hacks to keep ABI compatibility.
>
> Keeping ABI compatibility by itself is not an hack - the current
> workaround is an hack, but that's only because the current way of
> doing things in code generator is a bit ugly, and I did not want to
> spend too much time on it. It is purely an implementation issue, the
> fundamental idea is straightforward.
>
> If you want a cleaner solution, I can work on it. I think the hour or
> so that it would take is worth it compared to breaking many people's
> code.

If that's all it would take, then definitely go for it.    I'm not  
sure "breaking people's code" is the right image, though.   It's more  
like "forcing people to upgrade" to take advantage of new features.

Improvements to the encapsulation of the numpy C-API are definitely  
welcome.   They have come a long way from their beginnings in Numeric  
already due to the efforts of you and David Cooke (and I'm sure others  
I'm not as aware of).

The problem I have with spending time on it though is that there is  
still more implementation work to finish on the datetime functionality  
to complete the NEP implementation.      Naturally, I'd like to see  
those improvements made first.  But, time-spent is usually a function  
of how much time it takes to "get-in" to the code, so I won't try to  
distract you if you have a clear idea about how to proceed.

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091008/86838192/attachment.html>

From oliphant at enthought.com  Thu Oct  8 08:09:44 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 07:09:44 -0500
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <4ACCB3BB.8010805@stsci.edu>
References: <4ACCB3BB.8010805@stsci.edu>
Message-ID: <26A93513-988C-4EDB-A717-326BC01DE2EB@enthought.com>


On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote:

> I'm noticing an inconsistency as to how complex numbers are  
> byteswapped
> as arrays vs. scalars, and wondering if I'm doing something wrong.
>
>>>> x = np.array([-1j], '<c8')
>>>> x.tostring().encode('hex')
> '00000000000080bf'
> # This is a little-endian representation, in the order (real, imag)
>
> # When I swap the whole array, it swaps each of the (real, imag) parts
> separately
>>>> y = x.byteswap()
>>>> y.tostring().encode('hex')
> '00000000bf800000'
> # and this round-trips fine
>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>> assert z[0] == -1j
>>>>
>
> # When I swap the scalar, it seems to swap the entire 8 bytes
>>>> y = x[0].byteswap()
>>>> y.tostring().encode('hex')
> 'bf80000000000000'
> # ...and this doesn't round-trip
>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>> assert z[0] == -1j
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AssertionError
>>>>
>
> Any thoughts?

I think this is a bug.  You should file a ticket and mark it  
critical.    As I look at the scalar implementation (in  
gentype_byteswap in scalartypes.c.src), it looks like it's basing it  
just on the size (Hmm.... I don't know why it's not using the copyswap  
in the descr field....).   This works for many types, but not complex  
numbers which should have real and imaginary parts handled separately.

There are two ways to fix this that I can see:

	 1) fix the gentype implementation to use the copyswap function  
pointer from the datatype object
          2) over-ride the byteswap in the complex scalar Python type  
(there is a base-class complex scalar type where it could be placed)
                to do the right thing.

I would probably do #1 if I get a chance to work on it (because  
strings shouldn't be byteswapped either and they currently are, I  
see...)

x = np.array(['abcd'])

Compare:

x.byteswap()[0]
x[0].byteswap()


The work around is to byteswap before extraction:

x.byteswap()[0]

Thanks for the bug-report.

-Travis


From oliphant at enthought.com  Thu Oct  8 08:19:14 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 07:19:14 -0500
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <4ACCB3BB.8010805@stsci.edu>
References: <4ACCB3BB.8010805@stsci.edu>
Message-ID: <A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>


On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote:

> I'm noticing an inconsistency as to how complex numbers are  
> byteswapped
> as arrays vs. scalars, and wondering if I'm doing something wrong.
>
>>>> x = np.array([-1j], '<c8')
>>>> x.tostring().encode('hex')
> '00000000000080bf'
> # This is a little-endian representation, in the order (real, imag)
>
> # When I swap the whole array, it swaps each of the (real, imag) parts
> separately
>>>> y = x.byteswap()
>>>> y.tostring().encode('hex')
> '00000000bf800000'
> # and this round-trips fine
>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>> assert z[0] == -1j
>>>>
>
> # When I swap the scalar, it seems to swap the entire 8 bytes
>>>> y = x[0].byteswap()
>>>> y.tostring().encode('hex')
> 'bf80000000000000'
> # ...and this doesn't round-trip
>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>> assert z[0] == -1j
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AssertionError
>>>>
>
> Any thoughts?


I just checked a fix for this into SVN (tests still need to be added  
though...)

I can't currently build SVN on my Mac for some reason (I don't know if  
it has to do with recent changes or not, but I don't have time to  
track it down right now....the error I'm getting is something about  
Datetime array scalar types not being defined which seems related to  
the work Dave and Stefan have been discussing).

It's a small change, though, and should work.

-Travis


From oliphant at enthought.com  Thu Oct  8 08:25:26 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 07:25:26 -0500
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
References: <4ACCB3BB.8010805@stsci.edu>
	<A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
Message-ID: <295EECAD-5381-491E-8641-9DF53683FB70@enthought.com>


On Oct 8, 2009, at 7:19 AM, Travis Oliphant wrote:

>
> I just checked a fix for this into SVN (tests still need to be added
> though...)
>
> I can't currently build SVN on my Mac for some reason (I don't know if
> it has to do with recent changes or not, but I don't have time to
> track it down right now....the error I'm getting is something about
> Datetime array scalar types not being defined which seems related to
> the work Dave and Stefan have been discussing).

I can build from SVN.  The problem is I had to check-out again from  
SVN (and get rid of the old code-generated files --- sure would be  
nice if there were the equivalent of "make clean"


-Travis


From cournape at gmail.com  Thu Oct  8 09:47:21 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 8 Oct 2009 22:47:21 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
	<41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>
Message-ID: <5b8d13220910080647r731dba31t62e3ff2f5212af50@mail.gmail.com>

On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On Oct 7, 2009, at 9:51 PM, David Cournapeau wrote:
>
> On Thu, Oct 8, 2009 at 11:39 AM, Travis Oliphant <oliphant at enthought.com>
> wrote:
>
> I apologize for the mis communication that has occurred here.
>
> No problem
>
> ? I did not
>
> understand that there was a desire to keep ABI compatibility with NumPy 1.3
>
> when NumPy 1.4 was released. ? ?The datetime merge was made under that
>
> presumption.
>
> I had assumed that people would be fine with recompilation of extension
>
> modules that depend on the NumPy C-API. ? ?There are several things that
>
> needed to be done to merge in new fundamental data-types.
>
> Why don't we call the next release NumPy 2.0 if that helps things?
>
> ?Personally, I'd prefer that over hacks to keep ABI compatibility.
>
> Keeping ABI compatibility by itself is not an hack - the current
> workaround is an hack, but that's only because the current way of
> doing things in code generator is a bit ugly, and I did not want to
> spend too much time on it. It is purely an implementation issue, the
> fundamental idea is straightforward.
>
> If you want a cleaner solution, I can work on it. I think the hour or
> so that it would take is worth it compared to breaking many people's
> code.
>
> If that's all it would take, then definitely go for it. ? ?I'm not sure
> "breaking people's code" is the right image, though. ? It's more like
> "forcing people to upgrade" to take advantage of new features.

We got several people complaining about segfaults and the like -
granted, those could have been avoided by updating the ABI
accordingly.

> The problem I have with spending time on it though is that there is still
> more implementation work to finish on the datetime functionality to complete
> the NEP implementation. ? ? ?Naturally, I'd like to see those improvements
> made first. ?But, time-spent is usually a function of how much time it takes
> to "get-in" to the code, so I won't try to distract you if you have a clear
> idea about how to proceed.

I am applying my changes as we speak - it took me much more time than
I wished because I tried hard to make sure the ABI was not changed.
But at least, the current scheme should be much more robust: the
ordering is fixed at one single place, and there are a few checks
which ensure we don't screw things up (by putting 'holes' in the api
array, or by using twice the same index).

cheers,

David


From cournape at gmail.com  Thu Oct  8 11:01:37 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 9 Oct 2009 00:01:37 +0900
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
	<41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>
Message-ID: <5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com>

On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> The problem I have with spending time on it though is that there is still
> more implementation work to finish on the datetime functionality to complete
> the NEP implementation. ? ? ?Naturally, I'd like to see those improvements
> made first. ?But, time-spent is usually a function of how much time it takes
> to "get-in" to the code, so I won't try to distract you if you have a clear
> idea about how to proceed.

Would it be possible to include next changes in small self-contained
commits ? It really makes the review easier to follow for me, and
tracking regressions is easier as well. Git-svn makes this easy.

David


From mdroe at stsci.edu  Thu Oct  8 13:08:42 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Thu, 08 Oct 2009 13:08:42 -0400
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
References: <4ACCB3BB.8010805@stsci.edu>
	<A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
Message-ID: <4ACE1C9A.6030001@stsci.edu>

Thanks!  I guess I won't file a bug then ;)

Mike

Travis Oliphant wrote:
> On Oct 7, 2009, at 10:28 AM, Michael Droettboom wrote:
>
>   
>> I'm noticing an inconsistency as to how complex numbers are  
>> byteswapped
>> as arrays vs. scalars, and wondering if I'm doing something wrong.
>>
>>     
>>>>> x = np.array([-1j], '<c8')
>>>>> x.tostring().encode('hex')
>>>>>           
>> '00000000000080bf'
>> # This is a little-endian representation, in the order (real, imag)
>>
>> # When I swap the whole array, it swaps each of the (real, imag) parts
>> separately
>>     
>>>>> y = x.byteswap()
>>>>> y.tostring().encode('hex')
>>>>>           
>> '00000000bf800000'
>> # and this round-trips fine
>>     
>>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>>> assert z[0] == -1j
>>>>>
>>>>>           
>> # When I swap the scalar, it seems to swap the entire 8 bytes
>>     
>>>>> y = x[0].byteswap()
>>>>> y.tostring().encode('hex')
>>>>>           
>> 'bf80000000000000'
>> # ...and this doesn't round-trip
>>     
>>>>> z = np.fromstring(y.tostring(), dtype='>c8')
>>>>> assert z[0] == -1j
>>>>>           
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> AssertionError
>>     
>> Any thoughts?
>>     
>
>
> I just checked a fix for this into SVN (tests still need to be added  
> though...)
>
> I can't currently build SVN on my Mac for some reason (I don't know if  
> it has to do with recent changes or not, but I don't have time to  
> track it down right now....the error I'm getting is something about  
> Datetime array scalar types not being defined which seems related to  
> the work Dave and Stefan have been discussing).
>
> It's a small change, though, and should work.
>
> -Travis
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From oliphant at enthought.com  Thu Oct  8 18:01:53 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 17:01:53 -0500
Subject: [Numpy-discussion] NumPy SVN broken
In-Reply-To: <5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com>
References: <9457e7c80910060720g10ed73beqdd4ee57ff3f0e5bb@mail.gmail.com>
	<e06186140910060936n5fe4a162i857ad8dce10b7875@mail.gmail.com>
	<5b8d13220910060950x60f86586v8208debc700a8311@mail.gmail.com>
	<e06186140910061004u27f9afc0k78afbd4eab58df70@mail.gmail.com>
	<5b8d13220910061014o34b0d6b1oc74c3b79de3a0b6d@mail.gmail.com>
	<e06186140910061031q41115c4du2bfcc1df8007e3bc@mail.gmail.com>
	<5b8d13220910070106g9eb814di62c8277c52c5ed54@mail.gmail.com>
	<22A0ABF6-A4D6-4F04-9422-DB125233DC52@enthought.com>
	<5b8d13220910071951vf1e3049kd4b1071295831c9@mail.gmail.com>
	<41FD4390-CF8A-41F3-AD4B-5E38946061D7@enthought.com>
	<5b8d13220910080801m41ada777nd74fbb6cfc921070@mail.gmail.com>
Message-ID: <EACD5377-B52B-4A4D-AF4A-FDB862769388@enthought.com>


On Oct 8, 2009, at 10:01 AM, David Cournapeau wrote:

> On Thu, Oct 8, 2009 at 8:55 PM, Travis Oliphant <oliphant at enthought.com 
> > wrote:
>>
>> The problem I have with spending time on it though is that there is  
>> still
>> more implementation work to finish on the datetime functionality to  
>> complete
>> the NEP implementation.      Naturally, I'd like to see those  
>> improvements
>> made first.  But, time-spent is usually a function of how much time  
>> it takes
>> to "get-in" to the code, so I won't try to distract you if you have  
>> a clear
>> idea about how to proceed.
>
> Would it be possible to include next changes in small self-contained
> commits ? It really makes the review easier to follow for me, and
> tracking regressions is easier as well. Git-svn makes this easy.
>

That was the reason for merging to the trunk rather than continuing to  
work in the branch.    I expect that the next changes will be more  
incremental.

-Travis


From oliphant at enthought.com  Thu Oct  8 18:02:45 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Thu, 8 Oct 2009 17:02:45 -0500
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <4ACE1C9A.6030001@stsci.edu>
References: <4ACCB3BB.8010805@stsci.edu>
	<A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
	<4ACE1C9A.6030001@stsci.edu>
Message-ID: <E0DE0EDA-1110-493D-8536-F259EBAF1748@enthought.com>


On Oct 8, 2009, at 12:08 PM, Michael Droettboom wrote:

> Thanks!  I guess I won't file a bug then ;)

Probably still should, actually:  Until the tests get committed, the  
bug is not really "fixed"

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091008/f115de24/attachment.html>

From dwf at cs.toronto.edu  Thu Oct  8 18:28:32 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 8 Oct 2009 18:28:32 -0400
Subject: [Numpy-discussion] PyArray_FROM_OF from Cython
Message-ID: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu>

I'm trying to use PyArray_FROM_OF from Cython and the generated C code  
keeps crashing.  Dag said on the Cython list that he wasn't sure what  
was going on, so maybe someone here will have an idea.

The line that gdb says is crashing is:

#0  0x00e48287 in __pyx_pf_3_vq_vq (__pyx_self=0x0,  
__pyx_args=0xca2d8, __pyx_kwds=0x0) at _vq_rewrite.c:1025
1025	  __pyx_t_1 = PyArray_FROM_OF(((PyObject *)__pyx_v_obs),  
__pyx_v_flags); if (unlikely(!__pyx_t_1)) {__pyx_filename =  
__pyx_f[0]; __pyx_lineno = 90; __pyx_clineno = __LINE__; goto  
__pyx_L1_error;}

obs and obs_a are both cdef'd np.ndarrays, and the former (obs) is  
passed in as an argument.

I define flags as
	
	cdef int flags = np.NPY_CONTIGUOUS | np.NPY_ALIGNED | np.NPY_NOTSWAPPED

and then the line that crashes is

	obs_a = np.PyArray_FROM_OF(obs, flags)

Does anyone know what I'm doing wrong?

(I know I could use np.ascontiguous, but as far as I can tell this  
_should_ work)

David


From robert.kern at gmail.com  Thu Oct  8 18:47:31 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 8 Oct 2009 17:47:31 -0500
Subject: [Numpy-discussion] PyArray_FROM_OF from Cython
In-Reply-To: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu>
References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu>
Message-ID: <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com>

On Thu, Oct 8, 2009 at 17:28, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> I'm trying to use PyArray_FROM_OF from Cython and the generated C code
> keeps crashing. ?Dag said on the Cython list that he wasn't sure what
> was going on, so maybe someone here will have an idea.

You must call import_array() at the top level before you can use any
numpy C API functions.

http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From dwf at cs.toronto.edu  Thu Oct  8 20:32:14 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 8 Oct 2009 20:32:14 -0400
Subject: [Numpy-discussion] PyArray_FROM_OF from Cython
In-Reply-To: <3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com>
References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu>
	<3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com>
Message-ID: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu>


On 8-Oct-09, at 6:47 PM, Robert Kern wrote:

> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley  
> <dwf at cs.toronto.edu> wrote:
>> I'm trying to use PyArray_FROM_OF from Cython and the generated C  
>> code
>> keeps crashing.  Dag said on the Cython list that he wasn't sure what
>> was going on, so maybe someone here will have an idea.
>
> You must call import_array() at the top level before you can use any
> numpy C API functions.
>
> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI

Thanks. One more thing: calling Py_DECREF on arrays that I have  
acquired from PyArray_FROM_OF seems to cause crashes, am I correct in  
assuming that Cython is somehow tracking all the PyObjects in the  
scope (even ones acquired via the NumPy C API) and DECREF'ing it for me?

David


From robert.kern at gmail.com  Thu Oct  8 20:59:37 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 8 Oct 2009 19:59:37 -0500
Subject: [Numpy-discussion] PyArray_FROM_OF from Cython
In-Reply-To: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu>
References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu> 
	<3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com> 
	<354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu>
Message-ID: <3d375d730910081759j46fb68eeic1748d5c5c44862@mail.gmail.com>

On Thu, Oct 8, 2009 at 19:32, David Warde-Farley <dwf at cs.toronto.edu> wrote:
>
> On 8-Oct-09, at 6:47 PM, Robert Kern wrote:
>
>> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley
>> <dwf at cs.toronto.edu> wrote:
>>> I'm trying to use PyArray_FROM_OF from Cython and the generated C
>>> code
>>> keeps crashing. ?Dag said on the Cython list that he wasn't sure what
>>> was going on, so maybe someone here will have an idea.
>>
>> You must call import_array() at the top level before you can use any
>> numpy C API functions.
>>
>> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI
>
> Thanks. One more thing: calling Py_DECREF on arrays that I have
> acquired from PyArray_FROM_OF seems to cause crashes, am I correct in
> assuming that Cython is somehow tracking all the PyObjects in the
> scope (even ones acquired via the NumPy C API) and DECREF'ing it for me?

It usually does, yes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From cournape at gmail.com  Fri Oct  9 01:13:44 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 9 Oct 2009 14:13:44 +0900
Subject: [Numpy-discussion] byteswapping a complex scalar
In-Reply-To: <295EECAD-5381-491E-8641-9DF53683FB70@enthought.com>
References: <4ACCB3BB.8010805@stsci.edu>
	<A6DF43F0-0ED3-42CD-906F-6FB8FAA2CB1F@enthought.com>
	<295EECAD-5381-491E-8641-9DF53683FB70@enthought.com>
Message-ID: <5b8d13220910082213q759d0e74xaa53e232735497c4@mail.gmail.com>

On Thu, Oct 8, 2009 at 9:25 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On Oct 8, 2009, at 7:19 AM, Travis Oliphant wrote:
>
>>
>> I just checked a fix for this into SVN (tests still need to be added
>> though...)
>>
>> I can't currently build SVN on my Mac for some reason (I don't know if
>> it has to do with recent changes or not, but I don't have time to
>> track it down right now....the error I'm getting is something about
>> Datetime array scalar types not being defined which seems related to
>> the work Dave and Stefan have been discussing).
>
> I can build from SVN. ?The problem is I had to check-out again from
> SVN (and get rid of the old code-generated files --- sure would be
> nice if there were the equivalent of "make clean"

Note that git clean will clean your working tree, and numscons
generates less junk in the working tree. It can be a time saver and
quite convenient,

David


From david at ar.media.kyoto-u.ac.jp  Fri Oct  9 00:56:28 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 09 Oct 2009 13:56:28 +0900
Subject: [Numpy-discussion] [review] Easy win to improve numpy import times
	by 30 %
Message-ID: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>

Hi,

    This branch improves numpy import times quite significantly on slow
machines:

http://github.com/cournape/numpy/tree/noinspect

One of the main culprit is ma, because of inspect (inspect is extremely
slow to import; as a data point, python -c "import inspect" takes 67 ms
vs python -c "" taking 22 ms, and python -c "import numpy" taking 158 ms
on my machine).

Since inspect is used in quite a few places, and that we only use it to
extract arguments from a function, I added a small numpy.lib.inspect
module, and change the import in numpy.ma. I copied the inspect module
of python 2.4.4 to ensure maximum compatibility. This speed up the
import times from 158 ms to 108 ms ~ 30 % speed improvement. On recent
machines, the speedup is less impressive, but still in the 20 % range.

I think it largely worths it, and will integrate this unless someone is
strongly against it or see a problem with the approach,

cheers,

David


From dagss at student.matnat.uio.no  Fri Oct  9 04:47:43 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Fri, 09 Oct 2009 10:47:43 +0200
Subject: [Numpy-discussion] PyArray_FROM_OF from Cython
In-Reply-To: <354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu>
References: <1F7F5A0E-7130-47B6-8FD5-DE445AC3A67D@cs.toronto.edu>	<3d375d730910081547t49e32b21tc9ce2549660ecbfd@mail.gmail.com>
	<354AFC18-A136-40A1-A7DE-536D03407455@cs.toronto.edu>
Message-ID: <4ACEF8AF.6060709@student.matnat.uio.no>

David Warde-Farley wrote:
> On 8-Oct-09, at 6:47 PM, Robert Kern wrote:
> 
>> On Thu, Oct 8, 2009 at 17:28, David Warde-Farley  
>> <dwf at cs.toronto.edu> wrote:
>>> I'm trying to use PyArray_FROM_OF from Cython and the generated C  
>>> code
>>> keeps crashing.  Dag said on the Cython list that he wasn't sure what
>>> was going on, so maybe someone here will have an idea.
>> You must call import_array() at the top level before you can use any
>> numpy C API functions.
>>
>> http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI
> 
> Thanks. One more thing: calling Py_DECREF on arrays that I have  
> acquired from PyArray_FROM_OF seems to cause crashes, am I correct in  
> assuming that Cython is somehow tracking all the PyObjects in the  
> scope (even ones acquired via the NumPy C API) and DECREF'ing it for me?

If the function is declared with "object" as return type (or nothing 
which defaults to the same thing), then Cython will interpret that as 
the function handing away the reference of the object and make sure it 
is decref-ed.

-- 
Dag Sverre


From Chris.Barker at noaa.gov  Fri Oct  9 12:08:59 2009
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Fri, 09 Oct 2009 09:08:59 -0700
Subject: [Numpy-discussion] [review] Easy win to improve numpy import
	times	by 30 %
In-Reply-To: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>
References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>
Message-ID: <4ACF601B.4070903@noaa.gov>

David Cournapeau wrote:
>     This branch improves numpy import times quite significantly on slow
> machines:

> I think it largely worths it, and will integrate this unless someone is
> strongly against it or see a problem with the approach,

+1

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From numpy-discussion at maubp.freeserve.co.uk  Fri Oct  9 12:24:56 2009
From: numpy-discussion at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Oct 2009 17:24:56 +0100
Subject: [Numpy-discussion] [review] Easy win to improve numpy import
	times by 30 %
In-Reply-To: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>
References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>
Message-ID: <320fb6e00910090924p3c6af39dvead2d5d8502874e6@mail.gmail.com>

On Fri, Oct 9, 2009 at 5:56 AM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
>
> Since inspect is used in quite a few places, and that we only use it to
> extract arguments from a function, I added a small numpy.lib.inspect
> module, and ...

Is numpy.lib intended as a public API? How about numpy.lib._inspect
instead of numpy.lib.inspect to make it clear this new module is private?

Peter


From josef.pktd at gmail.com  Fri Oct  9 15:05:36 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 9 Oct 2009 15:05:36 -0400
Subject: [Numpy-discussion] tostring() for array rows
In-Reply-To: <4ACCCDE0.4090002@noaa.gov>
References: <1cd32cbb0910061049v3e2cb9a7t1822c0d56dc2ceb2@mail.gmail.com>
	<4ACBAB06.4000008@noaa.gov>
	<1cd32cbb0910061347r5c12a09di2bd93a4310685045@mail.gmail.com>
	<4ACCCDE0.4090002@noaa.gov>
Message-ID: <1cd32cbb0910091205y26c6850fy568373a4c35f6d60@mail.gmail.com>

On Wed, Oct 7, 2009 at 1:20 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> josef.pktd at gmail.com wrote:
>
>> I wanted to avoid the python loop and thought creating the view will be faster
>> with large arrays. But for this I need to know the memory length of a
>> row of arbitrary types for the conversion to strings,
>
> ndarray.itemsize
>
> might do it.
>
> -Chris

Thanks, (I forgot to reply), it works and feels less low level than strides.

Josef

>>> tmps2[0].itemsize * np.size(tmps2[0])
16
>>> tmp[0].itemsize * np.size(tmp[0])
24
>>> tmps2.strides[0]
16
>>> tmp.strides[0]
24
>>> tmp
array([[-1.414, -1.019, -1.171],
       [-1.273,  1.639, -0.854],
       [-1.795, -0.699,  0.595],
       [-0.865, -1.439, -0.275]])
>>> tmps2
array([(4.0, 0, 1), (1.0, 1, 3), (2.0, 2, 4), (4.0, 0, 1)],
      dtype=[('f0', '<f8'), ('f1', '<i4'), ('f2', '<i4')])
>>>


>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice
> 7600 Sand Point Way NE ? (206) 526-6329 ? fax
> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From amenity at enthought.com  Fri Oct  9 17:59:55 2009
From: amenity at enthought.com (Amenity Applewhite)
Date: Fri, 9 Oct 2009 16:59:55 -0500
Subject: [Numpy-discussion] October 16 Scientific Computing with Python
	Webinar: Traits
References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net>
Message-ID: <CDDC559A-D415-491B-AE34-E687AC469D05@enthought.com>


Having trouble viewing this email? Click here

Friday, October 16: Traits
SCIENTIFIC COMPUTING WITH PYTHON WEBINAR
Hello!

It's already time for our October Scientific Computing with Python  
webinar! This month we'll be handling Traits, one of our most popular  
training topics.

Traits: Expanding the Power of Attributes
An essential component of the open source Enthought Tool Suite, The  
Traits package is at the center of all development we do at Enthought.  
In fact, it has changed the mental model we use for programming in the  
already extremely efficient Python programming language.

Briefly, a trait is a type definition that can be used for normal  
Python object attributes, giving the attributes some additional  
characteristics: initialization, validation, delegation, notification,  
and (optionally) visualization (GUIs).   In this webinar we will  
provide an introduction to Traits by walking through several examples  
that show what you can do with Traits.

Scientific Computing With Python Webinar: Traits
October 16
1pm CDT/6pm UTC
Register at GoToMeeting

We hope to see you there! Also, don't forget that this free event is  
open to the public. Use the link at the bottom of this email to  
forward an invitation to your friends and colleagues.

As always, feel free to contact us with questions, concerns, or  
suggestions for future webinar topics.

Have a great weekend,

The Enthought Team
Enthought, Inc.
Quick Links
www.enthought.com
code.enthought.com
Facebook
Blog

Forward email

This email was sent to leah at enthought.com by amenity at enthought.com.
Update Profile/Email Address | Instant removal with SafeUnsubscribe? |  
Privacy Policy.
Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091009/c4eaa838/attachment.html>

From vs at it.uu.se  Mon Oct 12 03:53:14 2009
From: vs at it.uu.se (Virgil Stokes)
Date: Mon, 12 Oct 2009 09:53:14 +0200
Subject: [Numpy-discussion] October 16 Scientific Computing with Python
 Webinar: Traits
In-Reply-To: <CDDC559A-D415-491B-AE34-E687AC469D05@enthought.com>
References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net>
	<CDDC559A-D415-491B-AE34-E687AC469D05@enthought.com>
Message-ID: <4AD2E06A.3070807@it.uu.se>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091012/239c7e4a/attachment.html>

From perfreem at gmail.com  Mon Oct 12 10:18:44 2009
From: perfreem at gmail.com (per freem)
Date: Mon, 12 Oct 2009 10:18:44 -0400
Subject: [Numpy-discussion] finding nonzero elements in list
Message-ID: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>

hi all,

i'm trying to find nonzero elements in an array, as follows:

a = array([[1, 0],
       [1, 1],
       [1, 1],
       [0, 1]])

i want to find all elements that are [1,1]. i tried: nonzero(a ==
[1,0]) but i cannot interpret the output. the output i get is:
(array([0, 0, 1, 2]), array([0, 1, 0, 0]))

i simply want to find the indices of the elements that equal [1,0].
how can i do this? thanks.


From josef.pktd at gmail.com  Mon Oct 12 10:36:09 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 12 Oct 2009 10:36:09 -0400
Subject: [Numpy-discussion] finding nonzero elements in list
In-Reply-To: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>
References: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>
Message-ID: <1cd32cbb0910120736s392280f6r85b2d55eee422335@mail.gmail.com>

On Mon, Oct 12, 2009 at 10:18 AM, per freem <perfreem at gmail.com> wrote:
> hi all,
>
> i'm trying to find nonzero elements in an array, as follows:
>
> a = array([[1, 0],
> ? ? ? [1, 1],
> ? ? ? [1, 1],
> ? ? ? [0, 1]])
>
> i want to find all elements that are [1,1]. i tried: nonzero(a ==
> [1,0]) but i cannot interpret the output. the output i get is:
> (array([0, 0, 1, 2]), array([0, 1, 0, 0]))
>
> i simply want to find the indices of the elements that equal [1,0].
> how can i do this? thanks.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


a == [1,0]  does elementwise comparison, you need to aggregate
condition for all elements of row

>>> a = np.array([[1, 0],
      [1, 1],
      [1, 1],
      [0, 1]])
>>> np.nonzero((a==[1,0]).all(1))
(array([0]),)
>>> np.where((a==[1,0]).all(1))
(array([0]),)
>>> np.nonzero((a==[1,1]).all(1))
(array([1, 2]),)

Josef


From gokhansever at gmail.com  Mon Oct 12 10:39:59 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Mon, 12 Oct 2009 09:39:59 -0500
Subject: [Numpy-discussion] finding nonzero elements in list
In-Reply-To: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>
References: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>
Message-ID: <49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com>

On Mon, Oct 12, 2009 at 9:18 AM, per freem <perfreem at gmail.com> wrote:

> hi all,
>
> i'm trying to find nonzero elements in an array, as follows:
>
> a = array([[1, 0],
>       [1, 1],
>       [1, 1],
>       [0, 1]])
>
> i want to find all elements that are [1,1]. i tried: nonzero(a ==
> [1,0]) but i cannot interpret the output. the output i get is:
> (array([0, 0, 1, 2]), array([0, 1, 0, 0]))
>
> i simply want to find the indices of the elements that equal [1,0].
> how can i do this? thanks.
>


You might simply apply a mask to your array satisfying the condition:

I[1]: a = array([[1, 0],
   ...:       [1, 1],
   ...:       [1, 1],
   ...:       [0, 1]])

I[2]: a == [1,0]
O[2]:
array([[ True,  True],
       [ True, False],
       [ True, False],
       [False, False]], dtype=bool)

I[3]: a[a==[1,0]]
O[3]: array([1, 0, 1, 1])


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091012/35fa4927/attachment.html>

From gokhansever at gmail.com  Mon Oct 12 10:44:04 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Mon, 12 Oct 2009 09:44:04 -0500
Subject: [Numpy-discussion] finding nonzero elements in list
In-Reply-To: <49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com>
References: <e95b09750910120718n19c1c69end1f404c96716b913@mail.gmail.com>
	<49d6b3500910120739k21dcb120p21525134249489f8@mail.gmail.com>
Message-ID: <49d6b3500910120744u576bd79fi9b2e2904429356e2@mail.gmail.com>

On Mon, Oct 12, 2009 at 9:39 AM, G?khan Sever <gokhansever at gmail.com> wrote:

>
>
> On Mon, Oct 12, 2009 at 9:18 AM, per freem <perfreem at gmail.com> wrote:
>
>> hi all,
>>
>> i'm trying to find nonzero elements in an array, as follows:
>>
>> a = array([[1, 0],
>>       [1, 1],
>>       [1, 1],
>>       [0, 1]])
>>
>> i want to find all elements that are [1,1]. i tried: nonzero(a ==
>> [1,0]) but i cannot interpret the output. the output i get is:
>> (array([0, 0, 1, 2]), array([0, 1, 0, 0]))
>>
>> i simply want to find the indices of the elements that equal [1,0].
>> how can i do this? thanks.
>>
>
>
> You might simply apply a mask to your array satisfying the condition:
>
> I[1]: a = array([[1, 0],
>    ...:       [1, 1],
>    ...:       [1, 1],
>    ...:       [0, 1]])
>
> I[2]: a == [1,0]
> O[2]:
> array([[ True,  True],
>        [ True, False],
>        [ True, False],
>        [False, False]], dtype=bool)
>
> I[3]: a[a==[1,0]]
> O[3]: array([1, 0, 1, 1])
>


Addendum;

This might work better since you are looking non-zero elements

I[19]: a[a==[1,0]] & a[a==[0,1]]
O[19]: array([1, 0, 0, 1])


>
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
> G?khan
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091012/6d8b1012/attachment.html>

From perfreem at gmail.com  Mon Oct 12 17:25:39 2009
From: perfreem at gmail.com (per freem)
Date: Mon, 12 Oct 2009 17:25:39 -0400
Subject: [Numpy-discussion] performance of scipy: potential inefficiency in
	logsumexp and sampling from multinomial
Message-ID: <e95b09750910121425j775b0689t3ca6037c59ff6776@mail.gmail.com>

hi all,

i have a piece of code that relies heavily on sampling from
multinomial distributions and using their results to compute log
probabilities. my code makes heavy use of 'multinomial' from scipy,
and of 'logsumexp'.

my code is unusually slow, and profiling it with Python's "cPickle"
module reveals that most of the time is spent in the following
functions:

479.524    0.000 code.py:211(my_func)
122.682    0.000
/Library/Python/2.5/site-packages/scipy/maxentropy/maxentutils.py:27(logsumexp)
40.645    0.000
/Library/Python/2.5/site-packages/numpy/core/numeric.py:180(asarray)
20.374    0.000 {method 'max' of 'numpy.ndarray' objects}

(the first column represents cumulative time, the second is percall time.)

my code (listed as 'my_func' above) essentially computes a list of log
probabilities, exponentiates them and renormalizes them (using
'logsumexp') and then samples from a multinomial distribution using
those probabilities as a parameter. i then check to see which object
came up true from the multinomial sample. here's a sketch of the code:

def my_func(my_list, n_items)
    final_list = []
    for n in xrange(n_items):
        prob = my_dict[(my_list(n), n)]
        final_list.append(prob)
    final_list = final_list - logsumexp(final_list)
    sample = multinomial(1, exp(final_list))
    sample_index = list(sampled_reassignment).index(1)
    return sample_index

the list 'my_list' usually has around 3 to 5 elements in it, and
'my_dict' has about 500-1000 keys.

this function gets called about 1.5 million times in my code, and it
takes about 5 minutes, which seems very long relative to these
operations. (i'd like to scale this up to a case where the function is
called about 10-120 million times.)

are there known efficiency issues with logsumexp? it seems like it
should be a very cheap operation. also, 'multinomial' ought to be
relatively cheap, i believe. does anyone have any ideas on how this
can be optimized?  any input will be greatly appreciated.  i am also
open to using cython if that is likely to make a significant
improvement in this case.

also, what is likely to be the origin of the call to "asarray"? (i am
not explicitly calling that function, it must be indirectly via some
other function.)

thanks very much.


From charlesr.harris at gmail.com  Mon Oct 12 17:48:51 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 12 Oct 2009 15:48:51 -0600
Subject: [Numpy-discussion] performance of scipy: potential inefficiency
	in logsumexp and sampling from multinomial
In-Reply-To: <e95b09750910121425j775b0689t3ca6037c59ff6776@mail.gmail.com>
References: <e95b09750910121425j775b0689t3ca6037c59ff6776@mail.gmail.com>
Message-ID: <e06186140910121448q3e6122d5teca1502120bd6b8a@mail.gmail.com>

On Mon, Oct 12, 2009 at 3:25 PM, per freem <perfreem at gmail.com> wrote:

> hi all,
>
> i have a piece of code that relies heavily on sampling from
> multinomial distributions and using their results to compute log
> probabilities. my code makes heavy use of 'multinomial' from scipy,
> and of 'logsumexp'.
>
> my code is unusually slow, and profiling it with Python's "cPickle"
> module reveals that most of the time is spent in the following
> functions:
>
> 479.524    0.000 code.py:211(my_func)
> 122.682    0.000
>
> /Library/Python/2.5/site-packages/scipy/maxentropy/maxentutils.py:27(logsumexp)
> 40.645    0.000
> /Library/Python/2.5/site-packages/numpy/core/numeric.py:180(asarray)
> 20.374    0.000 {method 'max' of 'numpy.ndarray' objects}
>
> (the first column represents cumulative time, the second is percall time.)
>
> my code (listed as 'my_func' above) essentially computes a list of log
> probabilities, exponentiates them and renormalizes them (using
> 'logsumexp') and then samples from a multinomial distribution using
> those probabilities as a parameter. i then check to see which object
> came up true from the multinomial sample. here's a sketch of the code:
>
> def my_func(my_list, n_items)
>    final_list = []
>    for n in xrange(n_items):
>        prob = my_dict[(my_list(n), n)]
>        final_list.append(prob)
>    final_list = final_list - logsumexp(final_list)
>    sample = multinomial(1, exp(final_list))
>    sample_index = list(sampled_reassignment).index(1)
>    return sample_index
>
> the list 'my_list' usually has around 3 to 5 elements in it, and
> 'my_dict' has about 500-1000 keys.
>
> this function gets called about 1.5 million times in my code, and it
> takes about 5 minutes, which seems very long relative to these
> operations. (i'd like to scale this up to a case where the function is
> called about 10-120 million times.)
>
> are there known efficiency issues with logsumexp? it seems like it
> should be a very cheap operation. also, 'multinomial' ought to be
> relatively cheap, i believe. does anyone have any ideas on how this
> can be optimized?  any input will be greatly appreciated.  i am also
> open to using cython if that is likely to make a significant
> improvement in this case.
>
> also, what is likely to be the origin of the call to "asarray"? (i am
> not explicitly calling that function, it must be indirectly via some
> other function.)
>
>
You are going back and forth between lists and ndarrays of pretty small
sequences of items of variable size. That is bound to be inefficient and
isn't going to get you the benefits of vectorization. Is there any way you
can do what you want using the rows in a single big array?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091012/1bfeaa25/attachment.html>

From perfreem at gmail.com  Mon Oct 12 23:53:05 2009
From: perfreem at gmail.com (per freem)
Date: Mon, 12 Oct 2009 23:53:05 -0400
Subject: [Numpy-discussion] simple array multiplication question
Message-ID: <e95b09750910122053ke5dbc8bjceb8d2aaae19c433@mail.gmail.com>

hi all,

i am trying to write a simple product of 3 arrays (as vectorized code)
but am having some difficulty. i have three arrays, one is a list
containing several lists:

p = array([[ 0.2,  0.8], [ 0.5,  0.5], [ 0.3,  0.7]])

each list in the array 'p' is of size N -- in this case N = 2. i have
a second array containing a set of numbers, each between 0 and N-1:

l = array([0, 0, 1])

and finally an array of the same size as l:

s = array([10, 20, 30])

what i want to do is pick the columns l of p, and multiply each one by
the numbers in s. the first step, picking columns l of p, is simply:

cols = p[arange(3), l]

then i want to multiply each one by the numbers in s, and i do it like this:

cols * s.reshape(3,1)

this seems to work, but i am concerned that it might be inefficient.
is there a cleaner way of doing this? is 'arange' operation necessary
to reference all the 'l' columns of p? also, is the reshape operation
expensive?

thanks very much.


From cournape at gmail.com  Tue Oct 13 00:00:35 2009
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 13 Oct 2009 13:00:35 +0900
Subject: [Numpy-discussion] [review] Easy win to improve numpy import
	times by 30 %
In-Reply-To: <5b8d13220910092249w1aa3e7d7i81740a96e44f2e37@mail.gmail.com>
References: <4ACEC27C.2070607@ar.media.kyoto-u.ac.jp>
	<320fb6e00910090924p3c6af39dvead2d5d8502874e6@mail.gmail.com>
	<5b8d13220910092249w1aa3e7d7i81740a96e44f2e37@mail.gmail.com>
Message-ID: <5b8d13220910122100y2f83d807wbec2ba75b9356a08@mail.gmail.com>

On Sat, Oct 10, 2009 at 2:49 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, Oct 10, 2009 at 1:24 AM, Peter
> <numpy-discussion at maubp.freeserve.co.uk> wrote:
>> On Fri, Oct 9, 2009 at 5:56 AM, David Cournapeau
>> <david at ar.media.kyoto-u.ac.jp> wrote:
>>>
>>> Since inspect is used in quite a few places, and that we only use it to
>>> extract arguments from a function, I added a small numpy.lib.inspect
>>> module, and ...
>>
>> Is numpy.lib intended as a public API? How about numpy.lib._inspect
>> instead of numpy.lib.inspect to make it clear this new module is private?
>
> I could see it being used by other packages like scipy for example.
> Instead of numpy.lib.inspect, we may choose to have something like
> numpy.lib.compat or something, where we could put several potential
> cases similar to this one.

Ok, I created a new numpy subpackage numpy.compat, and numpy.compat
will contain the public API. The implementation is in
numpy.compat._inspect.

Unless someones objects to it, I will include this within the next day or so,

cheers,

David


From dwf at cs.toronto.edu  Tue Oct 13 00:40:18 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Tue, 13 Oct 2009 00:40:18 -0400
Subject: [Numpy-discussion] simple array multiplication question
In-Reply-To: <e95b09750910122053ke5dbc8bjceb8d2aaae19c433@mail.gmail.com>
References: <e95b09750910122053ke5dbc8bjceb8d2aaae19c433@mail.gmail.com>
Message-ID: <14427D13-1307-4E8A-B689-78CBB21F773E@cs.toronto.edu>


On 12-Oct-09, at 11:53 PM, per freem wrote:

> what i want to do is pick the columns l of p, and multiply each one by
> the numbers in s. the first step, picking columns l of p, is simply:
>
> cols = p[arange(3), l]

This isn't picking columns of p, this is picking the times at (0, 0),  
(1, 0), and (2, 1). Is this what you meant?

In [36]: p[arange(3), [0,0,1]]
Out[36]: array([ 0.2,  0.5,  0.7])

In [37]: p[:, [0,0,1]]
Out[37]:
array([[ 0.2,  0.2,  0.8],
        [ 0.5,  0.5,  0.5],
        [ 0.3,  0.3,  0.7]])

In [38]: p[arange(3), [0,0,1]] * s.reshape(3,1)
Out[38]:
array([[  2.,   5.,   7.],
        [  4.,  10.,  14.],
        [  6.,  15.,  21.]])

In [41]: p[:, [0,0,1]] * s.reshape(3,1)
Out[41]:
array([[  2.,   2.,   8.],
        [ 10.,  10.,  10.],
        [  9.,   9.,  21.]])

Notice the difference.

> then i want to multiply each one by the numbers in s, and i do it  
> like this:
>
> cols * s.reshape(3,1)
>
> this seems to work, but i am concerned that it might be inefficient.
> is there a cleaner way of doing this? is 'arange' operation necessary
> to reference all the 'l' columns of p?

That's about as efficient as it gets, I think.

> also, is the reshape operation
> expensive?

No. It will return a view, rather than make a copy.

You could also do cols * s[:, np.newaxis], equivalently.

David


From dpeterson at enthought.com  Tue Oct 13 12:43:53 2009
From: dpeterson at enthought.com (Dave Peterson)
Date: Tue, 13 Oct 2009 11:43:53 -0500
Subject: [Numpy-discussion] October 16 Scientific Computing with Python
 Webinar: Traits
In-Reply-To: <4AD2E06A.3070807@it.uu.se>
References: <1874882496.1255125323830.JavaMail.root@p2-ws606.ad.prodcc.net>	<CDDC559A-D415-491B-AE34-E687AC469D05@enthought.com>
	<4AD2E06A.3070807@it.uu.se>
Message-ID: <4AD4AE49.5020201@enthought.com>

Virgil Stokes wrote:
> Amenity Applewhite wrote:
>>
>> Having trouble viewing this email? Click here 
>> <http://campaign.constantcontact.com/render?v=001aqj1QInodGS8t1GM8CAZ73MmKrzM7KBX0AX1I0wzIfvLnzOhIXQ77sJQ8Tme65BaojOYOuyIm4KrobLgjL-SV-FfU98HseSlt1AfwgYYdr4fW28G474a13qrtKfusyqWxkyY_lb0STJHaEqEyWcvHajjw8eM4MUJeUrbQlwl_vs4iJaHjFWXg-e8nTVGNJiRH9QJQ57tZtf_suxQ_1t0OO-QKJHn2dAGcclN956u-N-BQGl_IdiXuinhAgy9WZc6d_vLWxDFveBEbjgNrfihl6SR_LcEKutQghtAr0Un6R7w5ZCII3l3sz4aDgXGHFr36DQwsd4BLuY%3D>
>>
>> SCP Banner October
>>
>> Friday, October 16: Traits
>> SCIENTIFIC COMPUTING WITH PYTHON WEBINAR
>>
>> Hello!
>>
>> It's already time for our October Scientific Computing with Python 
>> webinar! This month we'll be handling Traits 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fcode.enthought.com%2Fprojects%2Ftraits%2F&id=preview>, 
>> one of our most popular training topics. 
>>  
>>
>>
>> Traits: Expanding the Power of Attributes
>> Enthought Tool Suite 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fcode.enthought.com%2Fprojects%2Findex.php&id=preview> 
>> An essential component of the open source Enthought Tool Suite 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fcode.enthought.com%2Fprojects%2Findex.php&id=preview>, 
>> The Traits package is at the center of all development we do at 
>> Enthought. In fact, it has changed the mental model we use for 
>> programming in the already extremely efficient Python programming 
>> language.  
>>
>> Briefly, a trait is a type definition that can be used for normal 
>> Python object attributes, giving the attributes some additional 
>> characteristics: initialization, validation, delegation, 
>> notification, and (optionally) visualization (GUIs).   In this 
>> webinar we will provide an introduction to Traits by walking through 
>> several examples that show what you can do with Traits.
>>  
>>
>>
>> Scientific Computing With Python Webinar: Traits
>> October 16
>> 1pm CDT/6pm UTC
>> Register at GoToMeeting 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=https%3A%2F%2Fwww1.gotomeeting.com%2Fregister%2F289550160&id=preview>
>>  
>>
>>
>> We hope to see you there! Also, don't forget that this free event is 
>> open to the public. Use the link at the bottom of this email to 
>> forward an invitation to your friends and colleagues. 
>>
>> As always, feel free to contact us <mailto:amenity at enthought.com> 
>> with questions, concerns, or suggestions for future webinar topics.
>>  
>> Have a great weekend,
>>
>> The Enthought Team
>> Enthought, Inc.
>>
>> 	
>> Quick Links
>> www.enthought.com 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fwww.enthought.com&id=preview>
>> code.enthought.com 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fwww.code.enthought.com&id=preview>
>> Facebook 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fwww.facebook.com%2Fhome.php%3F%23%2Fpages%2FEnthought-Inc%2F104574589592&id=preview>
>> Blog 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fwww.enthought.com%2F&id=preview>
>>
>> Enthought Header 
>> <http://rs6.net/tn.jsp?t=9ipxvadab.0.0.qx5rzwcab.0&p=http%3A%2F%2Fwww.enthought.com&id=preview>
>>
>>
>> Forward email 
>> <http://ui.constantcontact.com/sa/fwtf.jsp?m=1102424111856&ea=leah at enthought.com&a=1102755642650&id=preview&id=preview>
>> Safe Unsubscribe 
>> <http://visitor.constantcontact.com/d.jsp?p=un&v=001DFR05IDUFUXFQLppfBgsYWPTOmTPCLpyuYIbftNBqlhs9rqwv9zZhispblLdvE5tEZeWoDfz_BSh3oWVuFS79rFJ8arQjhLuaI5G1ruh5g0pkDTX0KE7Mg%3D%3D&id=preview.1102424111856&id=preview> 
>>
>> This email was sent to leah at enthought.com <mailto:leah at enthought.com> 
>> by amenity at enthought.com <mailto:amenity at enthought.com>.
>> Update Profile/Email Address 
>> <http://visitor.constantcontact.com/d.jsp?p=oo&v=001DFR05IDUFUXFQLppfBgsYWPTOmTPCLpyuYIbftNBqlhs9rqwv9zZhispblLdvE5tEZeWoDfz_BSh3oWVuFS79rFJ8arQjhLuaI5G1ruh5g0pkDTX0KE7Mg%3D%3D&id=preview.1102424111856&id=preview> 
>> | Instant removal with SafeUnsubscribe 
>> <http://visitor.constantcontact.com/d.jsp?p=un&v=001DFR05IDUFUXFQLppfBgsYWPTOmTPCLpyuYIbftNBqlhs9rqwv9zZhispblLdvE5tEZeWoDfz_BSh3oWVuFS79rFJ8arQjhLuaI5G1ruh5g0pkDTX0KE7Mg%3D%3D&id=preview.1102424111856&id=preview>? 
>> | Privacy Policy 
>> <http://ui.constantcontact.com/roving/CCPrivacyPolicy.jsp?id=preview&id=preview>.
>> 	
>>
>> Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701
>>
>>
>> ------------------------------------------------------------------------
>>
>>   
> Do participants in this Webinar need to have GoToMeeting software 
> installed and if yes, do they need to purchase this software?

GoToMeeting does require installing some local software, but it's an 
applet that should be quickly installed when you request to join the 
meeting.   There is no purchase required, the applet is free.

-- Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091013/b2d19c05/attachment.html>

From perfreem at gmail.com  Wed Oct 14 00:27:31 2009
From: perfreem at gmail.com (per freem)
Date: Wed, 14 Oct 2009 00:27:31 -0400
Subject: [Numpy-discussion] [SciPy-User] vectorized version of
	'multinomial' sampling function
In-Reply-To: <6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu>
References: <e95b09750910131401l1b967024x80625d9339328a81@mail.gmail.com>
	<6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu>
Message-ID: <e95b09750910132127wccdbf5ei8388f8ee7c2ff814@mail.gmail.com>

On Tue, Oct 13, 2009 at 7:59 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> On 13-Oct-09, at 5:01 PM, per freem wrote:
>
>> hi all,
>>
>> i have a series of probability vector that i'd like to feed into
>> multinomial to get an array of vector outcomes back. for example,
>> given:
>>
>> p = array([[ 0.9 , ?0.05, ?0.05],
>> ? ? ? [ 0.05, ?0.05, ?0.9 ]])
>>
>> i'd like to call multinomial like this:
>>
>> multinomial(1, p)
>>
>> to get a vector of multinomial samplers, each using the nth list in
>> 'p'. something like:
>>
>> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems
>> like 'multinomial' takes only a one dimensional array. i could write
>> this as a "for" loop of course but i prefer a vectorized version since
>> speed is crucial for me here.
>>
>> thanks very much.
>
> Your best bet is probably to copy the pyrex/Cython code for
> multinomial in numpy/random/mtrand/mtrand.pyx, and add the
> functionality you want there. ?If you do it right (i.e. type your loop
> indices) then it should be fast.
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Hi David

thanks for your reply. i am not sure how to do this though -- is the
vectorized version i would write in pyrex/cython simply going to
iterate through this vector of vectors and do the operation? will that
really be efficient?

is there some other library that can do vectorized multinomial like i
described? i really am not sure how to write this cython.


From robert.kern at gmail.com  Wed Oct 14 00:30:33 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 13 Oct 2009 23:30:33 -0500
Subject: [Numpy-discussion] [SciPy-User] vectorized version of
	'multinomial' sampling function
In-Reply-To: <e95b09750910132127wccdbf5ei8388f8ee7c2ff814@mail.gmail.com>
References: <e95b09750910131401l1b967024x80625d9339328a81@mail.gmail.com> 
	<6E9F4234-F3FD-4E78-BDC4-D0960FE52242@cs.toronto.edu>
	<e95b09750910132127wccdbf5ei8388f8ee7c2ff814@mail.gmail.com>
Message-ID: <3d375d730910132130y204064a9vcd9e4117a7f6baba@mail.gmail.com>

On Tue, Oct 13, 2009 at 23:27, per freem <perfreem at gmail.com> wrote:

> thanks for your reply. i am not sure how to do this though -- is the
> vectorized version i would write in pyrex/cython simply going to
> iterate through this vector of vectors and do the operation? will that
> really be efficient?

Yes because the iteration, if written correctly, will be in C. This is
all that "vectorization" means in this context.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From thomas.robitaille at gmail.com  Wed Oct 14 09:52:29 2009
From: thomas.robitaille at gmail.com (Thomas Robitaille)
Date: Wed, 14 Oct 2009 09:52:29 -0400
Subject: [Numpy-discussion] rec_append_fields and n-dimensional fields
Message-ID: <08437E25-B5F0-46EB-B9E4-DE9DF4FEA700@gmail.com>

Hi,

I'm interested in constructing a recarray with fields that have two or  
more dimensions. This can be done from scratch like this:

r = np.recarray((10,),dtype=[('c1',float,(3,))])

However, I am interested in appending a field to an existing recarray.  
Rather than repeating existing code I would like to use the  
numpy.lib.recfunctions.rec_append_fields method, but I am not sure how  
to specify the dimension of each field, since it doesn't seem to be  
possible to specify the dtype as a tuple as above.

Thanks for any advice,

Thomas


From gael.varoquaux at normalesup.org  Wed Oct 14 11:32:07 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 14 Oct 2009 17:32:07 +0200
Subject: [Numpy-discussion] Speed of np.array versus np.vstack
Message-ID: <20091014153207.GI15987@phare.normalesup.org>

I tend to use np.array to stack arrays rather than np.vstack, as I find
it does what I want with higher dimensional arrays. However, I was quite
surprised to see a large speed difference:

In [1]: import numpy as np

In [2]: N = 1e6

In [3]: M = 10

In [4]: l = [np.random.random(N) for _ in range(M)]

In [5]: %timeit np.vstack(l)
10 loops, best of 3: 82.7 ms per loop

In [6]: %timeit np.array(l)
10 loops, best of 3: 822 ms per loop

I can't find the reasons for this speed difference. Also, I don't see
what is the correct way to get the behavior I want without paying the
extra speed cost.

Cheers,

Ga?l


From nwagner at iam.uni-stuttgart.de  Wed Oct 14 12:52:26 2009
From: nwagner at iam.uni-stuttgart.de (Nils Wagner)
Date: Wed, 14 Oct 2009 18:52:26 +0200
Subject: [Numpy-discussion] TypeError: 'bool' object is not callable
Message-ID: <web-125711983@uni-stuttgart.de>


  >>> numpy.__version__
'1.4.0.dev7528'

======================================================================
ERROR: test_from_unicode (test_defchararray.TestBasic)
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", 
line 68, in test_from_unicode
     A = np.char.array(u'\u03a3')
   File 
"/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", 
line 2453, in array
     obj = unicode(obj)
TypeError: 'bool' object is not callable

----------------------------------------------------------------------
Ran 2277 tests in 18.933s

FAILED (KNOWNFAIL=1, errors=1)
<nose.result.TextTestResult run=2277 errors=1 failures=0>


From mdroe at stsci.edu  Wed Oct 14 12:59:28 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 14 Oct 2009 12:59:28 -0400
Subject: [Numpy-discussion] TypeError: 'bool' object is not callable
In-Reply-To: <web-125711983@uni-stuttgart.de>
References: <web-125711983@uni-stuttgart.de>
Message-ID: <4AD60370.6040800@stsci.edu>

That's my bad.  I will commit a fix to SVN shortly.

Mike

Nils Wagner wrote:
>   >>> numpy.__version__
> '1.4.0.dev7528'
>
> ======================================================================
> ERROR: test_from_unicode (test_defchararray.TestBasic)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File 
> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", 
> line 68, in test_from_unicode
>      A = np.char.array(u'\u03a3')
>    File 
> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", 
> line 2453, in array
>      obj = unicode(obj)
> TypeError: 'bool' object is not callable
>
> ----------------------------------------------------------------------
> Ran 2277 tests in 18.933s
>
> FAILED (KNOWNFAIL=1, errors=1)
> <nose.result.TextTestResult run=2277 errors=1 failures=0>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From mdroe at stsci.edu  Wed Oct 14 13:02:25 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 14 Oct 2009 13:02:25 -0400
Subject: [Numpy-discussion] TypeError: 'bool' object is not callable
In-Reply-To: <4AD60370.6040800@stsci.edu>
References: <web-125711983@uni-stuttgart.de> <4AD60370.6040800@stsci.edu>
Message-ID: <4AD60421.9040701@stsci.edu>

The fix is in SVN r7530.

Mike

Michael Droettboom wrote:
> That's my bad.  I will commit a fix to SVN shortly.
>
> Mike
>
> Nils Wagner wrote:
>   
>>   >>> numpy.__version__
>> '1.4.0.dev7528'
>>
>> ======================================================================
>> ERROR: test_from_unicode (test_defchararray.TestBasic)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File 
>> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/tests/test_defchararray.py", 
>> line 68, in test_from_unicode
>>      A = np.char.array(u'\u03a3')
>>    File 
>> "/home/nwagner/local/lib64/python2.6/site-packages/numpy/core/defchararray.py", 
>> line 2453, in array
>>      obj = unicode(obj)
>> TypeError: 'bool' object is not callable
>>
>> ----------------------------------------------------------------------
>> Ran 2277 tests in 18.933s
>>
>> FAILED (KNOWNFAIL=1, errors=1)
>> <nose.result.TextTestResult run=2277 errors=1 failures=0>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>   
>>     
>
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From Ashwin.Kashyap at thomson.net  Wed Oct 14 19:04:01 2009
From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin)
Date: Wed, 14 Oct 2009 19:04:01 -0400
Subject: [Numpy-discussion] MKL with 64bit crashes
Message-ID: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>

Hello,
I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with
MKL.
This is my site.cfg:
[mkl]
# library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
include_dirs = /opt/intel/mkl/10.2.2.025/include
lapack_libs = mkl_lapack
#mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
iomp5, mkl_vml_mc3
mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
mkl_mc3, mkl_def

In [2]: numpy.test()
Running unit tests for numpy
NumPy version 1.3.0
NumPy is installed in /opt/*********************
Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4
(Ubuntu 4.2.4-1ubuntu3)]
nose version 0.11.0
..........
MKL ERROR: Parameter 4 was incorrect on entry to DGESV

MKL ERROR: Parameter 4 was incorrect on entry to DGESV

MKL ERROR: Parameter 4 was incorrect on entry to DGESV

MKL ERROR: Parameter 4 was incorrect on entry to DGESV

MKL ERROR: Parameter 4 was incorrect on entry to DGESV

MKL ERROR: Parameter 4 was incorrect on entry to DGESV
..
MKL ERROR: Parameter 4 was incorrect on entry to DGESV
FSegmentation fault

I am using gcc:
gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-mpfr --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

Anyone having the same issues? Do I have the mkl_libs correctly
configured (this seems like a black art!)

Thanks,
Ashwin


From cournape at gmail.com  Wed Oct 14 20:30:37 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 15 Oct 2009 09:30:37 +0900
Subject: [Numpy-discussion] MKL with 64bit crashes
In-Reply-To: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
Message-ID: <5b8d13220910141730j7294de20tca97402e7dd24f9b@mail.gmail.com>

On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin
<Ashwin.Kashyap at thomson.net> wrote:
> Hello,
> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel) with
> MKL.
> This is my site.cfg:
> [mkl]
> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
> include_dirs = /opt/intel/mkl/10.2.2.025/include
> lapack_libs = mkl_lapack
> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
> iomp5, mkl_vml_mc3
> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
> mkl_mc3, mkl_def

The order does not look right - I don't know the exact order (each
version of the MKL changes the libraries), but you should respect the
order as given in the MKL manual.

> MKL ERROR: Parameter 4 was incorrect on entry to DGESV

This suggests an error when passing argument to MKL - I believe your
version of MKL uses the gfortran ABI by default, and hardy uses g77 as
the default fortran compiler. You should either recompile everything
with gfortran, or regenerate the MKL interface libraries with g77 (as
indicated in the manual).

cheers,

David


From robince at gmail.com  Thu Oct 15 06:37:13 2009
From: robince at gmail.com (Robin)
Date: Thu, 15 Oct 2009 11:37:13 +0100
Subject: [Numpy-discussion] extension questions: f2py and cython
Message-ID: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>

Hi,

Sent this last week but checking the archives it appears not to have
got through. Hopefully this will work...

I am looking at moving some of my code to fortran to use with f2py. To
get started I used this simple example:

SUBROUTINE bincount (x,c,n,m)
     IMPLICIT NONE
     INTEGER, INTENT(IN) :: n,m
     INTEGER, DIMENSION(n), INTENT(IN) :: x
     INTEGER, DIMENSION(0:m-1), INTENT(OUT) :: c
     INTEGER :: i

     DO i = 1, n
       c(x(i)) = c(x(i)) + 1
     END DO
END

It performs well:
In [1]: x = np.random.random_integers(0,1023,1000000).astype(int)
In [4]: timeit test.bincount(x,1024)
1000 loops, best of 3: 1.16 ms per loop
In [5]: timeit np.bincount(x)
100 loops, best of 3: 4 ms per loop

I'm guessing most of the benefit comes from less checking + not having
to find the maximum value (which I pass as parameter m).

But I have some questions. It seems to work as is, but I don't set c
to zeros anywhere. Can I assume arrays created by f2py are zero? Is
this the recommended way to use f2py with arrays? (I initially tried
using assumed arrays with DIMENSION(:) but it couldn't get it to
work).
Also I'm quite new to fortran - what would be the advantages,
if any, of using a FORALL instead of DO in a situation like this?
I guess with 1D arrays it doesn't make a copy since ordering is not a
problem, but if it was 2D arrays am I right in thinking that if I
passed in a C order array it would automatically make a copy to F
order. What about the return - will I get a number array in F order,
or will it automatically be copied to C order? (I guess I will see but
I haven't got onto trying that yet).
What if I wanted to keep all the array creation in numpy - ie call it
as the fortran subroutine bincount(x,c) and have c modified in place?
Should I be using !f2py comments? I wasn't clear if these are needed -
it seems to work as is but could they give any improvement?

For comparison I tried the same thing in cython - after a couple of
iterations with not typing things properly I ended up with:

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
def bincount(np.ndarray[np.int_t, ndim=1] x not None,int m):
    cdef int n = x.shape[0]
    cdef unsigned int i
    cdef np.ndarray[np.int_t, ndim=1] c = np.zeros(m,dtype=np.int)

    for i from 0 <= i < n:
        c[<unsigned int>x[i]] += 1

    return c

which now performs a bit better than np.bincount, but still
significantly slower than the fortran. Is this to be expected or am I
missing something in the cython?

In [14]: timeit ctest.bincount(x,1024)
100 loops, best of 3: 3.31 ms per loop

Cheers

Robin


From robince at gmail.com  Thu Oct 15 08:53:48 2009
From: robince at gmail.com (Robin)
Date: Thu, 15 Oct 2009 13:53:48 +0100
Subject: [Numpy-discussion] extension questions: f2py and cython
In-Reply-To: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
Message-ID: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com>

Hi,

I have another question about distributing a Python extension which
uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy
alternatives and just use fortran version if available - but I think
that should be OK.

I'm more worried about distributing binaries on Windows - I think on
Mac/Linux it would be ok to have a fortran compiler required and build
it - but on Windows I guess one should really distribute binaries.

What is the recommended (free) fortran 95 compiler for use with f2py
on windows (gfortan with cygwin?)
Is it possible to get f2py to build a static library on windows so I
can just distribute that? Or will I need to include library files from
the compiler?
How many different binary versions does one need to support common
recent windows setups? I guess I need a different binary for each
major python version and 32/64 bits (ie 2.5 32bit, 2.6 32bit, 2.5
64bit, 2.6 64bit). Is this right, or would different binaries be
required for XP, Vista, 7 etc. ?

Can anyone point me to a smallish Python package that includes fortran
code in this way that I could look to for inspiration?

Cheers

Robin


From josef.pktd at gmail.com  Thu Oct 15 09:19:12 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 15 Oct 2009 09:19:12 -0400
Subject: [Numpy-discussion] extension questions: f2py and cython
In-Reply-To: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com>
References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
	<2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com>
Message-ID: <1cd32cbb0910150619u7aa85e34le22b7c08237584db@mail.gmail.com>

On Thu, Oct 15, 2009 at 8:53 AM, Robin <robince at gmail.com> wrote:
> Hi,
>
> I have another question about distributing a Python extension which
> uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy
> alternatives and just use fortran version if available - but I think
> that should be OK.
>
> I'm more worried about distributing binaries on Windows - I think on
> Mac/Linux it would be ok to have a fortran compiler required and build
> it - but on Windows I guess one should really distribute binaries.
>
> What is the recommended (free) fortran 95 compiler for use with f2py
> on windows (gfortan with cygwin?)
> Is it possible to get f2py to build a static library on windows so I
> can just distribute that? Or will I need to include library files from
> the compiler?
> How many different binary versions does one need to support common
> recent windows setups? I guess I need a different binary for each
> major python version and 32/64 bits (ie 2.5 32bit, 2.6 32bit, 2.5
> 64bit, 2.6 64bit). Is this right, or would different binaries be
> required for XP, Vista, 7 etc. ?

The same binaries should work on both XP and Vista.

>
> Can anyone point me to a smallish Python package that includes fortran
> code in this way that I could look to for inspiration?

I don't know if you can pymc smallish, but it is using a considerable
amount of fortran, and distributes only win32-py2.5 binaries.
http://code.google.com/p/pymc/

for the rest I have no idea.
(for numpy/scipy, I'm still using g77 with the official mingw for
windows xp, win32 only)

Josef

>
> Cheers
>
> Robin
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From Ashwin.Kashyap at thomson.net  Thu Oct 15 11:00:40 2009
From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin)
Date: Thu, 15 Oct 2009 11:00:40 -0400
Subject: [Numpy-discussion] MKL with 64bit crashes
In-Reply-To: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
Message-ID: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>

I followed the advice given by the Intel MKL link adviser
(http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)

This is my new site.cfg:
mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core

I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95.
Now I get these errors on import:
Running unit tests for numpy
NumPy version 1.3.0
NumPy is installed in
/opt/Personalization/lib/python2.5/site-packages/numpy
Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4
(Ubuntu 4.2.4-1ubuntu3)]
nose version 0.11.0

*** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol:
mkl_dft_commit_descriptor_s_c2c_md_omp
*** libmkl_def.so *** failed with error : libmkl_def.so: undefined
symbol: mkl_dft_commit_descriptor_s_c2c_md_omp
MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so


Any hints?

Thanks,
Ashwin


Your message:

On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin
<Ashwin.Kashyap at thomson.net> wrote:
> Hello,
> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel)
with
> MKL.
> This is my site.cfg:
> [mkl]
> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
> include_dirs = /opt/intel/mkl/10.2.2.025/include
> lapack_libs = mkl_lapack
> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
> iomp5, mkl_vml_mc3
> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
> mkl_mc3, mkl_def

The order does not look right - I don't know the exact order (each
version of the MKL changes the libraries), but you should respect the
order as given in the MKL manual.

> MKL ERROR: Parameter 4 was incorrect on entry to DGESV

This suggests an error when passing argument to MKL - I believe your
version of MKL uses the gfortran ABI by default, and hardy uses g77 as
the default fortran compiler. You should either recompile everything
with gfortran, or regenerate the MKL interface libraries with g77 (as
indicated in the manual).

cheers,

David


From matthieu.brucher at gmail.com  Thu Oct 15 11:06:05 2009
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Thu, 15 Oct 2009 17:06:05 +0200
Subject: [Numpy-discussion] MKL with 64bit crashes
In-Reply-To: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>
References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
	<68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>
Message-ID: <e76aa17f0910150806i30e38e89tf710663d457579e6@mail.gmail.com>

Hi,

You need to use the static libraries, are you sure you currently do?

Matthieu

2009/10/15 Kashyap Ashwin <Ashwin.Kashyap at thomson.net>:
> I followed the advice given by the Intel MKL link adviser
> (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)
>
> This is my new site.cfg:
> mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core
>
> I also exported CFLAGS="-fopenmp" and built with the --fcompiler=gnu95.
> Now I get these errors on import:
> Running unit tests for numpy
> NumPy version 1.3.0
> NumPy is installed in
> /opt/Personalization/lib/python2.5/site-packages/numpy
> Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4
> (Ubuntu 4.2.4-1ubuntu3)]
> nose version 0.11.0
>
> *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined symbol:
> mkl_dft_commit_descriptor_s_c2c_md_omp
> *** libmkl_def.so *** failed with error : libmkl_def.so: undefined
> symbol: mkl_dft_commit_descriptor_s_c2c_md_omp
> MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so
>
>
> Any hints?
>
> Thanks,
> Ashwin
>
>
>
> Your message:
>
> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin
> <Ashwin.Kashyap at thomson.net> wrote:
>> Hello,
>> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel)
> with
>> MKL.
>> This is my site.cfg:
>> [mkl]
>> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
>> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
>> include_dirs = /opt/intel/mkl/10.2.2.025/include
>> lapack_libs = mkl_lapack
>> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
>> iomp5, mkl_vml_mc3
>> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
>> mkl_mc3, mkl_def
>
> The order does not look right - I don't know the exact order (each
> version of the MKL changes the libraries), but you should respect the
> order as given in the MKL manual.
>
>> MKL ERROR: Parameter 4 was incorrect on entry to DGESV
>
> This suggests an error when passing argument to MKL - I believe your
> version of MKL uses the gfortran ABI by default, and hardy uses g77 as
> the default fortran compiler. You should either recompile everything
> with gfortran, or regenerate the MKL interface libraries with g77 (as
> indicated in the manual).
>
> cheers,
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


From HAWRYLA at novachem.com  Thu Oct 15 11:10:10 2009
From: HAWRYLA at novachem.com (Andrew Hawryluk)
Date: Thu, 15 Oct 2009 09:10:10 -0600
Subject: [Numpy-discussion] extension questions: f2py and cython
In-Reply-To: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
Message-ID: <48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com>


> -----Original Message-----
> From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-
> bounces at scipy.org] On Behalf Of Robin
> Sent: 15 Oct 2009 4:37 AM
> To: numpy-discussion at scipy.org
> Subject: [Numpy-discussion] extension questions: f2py and cython
> 
> Hi,
> 
> Sent this last week but checking the archives it appears not to have
> got through. Hopefully this will work...
> 
> I am looking at moving some of my code to fortran to use with f2py. To
> get started I used this simple example:

...

> But I have some questions. It seems to work as is, but I don't set c
to
> zeros anywhere. Can I assume arrays created by f2py are zero?

As I understand it, uninitialized variables in Fortran are
compiler/system-dependent. Some compilers initialize values to zero,
many leave the previous contents of the memory in place. It is safest to
never use the value of an uninitialized variable.

Andrew


From HAWRYLA at novachem.com  Thu Oct 15 11:24:03 2009
From: HAWRYLA at novachem.com (Andrew Hawryluk)
Date: Thu, 15 Oct 2009 09:24:03 -0600
Subject: [Numpy-discussion] extension questions: f2py and cython
In-Reply-To: <2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com>
References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
	<2d5132a50910150553k527ee022hc7af583757cee793@mail.gmail.com>
Message-ID: <48C01AE7354EC240A26F19CEB995E943033AF2F9@CHMAILMBX01.novachem.com>


> -----Original Message-----
> From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-
> bounces at scipy.org] On Behalf Of Robin
> Sent: 15 Oct 2009 6:54 AM
> To: numpy-discussion at scipy.org
> Subject: Re: [Numpy-discussion] extension questions: f2py and cython
> 
> Hi,
> 
> I have another question about distributing a Python extension which
> uses f2py wrapped code. Ideally I'd like to keep pure Python/Numpy
> alternatives and just use fortran version if available - but I think
> that should be OK.
> 
> I'm more worried about distributing binaries on Windows - I think on
> Mac/Linux it would be ok to have a fortran compiler required and build
> it - but on Windows I guess one should really distribute binaries.
> 
> What is the recommended (free) fortran 95 compiler for use with f2py
on
> windows (gfortan with cygwin?) Is it possible to get f2py to build a
> static library on windows so I can just distribute that? Or will I
need
> to include library files from the compiler?

I am using gfortran, which has a native Windows installer:
http://www.scipy.org/F2PY_Windows

I have also successfully used g95 with f2py on Windows.

When f2py runs this on Windows, it produces a *.pyd file that contains
the compiled code. E.g. myfoo.f --> myfoo.pyd. This is imported into
python with 'import myfoo'. The recipient of the windows binary needs
only the *.pyd file (and your *.py files).

Andrew


From mdroe at stsci.edu  Thu Oct 15 12:40:01 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Thu, 15 Oct 2009 12:40:01 -0400
Subject: [Numpy-discussion] object array alignment issues
Message-ID: <4AD75061.2020908@stsci.edu>

I recently committed a regression test and bugfix for object pointers in 
record arrays of unaligned size (meaning where each record is not a 
multiple of sizeof(PyObject **)).

For example:

        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
        a2 = np.zeros((10,), 'S10')
        # This copying would segfault
        a1['o'] = a2

http://projects.scipy.org/numpy/ticket/1198

Unfortunately, this unit test has opened up a whole hornet's nest of 
alignment issues on Solaris.  The various reference counting functions 
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, 
for instance.  Interestingly, there are comments in there saying 
"handles misaligned data" (eg. line 190), but in fact it doesn't, and 
doesn't look to me like it would.  But I won't rule out a mistake in 
building it on my part.

So, how to fix this? 

One obvious workaround is for users to pass "align=True" to the dtype 
constructor.  This works if the dtype descriptor is a dictionary or 
comma-separated string.  Is there a reason it couldn't be made to work 
with the string-of-tuples form that I'm missing?  It would be marginally 
more convenient from my application, but that's just a finesse issue.

However, perhaps we should try to fix the underlying alignment 
problems?  Unfortunately, it's not clear to me how to resolve them 
without at least some performance penalty.  You either do an alignment 
check of the pointer, and then memcpy if unaligned, or just always use 
memcpy.  Not sure which is faster, as memcpy may have a fast path 
already. These are object arrays anyway, so there's plenty of overhead 
already, and I don't think this would affect regular numerical arrays. 

If we choose not to fix it, perhaps we should we try to warn when 
creating an unaligned recarray on platforms where it matters?  I do 
worry about having something that works perfectly well on one platform 
fail on another.

In the meantime, I'll just mark the new regression test to "skip on 
Solaris".

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From charlesr.harris at gmail.com  Thu Oct 15 13:00:04 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 15 Oct 2009 11:00:04 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4AD75061.2020908@stsci.edu>
References: <4AD75061.2020908@stsci.edu>
Message-ID: <e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>

On Thu, Oct 15, 2009 at 10:40 AM, Michael Droettboom <mdroe at stsci.edu>wrote:

> I recently committed a regression test and bugfix for object pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>        a2 = np.zeros((10,), 'S10')
>        # This copying would segfault
>        a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.


No surprise there. Good unit tests seem to routinely uncover hornet's nests
and Solaris is a platform that exercises the alignment part of the code. I
think it is great that you are finding these problems. We folks working on
Intel don't see them so much.


> The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.
>
> So, how to fix this?
>
> One obvious workaround is for users to pass "align=True" to the dtype
> constructor.  This works if the dtype descriptor is a dictionary or
> comma-separated string.  Is there a reason it couldn't be made to work
> with the string-of-tuples form that I'm missing?  It would be marginally
> more convenient from my application, but that's just a finesse issue.
>
> However, perhaps we should try to fix the underlying alignment
> problems?  Unfortunately, it's not clear to me how to resolve them
> without at least some performance penalty.  You either do an alignment
> check of the pointer, and then memcpy if unaligned, or just always use
> memcpy.  Not sure which is faster, as memcpy may have a fast path
> already. These are object arrays anyway, so there's plenty of overhead
> already, and I don't think this would affect regular numerical arrays.
>
>
I believe the memcpy approach is used for other unaligned parts of void
types. There is an inherent performance penalty there, but I don't see how
it can be avoided when using what are essentially packed structures. As to
memcpy, it's performance seems to depend on the compiler/compiler version,
old versions of gcc had *horrible* implementations of memcpy. I believe the
situation has since improved. However, I'm not sure we should be coding to
compiler issues unless it is unavoidable or the gain is huge.


> If we choose not to fix it, perhaps we should we try to warn when
> creating an unaligned recarray on platforms where it matters?  I do
> worry about having something that works perfectly well on one platform
> fail on another.
>
> In the meantime, I'll just mark the new regression test to "skip on
> Solaris".
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091015/c81367a3/attachment.html>

From robince at gmail.com  Thu Oct 15 14:01:43 2009
From: robince at gmail.com (Robin)
Date: Thu, 15 Oct 2009 19:01:43 +0100
Subject: [Numpy-discussion] extension questions: f2py and cython
In-Reply-To: <48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com>
References: <2d5132a50910150337t175c2ab4y4a7095fea08c0d15@mail.gmail.com>
	<48C01AE7354EC240A26F19CEB995E943033AF2F8@CHMAILMBX01.novachem.com>
Message-ID: <2d5132a50910151101t58e27093j1d775cd68581eb4a@mail.gmail.com>

Hi,

Thanks

On Thu, Oct 15, 2009 at 4:10 PM, Andrew Hawryluk <HAWRYLA at novachem.com> wrote:
>> But I have some questions. It seems to work as is, but I don't set c
> to
>> zeros anywhere. Can I assume arrays created by f2py are zero?
>
> As I understand it, uninitialized variables in Fortran are
> compiler/system-dependent. Some compilers initialize values to zero,
> many leave the previous contents of the memory in place. It is safest to
> never use the value of an uninitialized variable.

But in this case I understood it was initialised or created by the
f2py wrapper first and then passed to the fortran subroutine - so I
wondered how f2py creates it (I think I traced it to
array_from_pyobj() but I couldn't really understand what it was doing
or whether it would always be zeros). I guess as you say though it is
always safer to initialize explicitly

Cheers

Robin


From Ashwin.Kashyap at thomson.net  Thu Oct 15 14:04:58 2009
From: Ashwin.Kashyap at thomson.net (Kashyap Ashwin)
Date: Thu, 15 Oct 2009 14:04:58 -0400
Subject: [Numpy-discussion] MKL with 64bit crashes
In-Reply-To: <68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>
References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>
	<68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>
Message-ID: <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com>

Matthieu,
I am not sure what exactly you mean. I did pass in "static" to the
link-adviser and this is the new setup.cfg
mkl_libs = mkl_solver_ilp64, mkl_intel_ilp64, mkl_gnu_thread, mkl_core.

On import, Numpy complains as usual about the mkl_def and mkl_mc. If I
append these libs, then the crashes happen on test() (complains first
about the DGES* functions).
Also, I have made sure that g77 is not installed and only gfortran is
available. 

I also put in the LD_LIBRARY_PATH=/opt/intel/mkl/10.2.2.025/lib/em64t.

Thanks,
Ashwin

Your message:
Hi,

You need to use the static libraries, are you sure you currently do?

Matthieu

2009/10/15 Kashyap Ashwin <Ashwin.Kashyap at thomson.net>:
> I followed the advice given by the Intel MKL link adviser
>
(http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)
>
> This is my new site.cfg:
> mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core
>
> I also exported CFLAGS="-fopenmp" and built with the
--fcompiler=gnu95.
> Now I get these errors on import:
> Running unit tests for numpy
> NumPy version 1.3.0
> NumPy is installed in
> /opt/Personalization/lib/python2.5/site-packages/numpy
> Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4
> (Ubuntu 4.2.4-1ubuntu3)]
> nose version 0.11.0
>
> *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined
symbol:
> mkl_dft_commit_descriptor_s_c2c_md_omp
> *** libmkl_def.so *** failed with error : libmkl_def.so: undefined
> symbol: mkl_dft_commit_descriptor_s_c2c_md_omp
> MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so
>
>
> Any hints?

>
> Thanks,
> Ashwin
>
>
>
> Your message:
>
> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin
> <Ashwin.Kashyap at thomson.net> wrote:
>> Hello,
>> I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel)
> with
>> MKL.
>> This is my site.cfg:
>> [mkl]
>> # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
>> library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
>> include_dirs = /opt/intel/mkl/10.2.2.025/include
>> lapack_libs = mkl_lapack
>> #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
>> iomp5, mkl_vml_mc3
>> mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
>> mkl_mc3, mkl_def
>
> The order does not look right - I don't know the exact order (each
> version of the MKL changes the libraries), but you should respect the
> order as given in the MKL manual.
>
>> MKL ERROR: Parameter 4 was incorrect on entry to DGESV
>
> This suggests an error when passing argument to MKL - I believe your
> version of MKL uses the gfortran ABI by default, and hardy uses g77 as
> the default fortran compiler. You should either recompile everything
> with gfortran, or regenerate the MKL interface libraries with g77 (as
> indicated in the manual).
>
> cheers,
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


> -----Original Message-----
> From: Kashyap Ashwin
> Sent: Thursday, October 15, 2009 11:01 AM
> To: 'numpy-discussion at scipy.org'
> Subject: RE: MKL with 64bit crashes
> 
> I followed the advice given by the Intel MKL link adviser
(http://software.intel.com/en-
> us/articles/intel-mkl-link-line-advisor/)
> 
> This is my new site.cfg:
> mkl_libs = mkl_intel_ilp64, mkl_gnu_thread, mkl_core
> 
> I also exported CFLAGS="-fopenmp" and built with the
--fcompiler=gnu95. Now I get these errors on
> import:
> Running unit tests for numpy
> NumPy version 1.3.0
> NumPy is installed in
/opt/Personalization/lib/python2.5/site-packages/numpy
> Python version 2.5.2 (r252:60911, Jul 22 2009, 15:33:10) [GCC 4.2.4
(Ubuntu 4.2.4-1ubuntu3)]
> nose version 0.11.0
> 
> *** libmkl_mc.so *** failed with error : libmkl_mc.so: undefined
symbol:
> mkl_dft_commit_descriptor_s_c2c_md_omp
> *** libmkl_def.so *** failed with error : libmkl_def.so: undefined
symbol:
> mkl_dft_commit_descriptor_s_c2c_md_omp
> MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so
> 
> 
> Any hints?
> 
> Thanks,
> Ashwin
> 
> 
> 
> Your message:
> 
> On Thu, Oct 15, 2009 at 8:04 AM, Kashyap Ashwin
> <Ashwin.Kashyap at thomson.net> wrote:
> > Hello,
> > I compiled numpy-1.3.0 from sources on Ubuntu-hardy, x86-64 (Intel)
with
> > MKL.
> > This is my site.cfg:
> > [mkl]
> > # library_dirs = /opt/intel/mkl/10.0.1.014/lib/32/
> > library_dirs = /opt/intel/mkl/10.2.2.025/lib/em64t
> > include_dirs = /opt/intel/mkl/10.2.2.025/include
> > lapack_libs = mkl_lapack
> > #mkl_libs = mkl_core, guide, mkl_gf_ilp64, mkl_def, mkl_gnu_thread,
> > iomp5, mkl_vml_mc3
> > mkl_libs = guide, mkl_core, mkl_gnu_thread, iomp5, mkl_gf_ilp64,
> > mkl_mc3, mkl_def
> 
> The order does not look right - I don't know the exact order (each
> version of the MKL changes the libraries), but you should respect the
> order as given in the MKL manual.
> 
> > MKL ERROR: Parameter 4 was incorrect on entry to DGESV
> 
> This suggests an error when passing argument to MKL - I believe your
> version of MKL uses the gfortran ABI by default, and hardy uses g77 as
> the default fortran compiler. You should either recompile everything
> with gfortran, or regenerate the MKL interface libraries with g77 (as
> indicated in the manual).
> 
> cheers,
> 
> David


From pgmdevlist at gmail.com  Thu Oct 15 19:08:23 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Thu, 15 Oct 2009 19:08:23 -0400
Subject: [Numpy-discussion] genfromtxt documentation : review needed
Message-ID: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com>

All,
Here's a first draft for the documentation of np.genfromtxt.
It took me longer than I thought, but that way I uncovered and fix  
some bugs.
Please send me your comments/reviews/etc
I count especially on our documentation specialist to let me know  
where to put it.
Thx in advance
P.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: doc_genfromtxt.rst
Type: application/octet-stream
Size: 20428 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091015/6ddde06e/attachment.obj>

From numpy at mspacek.mm.st  Thu Oct 15 20:44:42 2009
From: numpy at mspacek.mm.st (Martin Spacek)
Date: Fri, 16 Oct 2009 00:44:42 +0000 (UTC)
Subject: [Numpy-discussion] intersect1d for N input arrays
Message-ID: <loom.20091016T014923-73@post.gmane.org>

I have a list of many arrays (in my case each is unique, ie has no repeated
elements), and I'd like to extract the intersection of all of them, all in one
go. I'm running numpy 1.3.0, but looking at today's rev of numpy.lib.arraysetops
(http://svn.scipy.org/svn/numpy/trunk/numpy/lib/arraysetops.py), I see
intersect1d has changed. Just a note: the example used in the docstring implies
that the two arrays need to be the same length, which isn't the case. Maybe it
would be good to change the example to two arrays of different lengths.

intersect1d takes exactly 2 arrays. I've modified it a little to take the
intersection of any number of 1D arrays (of any length), in a list or tuple. It
seems to work fine, but could use more testing. Here it is with most of the docs
stripped. Feel free to use it, although I suppose for symmetry, many of the
other functions in arraysetops.py would also have to be modified to work with N
arrays:


def intersect1d(arrays, assume_unique=False):
    """Find the intersection of any number of 1D arrays.
    Return the sorted, unique values that are in all of the input arrays.
    Adapted from numpy.lib.arraysetops.intersect1d"""
    N = len(arrays)
    arrays = list(arrays) # allow assignment
    if not assume_unique:
        for i, arr in enumerate(arrays):
            arrays[i] = np.unique(arr)
    aux = np.concatenate(arrays) # one long 1D array
    aux.sort() # sorted
    shift = N-1
    return aux[aux[shift:] == aux[:-shift]]


From david at ar.media.kyoto-u.ac.jp  Thu Oct 15 23:25:51 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 16 Oct 2009 12:25:51 +0900
Subject: [Numpy-discussion] MKL with 64bit crashes
In-Reply-To: <68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com>
References: <68DF70B3485CC648835655773E92314F9208C2@prinsmail02.am.thmulti.com>	<68DF70B3485CC648835655773E92314F9208C3@prinsmail02.am.thmulti.com>
	<68DF70B3485CC648835655773E92314F9208C5@prinsmail02.am.thmulti.com>
Message-ID: <4AD7E7BF.2010509@ar.media.kyoto-u.ac.jp>

Kashyap Ashwin wrote:
> Matthieu,
> I am not sure what exactly you mean. I did pass in "static" to the
> link-adviser and this is the new setup.cfg
> mkl_libs = mkl_solver_ilp64, mkl_intel_ilp64, mkl_gnu_thread, mkl_core.
>
> On import, Numpy complains as usual about the mkl_def and mkl_mc. If I
> append these libs, then the crashes happen on test() (complains first
> about the DGES* functions).
>   

I remember now that I had the same problem recently - it is a
fundamental incompatibility between MKL and Python way of loading shared
libraries through dlopen. AFAIK, there is no solution to this problem,
except for using the static libraries.

David


From dwf at cs.toronto.edu  Fri Oct 16 00:14:01 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 16 Oct 2009 00:14:01 -0400
Subject: [Numpy-discussion] Google Groups archive?
Message-ID: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>

Does anyone know what happened to the Google Groups archive of this  
list? when I try to access it, I see:

Cannot find numpy-discussion
The group named numpy-discussion has been removed because it violated  
Google's Terms Of Service.

This seems exceedingly odd. Does anyone know _how_ we violated the ToS?

David


From pgmdevlist at gmail.com  Fri Oct 16 00:20:48 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Fri, 16 Oct 2009 00:20:48 -0400
Subject: [Numpy-discussion] Google Groups archive?
In-Reply-To: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>
References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>
Message-ID: <B1C9FF0E-DF33-4E1A-827A-F46625FC662B@gmail.com>


On Oct 16, 2009, at 12:14 AM, David Warde-Farley wrote:

> Does anyone know what happened to the Google Groups archive of this
> list? when I try to access it, I see:
>
> Cannot find numpy-discussion
> The group named numpy-discussion has been removed because it violated
> Google's Terms Of Service.
>
> This seems exceedingly odd. Does anyone know _how_ we violated the  
> ToS?

Hit by spam-bots, most likely. Was it actively used, actually ?


From josef.pktd at gmail.com  Fri Oct 16 00:23:34 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 16 Oct 2009 00:23:34 -0400
Subject: [Numpy-discussion] Google Groups archive?
In-Reply-To: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>
References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>
Message-ID: <1cd32cbb0910152123ia15db57y277ec4d0442a2ee6@mail.gmail.com>

On Fri, Oct 16, 2009 at 12:14 AM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> Does anyone know what happened to the Google Groups archive of this
> list? when I try to access it, I see:
>
> Cannot find numpy-discussion
> The group named numpy-discussion has been removed because it violated
> Google's Terms Of Service.

same question on october 5th

>
> This seems exceedingly odd. Does anyone know _how_ we violated the ToS?

adult material on front page

Who's the owner? Creating a new group would require a different name,
since the old name is blocked, I tried.

Josef

>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From pav+sp at iki.fi  Fri Oct 16 03:48:19 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 16 Oct 2009 07:48:19 +0000 (UTC)
Subject: [Numpy-discussion] Google Groups archive?
References: <96B42BFB-EDD3-42A2-A68C-F4C833349B3C@cs.toronto.edu>
	<1cd32cbb0910152123ia15db57y277ec4d0442a2ee6@mail.gmail.com>
Message-ID: <hb98g3$6rb$1@ger.gmane.org>

Fri, 16 Oct 2009 00:23:34 -0400, josef.pktd wrote:
[clip]
>> This seems exceedingly odd. Does anyone know _how_ we violated the ToS?
> 
> adult material on front page
> 
> Who's the owner? Creating a new group would require a different name,
> since the old name is blocked, I tried.

Maybe it's best just not to use Google Groups. IMO, gmane.org offers an 
equivalent if not superior service.

-- 
Pauli Virtanen


From cimrman3 at ntc.zcu.cz  Fri Oct 16 03:56:28 2009
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Fri, 16 Oct 2009 09:56:28 +0200
Subject: [Numpy-discussion] intersect1d for N input arrays
In-Reply-To: <loom.20091016T014923-73@post.gmane.org>
References: <loom.20091016T014923-73@post.gmane.org>
Message-ID: <4AD8272C.601@ntc.zcu.cz>

Hi Martin,

thanks for your ideas and contribution.

A few notes: I would let intersect1d as it is, and created a new function with another name for that (any proposals?). Considering that most of arraysetops functions are based on sort, and in particular here that an intersection array is (usually) smaller than each of the input arrays, it might be better just to call intersect1d repeatedly for each array and the result of the previous call, accumulating the intersection.

r.

Martin Spacek wrote:
> I have a list of many arrays (in my case each is unique, ie has no repeated
> elements), and I'd like to extract the intersection of all of them, all in one
> go. I'm running numpy 1.3.0, but looking at today's rev of numpy.lib.arraysetops
> (http://svn.scipy.org/svn/numpy/trunk/numpy/lib/arraysetops.py), I see
> intersect1d has changed. Just a note: the example used in the docstring implies
> that the two arrays need to be the same length, which isn't the case. Maybe it
> would be good to change the example to two arrays of different lengths.
> 
> intersect1d takes exactly 2 arrays. I've modified it a little to take the
> intersection of any number of 1D arrays (of any length), in a list or tuple. It
> seems to work fine, but could use more testing. Here it is with most of the docs
> stripped. Feel free to use it, although I suppose for symmetry, many of the
> other functions in arraysetops.py would also have to be modified to work with N
> arrays:
> 
> 
> def intersect1d(arrays, assume_unique=False):
>     """Find the intersection of any number of 1D arrays.
>     Return the sorted, unique values that are in all of the input arrays.
>     Adapted from numpy.lib.arraysetops.intersect1d"""
>     N = len(arrays)
>     arrays = list(arrays) # allow assignment
>     if not assume_unique:
>         for i, arr in enumerate(arrays):
>             arrays[i] = np.unique(arr)
>     aux = np.concatenate(arrays) # one long 1D array
>     aux.sort() # sorted
>     shift = N-1
>     return aux[aux[shift:] == aux[:-shift]]
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 


From faltet at pytables.org  Fri Oct 16 06:07:10 2009
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 16 Oct 2009 12:07:10 +0200
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>
	<e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>
Message-ID: <200910161207.11024.faltet@pytables.org>

A Thursday 15 October 2009 19:00:04 Charles R Harris escrigu?:
> > So, how to fix this?
> >
> > One obvious workaround is for users to pass "align=True" to the dtype
> > constructor.  This works if the dtype descriptor is a dictionary or
> > comma-separated string.  Is there a reason it couldn't be made to work
> > with the string-of-tuples form that I'm missing?  It would be marginally
> > more convenient from my application, but that's just a finesse issue.
> >
> > However, perhaps we should try to fix the underlying alignment
> > problems?  Unfortunately, it's not clear to me how to resolve them
> > without at least some performance penalty.  You either do an alignment
> > check of the pointer, and then memcpy if unaligned, or just always use
> > memcpy.  Not sure which is faster, as memcpy may have a fast path
> > already. These are object arrays anyway, so there's plenty of overhead
> > already, and I don't think this would affect regular numerical arrays.

The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
performance has improved quite a lot in latest gcc (it has been quite good in 
Win versions since many years ago), but working with data in-place (i.e. 
avoiding a memory copy) is always faster (and most specially for large arrays 
that don't fit in cache processors).

My own experiments says that, with an Intel Core2 processor the typical speed-
ups for avoiding memcpy() are 2x.  And I've read somewhere that both AMD and 
Intel are trying to make unaligned operations to go even faster in next 
architectures (the goal is that there should be no speed difference in 
accessing aligned or unaligned data).

> I believe the memcpy approach is used for other unaligned parts of void
> types. There is an inherent performance penalty there, but I don't see how
> it can be avoided when using what are essentially packed structures. As to
> memcpy, it's performance seems to depend on the compiler/compiler version,
> old versions of gcc had *horrible* implementations of memcpy. I believe the
> situation has since improved. However, I'm not sure we should be coding to
> compiler issues unless it is unavoidable or the gain is huge.

IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
is using this small snippet:

from cpuinfo import cpu
if cpu.is_AMD() or cpu.is_Intel():
    is_cpu_amd_intel = True
else:
    is_cpu_amd_intel = False

for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
calls for the unaligned arrays.

The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
which is distributed under NumPy, so it should not be too difficult to take 
advantage of this for avoiding unnecessary copies in this scenario.

-- 
Francesc Alted


From pav+sp at iki.fi  Fri Oct 16 07:53:45 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 16 Oct 2009 11:53:45 +0000 (UTC)
Subject: [Numpy-discussion] object array alignment issues
References: <4AD75061.2020908@stsci.edu>
	<e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>
	<200910161207.11024.faltet@pytables.org>
Message-ID: <hb9ms9$snf$1@ger.gmane.org>

Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
[clip]
> IMO, NumPy can be improved for unaligned data handling.  For example,
> Numexpr is using this small snippet:
> 
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
>     is_cpu_amd_intel = True
> else:
>     is_cpu_amd_intel = False
> 
> for detecting AMD/Intel architectures and allowing the code to avoid
> memcpy() calls for the unaligned arrays.
> 
> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> Peterson, which is distributed under NumPy, so it should not be too
> difficult to take advantage of this for avoiding unnecessary copies in
> this scenario.

I suppose this kind of check is easiest to do at compile-time, and 
defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for 
those architectures for which they are not necessary.

-- 
Pauli Virtanen


From cournape at gmail.com  Fri Oct 16 08:02:03 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 16 Oct 2009 21:02:03 +0900
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <hb9ms9$snf$1@ger.gmane.org>
References: <4AD75061.2020908@stsci.edu>
	<e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>
	<200910161207.11024.faltet@pytables.org> <hb9ms9$snf$1@ger.gmane.org>
Message-ID: <5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com>

On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen <pav+sp at iki.fi> wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>> IMO, NumPy can be improved for unaligned data handling. ?For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>> ? ? is_cpu_amd_intel = True
>> else:
>> ? ? is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.

I wonder whether we could switch at runtime (import time) - it could
be useful for testing.

That being said, I agree that the cpu checks should be done at compile
time - we had quite a few problems with cpuinfo in the past with new
cpu/unhandled cpu, I think a compilation-based method is much more
robust (and simpler) here. There are things where C is just much
easier than python :)

David


From faltet at pytables.org  Fri Oct 16 08:20:05 2009
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 16 Oct 2009 14:20:05 +0200
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu> <hb9ms9$snf$1@ger.gmane.org>
	<5b8d13220910160502x1827f6cdi777e22badcfa975f@mail.gmail.com>
Message-ID: <200910161420.05552.faltet@pytables.org>

A Friday 16 October 2009 14:02:03 David Cournapeau escrigu?:
> On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen <pav+sp at iki.fi> wrote:
> > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> > [clip]
> >
> >> IMO, NumPy can be improved for unaligned data handling. ?For example,
> >> Numexpr is using this small snippet:
> >>
> >> from cpuinfo import cpu
> >> if cpu.is_AMD() or cpu.is_Intel():
> >> ? ? is_cpu_amd_intel = True
> >> else:
> >> ? ? is_cpu_amd_intel = False
> >>
> >> for detecting AMD/Intel architectures and allowing the code to avoid
> >> memcpy() calls for the unaligned arrays.
> >>
> >> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> >> Peterson, which is distributed under NumPy, so it should not be too
> >> difficult to take advantage of this for avoiding unnecessary copies in
> >> this scenario.
> >
> > I suppose this kind of check is easiest to do at compile-time, and
> > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> > those architectures for which they are not necessary.
>
> I wonder whether we could switch at runtime (import time) - it could
> be useful for testing.
>
> That being said, I agree that the cpu checks should be done at compile
> time - we had quite a few problems with cpuinfo in the past with new
> cpu/unhandled cpu, I think a compilation-based method is much more
> robust (and simpler) here. There are things where C is just much
> easier than python :)

Agreed.  I'm relaying in ``cpuinfo.py`` just because it provides what I need 
in an easy way.  BTW, the detection of AMD/Intel (just the vendor) processors 
seems to work flawlessly for the platforms that I've checked (but I suppose 
that you are talking about other characteristics, like SSE version, etc).

-- 
Francesc Alted


From jsseabold at gmail.com  Fri Oct 16 08:29:29 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Fri, 16 Oct 2009 08:29:29 -0400
Subject: [Numpy-discussion] genfromtxt documentation : review needed
In-Reply-To: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com>
References: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com>
Message-ID: <c048da1c0910160529p6d4991ebs41fc81897fa31ddf@mail.gmail.com>

On Thu, Oct 15, 2009 at 7:08 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> All,
> Here's a first draft for the documentation of np.genfromtxt.
> It took me longer than I thought, but that way I uncovered and fix some
> bugs.
> Please send me your comments/reviews/etc
> I count especially on our documentation specialist to let me know where to
> put it.
> Thx in advance
> P.
>

Great work!  I am especially glad to see the better documentation on
missing values, as I didn't fully understand how to do this.  A few
small comments and a small attached diff with a few nitpicking
grammatical changes and some of what's proposed below.

On the actual function, I am wondering if white space shouldn't be
stripped by default, or at least if we have fixed width columns.  I
ran into a problem recently, where I was reading in a lot of strings
that were in a fixed width format and my 4 gb of memory were soon
consumed.  I also can't think of a case where I'd ever care about
leading or trailing white space.

I always get confused going back and forth from zero-indexed to non
zero-indexed, which might not be a good enough reason to worry about
this, but it might be helpful to explicitly say that skip_header is
not zero-indexed, though it doesn't raise an exception if you try.

data = "junk1,junk2,junk3\n1.2,1.5,1"
from StringIO import StringIO
import numpy as np
d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=0)

In [5]: d
Out[5]:
array([[ NaN,  NaN,  NaN],
       [ 1.2,  1.5,  1. ]])

d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=1)

In [7]: d
Out[7]: array([ 1.2,  1.5,  1. ])

d = np.genfromtxt(StringIO(data), delimiter=",", skip_header=-1)

In [9]: d
Out[9]:
array([[ NaN,  NaN,  NaN],
       [ 1.2,  1.5,  1. ]])

Also, I don't know if this is even something that should be worried
about in the io, but recarray names also can't start with a number to
preserve attribute names look up, but I thought I would bring it up
anyway, since I ran across this recently.

data = "1var1,var2,var3\n1.2,1.5,1"
d = np.recfromtxt(StringIO(data), dtype=float, delimiter=",", names=True)

In [36]: d
Out[36]:
rec.array((1.2, 1.5, 1.0),
      dtype=[('1var1', '<f8'), ('var2', '<f8'), ('var3', '<f8')])

In [37]: d.1var1
------------------------------------------------------------
   File "<ipython console>", line 1
     d.1var1
       ^
SyntaxError: invalid syntax


In [38]: d.var2
Out[38]: array(1.5)

In [39]: d['1var1']
Out[39]: array(1.2)

I didn't know about being able to specify the dtype as a dict.  That
might be handy.  Is there any way to cross-link to the dtype
documentation in rst?  I can't remember.  That might be helpful to
have.

I never did figure out what the loose keyword did, but I guess it's
not that important to me if I've never needed it.

Cheers,

Skipper
-------------- next part --------------
57c57
< By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the line is splitted along white-spaces (including tabs) and that consecutive white-spaces are considered as a single white-space.
---
> By default, :func:`genfromtxt` assumes ``delimiter=None``, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.
76c76
< By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading or tailing white spaces.
---
> By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading or trailing white spaces.
129c129
< The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed.
---
> The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed.  Note that this is not zero-indexed so that the first line is 1.
147c147
< Acceptable values for the argument are a single integer or a sequence of integers corresponding to the indices of the columns to import.
---
> An acceptable values for the argument is a single integer or a sequence of integers corresponding to the indices of the columns to import.
195c195
< This behavior may be changed by modifying the default mapper of the :class:`~numpi.lib._iotools.StringConverter` class
---
> This behavior may be changed by modifying the default mapper of the :class:`~numpy.lib._iotools.StringConverter` class
343c343
< .. However, user-defined converters may rapidly become cumbersome to manage when
---
> .. However, user-defined converters may rapidly become cumbersome to manage.
389c389
<       Each key can be a column index or a column name, and the corresponding value should eb a single object.
---
>       Each key can be a column index or a column name, and the corresponding value should be a single object.

From mdroe at stsci.edu  Fri Oct 16 08:31:08 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Fri, 16 Oct 2009 08:31:08 -0400
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <hb9ms9$snf$1@ger.gmane.org>
References: <4AD75061.2020908@stsci.edu>	<e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>	<200910161207.11024.faltet@pytables.org>
	<hb9ms9$snf$1@ger.gmane.org>
Message-ID: <4AD8678C.2030502@stsci.edu>

On 10/16/2009 07:53 AM, Pauli Virtanen wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>    
>> IMO, NumPy can be improved for unaligned data handling.  For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>>      is_cpu_amd_intel = True
>> else:
>>      is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>>      
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.
>
>    
That's close to the solution I'm arriving at.

I'm thinking of adding a macro "DEREF_UNALIGNED_PYOBJECT_PTR" which 
would do the right thing depending on the type of architecture.  There 
should be no impact on architectures that handle unaligned pointers, and 
slightly slower (but correct) performance on other architectures.

Mike


From sturla at molden.no  Fri Oct 16 12:05:05 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 16 Oct 2009 18:05:05 +0200
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <200910161207.11024.faltet@pytables.org>
References: <4AD75061.2020908@stsci.edu>	<e06186140910151000m38349876sbe3c4ee7295354e@mail.gmail.com>
	<200910161207.11024.faltet@pytables.org>
Message-ID: <4AD899B1.7010806@molden.no>

Francesc Alted skrev:
> The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
> performance has improved quite a lot in latest gcc (it has been quite good in 
> Win versions since many years ago), but working with data in-place (i.e. 
> avoiding a memory copy) is always faster (and most specially for large arrays 
> that don't fit in cache processors).
>
> My own experiments says that, with an Intel Core2 processor the typical speed-
> ups for avoiding memcpy() are 2x. 
If the underlying array is strided, I have seen the opposite as well. 
"Copy-in copy-out" is a common optimization used by Fortran compilers 
when working with strided arrays. The catch is that the work array has 
to fit in cache for this to make any sence. Anyhow, you cannot use 
memcpy for this kind of optimization - it assumes both buffers are 
contiguous. But working with arrays directly instead of copies is not 
always the faster option.

S.M.


>  And I've read somewhere that both AMD and 
> Intel are trying to make unaligned operations to go even faster in next 
> architectures (the goal is that there should be no speed difference in 
> accessing aligned or unaligned data).
>
>   
>> I believe the memcpy approach is used for other unaligned parts of void
>> types. There is an inherent performance penalty there, but I don't see how
>> it can be avoided when using what are essentially packed structures. As to
>> memcpy, it's performance seems to depend on the compiler/compiler version,
>> old versions of gcc had *horrible* implementations of memcpy. I believe the
>> situation has since improved. However, I'm not sure we should be coding to
>> compiler issues unless it is unavoidable or the gain is huge.
>>     
>
> IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
> is using this small snippet:
>
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
>     is_cpu_amd_intel = True
> else:
>     is_cpu_amd_intel = False
>
> for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
> calls for the unaligned arrays.
>
> The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
> which is distributed under NumPy, so it should not be too difficult to take 
> advantage of this for avoiding unnecessary copies in this scenario.
>
>   


From pgmdevlist at gmail.com  Fri Oct 16 17:36:48 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Fri, 16 Oct 2009 17:36:48 -0400
Subject: [Numpy-discussion] genfromtxt documentation : review needed
In-Reply-To: <c048da1c0910160529p6d4991ebs41fc81897fa31ddf@mail.gmail.com>
References: <31298ED8-7170-41B7-958E-F6E867DAA317@gmail.com>
	<c048da1c0910160529p6d4991ebs41fc81897fa31ddf@mail.gmail.com>
Message-ID: <744355DE-54EE-4AB3-91C4-6A10890A5637@gmail.com>


On Oct 16, 2009, at 8:29 AM, Skipper Seabold wrote:

> Great work!  I am especially glad to see the better documentation on
> missing values, as I didn't fully understand how to do this.  A few
> small comments and a small attached diff with a few nitpicking
> grammatical changes and some of what's proposed below.

Thanks. I took your modifications into account.

> On the actual function, I am wondering if white space shouldn't be
> stripped by default, or at least if we have fixed width columns.

Well, I'd do the opposite: `autostrip=False` if we work with fixed- 
length delimiters, `autostrip=True` if we work with character  
delimiters.

>  I also can't think of a case where I'd ever care about
> leading or trailing white space.

having `autostrip=False` when dealing with spaces as delimiters is a  
feature that was explicitly requested a while ago, when I started  
working on the function.

> I always get confused going back and forth from zero-indexed to non
> zero-indexed, which might not be a good enough reason to worry about
> this, but it might be helpful to explicitly say that skip_header is
> not zero-indexed, though it doesn't raise an exception if you try.

Took your comment into account, but I did state that `skip_header`  
expects a number of lines, not a line index.

> Also, I don't know if this is even something that should be worried
> about in the io, but recarray names also can't start with a number to
> preserve attribute names look up, but I thought I would bring it up
> anyway, since I ran across this recently.

Good point. I'll patch NameValidator for that.

> I didn't know about being able to specify the dtype as a dict.  That
> might be handy.  Is there any way to cross-link to the dtype
> documentation in rst?  I can't remember.  That might be helpful to
> have.

Hence my call to the doc specialists.


> I never did figure out what the loose keyword did, but I guess it's
> not that important to me if I've never needed it.

Oh yes, this one. Well, a StringConverter can either returns the  
default if it can't convert the string (loose=True) or raise an  
exception if it can't convert the string and the string is not part of  
the missing_values list of this StringConverter (loose=False). I need  
to add a couple of examples here.


From numpy at mspacek.mm.st  Fri Oct 16 18:01:54 2009
From: numpy at mspacek.mm.st (Martin Spacek)
Date: Fri, 16 Oct 2009 22:01:54 +0000 (UTC)
Subject: [Numpy-discussion] intersect1d for N input arrays
References: <loom.20091016T014923-73@post.gmane.org> <4AD8272C.601@ntc.zcu.cz>
Message-ID: <loom.20091017T000130-307@post.gmane.org>

Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:

> 
> Hi Martin,
> 
> thanks for your ideas and contribution.
> 
> A few notes: I would let intersect1d as it is, and created a new function with
another name for that (any
> proposals?). Considering that most of arraysetops functions are based on sort,
and in particular here
> that an intersection array is (usually) smaller than each of the input arrays,
it might be better just to
> call intersect1d repeatedly for each array and the result of the previous
call, accumulating the intersection.
> 
> r.

Hi Robert,

Yeah, I suppose sorting will get progressively slower the more input arrays
there are, and the longer each one gets. There's probably some crossover point
where the cost of doing a Python loop over the input arrays to accumulate the
intersection is less than the cost of doing a big sort. That would take some
benchmarking...

I forgot to handle the cases where the number of arrays passed is 0 or 1. Here's
an updated version:


def intersect1d(arrays, assume_unique=False):
    """Find the intersection of any number of 1D arrays.
    Return the sorted, unique values that are in all of the input arrays.
    Adapted from numpy.lib.arraysetops.intersect1d"""
    N = len(arrays)
    if N == 0:
        return np.asarray(arrays)
    arrays = list(arrays) # allow assignment
    if not assume_unique:
        for i, arr in enumerate(arrays):
            arrays[i] = np.unique(arr)
    aux = np.concatenate(arrays) # one long 1D array
    aux.sort() # sorted
    if N == 1:
        return aux
    shift = N-1
    return aux[aux[shift:] == aux[:-shift]]


From oliphant at enthought.com  Fri Oct 16 23:35:13 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Fri, 16 Oct 2009 22:35:13 -0500
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4AD75061.2020908@stsci.edu>
References: <4AD75061.2020908@stsci.edu>
Message-ID: <F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>


On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:

> I recently committed a regression test and bugfix for object  
> pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>        a2 = np.zeros((10,), 'S10')
>        # This copying would segfault
>        a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.  The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object  
> pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.

Thanks for this bug report.      It would be very helpful if you could  
provide the line number where the code is giving a bus error and  
explain why you think the code in question does not handle misaligned  
data (it still seems like it should to me --- but perhaps I must be  
missing something --- I don't have a Solaris box to test on).    
Perhaps, the real problem is elsewhere (such as other places where the  
mistake of forgetting about striding needing to be aligned also before  
pursuing the fast alignment path that you pointed out in another place  
of code).

This was the thinking for why the code (that I think is in question)  
should handle mis-aligned data:

1) pointers that are not aligned to the correct size need to be copied  
to an aligned memory area before being de-referenced.
2) static variables defined in a function will be aligned by the C  
compiler.

So, what the code in refcnt.c does is to copy the value in the NumPy  
data-area (i.e. pointed to by it->dataptr) to another memory location  
(the stack variable temp), dereference it and then increment it's  
reference count.

196:  temp = (PyObject **)it->dataptr;
197:  Py_XINCREF(*temp);

I'm puzzled why this should fail.    The stack trace showing where  
this fails would be very useful in figuring out what to fix.


This is all independent of defining a variable to decide whether or  
not to even care about worrying about un-aligned data (which we could  
avoid worrying about on Intel and AMD).

I'm all in favor of such a flag if it would speed up code, but I don't  
see it as the central issue here.

Any more details about the bug you have found would be greatly  
appreciated.

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091016/f94560ce/attachment.html>

From charlesr.harris at gmail.com  Sat Oct 17 00:25:04 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 16 Oct 2009 22:25:04 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
Message-ID: <e06186140910162125j113a3c6dre77d321bbb20d736@mail.gmail.com>

On Fri, Oct 16, 2009 at 9:35 PM, Travis Oliphant <oliphant at enthought.com>wrote:

>
> On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>
> I recently committed a regression test and bugfix for object pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>        a2 = np.zeros((10,), 'S10')
>        # This copying would segfault
>        a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.  The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.
>
>
> Thanks for this bug report.      It would be very helpful if you could
> provide the line number where the code is giving a bus error and explain why
> you think the code in question does not handle misaligned data (it still
> seems like it should to me --- but perhaps I must be missing something --- I
> don't have a Solaris box to test on).   Perhaps, the real problem is
> elsewhere (such as other places where the mistake of forgetting about
> striding needing to be aligned also before pursuing the fast alignment path
> that you pointed out in another place of code).
>
> This was the thinking for why the code (that I think is in question) should
> handle mis-aligned data:
>
> 1) pointers that are not aligned to the correct size need to be copied to
> an aligned memory area before being de-referenced.
> 2) static variables defined in a function will be aligned by the C
> compiler.
>
> So, what the code in refcnt.c does is to copy the value in the NumPy
> data-area (i.e. pointed to by it->dataptr) to another memory location (the
> stack variable temp), dereference it and then increment it's reference
> count.
>
> 196:  temp = (PyObject **)it->dataptr;
> 197:  Py_XINCREF(*temp);
>

Doesn't it->dataptr need to be copied to temp, not just assigned?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091016/30850578/attachment.html>

From faltet at pytables.org  Sat Oct 17 07:20:38 2009
From: faltet at pytables.org (Francesc Alted)
Date: Sat, 17 Oct 2009 13:20:38 +0200
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4AD899B1.7010806@molden.no>
References: <4AD75061.2020908@stsci.edu>
	<200910161207.11024.faltet@pytables.org>
	<4AD899B1.7010806@molden.no>
Message-ID: <200910171320.38833.faltet@pytables.org>

A Friday 16 October 2009 18:05:05 Sturla Molden escrigu?:
> Francesc Alted skrev:
> > The response is clear: avoid memcpy() if you can.  It is true that
> > memcpy() performance has improved quite a lot in latest gcc (it has been
> > quite good in Win versions since many years ago), but working with data
> > in-place (i.e. avoiding a memory copy) is always faster (and most
> > specially for large arrays that don't fit in cache processors).
> >
> > My own experiments says that, with an Intel Core2 processor the typical
> > speed- ups for avoiding memcpy() are 2x.
>
> If the underlying array is strided, I have seen the opposite as well.
> "Copy-in copy-out" is a common optimization used by Fortran compilers
> when working with strided arrays. The catch is that the work array has
> to fit in cache for this to make any sence. Anyhow, you cannot use
> memcpy for this kind of optimization - it assumes both buffers are
> contiguous. But working with arrays directly instead of copies is not
> always the faster option.

Mmh, don't know about Fortran (too many years without programming it), but in 
C it seems evident that performing a memcpy() is always slower, at least with 
modern CPUs (like the Intel Core2 that I'm using now):

In [43]: import numpy as np

In [44]: import numexpr as ne

In [45]: r = np.zeros(1e6, 'i1,i4,f8')

In [46]: f1, f2 = r['f1'], r['f2']

In [47]: f1.flags.aligned, f2.flags.aligned
Out[47]: (False, False)                    

In [48]: timeit f1*f2      # NumPy do copies before carrying out operations
100 loops, best of 3: 14.6 ms per loop

In [49]: timeit ne.evaluate('f1*f2')   # numexpr uses plain unaligned access
100 loops, best of 3: 5.77 ms per loop   # 2.5x faster than numpy

Using strides, the result is similar:

In [50]: f1, f2 = r['f1'][::2], r['f2'][::2]  # check with strides

In [51]: f1.flags.aligned, f2.flags.aligned
Out[51]: (False, False)

In [52]: timeit f1*f2
100 loops, best of 3: 7.52 ms per loop

In [53]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 3.96 ms per loop   # 1.9x faster than numpy

And, when using large strides so that the resulting arrays fit in cache:

In [54]: f1, f2 = r['f1'][::10], r['f2'][::10]  # big stride (fits in cache)

In [55]: timeit f1*f2
100 loops, best of 3: 3.51 ms per loop

In [56]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 2.61 ms per loop  # 34% faster than numpy

Which, although not much, still gives an advantage to the direct approach.
So, at least in C, operating with unaligned data on (modern) AMD/Intel 
processors seems to be fastest (at least in this quick-and-dirty benchmark).   
In fact, performance is very close to contiguous and aligned data:

In [58]: f1, f2 = r['f1'].copy(), r['f2'].copy()   # aligned and contiguous

In [59]: timeit f1*f2
100 loops, best of 3: 5.2 ms per loop

In [60]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 4.74 ms per loop

so 5.77 ms (unaligned data, In [49]) is not very far from 4.74 ms (aligned 
data, In [60]) and close to 'optimal' numpy performance (5.2 ms, In [59]).  
And, as I said before, the plans of AMD/Intel are to reduce this gap still 
further.

For unaligned arrays that fits in cache the results are even more dramatic:

In [61]: r = np.zeros(1e5, 'i1,i4,f8')

In [62]: f1, f2 = r['f1'], r['f2']

In [63]: timeit f1*f2
1000 loops, best of 3: 1.37 ms per loop

In [64]: timeit ne.evaluate('f1*f2')
1000 loops, best of 3: 293 ?s per loop  #  4.7x speedup

but not sure why...

Cheers,

-- 
Francesc Alted


From dsdale24 at gmail.com  Sat Oct 17 08:49:11 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Sat, 17 Oct 2009 08:49:11 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's functions
	generic
Message-ID: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>

numpy's functions, especially ufuncs, have had some ability to support
subclasses through the ndarray.__array_wrap__ method, which provides
masked arrays or quantities (for example) with an opportunity to set
the class and metadata of the output array at the end of an operation.
An example is

q1 = Quantity(1, 'meter')
q2 = Quantity(2, 'meters')
numpy.add(q1, q2) # yields Quantity(3, 'meters')

At SciPy2009 we committed a change to the numpy trunk that provides a
chance to determine the class and some metadata of the output *before*
the ufunc performs its calculation, but after output array has been
established (and its data is still uninitialized). Consider:

q1 = Quantity(1, 'meter')
q2 = Quantity(2, 'J')
numpy.add(q1, q2, q1)
# or equivalently:
# q1 += q2

With only __array_wrap__, the attempt to propagate the units happens
after q1's data was updated in place, too late to raise an error, the
data is now corrupted. __array_prepare__ solves that problem, an
exception can be raised in time.

Now I'd like to suggest one more improvement to numpy to make its
functions more generic. Consider one more example:

q1 = Quantity(1, 'meter')
q2 = Quantity(2, 'feet')
numpy.add(q1, q2)

In this case, I'd like an opportunity to operate on the input arrays
on the way in to the ufunc, to rescale the second input to meters. I
think it would be a hack to try to stuff this capability into
__array_prepare__. One form of this particular example is already
supported in quantities, "q1 + q2", by overriding the __add__ method
to rescale the second input, but there are ufuncs that do not have an
associated special method. So I'd like to look into adding another
check for a special method, perhaps called __input_prepare__. My time
is really tight for the next month, so I'd rather not start if there
are strong objections, but otherwise, I'd like to try to try to get it
in in time for numpy-1.4. (Has a timeline been established?)

I think it will be not too difficult to document this overall scheme:

When calling numpy functions:

1) __input_prepare__ provides an opportunity to operate on the inputs
to yield versions that are compatible with the operation (they should
obviously not be modified in place)

2) the output array is established

3) __array_prepare__ is used to determine the class of the output
array, as well as any metadata that needs to be established before the
operation proceeds

4) the ufunc performs its operations

5) __array_wrap__ provides an opportunity to update the output array
based on the results of the computation

Comments, criticisms? If PEP 3124^ were already a part of the standard
library, that could serve as the basis for generalizing numpy's
functions. But I think the PEP will not be approved in its current
form, and it is unclear when and if the author will revisit the
proposal. The scheme I'm imagining might be sufficient for our
purposes.

Darren

^ http://www.python.org/dev/peps/pep-3124/


From berthe.loic at gmail.com  Sat Oct 17 11:13:57 2009
From: berthe.loic at gmail.com (=?ISO-8859-1?Q?Lo=EFc_BERTHE?=)
Date: Sat, 17 Oct 2009 17:13:57 +0200
Subject: [Numpy-discussion] Subclassing record array
Message-ID: <f042e17a0910170813q28aa726bo9444b5bc7aa8363@mail.gmail.com>

   Hi,

I would like to create my own class of record array to deal with units.

Here is the code I used, inspired from
http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
:


[code]
from numpy import *

class BlocArray(rec.ndarray):
    """ Recarray with units and pretty print """

    fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'}

    def __new__(cls, data, titles=None, units=None):

        # guess format for each column
        data2 = []
        for line in zip(*data) :
            try : data2.append(cast[int](line))         # integers
            except ValueError :
                try : data2.append(cast[float](line))   # reals
                except ValueError :
                    data2.append(cast[str](line))       # characters

        # create the array
        dt = dtype(zip(titres, [line.dtype for line in data2]))
        obj = rec.array(data2, dtype=dt).view(cls)

        # add custom attributes
        obj.units = units or []
        obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n'
        obj._head = "%10s "*len(dt.names) % dt.names +'\n'
        obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in
units) +'\n'

        # Finally, we must return the newly created object:
        return obj

titles =  ['Name', 'Nb', 'Price']
units = ['/', '/', 'Eur']
data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']]
bloc = BlocArray(data, titles=titles, units=units)

In [544]: bloc
Out[544]:
      Name         Nb      Price
       (/)        (/)      (Eur)
      fish          1      12.25
       egg          6       0.85
        TV          1        125
[/code]

It's almost working, but I have some isues :

   - I can't access data through indexing
In [563]: bloc['Price']
/home/loic/Python/numpy/test.py in <genexpr>((r,))
     50
     51     def __repr__(self):
---> 52         return self._head + ''.join(self._fmt % tuple(r) for r in self)

TypeError: 'numpy.float64' object is not iterable

So I think that overloading the __repr__ method is not that easy

   - I can't access data through attributes now :
In [564]: bloc.Nb
AttributeError: 'BlocArray' object has no attribute 'Nb'

   - I can't use 'T' as field in theses array as the T method is
already here as a shortcut for transpose


Have you any hints to make this work ?


-- 
LB


From perfreem at gmail.com  Sat Oct 17 11:36:26 2009
From: perfreem at gmail.com (per freem)
Date: Sat, 17 Oct 2009 11:36:26 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
Message-ID: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>

hi all,

in my code, i use the function 'logsumexp' from scipy.maxentropy a
lot. as far as i can tell, this function has no vectorized version
that works on an m-x-n matrix. i might be doing something wrong here,
but i found that this function can run extremely slowly if used as
follows: i have an array of log probability vectors, such that each
column sums to one. i want to simply iterate over each column and
renormalize it, using exp(col - logsumexp(col)). here is the code that
i used to profile this operation:

from scipy import *
from numpy import *
from numpy.random.mtrand import dirichlet
from scipy.maxentropy import logsumexp
import time

# build an array of probability vectors.  each column represents a
probability vector.
num_vectors = 1000000
log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
# now renormalize each column, using logsumexp
norm_prob_vectors = []
t1 = time.time()
for n in range(num_vectors):
    norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
    norm_prob_vectors.append(norm_p)
t2 = time.time()
norm_prob_vectors = array(norm_prob_vectors)
print "logsumexp renormalization (%d many times) took %s seconds."
%(num_vectors, str(t2-t1))

i found that even with only 100,000 elements, this code takes about 5 seconds:

logsumexp renormalization (100000 many times) took 5.07085394859 seconds.

with 1 million elements, it becomes prohibitively slow:

logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.

is there a way to speed this up? most vectorized operations that work
on matrices in numpy/scipy are incredibly fast and it seems like a
vectorized version of logsumexp should be near instant on this scale.
is there a way to rewrite the above snippet so that it's faster?

thanks very much for your help.


From kwgoodman at gmail.com  Sat Oct 17 11:48:37 2009
From: kwgoodman at gmail.com (Keith Goodman)
Date: Sat, 17 Oct 2009 08:48:37 -0700
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
Message-ID: <f4f93d420910170848m47d85ee4n8805be6614682da@mail.gmail.com>

On Sat, Oct 17, 2009 at 8:36 AM, per freem <perfreem at gmail.com> wrote:
> hi all,
>
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
>
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
>
> # build an array of probability vectors. ?each column represents a
> probability vector.
> num_vectors = 1000000
> log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
> # now renormalize each column, using logsumexp
> norm_prob_vectors = []
> t1 = time.time()
> for n in range(num_vectors):
> ? ?norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
> ? ?norm_prob_vectors.append(norm_p)
> t2 = time.time()
> norm_prob_vectors = array(norm_prob_vectors)
> print "logsumexp renormalization (%d many times) took %s seconds."
> %(num_vectors, str(t2-t1))
>
> i found that even with only 100,000 elements, this code takes about 5 seconds:
>
> logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
>
> with 1 million elements, it becomes prohibitively slow:
>
> logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
>
> is there a way to speed this up? most vectorized operations that work
> on matrices in numpy/scipy are incredibly fast and it seems like a
> vectorized version of logsumexp should be near instant on this scale.
> is there a way to rewrite the above snippet so that it's faster?
>
> thanks very much for your help.

Here's logsumexp from scipy:

def logsumexp(a):
    a = asarray(a)
    a_max = a.max()
    return a_max + log((exp(a-a_max)).sum())

Would this work:

def logsumexp2(a):
    a = asarray(a)
    a_max = a.max(axis=0)
    return a_max + log((exp(a-a_max)).sum(axis=0))

?


From charlesr.harris at gmail.com  Sat Oct 17 13:20:50 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 11:20:50 -0600
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
Message-ID: <e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>

On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com> wrote:

> hi all,
>
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
>
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
>
>
Why aren't you using logaddexp ufunc from numpy?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/1365e2ec/attachment.html>

From josef.pktd at gmail.com  Sat Oct 17 13:54:52 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 17 Oct 2009 13:54:52 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
Message-ID: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>

On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com> wrote:
>>
>> hi all,
>>
>> in my code, i use the function 'logsumexp' from scipy.maxentropy a
>> lot. as far as i can tell, this function has no vectorized version
>> that works on an m-x-n matrix. i might be doing something wrong here,
>> but i found that this function can run extremely slowly if used as
>> follows: i have an array of log probability vectors, such that each
>> column sums to one. i want to simply iterate over each column and
>> renormalize it, using exp(col - logsumexp(col)). here is the code that
>> i used to profile this operation:
>>
>> from scipy import *
>> from numpy import *
>> from numpy.random.mtrand import dirichlet
>> from scipy.maxentropy import logsumexp
>> import time
>>
>
> Why aren't you using logaddexp ufunc from numpy?

Maybe because it is difficult to find, it doesn't have its own docs entry.

e.g. no link to logaddexp in

http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations

I have no idea, why it is different from the other ufuncs in the docs
(and help file).
It shows up correctly in the docs editor, but not in the numpy 1.3 and
online docs.

Josef

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From charlesr.harris at gmail.com  Sat Oct 17 14:02:37 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 12:02:37 -0600
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
Message-ID: <e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>

On Sat, Oct 17, 2009 at 11:54 AM, <josef.pktd at gmail.com> wrote:

> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com> wrote:
> >>
> >> hi all,
> >>
> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> >> lot. as far as i can tell, this function has no vectorized version
> >> that works on an m-x-n matrix. i might be doing something wrong here,
> >> but i found that this function can run extremely slowly if used as
> >> follows: i have an array of log probability vectors, such that each
> >> column sums to one. i want to simply iterate over each column and
> >> renormalize it, using exp(col - logsumexp(col)). here is the code that
> >> i used to profile this operation:
> >>
> >> from scipy import *
> >> from numpy import *
> >> from numpy.random.mtrand import dirichlet
> >> from scipy.maxentropy import logsumexp
> >> import time
> >>
> >
> > Why aren't you using logaddexp ufunc from numpy?
>
> Maybe because it is difficult to find, it doesn't have its own docs entry.
>
> e.g. no link to logaddexp in
>
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations
>
> I have no idea, why it is different from the other ufuncs in the docs
> (and help file).
> It shows up correctly in the docs editor, but not in the numpy 1.3 and
> online docs.
>
>
That's curious, none of the five ufuncs added in 1.3 have links even though
they all have documentation.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/f6722c8a/attachment.html>

From adam.ginsburg at colorado.edu  Sat Oct 17 14:08:18 2009
From: adam.ginsburg at colorado.edu (Adam Ginsburg)
Date: Sat, 17 Oct 2009 12:08:18 -0600
Subject: [Numpy-discussion] double-precision sqrt?
Message-ID: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>

Hi folks,
   I'm trying to write a ray-tracing code for which high precision is
required.  I also "need" to use square roots.  However, math.sqrt and
numpy.sqrt seem to only use single-precision floats.  Is there a
simple way to make sqrt use higher precision?  Alternately, am I
simply being obtuse?

Thanks,
Adam

Example code:
from scipy.optimize.minpack import fsolve
from numpy import sqrt

sqrt(float64(1.034324523462345))
# 1.0170174646791199
f=lambda x: x**2-float64(1.034324523462345)**2

f(sqrt(float64(1.034324523462345)))
# -0.03550269637326231

fsolve(f,1.01)
# 1.0343245234623459

f(fsolve(f,1.01))
# 1.7763568394002505e-15

fsolve(f,1.01) - sqrt(float64(1.034324523462345))
# 0.017307058783226026


From adam.ginsburg at colorado.edu  Sat Oct 17 14:17:29 2009
From: adam.ginsburg at colorado.edu (Adam Ginsburg)
Date: Sat, 17 Oct 2009 12:17:29 -0600
Subject: [Numpy-discussion] double-precision sqrt?
In-Reply-To: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
Message-ID: <fceb3d4a0910171117w36eb2aa6l7e93fdd83f9bc163@mail.gmail.com>

My code is actually wrong.... but I still have the problem I've
identified that sqrt is leading to precision errors.  Sorry about the
earlier mistake.

Adam

On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg
<adam.ginsburg at colorado.edu> wrote:
>
> sqrt(float64(1.034324523462345))
> # 1.0170174646791199
> f=lambda x: x**2-float64(1.034324523462345)**2

should be
f=lambda x: x**2-float64(1.034324523462345)

so the code I sent was not a legitimate test.


From dagss at student.matnat.uio.no  Sat Oct 17 14:25:01 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Sat, 17 Oct 2009 20:25:01 +0200
Subject: [Numpy-discussion] double-precision sqrt?
In-Reply-To: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
Message-ID: <4ADA0BFD.2090603@student.matnat.uio.no>

Adam Ginsburg wrote:
> Hi folks,
>    I'm trying to write a ray-tracing code for which high precision is
> required.  I also "need" to use square roots.  However, math.sqrt and
> numpy.sqrt seem to only use single-precision floats.  Is there a
> simple way to make sqrt use higher precision?  Alternately, am I
> simply being obtuse?

How are you actually using the results of sqrt? When printing the 
results you may not get the full precision...try e.g.

print "%.50f" % np.sqrt(np.float64( 1.034324523462345))

-- 
Dag Sverre


From nadavh at visionsense.com  Sat Oct 17 14:27:02 2009
From: nadavh at visionsense.com (Nadav Horesh)
Date: Sat, 17 Oct 2009 20:27:02 +0200
Subject: [Numpy-discussion] double-precision sqrt?
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
Message-ID: <710F2847B0018641891D9A21602763605AD1C8@ex3.envision.co.il>

The default precision is double unless yue specify otherwise (float32 or long double (float128 or float96))

You can see this from:

f(fsolve(f,1.01))
# 1.7763568394002505e-15

The last line should be:

>>> fsolve(f,1.01) - float64(1.034324523462345)
8.8817841970012523e-16

  Nadav


-----????? ??????-----
???: numpy-discussion-bounces at scipy.org ??? Adam Ginsburg
????: ? 17-???????-09 20:08
??: numpy-discussion at scipy.org
????: [Numpy-discussion] double-precision sqrt?
 
Hi folks,
   I'm trying to write a ray-tracing code for which high precision is
required.  I also "need" to use square roots.  However, math.sqrt and
numpy.sqrt seem to only use single-precision floats.  Is there a
simple way to make sqrt use higher precision?  Alternately, am I
simply being obtuse?

Thanks,
Adam

Example code:
from scipy.optimize.minpack import fsolve
from numpy import sqrt

sqrt(float64(1.034324523462345))
# 1.0170174646791199
f=lambda x: x**2-float64(1.034324523462345)**2

f(sqrt(float64(1.034324523462345)))
# -0.03550269637326231

fsolve(f,1.01)
# 1.0343245234623459

f(fsolve(f,1.01))
# 1.7763568394002505e-15

fsolve(f,1.01) - sqrt(float64(1.034324523462345))
# 0.017307058783226026
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3358 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/9d7fd8e8/attachment.bin>

From charlesr.harris at gmail.com  Sat Oct 17 14:31:14 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 12:31:14 -0600
Subject: [Numpy-discussion] double-precision sqrt?
In-Reply-To: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
Message-ID: <e06186140910171131o1201c480r93cbbea42a0a7d6f@mail.gmail.com>

On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg
<adam.ginsburg at colorado.edu>wrote:

> Hi folks,
>   I'm trying to write a ray-tracing code for which high precision is
> required.  I also "need" to use square roots.  However, math.sqrt and
> numpy.sqrt seem to only use single-precision floats.  Is there a
> simple way to make sqrt use higher precision?  Alternately, am I
> simply being obtuse?
>
> Thanks,
> Adam
>
> Example code:
> from scipy.optimize.minpack import fsolve
> from numpy import sqrt
>
> sqrt(float64(1.034324523462345))
> # 1.0170174646791199
> f=lambda x: x**2-float64(1.034324523462345)**2
>
> f(sqrt(float64(1.034324523462345)))
> # -0.03550269637326231
>
> fsolve(f,1.01)
> # 1.0343245234623459
>
> f(fsolve(f,1.01))
> # 1.7763568394002505e-15
>
> fsolve(f,1.01) - sqrt(float64(1.034324523462345))
> # 0.017307058783226026
> ____


The routines *are* in double precision, but why are you using fsolve?
optimize.zeros.brentq would probably be a better choice. Also, you are using
differences with squared terms that will lose you precision. The last time I
wrote a ray tracing package was 30 years ago, but I think much will depend
on how you represent the curved surfaces. IIRC, there were also a lot of
quadratic equations to solve, and using the correct formula for the root you
want (the usual formula is poor for at least one of the roots) will also
make a difference. In other words, you probably need to take a lot of care
with how you set up the problem in order to minimize roundoff error.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/b2fcc3fa/attachment.html>

From peridot.faceted at gmail.com  Sat Oct 17 14:37:18 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Sat, 17 Oct 2009 14:37:18 -0400
Subject: [Numpy-discussion] double-precision sqrt?
In-Reply-To: <fceb3d4a0910171117w36eb2aa6l7e93fdd83f9bc163@mail.gmail.com>
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
	<fceb3d4a0910171117w36eb2aa6l7e93fdd83f9bc163@mail.gmail.com>
Message-ID: <ce557a360910171137q37c986e4xc7cdea7b4f815796@mail.gmail.com>

2009/10/17 Adam Ginsburg <adam.ginsburg at colorado.edu>:
> My code is actually wrong.... but I still have the problem I've
> identified that sqrt is leading to precision errors. ?Sorry about the
> earlier mistake.

I think you'll find that numpy's sqrt is as good as it gets for double
precision. You can try using numpy's float96 type, which at least on
my machine, does give sa few more significant figures. If you really,
really need accuracy, there are arbitrary-precision packages for
python, which you could try.

But I think you may find that your problem is not solved by higher
precision. Something about ray-tracing just leads it to ferret out
every little limitation of floating-point computation. For example,
you can easily get "surface acne" when shooting shadow rays, where a
ray shot from a surface to a light source accidentally intersects that
same surface for some pixels but not for others. You can try to fix it
by requiring some minimum intersection distance, but then you'll find
lots of weird little quirks where your minimum distance causes
problems. A better solution is one which takes into account the
awkwardness of floating-point; for this particular case one trick is
to mark the object you're shooting rays from as not a candidate for
intersection. (This doesn't work, of course, if the object can cast
shadows on itself...) I have even seen people advocate for using
interval analysis inside ray tracers, to avoid this kind of problem.

Anne


From ndbecker2 at gmail.com  Sat Oct 17 14:40:16 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Sat, 17 Oct 2009 14:40:16 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
Message-ID: <hbd32h$d96$1@ger.gmane.org>

Somewhat offtopic, but is there a generalization of the logsumexp shortcut 
to more than 2 variables?

IIRC, it's this for 2 variables:
log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b)))


From charlesr.harris at gmail.com  Sat Oct 17 14:59:41 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 12:59:41 -0600
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <hbd32h$d96$1@ger.gmane.org>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<hbd32h$d96$1@ger.gmane.org>
Message-ID: <e06186140910171159v6d3a9c80j48d24de2f498cb53@mail.gmail.com>

On Sat, Oct 17, 2009 at 12:40 PM, Neal Becker <ndbecker2 at gmail.com> wrote:

> Somewhat offtopic, but is there a generalization of the logsumexp shortcut
> to more than 2 variables?
>
> IIRC, it's this for 2 variables:
> log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b)))
>
>
logaddexp.reduce will apply it along array rows. The reduce loop could
probably be optimized a bit using the methods that Dale used to optimize the
reduce case for add. Hmm,  the reduce loop would need to be implemented for
the generic loops. The logaddexp case could possibly be optimized further by
writing a specialized loop for the reduce case.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/0687a14c/attachment.html>

From charlesr.harris at gmail.com  Sat Oct 17 15:02:09 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 13:02:09 -0600
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e06186140910171159v6d3a9c80j48d24de2f498cb53@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<hbd32h$d96$1@ger.gmane.org>
	<e06186140910171159v6d3a9c80j48d24de2f498cb53@mail.gmail.com>
Message-ID: <e06186140910171202l2e58d8c9wf0c6e4af3b15a05f@mail.gmail.com>

On Sat, Oct 17, 2009 at 12:59 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Sat, Oct 17, 2009 at 12:40 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>
>> Somewhat offtopic, but is there a generalization of the logsumexp shortcut
>> to more than 2 variables?
>>
>> IIRC, it's this for 2 variables:
>> log (exp (a) + exp (b)) = max (a,b) + log (1 + exp (-abs (a-b)))
>>
>>
> logaddexp.reduce will apply it along array rows. The reduce loop could
> probably be optimized a bit using the methods that Dale used to optimize the
> reduce case for add. Hmm,  the reduce loop would need to be implemented for
> the generic loops. The logaddexp case could possibly be optimized further by
> writing a specialized loop for the reduce case.
>
>
Example:

In [1]: x = arange(9).reshape(3,3)

In [2]: logaddexp.reduce(x, axis=1)
Out[2]: array([ 2.40760596,  5.40760596,  8.40760596])

In [3]: logaddexp.reduce(x, axis=0)
Out[3]: array([ 6.05094576,  7.05094576,  8.05094576])

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/598ec4ac/attachment.html>

From adam.ginsburg at colorado.edu  Sat Oct 17 18:03:08 2009
From: adam.ginsburg at colorado.edu (Adam Ginsburg)
Date: Sat, 17 Oct 2009 16:03:08 -0600
Subject: [Numpy-discussion] double-precision sqrt?
In-Reply-To: <fceb3d4a0910171117w36eb2aa6l7e93fdd83f9bc163@mail.gmail.com>
References: <fceb3d4a0910171108vfdaa90fv10ab58e041eddd37@mail.gmail.com>
	<fceb3d4a0910171117w36eb2aa6l7e93fdd83f9bc163@mail.gmail.com>
Message-ID: <fceb3d4a0910171503j5b216927rc8d313f82fbefe8d@mail.gmail.com>

Hi again, I apologize, the mistake was entirely my own.  Sqrt's do the
right thing....

Adam

On Sat, Oct 17, 2009 at 12:17 PM, Adam Ginsburg
<adam.ginsburg at colorado.edu> wrote:
> My code is actually wrong.... but I still have the problem I've
> identified that sqrt is leading to precision errors. ?Sorry about the
> earlier mistake.
>
> Adam
>
> On Sat, Oct 17, 2009 at 12:08 PM, Adam Ginsburg
> <adam.ginsburg at colorado.edu> wrote:
>>
>> sqrt(float64(1.034324523462345))
>> # 1.0170174646791199
>> f=lambda x: x**2-float64(1.034324523462345)**2
>
> should be
> f=lambda x: x**2-float64(1.034324523462345)
>
> so the code I sent was not a legitimate test.
>


From charlesr.harris at gmail.com  Sat Oct 17 18:45:40 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 16:45:40 -0600
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
Message-ID: <e06186140910171545y3885dab7paf4836059e09d283@mail.gmail.com>

On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale <dsdale24 at gmail.com> wrote:

> numpy's functions, especially ufuncs, have had some ability to support
> subclasses through the ndarray.__array_wrap__ method, which provides
> masked arrays or quantities (for example) with an opportunity to set
> the class and metadata of the output array at the end of an operation.
> An example is
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'meters')
> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>
> At SciPy2009 we committed a change to the numpy trunk that provides a
> chance to determine the class and some metadata of the output *before*
> the ufunc performs its calculation, but after output array has been
> established (and its data is still uninitialized). Consider:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'J')
> numpy.add(q1, q2, q1)
> # or equivalently:
> # q1 += q2
>
> With only __array_wrap__, the attempt to propagate the units happens
> after q1's data was updated in place, too late to raise an error, the
> data is now corrupted. __array_prepare__ solves that problem, an
> exception can be raised in time.
>
> Now I'd like to suggest one more improvement to numpy to make its
> functions more generic. Consider one more example:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'feet')
> numpy.add(q1, q2)
>
> In this case, I'd like an opportunity to operate on the input arrays
> on the way in to the ufunc, to rescale the second input to meters. I
> think it would be a hack to try to stuff this capability into
> __array_prepare__. One form of this particular example is already
> supported in quantities, "q1 + q2", by overriding the __add__ method
> to rescale the second input, but there are ufuncs that do not have an
> associated special method. So I'd like to look into adding another
> check for a special method, perhaps called __input_prepare__. My time
> is really tight for the next month, so I'd rather not start if there
> are strong objections, but otherwise, I'd like to try to try to get it
> in in time for numpy-1.4. (Has a timeline been established?)
>
> I think it will be not too difficult to document this overall scheme:
>
> When calling numpy functions:
>
> 1) __input_prepare__ provides an opportunity to operate on the inputs
> to yield versions that are compatible with the operation (they should
> obviously not be modified in place)
>
> 2) the output array is established
>
> 3) __array_prepare__ is used to determine the class of the output
> array, as well as any metadata that needs to be established before the
> operation proceeds
>
> 4) the ufunc performs its operations
>
> 5) __array_wrap__ provides an opportunity to update the output array
> based on the results of the computation
>
> Comments, criticisms? If PEP 3124^ were already a part of the standard
> library, that could serve as the basis for generalizing numpy's
> functions. But I think the PEP will not be approved in its current
> form, and it is unclear when and if the author will revisit the
> proposal. The scheme I'm imagining might be sufficient for our
> purposes.
>
>
This sounds interesting to me, as it would push the use of array wrap down
into a common function and make it easier to use. I wonder what the impact
would be on the current subclasses of ndarray?

On a side note, I wonder if you could look into adding your reduce loop
optimizations into the generic loops? It would be interesting to see if that
speeded up some common operations. In any case, it can't hurt.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/aa13ed35/attachment.html>

From josef.pktd at gmail.com  Sat Oct 17 19:27:55 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 17 Oct 2009 19:27:55 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
Message-ID: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>

On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 11:54 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com> wrote:
>> >>
>> >> hi all,
>> >>
>> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a
>> >> lot. as far as i can tell, this function has no vectorized version
>> >> that works on an m-x-n matrix. i might be doing something wrong here,
>> >> but i found that this function can run extremely slowly if used as
>> >> follows: i have an array of log probability vectors, such that each
>> >> column sums to one. i want to simply iterate over each column and
>> >> renormalize it, using exp(col - logsumexp(col)). here is the code that
>> >> i used to profile this operation:
>> >>
>> >> from scipy import *
>> >> from numpy import *
>> >> from numpy.random.mtrand import dirichlet
>> >> from scipy.maxentropy import logsumexp
>> >> import time
>> >>
>> >
>> > Why aren't you using logaddexp ufunc from numpy?
>>
>> Maybe because it is difficult to find, it doesn't have its own docs entry.
>>
>> e.g. no link to logaddexp in
>>
>> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations
>>
>> I have no idea, why it is different from the other ufuncs in the docs
>> (and help file).
>> It shows up correctly in the docs editor, but not in the numpy 1.3 and
>> online docs.
>>
>
> That's curious, none of the five ufuncs added in 1.3 have links even though
> they all have documentation.

I found that they are missing from routines.math
http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/

I added logaddexp, logaddexp2 and exp2

What else?

Josef

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From charlesr.harris at gmail.com  Sat Oct 17 19:46:05 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 17:46:05 -0600
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
Message-ID: <e06186140910171646v3cce63ceg9815ffaf6fdcb141@mail.gmail.com>

On Sat, Oct 17, 2009 at 5:27 PM, <josef.pktd at gmail.com> wrote:

> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Sat, Oct 17, 2009 at 11:54 AM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com>
> wrote:
> >> >>
> >> >> hi all,
> >> >>
> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> >> >> lot. as far as i can tell, this function has no vectorized version
> >> >> that works on an m-x-n matrix. i might be doing something wrong here,
> >> >> but i found that this function can run extremely slowly if used as
> >> >> follows: i have an array of log probability vectors, such that each
> >> >> column sums to one. i want to simply iterate over each column and
> >> >> renormalize it, using exp(col - logsumexp(col)). here is the code
> that
> >> >> i used to profile this operation:
> >> >>
> >> >> from scipy import *
> >> >> from numpy import *
> >> >> from numpy.random.mtrand import dirichlet
> >> >> from scipy.maxentropy import logsumexp
> >> >> import time
> >> >>
> >> >
> >> > Why aren't you using logaddexp ufunc from numpy?
> >>
> >> Maybe because it is difficult to find, it doesn't have its own docs
> entry.
> >>
> >> e.g. no link to logaddexp in
> >>
> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations
> >>
> >> I have no idea, why it is different from the other ufuncs in the docs
> >> (and help file).
> >> It shows up correctly in the docs editor, but not in the numpy 1.3 and
> >> online docs.
> >>
> >
> > That's curious, none of the five ufuncs added in 1.3 have links even
> though
> > they all have documentation.
>
> I found that they are missing from routines.math
> http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/
>
> I added logaddexp, logaddexp2 and exp2
>
> What else?
>
>
Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that
something that can be done in svn, or automatically, or does it need to be
done on docs site?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/58125755/attachment.html>

From josef.pktd at gmail.com  Sat Oct 17 20:00:19 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 17 Oct 2009 20:00:19 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <e06186140910171646v3cce63ceg9815ffaf6fdcb141@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<e06186140910171646v3cce63ceg9815ffaf6fdcb141@mail.gmail.com>
Message-ID: <1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com>

On Sat, Oct 17, 2009 at 7:46 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 5:27 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Sat, Oct 17, 2009 at 11:54 AM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> hi all,
>> >> >>
>> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a
>> >> >> lot. as far as i can tell, this function has no vectorized version
>> >> >> that works on an m-x-n matrix. i might be doing something wrong
>> >> >> here,
>> >> >> but i found that this function can run extremely slowly if used as
>> >> >> follows: i have an array of log probability vectors, such that each
>> >> >> column sums to one. i want to simply iterate over each column and
>> >> >> renormalize it, using exp(col - logsumexp(col)). here is the code
>> >> >> that
>> >> >> i used to profile this operation:
>> >> >>
>> >> >> from scipy import *
>> >> >> from numpy import *
>> >> >> from numpy.random.mtrand import dirichlet
>> >> >> from scipy.maxentropy import logsumexp
>> >> >> import time
>> >> >>
>> >> >
>> >> > Why aren't you using logaddexp ufunc from numpy?
>> >>
>> >> Maybe because it is difficult to find, it doesn't have its own docs
>> >> entry.
>> >>
>> >> e.g. no link to logaddexp in
>> >>
>> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations
>> >>
>> >> I have no idea, why it is different from the other ufuncs in the docs
>> >> (and help file).
>> >> It shows up correctly in the docs editor, but not in the numpy 1.3 and
>> >> online docs.
>> >>
>> >
>> > That's curious, none of the five ufuncs added in 1.3 have links even
>> > though
>> > they all have documentation.
>>
>> I found that they are missing from routines.math
>> http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/
>>
>> I added logaddexp, logaddexp2 and exp2
>>
>> What else?
>>
>
> Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that
> something that can be done in svn, or automatically, or does it need to be
> done on docs site?
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

I can do it in the doc editor. I can see from the ufuncs docs where they belong.

Josef


From josef.pktd at gmail.com  Sat Oct 17 20:07:15 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 17 Oct 2009 20:07:15 -0400
Subject: [Numpy-discussion] vectorized version of logsumexp? (from
	scipy.maxentropy)
In-Reply-To: <1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<e06186140910171646v3cce63ceg9815ffaf6fdcb141@mail.gmail.com>
	<1cd32cbb0910171700xe319af6o767951191b84b14b@mail.gmail.com>
Message-ID: <1cd32cbb0910171707v2d526104n93b563da75f353ff@mail.gmail.com>

On Sat, Oct 17, 2009 at 8:00 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Oct 17, 2009 at 7:46 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Sat, Oct 17, 2009 at 5:27 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Sat, Oct 17, 2009 at 2:02 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Sat, Oct 17, 2009 at 11:54 AM, <josef.pktd at gmail.com> wrote:
>>> >>
>>> >> On Sat, Oct 17, 2009 at 1:20 PM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Sat, Oct 17, 2009 at 9:36 AM, per freem <perfreem at gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> hi all,
>>> >> >>
>>> >> >> in my code, i use the function 'logsumexp' from scipy.maxentropy a
>>> >> >> lot. as far as i can tell, this function has no vectorized version
>>> >> >> that works on an m-x-n matrix. i might be doing something wrong
>>> >> >> here,
>>> >> >> but i found that this function can run extremely slowly if used as
>>> >> >> follows: i have an array of log probability vectors, such that each
>>> >> >> column sums to one. i want to simply iterate over each column and
>>> >> >> renormalize it, using exp(col - logsumexp(col)). here is the code
>>> >> >> that
>>> >> >> i used to profile this operation:
>>> >> >>
>>> >> >> from scipy import *
>>> >> >> from numpy import *
>>> >> >> from numpy.random.mtrand import dirichlet
>>> >> >> from scipy.maxentropy import logsumexp
>>> >> >> import time
>>> >> >>
>>> >> >
>>> >> > Why aren't you using logaddexp ufunc from numpy?
>>> >>
>>> >> Maybe because it is difficult to find, it doesn't have its own docs
>>> >> entry.
>>> >>
>>> >> e.g. no link to logaddexp in
>>> >>
>>> >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations
>>> >>
>>> >> I have no idea, why it is different from the other ufuncs in the docs
>>> >> (and help file).
>>> >> It shows up correctly in the docs editor, but not in the numpy 1.3 and
>>> >> online docs.
>>> >>
>>> >
>>> > That's curious, none of the five ufuncs added in 1.3 have links even
>>> > though
>>> > they all have documentation.
>>>
>>> I found that they are missing from routines.math
>>> http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/
>>>
>>> I added logaddexp, logaddexp2 and exp2
>>>
>>> What else?
>>>
>>
>> Thanks. Also deg2rad, rad2deg, trunc, and copysign need to be added. Is that
>> something that can be done in svn, or automatically, or does it need to be
>> done on docs site?
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> I can do it in the doc editor. I can see from the ufuncs docs where they belong.
>
> Josef
>

here are the changes, if you wnat to check the location

Josef

http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.math.rst/diff/svn/cur/


From charlesr.harris at gmail.com  Sun Oct 18 00:22:31 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 17 Oct 2009 22:22:31 -0600
Subject: [Numpy-discussion] Subclassing record array
In-Reply-To: <f042e17a0910170813q28aa726bo9444b5bc7aa8363@mail.gmail.com>
References: <f042e17a0910170813q28aa726bo9444b5bc7aa8363@mail.gmail.com>
Message-ID: <e06186140910172122i1eee3d6es7d371346546625d7@mail.gmail.com>

On Sat, Oct 17, 2009 at 9:13 AM, Lo?c BERTHE <berthe.loic at gmail.com> wrote:

>   Hi,
>
> I would like to create my own class of record array to deal with units.
>
> Here is the code I used, inspired from
>
> http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
> :
>
>
> [code]
> from numpy import *
>
> class BlocArray(rec.ndarray):
>    """ Recarray with units and pretty print """
>
>    fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'}
>
>    def __new__(cls, data, titles=None, units=None):
>
>        # guess format for each column
>        data2 = []
>        for line in zip(*data) :
>            try : data2.append(cast[int](line))         # integers
>            except ValueError :
>                try : data2.append(cast[float](line))   # reals
>                except ValueError :
>                    data2.append(cast[str](line))       # characters
>
>        # create the array
>        dt = dtype(zip(titres, [line.dtype for line in data2]))
>        obj = rec.array(data2, dtype=dt).view(cls)
>
>        # add custom attributes
>        obj.units = units or []
>        obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n'
>        obj._head = "%10s "*len(dt.names) % dt.names +'\n'
>        obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in
> units) +'\n'
>
>        # Finally, we must return the newly created object:
>        return obj
>
> titles =  ['Name', 'Nb', 'Price']
> units = ['/', '/', 'Eur']
> data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']]
> bloc = BlocArray(data, titles=titles, units=units)
>
> In [544]: bloc
> Out[544]:
>      Name         Nb      Price
>       (/)        (/)      (Eur)
>      fish          1      12.25
>       egg          6       0.85
>        TV          1        125
> [/code]
>
> It's almost working, but I have some isues :
>
>   - I can't access data through indexing
> In [563]: bloc['Price']
> /home/loic/Python/numpy/test.py in <genexpr>((r,))
>     50
>     51     def __repr__(self):
> ---> 52         return self._head + ''.join(self._fmt % tuple(r) for r in
> self)
>
> TypeError: 'numpy.float64' object is not iterable
>
> So I think that overloading the __repr__ method is not that easy
>
>   - I can't access data through attributes now :
> In [564]: bloc.Nb
> AttributeError: 'BlocArray' object has no attribute 'Nb'
>
>   - I can't use 'T' as field in theses array as the T method is
> already here as a shortcut for transpose
>
>
> Have you any hints to make this work ?
>
>
>
On adding units in general, you might want to contact Darren Dale who has
been working in that direction also and has added some infrastructure in svn
to make it easier. He also gave a short presentation at scipy2009 on that
problem, which has been worked on before. No sense in reinventing the wheel
here.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091017/adf49a0e/attachment.html>

From gael.varoquaux at normalesup.org  Sun Oct 18 03:57:32 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 18 Oct 2009 09:57:32 +0200
Subject: [Numpy-discussion] Optimized sum of squares (was:
	vectorized	version of logsumexp? (from scipy.maxentropy))
In-Reply-To: <1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
Message-ID: <20091018075732.GA31449@phare.normalesup.org>

On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote:
> >> > Why aren't you using logaddexp ufunc from numpy?

> >> Maybe because it is difficult to find, it doesn't have its own docs entry.

Speaking of which...

I thought that there was a readily-written, optimized function (or ufunc)
in numpy or scipy that calculated the sum of squares for an array
(possibly along an axis). However, I cannot find it.

Is there something similar? If not, it is not the end of the world, the
operation is trivial to write.

Cheers,

Ga?l


From gruben at bigpond.net.au  Sun Oct 18 06:06:15 2009
From: gruben at bigpond.net.au (Gary Ruben)
Date: Sun, 18 Oct 2009 21:06:15 +1100
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <20091018075732.GA31449@phare.normalesup.org>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
Message-ID: <4ADAE897.1070000@bigpond.net.au>

Hi Ga?l,

If you've got a 1D array/vector called "a", I think the normal idiom is

np.dot(a,a)

For the more general case, I think
np.tensordot(a, a, axes=something_else)
should do it, where you should be able to figure out something_else for 
your particular case.

Gary R.

Gael Varoquaux wrote:
> On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote:
>>>>> Why aren't you using logaddexp ufunc from numpy?
> 
>>>> Maybe because it is difficult to find, it doesn't have its own docs entry.
> 
> Speaking of which...
> 
> I thought that there was a readily-written, optimized function (or ufunc)
> in numpy or scipy that calculated the sum of squares for an array
> (possibly along an axis). However, I cannot find it.
> 
> Is there something similar? If not, it is not the end of the world, the
> operation is trivial to write.
> 
> Cheers,
> 
> Ga?l


From dsdale24 at gmail.com  Sun Oct 18 07:48:43 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Sun, 18 Oct 2009 07:48:43 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <e06186140910171545y3885dab7paf4836059e09d283@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<e06186140910171545y3885dab7paf4836059e09d283@mail.gmail.com>
Message-ID: <a08e5f80910180448q3adbcc52we1975972a726e0bb@mail.gmail.com>

On Sat, Oct 17, 2009 at 6:45 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale <dsdale24 at gmail.com> wrote:
[...]
>> I think it will be not too difficult to document this overall scheme:
>>
>> When calling numpy functions:
>>
>> 1) __input_prepare__ provides an opportunity to operate on the inputs
>> to yield versions that are compatible with the operation (they should
>> obviously not be modified in place)
>>
>> 2) the output array is established
>>
>> 3) __array_prepare__ is used to determine the class of the output
>> array, as well as any metadata that needs to be established before the
>> operation proceeds
>>
>> 4) the ufunc performs its operations
>>
>> 5) __array_wrap__ provides an opportunity to update the output array
>> based on the results of the computation
>>
>> Comments, criticisms? If PEP 3124^ were already a part of the standard
>> library, that could serve as the basis for generalizing numpy's
>> functions. But I think the PEP will not be approved in its current
>> form, and it is unclear when and if the author will revisit the
>> proposal. The scheme I'm imagining might be sufficient for our
>> purposes.
>>
>
> This sounds interesting to me, as it would push the use of array wrap down
> into a common function and make it easier to use.

Sorry, I don't understand what you mean.

> I wonder what the impact
> would be on the current subclasses of ndarray?

I don't think it will have any impact. The only change would be the
addition of __input_prepare__, which by default would simply return
the unmodified inputs.

> On a side note, I wonder if you could look into adding your reduce loop
> optimizations into the generic loops? It would be interesting to see if that
> speeded up some common operations. In any case, it can't hurt.

I think you are confusing me with someone else.

Darren


From mdroe at stsci.edu  Sun Oct 18 08:04:15 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Sun, 18 Oct 2009 08:04:15 -0400
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
Message-ID: <4ADB043F.7060608@stsci.edu>

On 10/16/2009 11:35 PM, Travis Oliphant wrote:
>
> On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>
>> I recently committed a regression test and bugfix for object pointers in
>> record arrays of unaligned size (meaning where each record is not a
>> multiple of sizeof(PyObject **)).
>>
>> For example:
>>
>>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>>        a2 = np.zeros((10,), 'S10')
>>        # This copying would segfault
>>        a1['o'] = a2
>>
>> http://projects.scipy.org/numpy/ticket/1198
>>
>> Unfortunately, this unit test has opened up a whole hornet's nest of
>> alignment issues on Solaris.  The various reference counting functions
>> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
>> for instance.  Interestingly, there are comments in there saying
>> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
>> doesn't look to me like it would.  But I won't rule out a mistake in
>> building it on my part.
>
> Thanks for this bug report.      It would be very helpful if you could 
> provide the line number where the code is giving a bus error and 
> explain why you think the code in question does not handle misaligned 
> data (it still seems like it should to me --- but perhaps I must be 
> missing something --- I don't have a Solaris box to test on).   
> Perhaps, the real problem is elsewhere (such as other places where the 
> mistake of forgetting about striding needing to be aligned also before 
> pursuing the fast alignment path that you pointed out in another place 
> of code).
>
> This was the thinking for why the code (that I think is in question) 
> should handle mis-aligned data:
>
> 1) pointers that are not aligned to the correct size need to be copied 
> to an aligned memory area before being de-referenced.
> 2) static variables defined in a function will be aligned by the C 
> compiler.
>
> So, what the code in refcnt.c does is to copy the value in the NumPy 
> data-area (i.e. pointed to by it->dataptr) to another memory location 
> (the stack variable temp), dereference it and then increment it's 
> reference count.
>
> 196:  temp = (PyObject **)it->dataptr;
> 197:  Py_XINCREF(*temp);
This is exactly an instance that fails.  Let's say we have a PyObject at 
an aligned location 0x4000 (PyObjects themselves always seem to be 
aligned -- I strongly suspect CPython is enforcing that).  Then, we can 
create a recarray such that some of the PyObject*'s in it are at 
unaligned locations.  For example, if the dtype is 'O,c', you have a 
record stride of 5 which creates unaligned PyObject*'s:

    OOOOcOOOOcOOOOc
    0123456789abcde
         ^    ^

Now in the code above, let's assume that it->dataptr points to an 
unaligned location, 0x8005.  Assigning it to temp puts the same 
unaligned value in temp, 0x8005.  That is:

&temp == 0x1000 /* The location of temp *is* on the stack and aligned */
    temp == 0x8005 /* But its value as a pointer points to an unaligned 
memory location */
    *temp == 0x4000 /* Dereferencing it should get us back to the original
                       PyObject * pointer, but dereferencing an 
unaligned memory location
                       fails with a bus error on Solaris */

So the bus error occurs on line 197.

Note that something like:

    PyObject* temp;
    temp = *(PyObject **)it->dataptr;

would also fail.

The solution (this is what works for me, though there may be a better way):

     PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject 
**) */
     /* memcpy works byte-by-byte, so can handle an unaligned assignment */
     memcpy(&temp, it->dataptr, sizeof(PyObject *));
     Py_XINCREF(temp);

I'm proposing adding a macro which on Intel/AMD would be defined as:

#define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src))

and on alignment-required platforms as:

#define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), 
sizeof(PyObject *))

and it would be used something like:

COPY_PYOBJECT_PTR(&temp, it->dataptr);

If you agree with this assessment, I'm working on a patch for all of the 
locations that require this change.  All that I've found so far are 
related to object arrays.  It seems that many places where this would be 
an issue for numeric types are already using this memcpy technique (e.g. 
*_copyswap in arraytype.c.src:1716).  I think this issue shows up in 
object arrays much more because there are many more places where the 
unaligned memory is dereferenced (in order to do reference counting).

So here's the traceback from:

a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c'), ('i', 'i'), ('c2', 
'c')])

Unfortunately, I'm having trouble getting line numbers out of the 
debugger, but "print statement debugging" tells me the inner most frame 
here is in refcount.c:

275        PyObject **temp;
276        Py_XINCREF(obj);
277        temp = (PyObject **)optr;
278        *temp = obj; /* <-- here */
279        return;

My fix was:

Py_XINCREF(obj);
memcpy(optr, &obj, sizeof(PyObject*));
return;

0xfeefaf60 in _fillobject ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
(gdb) bt
#0  0xfeefaf60 in _fillobject ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#1  0xfeefaf20 in _fillobject ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#2  0xfeefad40 in PyArray_FillObjectArray ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#3  0xfee90e04 in _zerofill ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#4  0xfeed48c4 in PyArray_Zeros ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#5  0xfef05638 in array_zeros ()
    from 
/home/mdroe/numpy_clean/build/lib.solaris-2.8-sun4u-2.5/numpy/core/multiarray.so
#6  0x37e8c in PyObject_Call ()
#7  0x9a7e8 in do_call ()
#8  0x9a264 in call_function ()
#9  0x9754c in PyEval_EvalFrameEx ()
#10 0x988d4 in PyEval_EvalCodeEx ()
#11 0x93d44 in PyEval_EvalCode ()
#12 0xb9150 in run_mod ()
#13 0xb9108 in PyRun_FileExFlags ()
#14 0xb80c4 in PyRun_SimpleFileExFlags ()
#15 0x3171c in Py_Main ()

Hope that illustrates the point better.  Sorry for my vagueness in my 
initial report.

Mike


From dsdale24 at gmail.com  Sun Oct 18 08:06:52 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Sun, 18 Oct 2009 08:06:52 -0400
Subject: [Numpy-discussion] Subclassing record array
In-Reply-To: <e06186140910172122i1eee3d6es7d371346546625d7@mail.gmail.com>
References: <f042e17a0910170813q28aa726bo9444b5bc7aa8363@mail.gmail.com>
	<e06186140910172122i1eee3d6es7d371346546625d7@mail.gmail.com>
Message-ID: <a08e5f80910180506j166f2060t2a32baabb2d07ef@mail.gmail.com>

On Sun, Oct 18, 2009 at 12:22 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 9:13 AM, Lo?c BERTHE <berthe.loic at gmail.com> wrote:
>>
>> ? Hi,
>>
>> I would like to create my own class of record array to deal with units.
>>
>> Here is the code I used, inspired from
>>
>> http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
>> :
>>
>>
>> [code]
>> from numpy import *
>>
>> class BlocArray(rec.ndarray):
>> ? ?""" Recarray with units and pretty print """
>>
>> ? ?fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'}
>>
>> ? ?def __new__(cls, data, titles=None, units=None):
>>
>> ? ? ? ?# guess format for each column
>> ? ? ? ?data2 = []
>> ? ? ? ?for line in zip(*data) :
>> ? ? ? ? ? ?try : data2.append(cast[int](line)) ? ? ? ? # integers
>> ? ? ? ? ? ?except ValueError :
>> ? ? ? ? ? ? ? ?try : data2.append(cast[float](line)) ? # reals
>> ? ? ? ? ? ? ? ?except ValueError :
>> ? ? ? ? ? ? ? ? ? ?data2.append(cast[str](line)) ? ? ? # characters
>>
>> ? ? ? ?# create the array
>> ? ? ? ?dt = dtype(zip(titres, [line.dtype for line in data2]))
>> ? ? ? ?obj = rec.array(data2, dtype=dt).view(cls)
>>
>> ? ? ? ?# add custom attributes
>> ? ? ? ?obj.units = units or []
>> ? ? ? ?obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n'
>> ? ? ? ?obj._head = "%10s "*len(dt.names) % dt.names +'\n'
>> ? ? ? ?obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in
>> units) +'\n'
>>
>> ? ? ? ?# Finally, we must return the newly created object:
>> ? ? ? ?return obj
>>
>> titles = ?['Name', 'Nb', 'Price']
>> units = ['/', '/', 'Eur']
>> data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']]
>> bloc = BlocArray(data, titles=titles, units=units)
>>
>> In [544]: bloc
>> Out[544]:
>> ? ? ?Name ? ? ? ? Nb ? ? ?Price
>> ? ? ? (/) ? ? ? ?(/) ? ? ?(Eur)
>> ? ? ?fish ? ? ? ? ?1 ? ? ?12.25
>> ? ? ? egg ? ? ? ? ?6 ? ? ? 0.85
>> ? ? ? ?TV ? ? ? ? ?1 ? ? ? ?125
>> [/code]
>>
>> It's almost working, but I have some isues :
>>
>> ? - I can't access data through indexing
>> In [563]: bloc['Price']
>> /home/loic/Python/numpy/test.py in <genexpr>((r,))
>> ? ? 50
>> ? ? 51 ? ? def __repr__(self):
>> ---> 52 ? ? ? ? return self._head + ''.join(self._fmt % tuple(r) for r in
>> self)
>>
>> TypeError: 'numpy.float64' object is not iterable
>>
>> So I think that overloading the __repr__ method is not that easy
>>
>> ? - I can't access data through attributes now :
>> In [564]: bloc.Nb
>> AttributeError: 'BlocArray' object has no attribute 'Nb'
>>
>> ? - I can't use 'T' as field in theses array as the T method is
>> already here as a shortcut for transpose
>>
>>
>> Have you any hints to make this work ?
>>
>>
>
> On adding units in general, you might want to contact Darren Dale who has
> been working in that direction also and has added some infrastructure in svn
> to make it easier. He also gave a short presentation at scipy2009 on that
> problem, which has been worked on before. No sense in reinventing the wheel
> here.

The units package I have been working on is called quantities. It is
available at the python package index, and the project is hosted at
launchpad as python-quantities. If quantities isn't a good fit, please
let me know why. At least the code can provide some example of how to
subclass ndarray.

Darren


From gael.varoquaux at normalesup.org  Sun Oct 18 08:09:27 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 18 Oct 2009 14:09:27 +0200
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <4ADAE897.1070000@bigpond.net.au>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
Message-ID: <20091018120927.GA1113@phare.normalesup.org>

On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
> Hi Ga?l,

> If you've got a 1D array/vector called "a", I think the normal idiom is

> np.dot(a,a)

> For the more general case, I think
> np.tensordot(a, a, axes=something_else)
> should do it, where you should be able to figure out something_else for 
> your particular case.

Ha, yes. Good point about the tensordot trick.

Thank you

Ga?l


From charlesr.harris at gmail.com  Sun Oct 18 10:27:38 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 18 Oct 2009 08:27:38 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4ADB043F.7060608@stsci.edu>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
	<4ADB043F.7060608@stsci.edu>
Message-ID: <e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>

On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom <mdroe at stsci.edu> wrote:

> On 10/16/2009 11:35 PM, Travis Oliphant wrote:
> >
> > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
> >
> >> I recently committed a regression test and bugfix for object pointers in
> >> record arrays of unaligned size (meaning where each record is not a
> >> multiple of sizeof(PyObject **)).
> >>
> >> For example:
> >>
> >>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
> >>        a2 = np.zeros((10,), 'S10')
> >>        # This copying would segfault
> >>        a1['o'] = a2
> >>
> >> http://projects.scipy.org/numpy/ticket/1198
> >>
> >> Unfortunately, this unit test has opened up a whole hornet's nest of
> >> alignment issues on Solaris.  The various reference counting functions
> >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> >> for instance.  Interestingly, there are comments in there saying
> >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> >> doesn't look to me like it would.  But I won't rule out a mistake in
> >> building it on my part.
> >
> > Thanks for this bug report.      It would be very helpful if you could
> > provide the line number where the code is giving a bus error and
> > explain why you think the code in question does not handle misaligned
> > data (it still seems like it should to me --- but perhaps I must be
> > missing something --- I don't have a Solaris box to test on).
> > Perhaps, the real problem is elsewhere (such as other places where the
> > mistake of forgetting about striding needing to be aligned also before
> > pursuing the fast alignment path that you pointed out in another place
> > of code).
> >
> > This was the thinking for why the code (that I think is in question)
> > should handle mis-aligned data:
> >
> > 1) pointers that are not aligned to the correct size need to be copied
> > to an aligned memory area before being de-referenced.
> > 2) static variables defined in a function will be aligned by the C
> > compiler.
> >
> > So, what the code in refcnt.c does is to copy the value in the NumPy
> > data-area (i.e. pointed to by it->dataptr) to another memory location
> > (the stack variable temp), dereference it and then increment it's
> > reference count.
> >
> > 196:  temp = (PyObject **)it->dataptr;
> > 197:  Py_XINCREF(*temp);
> This is exactly an instance that fails.  Let's say we have a PyObject at
> an aligned location 0x4000 (PyObjects themselves always seem to be
> aligned -- I strongly suspect CPython is enforcing that).  Then, we can
> create a recarray such that some of the PyObject*'s in it are at
> unaligned locations.  For example, if the dtype is 'O,c', you have a
> record stride of 5 which creates unaligned PyObject*'s:
>
>    OOOOcOOOOcOOOOc
>    0123456789abcde
>         ^    ^
>
> Now in the code above, let's assume that it->dataptr points to an
> unaligned location, 0x8005.  Assigning it to temp puts the same
> unaligned value in temp, 0x8005.  That is:
>
> &temp == 0x1000 /* The location of temp *is* on the stack and aligned */
>    temp == 0x8005 /* But its value as a pointer points to an unaligned
> memory location */
>    *temp == 0x4000 /* Dereferencing it should get us back to the original
>                       PyObject * pointer, but dereferencing an
> unaligned memory location
>                       fails with a bus error on Solaris */
>
> So the bus error occurs on line 197.
>
> Note that something like:
>
>    PyObject* temp;
>    temp = *(PyObject **)it->dataptr;
>
> would also fail.
>
> The solution (this is what works for me, though there may be a better way):
>
>     PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject
> **) */
>     /* memcpy works byte-by-byte, so can handle an unaligned assignment */
>     memcpy(&temp, it->dataptr, sizeof(PyObject *));
>     Py_XINCREF(temp);
>
> I'm proposing adding a macro which on Intel/AMD would be defined as:
>
> #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src))
>
> and on alignment-required platforms as:
>
> #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src),
> sizeof(PyObject *))
>
> and it would be used something like:
>
> COPY_PYOBJECT_PTR(&temp, it->dataptr);
>
>
This looks right to me, but I'll let Travis sign off on it.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091018/f2f2cef2/attachment.html>

From jsseabold at gmail.com  Sun Oct 18 12:06:49 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sun, 18 Oct 2009 12:06:49 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <20091018120927.GA1113@phare.normalesup.org>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com> 
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com> 
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com> 
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com> 
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com> 
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au> 
	<20091018120927.GA1113@phare.normalesup.org>
Message-ID: <c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>

On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
>> Hi Ga?l,
>
>> If you've got a 1D array/vector called "a", I think the normal idiom is
>
>> np.dot(a,a)
>
>> For the more general case, I think
>> np.tensordot(a, a, axes=something_else)
>> should do it, where you should be able to figure out something_else for
>> your particular case.
>
> Ha, yes. Good point about the tensordot trick.
>
> Thank you
>
> Ga?l

I'm curious about this as I use ss, which is just np.sum(a*a, axis),
in statsmodels and didn't much think about it.

There is

import numpy as np
from scipy.stats import ss

a = np.ones(5000)

but

timeit ss(a)
10000 loops, best of 3: 21.5 ?s per loop

timeit np.add.reduce(a*a)
100000 loops, best of 3: 15 ?s per loop

timeit np.dot(a,a)
100000 loops, best of 3: 5.38 ?s per loop

Do the number of loops matter in the timings and is dot always faster
even without the blas dot?

Skipper


From gokhansever at gmail.com  Sun Oct 18 13:03:14 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Sun, 18 Oct 2009 12:03:14 -0500
Subject: [Numpy-discussion] Multiple string formatting while writing an
	array into a file
Message-ID: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com>

Hello,

I have a relatively simple question which I couldn't figure out myself yet.
I have an array that I am writing into a file using the following savetxt
method.

np.savetxt(fid, output_array, fmt='%12.4f', delimiter='')


However, I have made some changes on the code and I require to write after
7th element of the array as integer instead of 12.4 formatted float. The
change below doesn't help me to solve the problem since I get a "ValueError:
setting an array element with a sequence."

np.savetxt(fid, (output_array[:7], output_array[7:]), fmt=('%12.4f',
'%12d'), delimiter='')

What would be the right approach to fix this issue?

Thanks.

-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091018/50f6c0e2/attachment.html>

From josef.pktd at gmail.com  Sun Oct 18 13:37:55 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 18 Oct 2009 13:37:55 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
	<20091018120927.GA1113@phare.normalesup.org>
	<c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>
Message-ID: <1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com>

On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
>>> Hi Ga?l,
>>
>>> If you've got a 1D array/vector called "a", I think the normal idiom is
>>
>>> np.dot(a,a)
>>
>>> For the more general case, I think
>>> np.tensordot(a, a, axes=something_else)
>>> should do it, where you should be able to figure out something_else for
>>> your particular case.
>>
>> Ha, yes. Good point about the tensordot trick.
>>
>> Thank you
>>
>> Ga?l
>
> I'm curious about this as I use ss, which is just np.sum(a*a, axis),
> in statsmodels and didn't much think about it.
>
> There is
>
> import numpy as np
> from scipy.stats import ss
>
> a = np.ones(5000)
>
> but
>
> timeit ss(a)
> 10000 loops, best of 3: 21.5 ?s per loop
>
> timeit np.add.reduce(a*a)
> 100000 loops, best of 3: 15 ?s per loop
>
> timeit np.dot(a,a)
> 100000 loops, best of 3: 5.38 ?s per loop
>
> Do the number of loops matter in the timings and is dot always faster
> even without the blas dot?

David's reply once was that it depends on ATLAS and the version of lapack/blas.

I usually switched to using dot for 1d. Using tensordot looks to
complicated for me, to figure out the axes when I quickly want a sum of squares.

I never tried the timing of tensordot for 2d arrays, especially for
axis=0 for a
c ordered array. If it's faster, this could be useful to rewrite stats.ss.

I don't remember that np.add.reduce is much faster than np.sum. This might be
the additional call overhead from using another function in between.

Josef


>
> Skipper
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From sturla at molden.no  Sun Oct 18 14:16:34 2009
From: sturla at molden.no (Sturla Molden)
Date: Sun, 18 Oct 2009 20:16:34 +0200
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>	<4ADAE897.1070000@bigpond.net.au>
	<20091018120927.GA1113@phare.normalesup.org>
	<c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>
Message-ID: <4ADB5B82.2040105@molden.no>

Skipper Seabold skrev:
> I'm curious about this as I use ss, which is just np.sum(a*a, axis),
> in statsmodels and didn't much think about it.
>
> Do the number of loops matter in the timings and is dot always faster
> even without the blas dot?
>   
The thing is that a*a returns a temporary array with the same shape as 
a, and then that is passed to np.sum. The BLAS dot product don't need to 
allocate and deallocate temporary arrays.

S.M.


From charlesr.harris at gmail.com  Sun Oct 18 14:19:32 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 18 Oct 2009 12:19:32 -0600
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
	<20091018120927.GA1113@phare.normalesup.org>
	<c048da1c0910180906h32180f34wff5a0451ca9ed1d1@mail.gmail.com>
	<1cd32cbb0910181037h444b3491i5b6092e12a75d75d@mail.gmail.com>
Message-ID: <e06186140910181119j6a0c133cu8e590c2d18478493@mail.gmail.com>

On Sun, Oct 18, 2009 at 11:37 AM, <josef.pktd at gmail.com> wrote:

> On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold <jsseabold at gmail.com>
> wrote:
> > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux
> > <gael.varoquaux at normalesup.org> wrote:
> >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote:
> >>> Hi Ga?l,
> >>
> >>> If you've got a 1D array/vector called "a", I think the normal idiom is
> >>
> >>> np.dot(a,a)
> >>
> >>> For the more general case, I think
> >>> np.tensordot(a, a, axes=something_else)
> >>> should do it, where you should be able to figure out something_else for
> >>> your particular case.
> >>
> >> Ha, yes. Good point about the tensordot trick.
> >>
> >> Thank you
> >>
> >> Ga?l
> >
> > I'm curious about this as I use ss, which is just np.sum(a*a, axis),
> > in statsmodels and didn't much think about it.
> >
> > There is
> >
> > import numpy as np
> > from scipy.stats import ss
> >
> > a = np.ones(5000)
> >
> > but
> >
> > timeit ss(a)
> > 10000 loops, best of 3: 21.5 ?s per loop
> >
> > timeit np.add.reduce(a*a)
> > 100000 loops, best of 3: 15 ?s per loop
> >
> > timeit np.dot(a,a)
> > 100000 loops, best of 3: 5.38 ?s per loop
> >
> > Do the number of loops matter in the timings and is dot always faster
> > even without the blas dot?
>
> David's reply once was that it depends on ATLAS and the version of
> lapack/blas.
>
> I usually switched to using dot for 1d. Using tensordot looks to
> complicated for me, to figure out the axes when I quickly want a sum of
> squares.
>
> I never tried the timing of tensordot for 2d arrays, especially for
> axis=0 for a
> c ordered array. If it's faster, this could be useful to rewrite stats.ss.
>
> I don't remember that np.add.reduce is much faster than np.sum. This might
> be
> the additional call overhead from using another function in between.
>
>
If you are using numpy from svn, it might be due to te recent optimizations
that Luca Citi did for some of the ufuncs. Now we just need a multiply and
add function.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091018/d33e772f/attachment.html>

From jeffamcgee at gmail.com  Sun Oct 18 15:11:49 2009
From: jeffamcgee at gmail.com (Jeffrey McGee)
Date: Sun, 18 Oct 2009 14:11:49 -0500
Subject: [Numpy-discussion] TypeError when calling numpy.kaiser()
Message-ID: <dbf395c0910181211t1da978fbi536cebf123d27c27@mail.gmail.com>

Howdy,
I'm having trouble getting the kaiser window to work.  Anytime I try
to call numpy.kaiser(), it throws an exception.  Here's the output when
I run the example code from
http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html :

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from numpy import kaiser
>>> kaiser(12, 14)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
line 2630, in kaiser
    return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta)
  File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
line 2507, in i0
    y[ind] = _i0_1(x[ind])
TypeError: array cannot be safely cast to required type
>>>


Is this a bug?  Am I doing something wrong?  (I'm using the Ubuntu 9.4
packages for python and numpy.)
Thanks,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091018/7690a079/attachment.html>

From sebastian.walter at gmail.com  Mon Oct 19 03:10:06 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Mon, 19 Oct 2009 09:10:06 +0200
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
Message-ID: <ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>

On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale <dsdale24 at gmail.com> wrote:
> numpy's functions, especially ufuncs, have had some ability to support
> subclasses through the ndarray.__array_wrap__ method, which provides
> masked arrays or quantities (for example) with an opportunity to set
> the class and metadata of the output array at the end of an operation.
> An example is
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'meters')
> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>
> At SciPy2009 we committed a change to the numpy trunk that provides a
> chance to determine the class and some metadata of the output *before*
> the ufunc performs its calculation, but after output array has been
> established (and its data is still uninitialized). Consider:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'J')
> numpy.add(q1, q2, q1)
> # or equivalently:
> # q1 += q2
>
> With only __array_wrap__, the attempt to propagate the units happens
> after q1's data was updated in place, too late to raise an error, the
> data is now corrupted. __array_prepare__ solves that problem, an
> exception can be raised in time.
>
> Now I'd like to suggest one more improvement to numpy to make its
> functions more generic. Consider one more example:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'feet')
> numpy.add(q1, q2)
>
> In this case, I'd like an opportunity to operate on the input arrays
> on the way in to the ufunc, to rescale the second input to meters. I
> think it would be a hack to try to stuff this capability into
> __array_prepare__. One form of this particular example is already
> supported in quantities, "q1 + q2", by overriding the __add__ method
> to rescale the second input, but there are ufuncs that do not have an
> associated special method. So I'd like to look into adding another
> check for a special method, perhaps called __input_prepare__. My time
> is really tight for the next month, so I'd rather not start if there
> are strong objections, but otherwise, I'd like to try to try to get it
> in in time for numpy-1.4. (Has a timeline been established?)
>
> I think it will be not too difficult to document this overall scheme:
>
> When calling numpy functions:
>
> 1) __input_prepare__ provides an opportunity to operate on the inputs
> to yield versions that are compatible with the operation (they should
> obviously not be modified in place)
>
> 2) the output array is established
>
> 3) __array_prepare__ is used to determine the class of the output
> array, as well as any metadata that needs to be established before the
> operation proceeds
>
> 4) the ufunc performs its operations
>
> 5) __array_wrap__ provides an opportunity to update the output array
> based on the results of the computation
>
> Comments, criticisms? If PEP 3124^ were already a part of the standard
> library, that could serve as the basis for generalizing numpy's
> functions. But I think the PEP will not be approved in its current
> form, and it is unclear when and if the author will revisit the
> proposal. The scheme I'm imagining might be sufficient for our
> purposes.

I'm all for generic (u)funcs since they might come handy for me since
I'm doing lots of operation on arrays of polynomials.
 I don't quite get the reasoning though.
Could you correct me where I get it wrong?
* the class Quantity derives from numpy.ndarray
* Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for
q1 = Quantity(1, 'meter')
q2 = Quantity(2, 'J')
by raising an exception when performing q1+=q2
* The problem is that numpy.add(q1,q1,q2) would corrupt q1 before
raising an exception


Sebastian


>
> Darren
>
> ^ http://www.python.org/dev/peps/pep-3124/
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From robince at gmail.com  Mon Oct 19 07:22:50 2009
From: robince at gmail.com (Robin)
Date: Mon, 19 Oct 2009 12:22:50 +0100
Subject: [Numpy-discussion] fortran vs numpy on mac/linux - gcc performance?
Message-ID: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com>

Hi,

I have been looking at moving some of my bottleneck functions to
fortran with f2py. To get started I tried some simple things, and was
surprised they performend so much better than the number builtins -
which I assumed would be c and would be quite fast.

On my Macbook pro laptop (Intel core 2 duo) I got the following
results. Numpy is built with xcode gcc 4.0.1 and gfortran is 4.2.3 -
fortran code for shuffle and bincount below:
In [1]: x = np.random.random_integers(0,1023,1000000).astype(int)
In [2]: import ftest
In [3]: timeit np.bincount(x)
100 loops, best of 3: 3.97 ms per loop
In [4]: timeit ftest.bincount(x,1024)
1000 loops, best of 3: 1.15 ms per loop
In [5]: timeit np.random.shuffle(x)
1 loops, best of 3: 605 ms per loop
In [6]: timeit ftest.shuffle(x)
10 loops, best of 3: 139 ms per loop

So fortran was about 4 times faster for these loops - similarly faster
than cython as well. So I was really happy as these are two of my
biggest bottlenecks, but when I moved a linux workstation I got
different results. Here with gcc/gfortran 4.3.3 :
In [3]: x = np.random.random_integers(0,1023,1000000).astype(int)
In [4]: timeit np.bincount(x)
100 loops, best of 3: 8.18 ms per loop
In [5]: timeit ftest.bincount(x,1024)
100 loops, best of 3: 8.25 ms per loop
In [6]:
In [7]: timeit np.random.shuffle(x)
1 loops, best of 3: 379 ms per loop
In [8]: timeit ftest.shuffle(x)
10 loops, best of 3: 172 ms per loop

So shuffle is a bit faster, but bincount is now the same as fortran.
The only thing I can think is that it is due to much better
performance of the more recent c compiler. I think this would also
explain why f2py extension was performing so much better than cython
on the mac.

So my question is -  is there a way to build numpy with a more recent
compiler on leopard? (I guess I could upgrade to snow leopard now) -
Could I make the numpy install use gcc-4.2 from xcode or would it
break stuff? Could I use gcc 4.3.3 from macports? It would be great to
get a 4x speed up on all numpy c loops! (already just these two
functions I use a lot would make a big difference).

Cheers

Robin


From robince at gmail.com  Mon Oct 19 07:24:18 2009
From: robince at gmail.com (Robin)
Date: Mon, 19 Oct 2009 12:24:18 +0100
Subject: [Numpy-discussion] fortran vs numpy on mac/linux - gcc
	performance?
In-Reply-To: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com>
References: <2d5132a50910190422g74f246c6t3b9d18cc1742a50a@mail.gmail.com>
Message-ID: <2d5132a50910190424m9589ba2w700f75acea34bc4a@mail.gmail.com>

Forgot to include the fortran code used:

jm-g26b101:fortran robince$ cat test.f95
subroutine bincount (x,c,n,m)
    implicit none
    integer, intent(in) :: n,m
    integer, dimension(0:n-1), intent(in) :: x
    integer, dimension(0:m-1), intent(out) :: c
    integer :: i

    c = 0
    do i = 0, n-1
        c(x(i)) = c(x(i)) + 1
    end do
end


subroutine shuffle (x,s,n)
    implicit none
    integer, intent(in) :: n
    integer, dimension(n), intent(in) :: x
    integer, dimension(n), intent(out) :: s
    integer :: i,randpos,temp
    real :: r

    ! copy input
    s = x
    call init_random_seed()
    ! knuth shuffle from http://rosettacode.org/wiki/Knuth_shuffle#Fortran
    do i = n, 2, -1
        call random_number(r)
        randpos = int(r * i) + 1
        temp = s(randpos)
        s(randpos) = s(i)
        s(i) = temp
    end do
end


subroutine init_random_seed()
    ! init_random_seed from gfortran documentation
    integer :: i, n, clock
    integer, dimension(:), allocatable :: seed

    call random_seed(size = n)
    allocate(seed(n))

    call system_clock(count=clock)

    seed = clock + 37 * (/ (i - 1, i = 1, n) /)
    call random_seed(put = seed)

    deallocate(seed)
end subroutine


From dsdale24 at gmail.com  Mon Oct 19 07:55:36 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Mon, 19 Oct 2009 07:55:36 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
Message-ID: <a08e5f80910190455l6ea7f20cj2cdc0403a2ae015a@mail.gmail.com>

On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter
<sebastian.walter at gmail.com> wrote:
> On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale <dsdale24 at gmail.com> wrote:
>> numpy's functions, especially ufuncs, have had some ability to support
>> subclasses through the ndarray.__array_wrap__ method, which provides
>> masked arrays or quantities (for example) with an opportunity to set
>> the class and metadata of the output array at the end of an operation.
>> An example is
>>
>> q1 = Quantity(1, 'meter')
>> q2 = Quantity(2, 'meters')
>> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>>
>> At SciPy2009 we committed a change to the numpy trunk that provides a
>> chance to determine the class and some metadata of the output *before*
>> the ufunc performs its calculation, but after output array has been
>> established (and its data is still uninitialized). Consider:
>>
>> q1 = Quantity(1, 'meter')
>> q2 = Quantity(2, 'J')
>> numpy.add(q1, q2, q1)
>> # or equivalently:
>> # q1 += q2
>>
>> With only __array_wrap__, the attempt to propagate the units happens
>> after q1's data was updated in place, too late to raise an error, the
>> data is now corrupted. __array_prepare__ solves that problem, an
>> exception can be raised in time.
>>
>> Now I'd like to suggest one more improvement to numpy to make its
>> functions more generic. Consider one more example:
>>
>> q1 = Quantity(1, 'meter')
>> q2 = Quantity(2, 'feet')
>> numpy.add(q1, q2)
>>
>> In this case, I'd like an opportunity to operate on the input arrays
>> on the way in to the ufunc, to rescale the second input to meters. I
>> think it would be a hack to try to stuff this capability into
>> __array_prepare__. One form of this particular example is already
>> supported in quantities, "q1 + q2", by overriding the __add__ method
>> to rescale the second input, but there are ufuncs that do not have an
>> associated special method. So I'd like to look into adding another
>> check for a special method, perhaps called __input_prepare__. My time
>> is really tight for the next month, so I'd rather not start if there
>> are strong objections, but otherwise, I'd like to try to try to get it
>> in in time for numpy-1.4. (Has a timeline been established?)
>>
>> I think it will be not too difficult to document this overall scheme:
>>
>> When calling numpy functions:
>>
>> 1) __input_prepare__ provides an opportunity to operate on the inputs
>> to yield versions that are compatible with the operation (they should
>> obviously not be modified in place)
>>
>> 2) the output array is established
>>
>> 3) __array_prepare__ is used to determine the class of the output
>> array, as well as any metadata that needs to be established before the
>> operation proceeds
>>
>> 4) the ufunc performs its operations
>>
>> 5) __array_wrap__ provides an opportunity to update the output array
>> based on the results of the computation
>>
>> Comments, criticisms? If PEP 3124^ were already a part of the standard
>> library, that could serve as the basis for generalizing numpy's
>> functions. But I think the PEP will not be approved in its current
>> form, and it is unclear when and if the author will revisit the
>> proposal. The scheme I'm imagining might be sufficient for our
>> purposes.
>
> I'm all for generic (u)funcs since they might come handy for me since
> I'm doing lots of operation on arrays of polynomials.
> ?I don't quite get the reasoning though.
> Could you correct me where I get it wrong?
> * the class Quantity derives from numpy.ndarray
> * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'J')
> by raising an exception when performing q1+=q2

No, Quantity does not override __iadd__ to catch this. Quantity
implements __array_prepare__ to perform the dimensional analysis based
on the identity of the ufunc and the inputs, and set the class and
dimensionality of the output array, or raise an error when dimensional
analysis fails. This approach lets quantities support all ufuncs (in
principle), not just built in numerical operations. It should also
make it easier to subclass from MaskedArray, so we could have a
MaskedQuantity without having to establish yet another suite of ufuncs
specific to quantities or masked quantities.

> * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before
> raising an exception

That was solved by the addition of __array_prepare__ to numpy back in
August. What I am proposing now is supporting operations on arrays
that would be compatible if we had a chance to transform them on the
way into the ufunc, like "meter + foot".

Darren


From josef.pktd at gmail.com  Mon Oct 19 10:40:23 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 19 Oct 2009 10:40:23 -0400
Subject: [Numpy-discussion] numpy build/installation problems ?
Message-ID: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>

I wanted to finally upgrade my numpy, so I can build scipy trunk
again, but I get test failures with numpy. And running the tests of
the previously compiled version of scipy crashes in signaltools.

Is this a problem with my build (the usual official MingW on
WindowsXP), or are there still ABI problems in numpy trunk?
I did the build twice with (I think) clean directories and get the same result.

Thanks,

Josef


Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__file__
'C:\\Josef\\_progs\\Subversion\\numpy-trunk\\dist\\numpy-1.4.0.dev7539.win32\\Pr
ograms\\Python25\\Lib\\site-packages\\numpy\\__init__.py'
>>> numpy.test()
Running unit tests for numpy
NumPy version 1.4.0.dev7539
NumPy is installed in C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.de
v7539.win32\Programs\Python25\Lib\site-packages\numpy
Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int
el)]
nose version 0.11.1
................................. \u03a3
.[[' abc' '']
 ['12345' 'MixedCase']
 ['123      345' 'UPPER']]
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
............................K...................................................
..FF.......................FFFF.................................................
................................................................................
................................................................................
................................................................................
................................................................................
............................................................................C:\J
osef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Programs\Pytho
n25\Lib\site-packages\numpy\lib\io.py:1332: ConversionWarning: Some errors were
detected !
    Line #2 (got 4 columns instead of 5)
    Line #12 (got 4 columns instead of 5)
    Line #22 (got 4 columns instead of 5)
    Line #32 (got 4 columns instead of 5)
    Line #42 (got 4 columns instead of 5)
  warnings.warn(errmsg, ConversionWarning)
.C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Programs\
Python25\Lib\site-packages\numpy\lib\io.py:1332: ConversionWarning: Some errors
were detected !
    Line #2 (got 4 columns instead of 2)
    Line #12 (got 4 columns instead of 2)
    Line #22 (got 4 columns instead of 2)
    Line #32 (got 4 columns instead of 2)
    Line #42 (got 4 columns instead of 2)
  warnings.warn(errmsg, ConversionWarning)
................E.E.EE....................K........K............F...............
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
..S.............................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
.............................
======================================================================
ERROR: Test giving usecols with a comma-separated string
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 747, in test
_usecols_as_css
    names="a, b, c", usecols="a, c")
  File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1099, in gen
fromtxt
    the docstring of the `genfromtxt` function.
AttributeError: 'tuple' object has no attribute 'index'

======================================================================
ERROR: Test usecols with named columns
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 773, in test
_usecols_with_named_columns
    usecols=('a', 'c'), **kwargs)
  File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1099, in gen
fromtxt
    the docstring of the `genfromtxt` function.
AttributeError: 'tuple' object has no attribute 'index'

======================================================================
ERROR: Test with missing and filling values
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 861, in test
_user_filling_values
    test = np.genfromtxt(StringIO.StringIO(data), **kwargs)
  File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1127, in gen
fromtxt
    See Also
AttributeError: 'tuple' object has no attribute 'index'

======================================================================
ERROR: test_user_missing_values (test_io.TestFromTxt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\lib\tests\test_io.py", line 845, in test
_user_missing_values
    **basekwargs)
  File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1481, in maf
romtxt
  File "\Programs\Python25\Lib\site-packages\numpy\lib\io.py", line 1127, in gen
fromtxt
    See Also
AttributeError: 'tuple' object has no attribute 'index'

======================================================================
FAIL: test_umath.test_hypot_special_values(1.#QNAN, 1.#INF)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\core\tests\test_umath.py", line 211, in
assert_hypot_isinf
    assert np.isinf(ncu.hypot(x, y))
AssertionError

======================================================================
FAIL: test_umath.test_hypot_special_values(1.#INF, 1.#QNAN)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\core\tests\test_umath.py", line 211, in
assert_hypot_isinf
    assert np.isinf(ncu.hypot(x, y))
AssertionError

======================================================================
FAIL: test_umath.test_arctan2_special_values(nan, 2.3561944901923448)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a
lmost_equal
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal
 ACTUAL: nan
 DESIRED: 2.3561944901923448

======================================================================
FAIL: test_umath.test_arctan2_special_values(nan, -2.3561944901923448)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a
lmost_equal
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal
 ACTUAL: nan
 DESIRED: -2.3561944901923448

======================================================================
FAIL: test_umath.test_arctan2_special_values(nan, 0.78539816339744828)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a
lmost_equal
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal
 ACTUAL: nan
 DESIRED: 0.78539816339744828

======================================================================
FAIL: test_umath.test_arctan2_special_values(nan, -0.78539816339744828)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 449, in assert_a
lmost_equal
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal
 ACTUAL: nan
 DESIRED: -0.78539816339744828

======================================================================
FAIL: test_doctests (test_polynomial.TestDocs)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\lib\tests\test_polynomial.py", line 90,
in test_doctests
    return rundocs()
  File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Pr
ograms\Python25\Lib\site-packages\numpy\testing\utils.py", line 951, in rundocs
    raise AssertionError("Some doctests failed:\n%s" % "\n".join(msg))
AssertionError: Some doctests failed:
**********************************************************************
File "C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.dev7539.win32\Prog
rams\Python25\Lib\site-packages\numpy\lib\tests\test_polynomial.py", line 20, in
 test_polynomial
Failed example:
    print poly1d([100e-90, 1.234567e-9j+3, -1234.999e8])
Expected:
           2
    1e-88 x + (3 + 1.235e-09j) x - 1.235e+11
Got:
            2
    1e-088 x + (3 + 1.235e-009j) x - 1.235e+011


----------------------------------------------------------------------
Ran 2140 tests in 13.281s

FAILED (KNOWNFAIL=3, SKIP=1, errors=4, failures=7)
<nose.result.TextTestResult run=2140 errors=4 failures=7>


>>> import scipy
>>> scipy.__file__
'c:\\josef\\eclipsegworkspace\\scipy-trunk-work\\scipytrunkcopy\\scipy\\__init__
.pyc'
>>> scipy.test()
Running unit tests for scipy
NumPy version 1.4.0.dev7539
NumPy is installed in C:\Josef\_progs\Subversion\numpy-trunk\dist\numpy-1.4.0.de
v7539.win32\Programs\Python25\Lib\site-packages\numpy
SciPy version 0.8.0.dev
SciPy is installed in c:\josef\eclipsegworkspace\scipy-trunk-work\scipytrunkcopy
\scipy
Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int
el)]
nose version 0.11.1
E.....
...

crash

with verbose the last tests are

test_rank1 (test_signaltools.TestCorrelateComplexSingle) ... ok
test_rank3 (test_signaltools.TestCorrelateComplexSingle) ... ok
test_rank1 (test_signaltools.TestCorrelateDouble) ... ok
test_rank3 (test_signaltools.TestCorrelateDouble) ... ok
test_rank1 (test_signaltools.TestCorrelateExtended) ... ok
test_rank3 (test_signaltools.TestCorrelateExtended) ... ok
test_rank1 (test_signaltools.TestCorrelateObject) ... ok
test_rank3 (test_signaltools.TestCorrelateObject) ... ok
test_rank1 (test_signaltools.TestCorrelateSingle) ... ok
test_rank3 (test_signaltools.TestCorrelateSingle) ... ok
test_signaltools.TestDecimate.test_basic ...

crash


From mdroe at stsci.edu  Mon Oct 19 10:55:43 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Mon, 19 Oct 2009 10:55:43 -0400
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
Message-ID: <4ADC7DEF.704@stsci.edu>

I've filed a bug and attached a patch:

http://projects.scipy.org/numpy/ticket/1267

No guarantees that I've found all of the alignment issues.  I did a grep 
for "PyObject **" to find possible locations where PyObject * in arrays 
were being dereferenced.  If I could write a unit test to make it fall 
over on Solaris, then I fixed it, otherwise I left it alone.  For 
example, there are places where misaligned dereferencing is 
theoretically possible (OBJECT_dot, OBJECT_compare), but a higher level 
function already did a "BEHAVED" array cast.  In those cases I added a 
unit test so hopefully we'll be able to catch it in the future if the 
caller no longer ensures well-behavedness.

The unit tests are passing with this patch on Sparc (SunOS 5.8), x86 
(RHEL 4) and x86_64 (RHEL 4).  Those of you who care about less common 
architectures may want to try the patch out.  Since I don't know the 
alignment requirements of all of the supported platforms, I erred on the 
side of caution: only x86 and x86_64 will perform unaligned pointer 
dereferencing -- Everything else will use the slower-but-sure-to-work 
memcpy approach.  That can easily be changed in npy_cpu.h if necessary.

Mike

Charles R Harris wrote:
>
>
> On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom <mdroe at stsci.edu 
> <mailto:mdroe at stsci.edu>> wrote:
>
>     On 10/16/2009 11:35 PM, Travis Oliphant wrote:
>     >
>     > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>     >
>     >> I recently committed a regression test and bugfix for object
>     pointers in
>     >> record arrays of unaligned size (meaning where each record is not a
>     >> multiple of sizeof(PyObject **)).
>     >>
>     >> For example:
>     >>
>     >>        a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>     >>        a2 = np.zeros((10,), 'S10')
>     >>        # This copying would segfault
>     >>        a1['o'] = a2
>     >>
>     >> http://projects.scipy.org/numpy/ticket/1198
>     >>
>     >> Unfortunately, this unit test has opened up a whole hornet's
>     nest of
>     >> alignment issues on Solaris.  The various reference counting
>     functions
>     >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object
>     pointers,
>     >> for instance.  Interestingly, there are comments in there saying
>     >> "handles misaligned data" (eg. line 190), but in fact it
>     doesn't, and
>     >> doesn't look to me like it would.  But I won't rule out a
>     mistake in
>     >> building it on my part.
>     >
>     > Thanks for this bug report.      It would be very helpful if you
>     could
>     > provide the line number where the code is giving a bus error and
>     > explain why you think the code in question does not handle
>     misaligned
>     > data (it still seems like it should to me --- but perhaps I must be
>     > missing something --- I don't have a Solaris box to test on).
>     > Perhaps, the real problem is elsewhere (such as other places
>     where the
>     > mistake of forgetting about striding needing to be aligned also
>     before
>     > pursuing the fast alignment path that you pointed out in another
>     place
>     > of code).
>     >
>     > This was the thinking for why the code (that I think is in question)
>     > should handle mis-aligned data:
>     >
>     > 1) pointers that are not aligned to the correct size need to be
>     copied
>     > to an aligned memory area before being de-referenced.
>     > 2) static variables defined in a function will be aligned by the C
>     > compiler.
>     >
>     > So, what the code in refcnt.c does is to copy the value in the NumPy
>     > data-area (i.e. pointed to by it->dataptr) to another memory
>     location
>     > (the stack variable temp), dereference it and then increment it's
>     > reference count.
>     >
>     > 196:  temp = (PyObject **)it->dataptr;
>     > 197:  Py_XINCREF(*temp);
>     This is exactly an instance that fails.  Let's say we have a
>     PyObject at
>     an aligned location 0x4000 (PyObjects themselves always seem to be
>     aligned -- I strongly suspect CPython is enforcing that).  Then,
>     we can
>     create a recarray such that some of the PyObject*'s in it are at
>     unaligned locations.  For example, if the dtype is 'O,c', you have a
>     record stride of 5 which creates unaligned PyObject*'s:
>
>        OOOOcOOOOcOOOOc
>        0123456789abcde
>             ^    ^
>
>     Now in the code above, let's assume that it->dataptr points to an
>     unaligned location, 0x8005.  Assigning it to temp puts the same
>     unaligned value in temp, 0x8005.  That is:
>
>     &temp == 0x1000 /* The location of temp *is* on the stack and
>     aligned */
>        temp == 0x8005 /* But its value as a pointer points to an unaligned
>     memory location */
>        *temp == 0x4000 /* Dereferencing it should get us back to the
>     original
>                           PyObject * pointer, but dereferencing an
>     unaligned memory location
>                           fails with a bus error on Solaris */
>
>     So the bus error occurs on line 197.
>
>     Note that something like:
>
>        PyObject* temp;
>        temp = *(PyObject **)it->dataptr;
>
>     would also fail.
>
>     The solution (this is what works for me, though there may be a
>     better way):
>
>         PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject
>     **) */
>         /* memcpy works byte-by-byte, so can handle an unaligned
>     assignment */
>         memcpy(&temp, it->dataptr, sizeof(PyObject *));
>         Py_XINCREF(temp);
>
>     I'm proposing adding a macro which on Intel/AMD would be defined as:
>
>     #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src))
>
>     and on alignment-required platforms as:
>
>     #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src),
>     sizeof(PyObject *))
>
>     and it would be used something like:
>
>     COPY_PYOBJECT_PTR(&temp, it->dataptr);
>
>
> This looks right to me, but I'll let Travis sign off on it.
>
> <snip>
>
> Chuck
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From josef.pktd at gmail.com  Mon Oct 19 11:26:04 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 19 Oct 2009 11:26:04 -0400
Subject: [Numpy-discussion] numpy build/installation problems ?
In-Reply-To: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
Message-ID: <1cd32cbb0910190826s5dd5aa3ftf1f8c5039d145256@mail.gmail.com>

On Mon, Oct 19, 2009 at 10:40 AM,  <josef.pktd at gmail.com> wrote:
> I wanted to finally upgrade my numpy, so I can build scipy trunk
> again, but I get test failures with numpy. And running the tests of
> the previously compiled version of scipy crashes in signaltools.
>
> Is this a problem with my build (the usual official MingW on
> WindowsXP), or are there still ABI problems in numpy trunk?
> I did the build twice with (I think) clean directories and get the same result.
>
> Thanks,
>
> Josef

Forgot to mention my previous version of scipy was build against numpy
release 1.3.0

I recompiled scipy, and have no problems building and running scipy
trunk against numpy trunk.

One problem I had, was that during the build of scipy,    gcc failed
with unknown npymath.
I had to copy the file libnpymath.a to my Python libs directory, then
the build finished without
problems.

Josef


From pgmdevlist at gmail.com  Mon Oct 19 11:43:59 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 19 Oct 2009 11:43:59 -0400
Subject: [Numpy-discussion] numpy build/installation problems ?
In-Reply-To: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
Message-ID: <E9CD305F-D692-493F-A732-86595C198C19@gmail.com>


On Oct 19, 2009, at 10:40 AM, josef.pktd at gmail.com wrote:

> I wanted to finally upgrade my numpy, so I can build scipy trunk
> again, but I get test failures with numpy. And running the tests of
> the previously compiled version of scipy crashes in signaltools.

The ConversionWarnings are expected. I'm probably to be blamed for the  
AttributeErrors (I'm testing on 2.6 where tuples do have an index  
Attribute), I gonna check that.


From gnurser at googlemail.com  Mon Oct 19 12:01:52 2009
From: gnurser at googlemail.com (George Nurser)
Date: Mon, 19 Oct 2009 17:01:52 +0100
Subject: [Numpy-discussion] numpy build/installation problems ?
In-Reply-To: <E9CD305F-D692-493F-A732-86595C198C19@gmail.com>
References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
	<E9CD305F-D692-493F-A732-86595C198C19@gmail.com>
Message-ID: <1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com>

I had the same 4 errors in genfromtext yesterday  when I upgraded numpy r 7539.
 mac os x python 2.5.2.

--George.

2009/10/19 Pierre GM <pgmdevlist at gmail.com>:
>
> On Oct 19, 2009, at 10:40 AM, josef.pktd at gmail.com wrote:
>
>> I wanted to finally upgrade my numpy, so I can build scipy trunk
>> again, but I get test failures with numpy. And running the tests of
>> the previously compiled version of scipy crashes in signaltools.
>
> The ConversionWarnings are expected. I'm probably to be blamed for the
> AttributeErrors (I'm testing on 2.6 where tuples do have an index
> Attribute), I gonna check that.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From pgmdevlist at gmail.com  Mon Oct 19 12:26:48 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Mon, 19 Oct 2009 12:26:48 -0400
Subject: [Numpy-discussion] numpy build/installation problems ?
In-Reply-To: <1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com>
References: <1cd32cbb0910190740x1d23ee91s2dc9a13a4f4a54ca@mail.gmail.com>
	<E9CD305F-D692-493F-A732-86595C198C19@gmail.com>
	<1d1e6ea70910190901g39fe5fehd30a9f49806a0fd6@mail.gmail.com>
Message-ID: <23EF0D4B-433C-408F-B69D-7D8105853B37@gmail.com>


On Oct 19, 2009, at 12:01 PM, George Nurser wrote:

> I had the same 4 errors in genfromtext yesterday  when I upgraded  
> numpy r 7539.
> mac os x python 2.5.2.

I'm on it, should be fixed in a few hours.
Please, don't hesitate to open a ticket next time (so that I remember  
to test on 2.5 as well...).
Thx


From gokhansever at gmail.com  Mon Oct 19 12:32:18 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Mon, 19 Oct 2009 11:32:18 -0500
Subject: [Numpy-discussion] Multiple string formatting while writing an
	array into a file
In-Reply-To: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com>
References: <49d6b3500910181003w56dae6a5h76269d110e71f22e@mail.gmail.com>
Message-ID: <49d6b3500910190932w4dced851s36313a294dce787b@mail.gmail.com>

On Sun, Oct 18, 2009 at 12:03 PM, G?khan Sever <gokhansever at gmail.com>wrote:

> Hello,
>
> I have a relatively simple question which I couldn't figure out myself yet.
> I have an array that I am writing into a file using the following savetxt
> method.
>
> np.savetxt(fid, output_array, fmt='%12.4f', delimiter='')
>
>
> However, I have made some changes on the code and I require to write after
> 7th element of the array as integer instead of 12.4 formatted float. The
> change below doesn't help me to solve the problem since I get a "ValueError:
> setting an array element with a sequence."
>
> np.savetxt(fid, (output_array[:7], output_array[7:]), fmt=('%12.4f',
> '%12d'), delimiter='')
>
> What would be the right approach to fix this issue?
>
> Thanks.
>
> --
> G?khan
>

Pre-defining a format like shown below, seemingly help me to fix:

I[48]: format=""

I[49]: for i in range(len(output_array)):
    if i<7:
        format += "%12.4f "
    else:
        format += "%12d "

np.savetxt(fid, output_array, fmt=format)


However couldn't figure out to make it work in-place. From the
savetxt<http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html#numpy.savetxt>documentation:

*fmt* : str or sequence of strs

A single format (%10.5f), a sequence of formats, or a multi-format string,
e.g. ?Iteration %d ? %10.5f?, in which case *delimiter* is ignored

Any ideas how to make this work via in-place iteration? I could add an
example to the function doc once I learn how to do this.

Thanks.


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/745bb641/attachment.html>

From dagmarwi at gmail.com  Mon Oct 19 13:00:15 2009
From: dagmarwi at gmail.com (dagmar wismeijer)
Date: Mon, 19 Oct 2009 19:00:15 +0200
Subject: [Numpy-discussion] opening pickled numarray data with numpy
Message-ID: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com>

Hi,

I've been trying to open (using numpy) old pickled data files that I once
created using numarray, but I keep getting the message that there is no
module numarray.generic.
Is there any way I could open these datafiles without installing numarray
again?

Thanks in advance,

Dagmar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/33fb3cfa/attachment.html>

From v.for.vandal at gmail.com  Mon Oct 19 15:55:22 2009
From: v.for.vandal at gmail.com (Artem Serebriyskiy)
Date: Mon, 19 Oct 2009 23:55:22 +0400
Subject: [Numpy-discussion] user defined types
Message-ID: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com>

Hello! Would you please give me some examples of open source projects which
use the implementation of user defined types for numpy library?
(implementation on the C-API level)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/ab0ba9f5/attachment.html>

From robert.kern at gmail.com  Mon Oct 19 16:29:32 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 19 Oct 2009 15:29:32 -0500
Subject: [Numpy-discussion] user defined types
In-Reply-To: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com>
References: <75d4f97a0910191255s1cb55645laadb26ccde850c7a@mail.gmail.com>
Message-ID: <3d375d730910191329p32cba260sd512fbcd9ce8a6b@mail.gmail.com>

On Mon, Oct 19, 2009 at 14:55, Artem Serebriyskiy
<v.for.vandal at gmail.com> wrote:
> Hello! Would you please give me some examples of open source projects which
> use the implementation of user defined types for numpy library?
> (implementation on the C-API level)

I'm not sure that anyone currently does. We do have an example in
doc/newdtype_example/.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From oliphant at enthought.com  Mon Oct 19 17:55:17 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Mon, 19 Oct 2009 16:55:17 -0500
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4ADC7DEF.704@stsci.edu>
References: <4AD75061.2020908@stsci.edu>	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
	<4ADC7DEF.704@stsci.edu>
Message-ID: <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>


On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote:

> I've filed a bug and attached a patch:
>
> http://projects.scipy.org/numpy/ticket/1267
>
> No guarantees that I've found all of the alignment issues.  I did a  
> grep
> for "PyObject **" to find possible locations where PyObject * in  
> arrays
> were being dereferenced.  If I could write a unit test to make it fall
> over on Solaris, then I fixed it, otherwise I left it alone.  For
> example, there are places where misaligned dereferencing is
> theoretically possible (OBJECT_dot, OBJECT_compare), but a higher  
> level
> function already did a "BEHAVED" array cast.  In those cases I added a
> unit test so hopefully we'll be able to catch it in the future if the
> caller no longer ensures well-behavedness.


This patch looks great technically.  Thank you for tracking this down  
and correcting my error.

Right now, though, the patch has too many white-space only changes in  
it.  Could you submit a new patch that removes those changes?

Thanks,

-Travis

--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant at enthought.com


From charlesr.harris at gmail.com  Mon Oct 19 18:28:16 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 19 Oct 2009 16:28:16 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
	<4ADC7DEF.704@stsci.edu>
	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>
Message-ID: <e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>

On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant <oliphant at enthought.com>wrote:

>
> On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote:
>
> > I've filed a bug and attached a patch:
> >
> > http://projects.scipy.org/numpy/ticket/1267
> >
> > No guarantees that I've found all of the alignment issues.  I did a
> > grep
> > for "PyObject **" to find possible locations where PyObject * in
> > arrays
> > were being dereferenced.  If I could write a unit test to make it fall
> > over on Solaris, then I fixed it, otherwise I left it alone.  For
> > example, there are places where misaligned dereferencing is
> > theoretically possible (OBJECT_dot, OBJECT_compare), but a higher
> > level
> > function already did a "BEHAVED" array cast.  In those cases I added a
> > unit test so hopefully we'll be able to catch it in the future if the
> > caller no longer ensures well-behavedness.
>
>
> This patch looks great technically.  Thank you for tracking this down
> and correcting my error.
>
> Right now, though, the patch has too many white-space only changes in
> it.  Could you submit a new patch that removes those changes?
>
>
The old whitespace is hard tabs and needs to be replaced anyway. The new
whitespace doesn't always get the indentation right, however. That file
needs a style/whitespace cleanup.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/d67ffc56/attachment.html>

From oliphant at enthought.com  Mon Oct 19 18:29:50 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Mon, 19 Oct 2009 17:29:50 -0500
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
Message-ID: <C5417D31-0B87-4D8E-BA56-1751DD2162CC@enthought.com>


On Oct 17, 2009, at 7:49 AM, Darren Dale wrote:

> numpy's functions, especially ufuncs, have had some ability to support
> subclasses through the ndarray.__array_wrap__ method, which provides
> masked arrays or quantities (for example) with an opportunity to set
> the class and metadata of the output array at the end of an operation.
> An example is
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'meters')
> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>
> At SciPy2009 we committed a change to the numpy trunk that provides a
> chance to determine the class and some metadata of the output *before*
> the ufunc performs its calculation, but after output array has been
> established (and its data is still uninitialized). Consider:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'J')
> numpy.add(q1, q2, q1)
> # or equivalently:
> # q1 += q2
>
> With only __array_wrap__, the attempt to propagate the units happens
> after q1's data was updated in place, too late to raise an error, the
> data is now corrupted. __array_prepare__ solves that problem, an
> exception can be raised in time.
>
> Now I'd like to suggest one more improvement to numpy to make its
> functions more generic. Consider one more example:
>
> q1 = Quantity(1, 'meter')
> q2 = Quantity(2, 'feet')
> numpy.add(q1, q2)
>
> In this case, I'd like an opportunity to operate on the input arrays
> on the way in to the ufunc, to rescale the second input to meters. I
> think it would be a hack to try to stuff this capability into
> __array_prepare__. One form of this particular example is already
> supported in quantities, "q1 + q2", by overriding the __add__ method
> to rescale the second input, but there are ufuncs that do not have an
> associated special method. So I'd like to look into adding another
> check for a special method, perhaps called __input_prepare__. My time
> is really tight for the next month, so I'd rather not start if there
> are strong objections, but otherwise, I'd like to try to try to get it
> in in time for numpy-1.4. (Has a timeline been established?)
>
> I think it will be not too difficult to document this overall scheme:
>
> When calling numpy functions:
>
> 1) __input_prepare__ provides an opportunity to operate on the inputs
> to yield versions that are compatible with the operation (they should
> obviously not be modified in place)
>
> 2) the output array is established
>
> 3) __array_prepare__ is used to determine the class of the output
> array, as well as any metadata that needs to be established before the
> operation proceeds
>
> 4) the ufunc performs its operations
>
> 5) __array_wrap__ provides an opportunity to update the output array
> based on the results of the computation
>
> Comments, criticisms? If PEP 3124^ were already a part of the standard
> library, that could serve as the basis for generalizing numpy's
> functions. But I think the PEP will not be approved in its current
> form, and it is unclear when and if the author will revisit the
> proposal. The scheme I'm imagining might be sufficient for our
> purposes.

This seems like it could work.    So, basically ufuncs will take any  
object as input and call it's __input__prepare__ method?   This should  
return a sub-class of an ndarray?

-Travis


From robert.kern at gmail.com  Mon Oct 19 18:36:39 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 19 Oct 2009 17:36:39 -0500
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com> 
	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com> 
	<4ADC7DEF.704@stsci.edu>
	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com> 
	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>
Message-ID: <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>

On Mon, Oct 19, 2009 at 17:28, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant <oliphant at enthought.com>
> wrote:

>> Right now, though, the patch has too many white-space only changes in
>> it. ?Could you submit a new patch that removes those changes?
>
> The old whitespace is hard tabs and needs to be replaced anyway. The new
> whitespace doesn't always get the indentation right, however. That file
> needs a style/whitespace cleanup.

That's fine, but whitespace cleanup needs to be done in commits that
are separate from the functional changes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From charlesr.harris at gmail.com  Mon Oct 19 18:54:16 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 19 Oct 2009 16:54:16 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
	<4ADC7DEF.704@stsci.edu>
	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>
	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>
	<3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>
Message-ID: <e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>

On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Mon, Oct 19, 2009 at 17:28, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant <oliphant at enthought.com
> >
> > wrote:
>
> >> Right now, though, the patch has too many white-space only changes in
> >> it.  Could you submit a new patch that removes those changes?
> >
> > The old whitespace is hard tabs and needs to be replaced anyway. The new
> > whitespace doesn't always get the indentation right, however. That file
> > needs a style/whitespace cleanup.
>
> That's fine, but whitespace cleanup needs to be done in commits that
> are separate from the functional changes.
>
>
I agree, but it can be tricky to preserve hard tabs when your editor uses
spaces and has hard tabs set to 8 spaces. That file is on my cleanup list
anyway, I'll try to get to it this weekend.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/99fc033e/attachment.html>

From jrennie at gmail.com  Mon Oct 19 19:51:53 2009
From: jrennie at gmail.com (Jason Rennie)
Date: Mon, 19 Oct 2009 19:51:53 -0400
Subject: [Numpy-discussion] opening pickled numarray data with numpy
In-Reply-To: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com>
References: <87701dd70910191000n7110c024i2809bb7f97a498fa@mail.gmail.com>
Message-ID: <75c31b2a0910191651q19932d53kef173130a02cf3c1@mail.gmail.com>

Try creating an empty module/class with the given name.  I.e. create a
'numarray' dir off your PYTHONPATH, create an empty __init__.py file, create
a 'generic.py' file in that dir and populate it with whatever class python
complains about like so:
#!/usr/bin/env python

class MissingClass(object):
    pass

Cheers,

Jason

On Mon, Oct 19, 2009 at 1:00 PM, dagmar wismeijer <dagmarwi at gmail.com>wrote:

> Hi,
>
> I've been trying to open (using numpy) old pickled data files that I once
> created using numarray, but I keep getting the message that there is no
> module numarray.generic.
> Is there any way I could open these datafiles without installing numarray
> again?
>
> Thanks in advance,
>
> Dagmar
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091019/2a238325/attachment.html>

From peridot.faceted at gmail.com  Mon Oct 19 23:45:35 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 19 Oct 2009 23:45:35 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
Message-ID: <ce557a360910192045p115b57dey1e59595a0b5f09c2@mail.gmail.com>

2009/10/19 Sebastian Walter <sebastian.walter at gmail.com>:
>
> I'm all for generic (u)funcs since they might come handy for me since
> I'm doing lots of operation on arrays of polynomials.

Just as a side note, if you don't mind my asking, what sorts of
operations do you do on arrays of polynomials? In a thread on
scipy-dev we're discussing improving scipy's polynomial support, and
we'd be happy to get some more feedback on what they need to be able
to do.

Thanks!
Anne


From sebastian.walter at gmail.com  Tue Oct 20 03:21:42 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Tue, 20 Oct 2009 09:21:42 +0200
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <ce557a360910192045p115b57dey1e59595a0b5f09c2@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
	<ce557a360910192045p115b57dey1e59595a0b5f09c2@mail.gmail.com>
Message-ID: <ec9f80fa0910200021r24e25d32vf5331977f029f3e8@mail.gmail.com>

On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/10/19 Sebastian Walter <sebastian.walter at gmail.com>:
>>
>> I'm all for generic (u)funcs since they might come handy for me since
>> I'm doing lots of operation on arrays of polynomials.
>
> Just as a side note, if you don't mind my asking, what sorts of
> operations do you do on arrays of polynomials? In a thread on
> scipy-dev we're discussing improving scipy's polynomial support, and
> we'd be happy to get some more feedback on what they need to be able
> to do.

I've been reading (and commenting) that thread ;)
 I'm doing algorithmic differentiation by computing on truncated
Taylor polynomials in the Powerbasis,
 i.e. always truncating all operation at degree D
z(t) = \sum_d=0^{D-1} z_d t^d =  x(t) * y(t) = \sum_{d=0}^{D-1}
\sum_{k=0}^d x_k * y_{d-k} + O(t^D)

Using other bases does not make sense in my case since the truncation
of all terms of higher degree than t^D
has afaik no good counterpart for bases like chebycheff.
On the other hand, I need to be generic in the coefficients, e.g.
z_d from above could be a tensor of any shape,  e.g.  a matrix.

Typical workcase when I need to perform operations on arrays of
polynomials is best explained in a talk I gave earlier this year:
http://github.com/b45ch1/pyadolc/raw/master/doc/walter_talk_algorithmic_differentiation_in_python_with_pyadolc_pycppad_algopy.pdf
on slide 7 and 8. (the class adouble "is" a Taylor polynomial).

>
> Thanks!
> Anne
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From markus.proeller at ifm.com  Tue Oct 20 05:17:47 2009
From: markus.proeller at ifm.com (markus.proeller at ifm.com)
Date: Tue, 20 Oct 2009 11:17:47 +0200
Subject: [Numpy-discussion] why does binary_repr don't support arrays
Message-ID: <OF1ECCC356.71109568-ONC1257655.0032D20D-C1257655.00331A11@ifm-electronic.com>

Hello,

I'm always wondering why binary_repr doesn't allow arrays as input values. 
I always have to use a work around like:

import numpy as np

def binary_repr(arr, width=None):
    binary_list = map((lambda foo: np.binary_repr(foo, width)), 
arr.flatten())
    str_len_max = len(np.binary_repr(arr.max(), width=width))
    str_len_min = len(np.binary_repr(arr.min(), width=width))
    if str_len_max > str_len_min:
        str_len = str_len_max
    else:
        str_len = str_len_min
    binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len))
    return binary_array.reshape(arr.shape)

Is there a reason why arrays are not supported or is there another 
function that does support arrays?

Thanks,

Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091020/30c8ba2c/attachment.html>

From sebastian.walter at gmail.com  Tue Oct 20 05:24:51 2009
From: sebastian.walter at gmail.com (Sebastian Walter)
Date: Tue, 20 Oct 2009 11:24:51 +0200
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <a08e5f80910190455l6ea7f20cj2cdc0403a2ae015a@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
	<a08e5f80910190455l6ea7f20cj2cdc0403a2ae015a@mail.gmail.com>
Message-ID: <ec9f80fa0910200224h5ec59277q51ce7aad0006a704@mail.gmail.com>

I'm not very familiar with the underlying C-API of numpy, so this has
to be taken with a grain of salt.

The reason why I'm curious about the genericity is that it would be
awesome to have:
1) ufuncs like sin, cos, exp... to work on arrays of any object (this
works already)
2) funcs like dot, eig, etc, to work on arrays of objects( works for
dot already, but not for eig)
3) ufuncs and funcs to work on any objects

examples that would be nice to work are among others:
* arrays of polynomials, i.e. arrays of objects
* polynomials with tensor coefficients, object with underlying array structure

I thought that the most elegant way to implement that would be to have
all numpy functions try  to call either
1)  the class function with the same name as the numpy function
2) or if the class function is not implemented, the member function
with the same name as the numpy function
3) if none exists, raise an exception

E.g.

1)
if isinstance(x) = Foo
then numpy.sin(x)
would call Foo.sin(x) if it doesn't know how to handle Foo

2)
similarly, for arrays of objects of type Foo:
 x = np.array([Foo(1), Foo(2)])

Then numpy.sin(x)
should try to return npy.array([Foo.sin(xi) for xi in x])
or in case Foo.sin is not implemented as class function,
return : np.array([xi.sin() for xi in x])

Therefore, I somehow expected something like that:
Quantity would derive from numpy.ndarray.
When calling  Quantity.__new__(cls) creates the member functions
__add__, __imul__, sin, exp, ...
where each function has a preprocessing part and a post processing part.
After the preprocessing call the original ufuncs on the base class
object, e.g. __add__


Sebastian


On Mon, Oct 19, 2009 at 1:55 PM, Darren Dale <dsdale24 at gmail.com> wrote:
> On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter
> <sebastian.walter at gmail.com> wrote:
>> On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale <dsdale24 at gmail.com> wrote:
>>> numpy's functions, especially ufuncs, have had some ability to support
>>> subclasses through the ndarray.__array_wrap__ method, which provides
>>> masked arrays or quantities (for example) with an opportunity to set
>>> the class and metadata of the output array at the end of an operation.
>>> An example is
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'meters')
>>> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>>>
>>> At SciPy2009 we committed a change to the numpy trunk that provides a
>>> chance to determine the class and some metadata of the output *before*
>>> the ufunc performs its calculation, but after output array has been
>>> established (and its data is still uninitialized). Consider:
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'J')
>>> numpy.add(q1, q2, q1)
>>> # or equivalently:
>>> # q1 += q2
>>>
>>> With only __array_wrap__, the attempt to propagate the units happens
>>> after q1's data was updated in place, too late to raise an error, the
>>> data is now corrupted. __array_prepare__ solves that problem, an
>>> exception can be raised in time.
>>>
>>> Now I'd like to suggest one more improvement to numpy to make its
>>> functions more generic. Consider one more example:
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'feet')
>>> numpy.add(q1, q2)
>>>
>>> In this case, I'd like an opportunity to operate on the input arrays
>>> on the way in to the ufunc, to rescale the second input to meters. I
>>> think it would be a hack to try to stuff this capability into
>>> __array_prepare__. One form of this particular example is already
>>> supported in quantities, "q1 + q2", by overriding the __add__ method
>>> to rescale the second input, but there are ufuncs that do not have an
>>> associated special method. So I'd like to look into adding another
>>> check for a special method, perhaps called __input_prepare__. My time
>>> is really tight for the next month, so I'd rather not start if there
>>> are strong objections, but otherwise, I'd like to try to try to get it
>>> in in time for numpy-1.4. (Has a timeline been established?)
>>>
>>> I think it will be not too difficult to document this overall scheme:
>>>
>>> When calling numpy functions:
>>>
>>> 1) __input_prepare__ provides an opportunity to operate on the inputs
>>> to yield versions that are compatible with the operation (they should
>>> obviously not be modified in place)
>>>
>>> 2) the output array is established
>>>
>>> 3) __array_prepare__ is used to determine the class of the output
>>> array, as well as any metadata that needs to be established before the
>>> operation proceeds
>>>
>>> 4) the ufunc performs its operations
>>>
>>> 5) __array_wrap__ provides an opportunity to update the output array
>>> based on the results of the computation
>>>
>>> Comments, criticisms? If PEP 3124^ were already a part of the standard
>>> library, that could serve as the basis for generalizing numpy's
>>> functions. But I think the PEP will not be approved in its current
>>> form, and it is unclear when and if the author will revisit the
>>> proposal. The scheme I'm imagining might be sufficient for our
>>> purposes.
>>
>> I'm all for generic (u)funcs since they might come handy for me since
>> I'm doing lots of operation on arrays of polynomials.
>> ?I don't quite get the reasoning though.
>> Could you correct me where I get it wrong?
>> * the class Quantity derives from numpy.ndarray
>> * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for
>> q1 = Quantity(1, 'meter')
>> q2 = Quantity(2, 'J')
>> by raising an exception when performing q1+=q2
>
> No, Quantity does not override __iadd__ to catch this. Quantity
> implements __array_prepare__ to perform the dimensional analysis based
> on the identity of the ufunc and the inputs, and set the class and
> dimensionality of the output array, or raise an error when dimensional
> analysis fails. This approach lets quantities support all ufuncs (in
> principle), not just built in numerical operations. It should also
> make it easier to subclass from MaskedArray, so we could have a
> MaskedQuantity without having to establish yet another suite of ufuncs
> specific to quantities or masked quantities.
>
>> * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before
>> raising an exception
>
> That was solved by the addition of __array_prepare__ to numpy back in
> August. What I am proposing now is supporting operations on arrays
> that would be compatible if we had a chance to transform them on the
> way into the ufunc, like "meter + foot".
>
> Darren
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From dsdale24 at gmail.com  Tue Oct 20 07:46:42 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Tue, 20 Oct 2009 07:46:42 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <ec9f80fa0910200224h5ec59277q51ce7aad0006a704@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
	<a08e5f80910190455l6ea7f20cj2cdc0403a2ae015a@mail.gmail.com>
	<ec9f80fa0910200224h5ec59277q51ce7aad0006a704@mail.gmail.com>
Message-ID: <a08e5f80910200446m610d88b9pec8f78b1551354d9@mail.gmail.com>

On Tue, Oct 20, 2009 at 5:24 AM, Sebastian Walter
<sebastian.walter at gmail.com> wrote:
> I'm not very familiar with the underlying C-API of numpy, so this has
> to be taken with a grain of salt.
>
> The reason why I'm curious about the genericity is that it would be
> awesome to have:
> 1) ufuncs like sin, cos, exp... to work on arrays of any object (this
> works already)
> 2) funcs like dot, eig, etc, to work on arrays of objects( works for
> dot already, but not for eig)
> 3) ufuncs and funcs to work on any objects

I think if you want to work on any object, you need something like the
PEP I mentioned earlier. What I am proposing is to use the existing
mechanism in numpy, check __array_priority__ to determine which
input's __input_prepare__ to call.

> examples that would be nice to work are among others:
> * arrays of polynomials, i.e. arrays of objects
> * polynomials with tensor coefficients, object with underlying array structure
>
> I thought that the most elegant way to implement that would be to have
> all numpy functions try ?to call either
> 1) ?the class function with the same name as the numpy function
> 2) or if the class function is not implemented, the member function
> with the same name as the numpy function
> 3) if none exists, raise an exception
>
> E.g.
>
> 1)
> if isinstance(x) = Foo
> then numpy.sin(x)
> would call Foo.sin(x) if it doesn't know how to handle Foo

How does it numpy.sin know if it knows how to handle Foo? numpy.sin
will happily process the data of subclasses of ndarray, but if you
give it a quantity with units of degrees it is going to return garbage
and not care.

> 2)
> similarly, for arrays of objects of type Foo:
> ?x = np.array([Foo(1), Foo(2)])
>
> Then numpy.sin(x)
> should try to return npy.array([Foo.sin(xi) for xi in x])
> or in case Foo.sin is not implemented as class function,
> return : np.array([xi.sin() for xi in x])

I'm not going to comment on this, except to say that it is outside the
scope of my proposal.

> Therefore, I somehow expected something like that:
> Quantity would derive from numpy.ndarray.
> When calling ?Quantity.__new__(cls) creates the member functions
> __add__, __imul__, sin, exp, ...
> where each function has a preprocessing part and a post processing part.
> After the preprocessing call the original ufuncs on the base class
> object, e.g. __add__

It is more complicated than that. Ufuncs don't call array methods, its
the other way around. ndarray.__add__ calls numpy.add. If you have a
custom operation to perform on numpy arrays, you write a ufunc, not a
subclass. What you are proposing is a very significant change to
numpy.

Darren


From dsdale24 at gmail.com  Tue Oct 20 08:04:19 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Tue, 20 Oct 2009 08:04:19 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <C5417D31-0B87-4D8E-BA56-1751DD2162CC@enthought.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<C5417D31-0B87-4D8E-BA56-1751DD2162CC@enthought.com>
Message-ID: <a08e5f80910200504p652103b7r94e7c13e67e5cefc@mail.gmail.com>

Hi Travis,

On Mon, Oct 19, 2009 at 6:29 PM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On Oct 17, 2009, at 7:49 AM, Darren Dale wrote:
[...]
>> When calling numpy functions:
>>
>> 1) __input_prepare__ provides an opportunity to operate on the inputs
>> to yield versions that are compatible with the operation (they should
>> obviously not be modified in place)
>>
>> 2) the output array is established
>>
>> 3) __array_prepare__ is used to determine the class of the output
>> array, as well as any metadata that needs to be established before the
>> operation proceeds
>>
>> 4) the ufunc performs its operations
>>
>> 5) __array_wrap__ provides an opportunity to update the output array
>> based on the results of the computation
>>
>> Comments, criticisms? If PEP 3124^ were already a part of the standard
>> library, that could serve as the basis for generalizing numpy's
>> functions. But I think the PEP will not be approved in its current
>> form, and it is unclear when and if the author will revisit the
>> proposal. The scheme I'm imagining might be sufficient for our
>> purposes.
>
> This seems like it could work. ? ?So, basically ufuncs will take any
> object as input and call it's __input__prepare__ method? ? This should
> return a sub-class of an ndarray?

ufuncs would call __input_prepare__ on the input declaring the highest
__array_priority__, just like ufuncs do with __array_wrap__, passing a
tuple of inputs and the ufunc itself (provided for context).
__input_prepare__ would return a tuple of inputs that the ufunc would
use for computation, I'm not sure if these need to be arrays or not, I
think I can give a better answer once I start the implementation (next
few days I think).

Darren


From mdroe at stsci.edu  Tue Oct 20 09:02:46 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Tue, 20 Oct 2009 09:02:46 -0400
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>	<4ADB043F.7060608@stsci.edu>	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>	<4ADC7DEF.704@stsci.edu>	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>	<3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>
	<e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>
Message-ID: <4ADDB4F6.3020700@stsci.edu>

I've resubmitted the patch without whitespace-only changes.

For what it's worth, I had followed the directions here:

http://projects.scipy.org/numpy/wiki/EmacsSetup

which say to perform "untabify" and "whitespace-cleanup".  Are those not 
current?  I had added these to my pre-save hooks under my numpy tree.

Cheers,
Mike

Charles R Harris wrote:
>
>
> On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern <robert.kern at gmail.com 
> <mailto:robert.kern at gmail.com>> wrote:
>
>     On Mon, Oct 19, 2009 at 17:28, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>     >
>     > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant
>     <oliphant at enthought.com <mailto:oliphant at enthought.com>>
>     > wrote:
>
>     >> Right now, though, the patch has too many white-space only
>     changes in
>     >> it.  Could you submit a new patch that removes those changes?
>     >
>     > The old whitespace is hard tabs and needs to be replaced anyway.
>     The new
>     > whitespace doesn't always get the indentation right, however.
>     That file
>     > needs a style/whitespace cleanup.
>
>     That's fine, but whitespace cleanup needs to be done in commits that
>     are separate from the functional changes.
>
>
> I agree, but it can be tricky to preserve hard tabs when your editor 
> uses spaces and has hard tabs set to 8 spaces. That file is on my 
> cleanup list anyway, I'll try to get to it this weekend.
>
> Chuck
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From peridot.faceted at gmail.com  Tue Oct 20 11:04:23 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Tue, 20 Oct 2009 11:04:23 -0400
Subject: [Numpy-discussion] Another suggestion for making numpy's
	functions generic
In-Reply-To: <ec9f80fa0910200021r24e25d32vf5331977f029f3e8@mail.gmail.com>
References: <a08e5f80910170549h67809478ub8ebcf87157688e@mail.gmail.com>
	<ec9f80fa0910190010k5c223411h50724751a0a61223@mail.gmail.com>
	<ce557a360910192045p115b57dey1e59595a0b5f09c2@mail.gmail.com>
	<ec9f80fa0910200021r24e25d32vf5331977f029f3e8@mail.gmail.com>
Message-ID: <ce557a360910200804m61d5a53fh780b4e0b5fac786a@mail.gmail.com>

2009/10/20 Sebastian Walter <sebastian.walter at gmail.com>:
> On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald
> <peridot.faceted at gmail.com> wrote:
>> 2009/10/19 Sebastian Walter <sebastian.walter at gmail.com>:
>>>
>>> I'm all for generic (u)funcs since they might come handy for me since
>>> I'm doing lots of operation on arrays of polynomials.
>>
>> Just as a side note, if you don't mind my asking, what sorts of
>> operations do you do on arrays of polynomials? In a thread on
>> scipy-dev we're discussing improving scipy's polynomial support, and
>> we'd be happy to get some more feedback on what they need to be able
>> to do.
>
> I've been reading (and commenting) that thread ;)
> ?I'm doing algorithmic differentiation by computing on truncated
> Taylor polynomials in the Powerbasis,
> ?i.e. always truncating all operation at degree D
> z(t) = \sum_d=0^{D-1} z_d t^d = ?x(t) * y(t) = \sum_{d=0}^{D-1}
> \sum_{k=0}^d x_k * y_{d-k} + O(t^D)
>
> Using other bases does not make sense in my case since the truncation
> of all terms of higher degree than t^D
> has afaik no good counterpart for bases like chebycheff.
> On the other hand, I need to be generic in the coefficients, e.g.
> z_d from above could be a tensor of any shape, ?e.g. ?a matrix.

In fact, truncating at degree D for Chebyshev polynomials works
exactly the same way as it does for power polynomials, and if what you
care about is function approximation, it has much nicer behaviour. But
if what you care about is really truncated Taylor polynomials, there's
no beating the power basis.

I realize that arrays of polynomial objects are handy from a
bookkeeping point of view, but how does an array of polynomials, say
of shape (N,), differ from a single polynomial with coefficients of
shape (N,)? I think we need to provide the latter in our polynomial
classes, but as you point out, getting ufunc support for the former is
nontrivial.

Anne


From charlesr.harris at gmail.com  Tue Oct 20 12:13:53 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 20 Oct 2009 10:13:53 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <4ADDB4F6.3020700@stsci.edu>
References: <4AD75061.2020908@stsci.edu>
	<F9CB1F8B-2CA1-4899-A777-DFC98A57E776@enthought.com>
	<4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
	<4ADC7DEF.704@stsci.edu>
	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>
	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>
	<3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>
	<e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>
	<4ADDB4F6.3020700@stsci.edu>
Message-ID: <e06186140910200913o77061f21jde0b76c8664dde43@mail.gmail.com>

On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom <mdroe at stsci.edu> wrote:

> I've resubmitted the patch without whitespace-only changes.
>
> For what it's worth, I had followed the directions here:
>
> http://projects.scipy.org/numpy/wiki/EmacsSetup
>
> which say to perform "untabify" and "whitespace-cleanup".  Are those not
> current?  I had added these to my pre-save hooks under my numpy tree.
>
>
The problem is that hard tabs have crept into the file. The strict approach
in this case is to make two patches: the first cleans up the hard tabs, the
second fixes the problems.

How about I fix up the hard tabs and then you can make another patch?

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091020/2a28dfc4/attachment.html>

From josef.pktd at gmail.com  Tue Oct 20 13:16:18 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 20 Oct 2009 13:16:18 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <4ADAE897.1070000@bigpond.net.au>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
Message-ID: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>

On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben <gruben at bigpond.net.au> wrote:
> Hi Ga?l,
>
> If you've got a 1D array/vector called "a", I think the normal idiom is
>
> np.dot(a,a)
>
> For the more general case, I think
> np.tensordot(a, a, axes=something_else)
> should do it, where you should be able to figure out something_else for
> your particular case.

Is it really possible to get the same as np.sum(a*a, axis)  with
tensordot  if a.ndim=2 ?
Any way I try the "something_else", I get extra terms as in np.dot(a.T, a)

Josef


>
> Gary R.
>
> Gael Varoquaux wrote:
>> On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.pktd at gmail.com wrote:
>>>>>> Why aren't you using logaddexp ufunc from numpy?
>>
>>>>> Maybe because it is difficult to find, it doesn't have its own docs entry.
>>
>> Speaking of which...
>>
>> I thought that there was a readily-written, optimized function (or ufunc)
>> in numpy or scipy that calculated the sum of squares for an array
>> (possibly along an axis). However, I cannot find it.
>>
>> Is there something similar? If not, it is not the end of the world, the
>> operation is trivial to write.
>>
>> Cheers,
>>
>> Ga?l
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From peridot.faceted at gmail.com  Tue Oct 20 15:09:36 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Tue, 20 Oct 2009 15:09:36 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
	<1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
Message-ID: <ce557a360910201209k7818c556h53f393b70f1bde62@mail.gmail.com>

2009/10/20  <josef.pktd at gmail.com>:
> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben <gruben at bigpond.net.au> wrote:
>> Hi Ga?l,
>>
>> If you've got a 1D array/vector called "a", I think the normal idiom is
>>
>> np.dot(a,a)
>>
>> For the more general case, I think
>> np.tensordot(a, a, axes=something_else)
>> should do it, where you should be able to figure out something_else for
>> your particular case.
>
> Is it really possible to get the same as np.sum(a*a, axis)  with
> tensordot  if a.ndim=2 ?
> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a)

It seems like this would be a good place to apply numpy's
higher-dimensional ufuncs: what you want seems to just be the vector
inner product, broadcast over all other dimensions. In fact I believe
this is implemented in numpy as a demo: numpy.umath_tests.inner1d
should do the job.


Anne


From josef.pktd at gmail.com  Tue Oct 20 15:28:45 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 20 Oct 2009 15:28:45 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <ce557a360910201209k7818c556h53f393b70f1bde62@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
	<1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
	<ce557a360910201209k7818c556h53f393b70f1bde62@mail.gmail.com>
Message-ID: <1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com>

On Tue, Oct 20, 2009 at 3:09 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/10/20 ?<josef.pktd at gmail.com>:
>> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben <gruben at bigpond.net.au> wrote:
>>> Hi Ga?l,
>>>
>>> If you've got a 1D array/vector called "a", I think the normal idiom is
>>>
>>> np.dot(a,a)
>>>
>>> For the more general case, I think
>>> np.tensordot(a, a, axes=something_else)
>>> should do it, where you should be able to figure out something_else for
>>> your particular case.
>>
>> Is it really possible to get the same as np.sum(a*a, axis) ?with
>> tensordot ?if a.ndim=2 ?
>> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a)
>
> It seems like this would be a good place to apply numpy's
> higher-dimensional ufuncs: what you want seems to just be the vector
> inner product, broadcast over all other dimensions. In fact I believe
> this is implemented in numpy as a demo: numpy.umath_tests.inner1d
> should do the job.

Thanks, this works well, needs core in name
(I might have to learn how to swap or roll axis to use this for more than 2d.)

>>> np.core.umath_tests.inner1d(a.T, b.T)
array([12,  8, 16])
>>> (a*b).sum(0)
array([12,  8, 16])
>>> np.core.umath_tests.inner1d(a.T, b.T)
array([12,  8, 16])
>>> (a*a).sum(0)
array([126, 166, 214])
>>> np.core.umath_tests.inner1d(a.T, a.T)
array([126, 166, 214])


What's the status on these functions? They don't show up in the docs
or help, except for
a brief mention in the c-api:

http://docs.scipy.org/numpy/docs/numpy-docs/reference/c-api.generalized-ufuncs.rst/

Are they for public consumption and should go into the docs?
Or do they remain a hidden secret, to force users to read the mailing lists?

Josef


>
> Anne
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From charlesr.harris at gmail.com  Tue Oct 20 16:05:22 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 20 Oct 2009 14:05:22 -0600
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910200913o77061f21jde0b76c8664dde43@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu> <4ADB043F.7060608@stsci.edu>
	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>
	<4ADC7DEF.704@stsci.edu>
	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>
	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>
	<3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>
	<e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>
	<4ADDB4F6.3020700@stsci.edu>
	<e06186140910200913o77061f21jde0b76c8664dde43@mail.gmail.com>
Message-ID: <e06186140910201305n3994dcdcx77c1b8aed1407564@mail.gmail.com>

On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom <mdroe at stsci.edu>wrote:
>
>> I've resubmitted the patch without whitespace-only changes.
>>
>> For what it's worth, I had followed the directions here:
>>
>> http://projects.scipy.org/numpy/wiki/EmacsSetup
>>
>> which say to perform "untabify" and "whitespace-cleanup".  Are those not
>> current?  I had added these to my pre-save hooks under my numpy tree.
>>
>>
> The problem is that hard tabs have crept into the file. The strict approach
> in this case is to make two patches: the first cleans up the hard tabs, the
> second fixes the problems.
>
> How about I fix up the hard tabs and then you can make another patch?
>
>
I applied the patch. Can you test it?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091020/efb2db3e/attachment.html>

From mdroe at stsci.edu  Tue Oct 20 16:22:02 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Tue, 20 Oct 2009 16:22:02 -0400
Subject: [Numpy-discussion] object array alignment issues
In-Reply-To: <e06186140910201305n3994dcdcx77c1b8aed1407564@mail.gmail.com>
References: <4AD75061.2020908@stsci.edu>
	<4ADB043F.7060608@stsci.edu>	<e06186140910180727yed7e729nb7db64dd97ea4e6d@mail.gmail.com>	<4ADC7DEF.704@stsci.edu>	<1813038E-510C-443D-A97D-3399C0CA1AF8@enthought.com>	<e06186140910191528j77fbeea7v58409514d06b6c5b@mail.gmail.com>	<3d375d730910191536l67f76ce6r3bbf3d4e3ee7295f@mail.gmail.com>	<e06186140910191554n31bf6b4dje2c2adf8771645b8@mail.gmail.com>	<4ADDB4F6.3020700@stsci.edu>	<e06186140910200913o77061f21jde0b76c8664dde43@mail.gmail.com>
	<e06186140910201305n3994dcdcx77c1b8aed1407564@mail.gmail.com>
Message-ID: <4ADE1BEA.8060207@stsci.edu>

Thanks.  It's passing the related unit tests on Sparc SunOS 5, and Linux 
x86.

Cheers,
Mike

Charles R Harris wrote:
>
>
> On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris 
> <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>
>
>
>     On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom
>     <mdroe at stsci.edu <mailto:mdroe at stsci.edu>> wrote:
>
>         I've resubmitted the patch without whitespace-only changes.
>
>         For what it's worth, I had followed the directions here:
>
>         http://projects.scipy.org/numpy/wiki/EmacsSetup
>
>         which say to perform "untabify" and "whitespace-cleanup".  Are
>         those not
>         current?  I had added these to my pre-save hooks under my
>         numpy tree.
>
>
>     The problem is that hard tabs have crept into the file. The strict
>     approach in this case is to make two patches: the first cleans up
>     the hard tabs, the second fixes the problems.
>
>     How about I fix up the hard tabs and then you can make another patch?
>
>
> I applied the patch. Can you test it?
>
> Chuck
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From peridot.faceted at gmail.com  Tue Oct 20 18:59:42 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Tue, 20 Oct 2009 18:59:42 -0400
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>
	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>
	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>
	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>
	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>
	<20091018075732.GA31449@phare.normalesup.org>
	<4ADAE897.1070000@bigpond.net.au>
	<1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
	<ce557a360910201209k7818c556h53f393b70f1bde62@mail.gmail.com>
	<1cd32cbb0910201228t32d4be5agff1a845474481b68@mail.gmail.com>
Message-ID: <ce557a360910201559k7f8af380me75f4b7bee613810@mail.gmail.com>

2009/10/20  <josef.pktd at gmail.com>:
> On Tue, Oct 20, 2009 at 3:09 PM, Anne Archibald
> <peridot.faceted at gmail.com> wrote:
>> 2009/10/20  <josef.pktd at gmail.com>:
>>> On Sun, Oct 18, 2009 at 6:06 AM, Gary Ruben <gruben at bigpond.net.au> wrote:
>>>> Hi Ga?l,
>>>>
>>>> If you've got a 1D array/vector called "a", I think the normal idiom is
>>>>
>>>> np.dot(a,a)
>>>>
>>>> For the more general case, I think
>>>> np.tensordot(a, a, axes=something_else)
>>>> should do it, where you should be able to figure out something_else for
>>>> your particular case.
>>>
>>> Is it really possible to get the same as np.sum(a*a, axis)  with
>>> tensordot  if a.ndim=2 ?
>>> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a)
>>
>> It seems like this would be a good place to apply numpy's
>> higher-dimensional ufuncs: what you want seems to just be the vector
>> inner product, broadcast over all other dimensions. In fact I believe
>> this is implemented in numpy as a demo: numpy.umath_tests.inner1d
>> should do the job.
>
> Thanks, this works well, needs core in name
> (I might have to learn how to swap or roll axis to use this for more than 2d.)
>
>>>> np.core.umath_tests.inner1d(a.T, b.T)
> array([12,  8, 16])
>>>> (a*b).sum(0)
> array([12,  8, 16])
>>>> np.core.umath_tests.inner1d(a.T, b.T)
> array([12,  8, 16])
>>>> (a*a).sum(0)
> array([126, 166, 214])
>>>> np.core.umath_tests.inner1d(a.T, a.T)
> array([126, 166, 214])
>
>
> What's the status on these functions? They don't show up in the docs
> or help, except for
> a brief mention in the c-api:
>
> http://docs.scipy.org/numpy/docs/numpy-docs/reference/c-api.generalized-ufuncs.rst/
>
> Are they for public consumption and should go into the docs?
> Or do they remain a hidden secret, to force users to read the mailing lists?

I think the long-term goal is to have a completely ufuncized linear
algebra library, and I think these functions are just tests of the
gufunc features. In principle, at least, it wouldn't actually be too
hard to fill out a full linear algebra library, since the per
"element" linear algebra operations already exist. Unfortunately the
code should exist for many data types, and the code generator scheme
currently used to do this for ordinary ufuncs is a barrier to
contributions. It might be worth banging out a doubles-only generic
ufunc linear algebra library (in addition to
numpy.linalg/scipy.linalg), just as a proof of concept.

Anne


From mathieu at mblondel.org  Wed Oct 21 01:44:39 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Wed, 21 Oct 2009 14:44:39 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
Message-ID: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>

Hello,

About one year ago, a high-level, objected-oriented SIMD API was added
to Mono. For example, there is a class Vector4f for vectors of 4
floats and this class implements methods such as basic operators,
bitwise operators, comparison operators, min, max, sqrt, shuffle
directly using SIMD operations. You can have a look at the following
pages for further details:

http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
http://go-mono.com/docs/index.aspx?tlink=0 at N%3aMono.Simd (API reference)

It seems to me that such an API would possibly be a great fit in Numpy
too. It would also be possible to add classes that don't directly map
to SIMD types. For example, Vector8f can easily be implemented in
terms of 2 Vector4f. In addition to vectors, additional API may be
added to support operations on matrices of fixed width or height.

I search the archives for similar discussions but I only found a
discussion about memory-alignment so I hope I am not restarting an
existing discussion here. Memory-alignment is an import related issue
since non-aligned movs can tank the performance.

Any thoughts? I don't know the Numpy code base yet but I'm willing to
help if such an effort is started.

Thanks,
Mathieu


From charlesr.harris at gmail.com  Wed Oct 21 03:13:11 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 21 Oct 2009 01:13:11 -0600
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
Message-ID: <e06186140910210013x57db67d1t6132ca228f781cbb@mail.gmail.com>

On Tue, Oct 20, 2009 at 11:44 PM, Mathieu Blondel <mathieu at mblondel.org>wrote:

> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations. You can have a look at the following
> pages for further details:
>
> http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
> http://go-mono.com/docs/index.aspx?tlink=0 at N%3aMono.Simd (API reference)
>
> It seems to me that such an API would possibly be a great fit in Numpy
> too. It would also be possible to add classes that don't directly map
> to SIMD types. For example, Vector8f can easily be implemented in
> terms of 2 Vector4f. In addition to vectors, additional API may be
> added to support operations on matrices of fixed width or height.
>
> I search the archives for similar discussions but I only found a
> discussion about memory-alignment so I hope I am not restarting an
> existing discussion here. Memory-alignment is an import related issue
> since non-aligned movs can tank the performance.
>
> Any thoughts? I don't know the Numpy code base yet but I'm willing to
> help if such an effort is started.
>
>
The licenses look all hodge-podge:


   - The C# compiler is dual-licensed under the MIT/X11 license and the GNU
   General Public License<http://www.opensource.org/licenses/gpl-license.html>
    (*http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The tools are released under the terms of the GNU General Public
   License <http://www.opensource.org/licenses/gpl-license.html> (*
   http://www.opensource.org/licenses/gpl-license.html*) (GPL).


   - The runtime libraries are under the GNU Library GPL
2.0<http://www.gnu.org/copyleft/library.html#TOC1>
    (*http://www.gnu.org/copyleft/library.html#TOC1*) (LGPL 2.0).


   - The class libraries are released under the terms of the MIT
X11<http://www.opensource.org/licenses/mit-license.html>
    (*http://www.opensource.org/licenses/mit-license.html*) license.


   - ASP.NET MVC and ASP.NET AJAX client software are released by Microsoft
   under the open source Microsoft Permissive
License<http://www.opensource.org/licenses/ms-pl.html>
    (*http://www.opensource.org/licenses/ms-pl.html*).

However, if the good stuff is in the class libraries, that looks OK. But
that still leaves it in C#, no? You could have a looksie to see how it would
fit into, say, Cython. I don't know where it would go in numpy, maybe some
of the vector bits would be suitable for some generalized ufuncs. Apart from
that, I believe ATLAS can already make use of SIMD, but I have no idea how
far it goes in using the full feature set.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091021/95ceeffc/attachment.html>

From david at ar.media.kyoto-u.ac.jp  Wed Oct 21 02:56:52 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Wed, 21 Oct 2009 15:56:52 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
Message-ID: <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>

Hi Mathieu,

Mathieu Blondel wrote:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations. You can have a look at the following
> pages for further details:
>
> http://tirania.org/blog/archive/2008/Nov-03.html (blog post)
>   

I am not sure how this could be applied to numpy case ? From what I can
understand, this cannot be directly applied to python: the described
changes are vm changes, and we cannot do anything at python vm level (I
would guess the python vm to be too primitive to implement this kind of
things anyway).

I don't see how the high level API at the assembly level (Mono.Simd)
would work either: the overhead of python and numpy to deal with 4 or 8
items in python would make this API useless from a speed POV.

Implementing some numpy internal code in SIMD, and having a 'object
oriented' C API for SIMD would indeed be nice - gcc provides SSE
intrinsics, as well as visual studio (although the later seems quite
buggy if I believe this link:
http://www.virtualdub.org/blog/pivot/entry.php?id=162), which would make
this in principle relatively easy.

This is only my opinion (read other numpy dev may disagree), but I think
that the numpy C code should be cleaned up before adding this kind of
features: there is still too much coupling between the pure C core and
the python machinery. Also, any use of SIMD code should be done at
runtime IMHO (so that one binary can be used on multiple architectures),
which has some issues on its own from a cross platform POV.

David


From mathieu at mblondel.org  Wed Oct 21 03:29:44 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Wed, 21 Oct 2009 16:29:44 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <e06186140910210013x57db67d1t6132ca228f781cbb@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<e06186140910210013x57db67d1t6132ca228f781cbb@mail.gmail.com>
Message-ID: <7e1472660910210029g6488a64eo102c9ccf7945acfc@mail.gmail.com>

> The licenses look all hodge-podge:

[...]

> However, if the good stuff is in the class libraries, that looks OK. But
> that still leaves it in C#, no?


I was mentioning Mono just to show that "this has been done" and also
their API reference can serve as inspiration to design Numpy's own
API.

Mathieu


From mathieu at mblondel.org  Wed Oct 21 03:48:22 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Wed, 21 Oct 2009 16:48:22 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
Message-ID: <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>

Hi David,

On Wed, Oct 21, 2009 at 3:56 PM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> I am not sure how this could be applied to numpy case ? From what I can
> understand, this cannot be directly applied to python: the described
> changes are vm changes, and we cannot do anything at python vm level (I
> would guess the python vm to be too primitive to implement this kind of
> things anyway).

Yes in Mono this is realized with Just-In-Time compilation, so at the VM level.

The reason I thought of Numpy rather than Cython is that Python's
support for vectors/matrices is limited and Numpy has kind of become
the standard for that in the Python world.

I saw the video of Peter Norvig at the last Scipy conference who was
suggesting to merge Numpy into Cython. The SIMD API would be an
argument in favor of this too because of the possible interactions
between such a SIMD API and an array API.

> I don't see how the high level API at the assembly level (Mono.Simd)
> would work either: the overhead of python and numpy to deal with 4 or 8
> items in python would make this API useless from a speed POV.

My original idea was to write the code in C with Intel/Alvitec/Neon
intrinsics and have this code binded to be able to call it from
Python. So the SIMD code would be compiled already, ready to be called
from Python. Like you said, there's a risk that the overhead of
calling Python is bigger than the benefit of using SIMD instructions.
If it's worth trying out, an experiment can be made with Vector4f to
see if it's even worth continuing with other types.

> This is only my opinion (read other numpy dev may disagree), but I think
> that the numpy C code should be cleaned up before adding this kind of
> features: there is still too much coupling between the pure C core and
> the python machinery. Also, any use of SIMD code should be done at
> runtime IMHO (so that one binary can be used on multiple architectures),
> which has some issues on its own from a cross platform POV.

I recently used SIMD instructions for a project and I realized that
they cannot be activated in a standard Debian package, because the
package has to remain general-purpose. So people who want to benefit
the speed up have to compile my project from source... I also see that
sometimes packages are available in different flavors (-msse,
-msse2...).

Mathieu


From pav+sp at iki.fi  Wed Oct 21 04:24:09 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Wed, 21 Oct 2009 08:24:09 +0000 (UTC)
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
Message-ID: <hbmgf9$708$1@ger.gmane.org>

Wed, 21 Oct 2009 16:48:22 +0900, Mathieu Blondel wrote:
[clip]
> My original idea was to write the code in C with Intel/Alvitec/Neon
> intrinsics and have this code binded to be able to call it from Python.
> So the SIMD code would be compiled already, ready to be called from
> Python. Like you said, there's a risk that the overhead of calling
> Python is bigger than the benefit of using SIMD instructions. If it's
> worth trying out, an experiment can be made with Vector4f to see if it's
> even worth continuing with other types.

The overhead is quickly checked for multiplication with numpy arrays of 
varying size, without SSE:

Overhead per iteration (ms): 1.6264549101
Time per array element (ms): 0.000936947636565
Cross-over point:            1735.90801303

#----------------------------------------------
import numpy as np
from scipy import optimize
import time
import matplotlib.pyplot as plt

def main():
    data = []

    for n in np.unique(np.logspace(0, 5, 20).astype(int)):
        print n
        m = 100
        reps = 5
        times = []
        for rep in xrange(reps):
            x = np.zeros((n,), dtype=np.float_)
            start = time.time()
            #------------------
            for k in xrange(m):
                x *= 1.1
            #------------------
            end = time.time()
            times.append(end - start)
        t = min(times)
        data.append((n, t))

    data = np.array(data)

    def model(z):
        n, t = data.T
        overhead, per_elem = z
        return np.log10(t) - np.log10(overhead + per_elem * n)

    z, ier = optimize.leastsq(model, [1., 1.])
    overhead, per_elem = z

    print ""
    print "Overhead per iteration (ms):", overhead*1e3
    print "Time per array element (ms):", per_elem*1e3
    print "Cross-over point:           ", overhead/per_elem

    n = np.logspace(0, 5, 500)
    plt.loglog(data[:,0], data[:,0]/data[:,1], 'x',
               label=r'measured')
    plt.loglog(n, n/(overhead + per_elem*n), 'k-',
               label=r'fit to $t = a + b n$')
    plt.xlabel(r'$n$')
    plt.ylabel(r'ops/second')
    plt.grid(1)
    plt.legend()
    plt.show()

if __name__ == "__main__":
    main()


From david at ar.media.kyoto-u.ac.jp  Wed Oct 21 04:05:24 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Wed, 21 Oct 2009 17:05:24 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
Message-ID: <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>

Mathieu Blondel wrote:
> I saw the video of Peter Norvig at the last Scipy conference who was
> suggesting to merge Numpy into Cython. The SIMD API would be an
> argument in favor of this too because of the possible interactions
> between such a SIMD API and an array API.
>   

Hm, I don't remember this - I guess I would have to look at the video.
Do you know at which point of the presentation he discussed about SIMD ?

> My original idea was to write the code in C with Intel/Alvitec/Neon
> intrinsics and have this code binded to be able to call it from
> Python. So the SIMD code would be compiled already, ready to be called
> from Python. Like you said, there's a risk that the overhead of
> calling Python is bigger than the benefit of using SIMD instructions.
> If it's worth trying out, an experiment can be made with Vector4f to
> see if it's even worth continuing with other types.
>   

I am quite confident that the overhead will be way too significant for
this approach to be useful. If you have two python objects, using + on
it will induce at least one function call, and most likely several
function calls at the python level. Python function calls are painfully
slow (several thousand cycles per call in the most optimistic case).

Python overhead is several order of magnitude bigger than what you can
earn between SIMD and straightforward C. The only way I can see to make
this work is to generate SIMD code from python (which would be a poor
man's replacement for a JIT in a way), there was a presentation
following this direction at scipy 09 conference.


> I recently used SIMD instructions for a project and I realized that
> they cannot be activated in a standard Debian package, because the
> package has to remain general-purpose. So people who want to benefit
> the speed up have to compile my project from source... 
>   

Yes - that's unacceptable IMHO. The real solution is to include all the
code at build time, detect at *runtime* which ISA is supported, and
select the functions accordingly. The problem is that loading shared
code at runtime in a cross platform way is complicated - python already
does it, but unfortunately does not provide a C API for it AFAIK, so we
would have to re-implement it in python.

cheers,

David


From mathieu at mblondel.org  Wed Oct 21 04:38:20 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Wed, 21 Oct 2009 17:38:20 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
Message-ID: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>

On Wed, Oct 21, 2009 at 5:05 PM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> Mathieu Blondel wrote:
>> I saw the video of Peter Norvig at the last Scipy conference who was
>> suggesting to merge Numpy into Cython. The SIMD API would be an
>> argument in favor of this too because of the possible interactions
>> between such a SIMD API and an array API.
>>
>
> Hm, I don't remember this - I guess I would have to look at the video.
> Do you know at which point of the presentation he discussed about SIMD ?

Peter Norvig suggested to merge Numpy into Cython but he didn't
mention SIMD as the reason (this one is from me). Sorry if I wasn't
clear. IIRC, the reason was to help democratize Numpy and make it
easier for users to install it. He went on to say that he talked about
it with Guido and apparently the main barrier was the release cycle.
Please check the video as I'm telling you that from memory.

Mathieu


From david at ar.media.kyoto-u.ac.jp  Wed Oct 21 04:23:58 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Wed, 21 Oct 2009 17:23:58 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
	<7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
Message-ID: <4ADEC51E.7080309@ar.media.kyoto-u.ac.jp>

Mathieu Blondel wrote:
> He went on to say that he talked about
> it with Guido and apparently the main barrier was the release cycle.
> Please check the video as I'm telling you that from memory.
>   

Ah, I think you are mistaken, then - he referred to merging numpy and
scipy into python during his talk, not cython.

For the reason you gave, including numpy into python is not really on
the radar. It was tried unsuccessfully some time ago, and the PEP buffer
(3118 IIRC) is a much more low-level API to share "typed" buffer at the
C level. Hopefully, numpy will be built on top of this at some point.
Scipy is very unlikely IMHO - I doubt depending on fortran code would be
acceptable for python.

David


From mathieu at mblondel.org  Wed Oct 21 05:10:09 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Wed, 21 Oct 2009 18:10:09 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4ADEC51E.7080309@ar.media.kyoto-u.ac.jp>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
	<7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
	<4ADEC51E.7080309@ar.media.kyoto-u.ac.jp>
Message-ID: <7e1472660910210210l57fe086eqae02c7d9aad056e4@mail.gmail.com>

On Wed, Oct 21, 2009 at 5:23 PM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:

> Ah, I think you are mistaken, then - he referred to merging numpy and
> scipy into python during his talk, not cython.

Oh, I meant to say CPython (the default implementation of Python), not
Cython. I didn't realize that they were different projects.

So the method dispatch seems to be a great obstacle to an
object-oriented SIMD API. That would seem more feasible in C++ with
non-virtual methods. Java has final methods, which can be useful
information to the JIT. C# seems to have "sealed" methods.
Interestingly, the Mono.SIMD API uses static methods, which I guess is
to avoid the dispatch problem. But it makes the code look uglier. For
example, instead of a + b, you have to do Vector4f.Addition(a, b).

Mathieu


From faltet at pytables.org  Wed Oct 21 05:12:21 2009
From: faltet at pytables.org (Francesc Alted)
Date: Wed, 21 Oct 2009 11:12:21 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
Message-ID: <200910211112.22037.faltet@pytables.org>

A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigu?:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations.
[clip]

It is important to stress out that all the above operations, except probably 
sqrt, are all memory-bound operations, and that implementing them for numpy 
would not represent a significant improvement at all.

This is because numpy is a package that works mainly with arrays in an 
element-wise way, and in this scenario, the time to transmit data to CPU 
dominates, by and large, over the time to perform operations.

Among other places, you can find a detailed explication of this fact in my 
presentation at latest EuroSciPy:

http://www.pytables.org/docs/StarvingCPUs.pdf

Cheers,

-- 
Francesc Alted


From robince at gmail.com  Wed Oct 21 05:46:54 2009
From: robince at gmail.com (Robin)
Date: Wed, 21 Oct 2009 10:46:54 +0100
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
Message-ID: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>

Hi,

I was wondering what the recommended way to run numpy/scipy on mac os
x 10.6 is. I understood previously it was recommended to use
python.org python and keep everything seperate from the system python,
which worked well. But now I would like to have a 64 bit python and
numpy, and there isn't one available from python.org. Also I think the
python.org ones are built against the 10.4 SDK which I understand
requires using gcc 4.0 - I was keen to try 4.2 to see if some of the
differential performance I've seen between c and fortran goes away.

So I guess the choices are system python with a virtualenv or a
macports built python (or a hand built python).

I'm thinking of using macports at the moment but I'm not sure how to
handle preventing macports numpy from installing so I can use svn
numpy. I'm not sure how the virtualenv will work with packaged
installers - (ie how could I tell the wx installer to install into the
virtualenv).

I was wondering what others do?

Cheers

Robin


From david at ar.media.kyoto-u.ac.jp  Wed Oct 21 05:28:09 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Wed, 21 Oct 2009 18:28:09 +0900
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
Message-ID: <4ADED429.4030809@ar.media.kyoto-u.ac.jp>

Robin wrote:
> Hi,
>
> I was wondering what the recommended way to run numpy/scipy on mac os
> x 10.6 is. I understood previously it was recommended to use
> python.org python and keep everything seperate from the system python,
> which worked well. 

You can simply use the --user option to the install command: instead of
installing in /System, it will install numpy (or any other package) in
$HOME/.local, and you don't need to update PYTHONPATH, as python knows
about this location.

This is a new feature in  2.6, and can be a simple alternative to
virtualenv if you don't need the other features (sandboxing, etc...). I
think you need a very recent version of virtualenv on slow leopard, if
you decide to go this route,

David


From robince at gmail.com  Wed Oct 21 06:58:30 2009
From: robince at gmail.com (Robin)
Date: Wed, 21 Oct 2009 11:58:30 +0100
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <4ADED429.4030809@ar.media.kyoto-u.ac.jp>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
Message-ID: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>

On Wed, Oct 21, 2009 at 10:28 AM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> Robin wrote:
>> Hi,
>>
>> I was wondering what the recommended way to run numpy/scipy on mac os
>> x 10.6 is. I understood previously it was recommended to use
>> python.org python and keep everything seperate from the system python,
>> which worked well.
>
> You can simply use the --user option to the install command: instead of
> installing in /System, it will install numpy (or any other package) in
> $HOME/.local, and you don't need to update PYTHONPATH, as python knows
> about this location.

Thanks - that looks ideal. I take it $HOME/.local is searched first so
numpy will be used fromt here in preference to the system numpy.

My only worry is with installer packages - I'm thinking mainly of
wxpython. Is there a way I can get that package to install in
$HOME/.local. (The installer only seems to let you choose a drive).
Also - if I build for example vim against the system python, will I be
able to see packages in $HOME/.local from the python interpreter
inside vim?

Cheers

Robin


From david at ar.media.kyoto-u.ac.jp  Wed Oct 21 06:41:18 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Wed, 21 Oct 2009 19:41:18 +0900
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
Message-ID: <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>

Robin wrote:
>
> Thanks - that looks ideal. I take it $HOME/.local is searched first so
> numpy will be used fromt here in preference to the system numpy.
>   

Yes, unless framework-enabled python does something 'fishy' (I think
framework vs convention python have different rules w.r.t. sys.path). As
always, in doubt, you should check with numpy.__file__ which one is loaded.

> My only worry is with installer packages - I'm thinking mainly of
> wxpython. Is there a way I can get that package to install in
> $HOME/.local. (The installer only seems to let you choose a drive).
>   

is wxpython supported on python 64 bits ? I don't know if you can
install a .mpkg in $HOME/.local. It is not supported by python AFAIK,
and I think you would have to hack something to make it work.

May just be easier to build by yourself.

> Also - if I build for example vim against the system python, will I be
> able to see packages in $HOME/.local from the python interpreter
> inside vim?

I don't know about vim-python interaction: doesn't vim uses its own
python embedded within vim process ? You would have to check sys.path
and similar variables, as well as vim doc.

David


From robince at gmail.com  Wed Oct 21 07:24:12 2009
From: robince at gmail.com (Robin)
Date: Wed, 21 Oct 2009 12:24:12 +0100
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
Message-ID: <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>

Thanks...

On Wed, Oct 21, 2009 at 11:41 AM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> Robin wrote:
>>
>> Thanks - that looks ideal. I take it $HOME/.local is searched first so
>> numpy will be used fromt here in preference to the system numpy.
>>
>
> Yes, unless framework-enabled python does something 'fishy' (I think
> framework vs convention python have different rules w.r.t. sys.path). As
> always, in doubt, you should check with numpy.__file__ which one is loaded.

It seems it does...  the built in numpy which is in
'/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python',
comes before $HOME/.local in sys.path so I think system numpy will
always be picked up over my own installed version.

robin-mbp:~ robince$ /usr/bin/python2.6 -c "import sys; print sys.path"
['', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python26.zip',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages',
'/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-old',
'/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload',
'/Users/robince/.local/lib/python2.6/site-packages',
'/Library/Python/2.6/site-packages',
'/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/PyObjC',
'/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/wx-2.8-mac-unicode']

So I guess virtualenv or macports?

Cheers

Robin


From cournape at gmail.com  Wed Oct 21 08:27:46 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 21 Oct 2009 21:27:46 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <200910211112.22037.faltet@pytables.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
Message-ID: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>

On Wed, Oct 21, 2009 at 6:12 PM, Francesc Alted <faltet at pytables.org> wrote:
> A Wednesday 21 October 2009 07:44:39 Mathieu Blondel escrigu?:
>> Hello,
>>
>> About one year ago, a high-level, objected-oriented SIMD API was added
>> to Mono. For example, there is a class Vector4f for vectors of 4
>> floats and this class implements methods such as basic operators,
>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>> directly using SIMD operations.
> [clip]
>
> It is important to stress out that all the above operations, except probably
> sqrt, are all memory-bound operations, and that implementing them for numpy
> would not represent a significant improvement at all.


> This is because numpy is a package that works mainly with arrays in an
> element-wise way, and in this scenario, the time to transmit data to CPU
> dominates, by and large, over the time to perform operations.

Is it general, or just for simple operations in numpy and ufunc ? I
remember that for music softwares, SIMD used to matter a lot, even for
simple bus mixing (which is basically a ax+by with a, b scalars and x
y the input arrays).

Do you have any interest in adding SIMD to some core numpy
(transcendental functions). If so, I would try to go back to the
problem of runtime SSE detection and loading of optimized shared
library in a cross-platform way - that's something which should be
done at some point in numpy, and people requiring it would be a good
incentive.

David


From matthieu.brucher at gmail.com  Wed Oct 21 08:37:02 2009
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Wed, 21 Oct 2009 14:37:02 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
Message-ID: <e76aa17f0910210537y1b053b71ob7158d1ac5864a45@mail.gmail.com>

> Is it general, or just for simple operations in numpy and ufunc ? I
> remember that for music softwares, SIMD used to matter a lot, even for
> simple bus mixing (which is basically a ax+by with a, b scalars and x
> y the input arrays).

Indeed, it shouldn't :| I think the main reason might not be SIMD, but
the additional hypothesis you put on the arrays (aliasing). This way,
todays compilers may not even need the actual SIMD instructions.
I have the same opinion as Francesc, it would only be useful for
operations that need more computations that load/store.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


From faltet at pytables.org  Wed Oct 21 08:47:02 2009
From: faltet at pytables.org (Francesc Alted)
Date: Wed, 21 Oct 2009 14:47:02 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
Message-ID: <200910211447.02842.faltet@pytables.org>

A Wednesday 21 October 2009 14:27:46 David Cournapeau escrigu?:
> > This is because numpy is a package that works mainly with arrays in an
> > element-wise way, and in this scenario, the time to transmit data to CPU
> > dominates, by and large, over the time to perform operations.
>
> Is it general, or just for simple operations in numpy and ufunc ? I
> remember that for music softwares, SIMD used to matter a lot, even for
> simple bus mixing (which is basically a ax+by with a, b scalars and x
> y the input arrays).

This is general, as long as the dataset has to be brought from memory to CPU, 
and operations to be done are element-wise and simple (i.e. not 
transcendental).  SIMD does matter in general when the dataset:

1) is already in cache
2) you have to perform costly operations (mainly transcendental)
3) a combination of the above

I don't know the case for music software, but if you say that ax+by are 
accelerated by SIMD, I'd say that case 1) is happening.

> Do you have any interest in adding SIMD to some core numpy
> (transcendental functions). If so, I would try to go back to the
> problem of runtime SSE detection and loading of optimized shared
> library in a cross-platform way - that's something which should be
> done at some point in numpy, and people requiring it would be a good
> incentive.

I don't personally have a lot of interest implementing this for numpy.  But in 
case anyone does, I find the next library:

http://gruntthepeon.free.fr/ssemath/

very interesting.  Perhaps there could be other (free) implementations...

-- 
Francesc Alted


From pav+sp at iki.fi  Wed Oct 21 09:14:54 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Wed, 21 Oct 2009 13:14:54 +0000 (UTC)
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org>
Message-ID: <hbn1ge$bu8$1@ger.gmane.org>

Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
[clip]
>> Do you have any interest in adding SIMD to some core numpy
>> (transcendental functions). If so, I would try to go back to the
>> problem of runtime SSE detection and loading of optimized shared
>> library in a cross-platform way - that's something which should be done
>> at some point in numpy, and people requiring it would be a good
>> incentive.
> 
> I don't personally have a lot of interest implementing this for numpy. 
> But in case anyone does, I find the next library:
> 
> http://gruntthepeon.free.fr/ssemath/
> 
> very interesting.  Perhaps there could be other (free)
> implementations...

Optimized transcendental functions could be interesting. For example for 
tanh, call overhead is overcome already for ~30-element arrays.

Since these are ufuncs, I suppose the SSE implementations could just be 
put in a separate module, which is always compiled. Before importing the 
module, we could simply check from Python side that the CPU supports the 
necessary instructions. If everything is OK, the accelerated 
implementations would then just replace the Numpy routines.

This type of project could probably also be started outside Numpy, and 
just monkey-patch the Numpy routines on import.

-- 
Pauli Virtanen


From rmay31 at gmail.com  Wed Oct 21 09:54:04 2009
From: rmay31 at gmail.com (Ryan May)
Date: Wed, 21 Oct 2009 08:54:04 -0500
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> 
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> 
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
Message-ID: <c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>

> It seems it does... ?the built in numpy which is in
> '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python',
> comes before $HOME/.local in sys.path so I think system numpy will
> always be picked up over my own installed version.
>
> robin-mbp:~ robince$ /usr/bin/python2.6 -c "import sys; print sys.path"
> ['', '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python26.zip',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-old',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload',
> '/Users/robince/.local/lib/python2.6/site-packages',
> '/Library/Python/2.6/site-packages',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/PyObjC',
> '/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/wx-2.8-mac-unicode']
>
> So I guess virtualenv or macports?

Wow.  Once again, Apple makes using python unnecessarily difficult.
Someone needs a whack with a clue bat.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


From robince at gmail.com  Wed Oct 21 10:00:12 2009
From: robince at gmail.com (Robin)
Date: Wed, 21 Oct 2009 15:00:12 +0100
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
Message-ID: <2d5132a50910210700o7558e061r8e596aeb7beb612@mail.gmail.com>

On Wed, Oct 21, 2009 at 2:54 PM, Ryan May <rmay31 at gmail.com> wrote:
> Wow. ?Once again, Apple makes using python unnecessarily difficult.
> Someone needs a whack with a clue bat.

Right - I think in the end I decided I will try and use macports
python with virtualenv for svn numpy/scipy and leave system python
well alone as before.

Cheers

Robin


From zachary.pincus at yale.edu  Wed Oct 21 10:09:35 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Wed, 21 Oct 2009 10:09:35 -0400
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
Message-ID: <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>

> Wow.  Once again, Apple makes using python unnecessarily difficult.
> Someone needs a whack with a clue bat.

Well, some tools from the operating system use numpy and other python  
modules. And upgrading one of these modules might conceivably break  
that dependency, leading to breakage in the OS. So Apple's design goal  
is to keep their tools working right, and absent any clear standard  
for package management in python that allows for side-by-side  
installation of different versions of the same module, this is  
probably the best way to go from their perspective.

Personally, I just install a hand-built python in /Frameworks. This is  
very easy, and it is also where the python.org python goes, so 3rd- 
party installers with hard-coded paths (boo) still work.

Zach


From mdroe at stsci.edu  Wed Oct 21 10:29:55 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 21 Oct 2009 10:29:55 -0400
Subject: [Numpy-discussion] Using numpydoc outside of numpy
Message-ID: <4ADF1AE3.9030002@stsci.edu>

I'm in the process of converting a project to use Sphinx for 
documentation, and would like to use the Numpy docstring standard with 
its sections etc.  It appears, however, that the numpydoc sphinxext is 
not installed but merely sits in doc/sphinxext.  I see that scipy uses 
an SVN external to get at this stuff, but I'd prefer not to do that if 
possible (my project may not live in SVN forever).  I see that numpydoc 
is separately installable (it has a setup.py), but then I assume users 
of my project who wish to build the docs must download the numpy source 
and know to explicitly install the numpydoc package.

Are there plans to install the numpydoc extension under the numpy 
install tree somewhere so that other projects can use it, simply by 
having numpy installed?  This is what we do with the plot_directive in 
matplotlib.  Is there a reason that's not a good idea for numpy?  Or is 
there a way for my project to use it that I'm missing?

Cheers,
Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From rmay31 at gmail.com  Wed Oct 21 11:01:52 2009
From: rmay31 at gmail.com (Ryan May)
Date: Wed, 21 Oct 2009 10:01:52 -0500
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> 
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> 
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> 
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com> 
	<716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
Message-ID: <c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com>

On Wed, Oct 21, 2009 at 9:09 AM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>> Wow. ?Once again, Apple makes using python unnecessarily difficult.
>> Someone needs a whack with a clue bat.
>
> Well, some tools from the operating system use numpy and other python
> modules. And upgrading one of these modules might conceivably break
> that dependency, leading to breakage in the OS. So Apple's design goal
> is to keep their tools working right, and absent any clear standard
> for package management in python that allows for side-by-side
> installation of different versions of the same module, this is
> probably the best way to go from their perspective.
>
> Personally, I just install a hand-built python in /Frameworks. This is
> very easy, and it is also where the python.org python goes, so 3rd-
> party installers with hard-coded paths (boo) still work.

~/.local was added to *be the standard* for easily installing python
packages in your user account.  And it works perfectly on the other
major OSes, no twiddling of paths anymore.  I understand the desire to
not conflict with the system's python intall (and, in fact, applaud
them for using python).  Indeed, on linux, I do end up conflicting
between my system numpy and my SVN install in ~/.local, and I don't
have a problem with it.  This comes with the territory when I start
doing power-user/developer tasks. It just to me seems odd that the OS
that works so hard to make so many things easier makes this *more*
difficult.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from Norman, Oklahoma, United States


From mdroe at stsci.edu  Wed Oct 21 11:13:35 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 21 Oct 2009 11:13:35 -0400
Subject: [Numpy-discussion] Using numpydoc outside of numpy
In-Reply-To: <4ADF1AE3.9030002@stsci.edu>
References: <4ADF1AE3.9030002@stsci.edu>
Message-ID: <4ADF251F.3070505@stsci.edu>

Sorry for the noise.  Found the instructions in HOWTO_BUILD_DOCS.txt .

Mike

Michael Droettboom wrote:
> I'm in the process of converting a project to use Sphinx for 
> documentation, and would like to use the Numpy docstring standard with 
> its sections etc.  It appears, however, that the numpydoc sphinxext is 
> not installed but merely sits in doc/sphinxext.  I see that scipy uses 
> an SVN external to get at this stuff, but I'd prefer not to do that if 
> possible (my project may not live in SVN forever).  I see that numpydoc 
> is separately installable (it has a setup.py), but then I assume users 
> of my project who wish to build the docs must download the numpy source 
> and know to explicitly install the numpydoc package.
>
> Are there plans to install the numpydoc extension under the numpy 
> install tree somewhere so that other projects can use it, simply by 
> having numpy installed?  This is what we do with the plot_directive in 
> matplotlib.  Is there a reason that's not a good idea for numpy?  Or is 
> there a way for my project to use it that I'm missing?
>
> Cheers,
> Mike
>
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From josef.pktd at gmail.com  Wed Oct 21 11:18:05 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 21 Oct 2009 11:18:05 -0400
Subject: [Numpy-discussion] TypeError when calling numpy.kaiser()
In-Reply-To: <dbf395c0910181211t1da978fbi536cebf123d27c27@mail.gmail.com>
References: <dbf395c0910181211t1da978fbi536cebf123d27c27@mail.gmail.com>
Message-ID: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com>

On Sun, Oct 18, 2009 at 3:11 PM, Jeffrey McGee <jeffamcgee at gmail.com> wrote:
> Howdy,
> I'm having trouble getting the kaiser window to work.  Anytime I try
> to call numpy.kaiser(), it throws an exception.  Here's the output when
> I run the example code from
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html :
>
>
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
>>>> from numpy import kaiser
>>>> kaiser(12, 14)
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>
>   File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
> line 2630, in kaiser
>     return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta)
>   File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
>
> line 2507, in i0
>     y[ind] = _i0_1(x[ind])
> TypeError: array cannot be safely cast to required type
>>>>
>
>
> Is this a bug?  Am I doing something wrong?  (I'm using the Ubuntu 9.4
>
> packages for python and numpy.)
> Thanks,
> Jeff
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

It works with my numpy 1.3.0, but np.i0 doesn't like integers.
Can you try with a float 14. instead of the integer?
np.kaiser(12, 14.)

You could file a ticket, and we see if the experts consider
this a feature or a bug. (I don't know anything about kaiser
or i0)

Josef

>>> np.kaiser(12,14)
array([  7.72686684e-06,   3.46009194e-03,   4.65200189e-02,
         2.29737120e-01,   5.99885316e-01,   9.45674898e-01,
         9.45674898e-01,   5.99885316e-01,   2.29737120e-01,
         4.65200189e-02,   3.46009194e-03,   7.72686684e-06])


>>> np.i0(1)
Traceback (most recent call last):
  File "<pyshell#46>", line 1, in <module>
    np.i0(1)
  File "C:\Programs\Python25\Lib\site-packages\numpy\lib\function_base.py",
line 2484, in i0
    y[ind] = _i0_1(x[ind])
TypeError: array cannot be safely cast to required type

>>> np.i0(1.)
array(1.2660658777520082)


From renesd at gmail.com  Wed Oct 21 12:19:02 2009
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Wed, 21 Oct 2009 17:19:02 +0100
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <hbn1ge$bu8$1@ger.gmane.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org>
Message-ID: <64ddb72c0910210919m4e449aabkcfa5db1ecee32b18@mail.gmail.com>

On Wed, Oct 21, 2009 at 2:14 PM, Pauli Virtanen <pav+sp at iki.fi<pav%2Bsp at iki.fi>
> wrote:

> Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
> [clip]
> >> Do you have any interest in adding SIMD to some core numpy
> >> (transcendental functions). If so, I would try to go back to the
> >> problem of runtime SSE detection and loading of optimized shared
> >> library in a cross-platform way - that's something which should be done
> >> at some point in numpy, and people requiring it would be a good
> >> incentive.
> >
> > I don't personally have a lot of interest implementing this for numpy.
> > But in case anyone does, I find the next library:
> >
> > http://gruntthepeon.free.fr/ssemath/
> >
> > very interesting.  Perhaps there could be other (free)
> > implementations...
>
> Optimized transcendental functions could be interesting. For example for
> tanh, call overhead is overcome already for ~30-element arrays.
>
> Since these are ufuncs, I suppose the SSE implementations could just be
> put in a separate module, which is always compiled. Before importing the
> module, we could simply check from Python side that the CPU supports the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.
>
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


Anyone seen the corepy numpy gsoc project?
    http://numcorepy.blogspot.com/

It implements a number of functions with the corepy runtime assembler.  The
project showed nice simd speedups for numpy.


I've been following the liborc project... which is a runtime assembler that
uses a generic assembly language and supports many different simd assembly
languages (eg SSE, MMX, ARM, Altivec).  It's the replacement for the liboil
library (used in gstreamer etc).
    http://code.entropywave.com/projects/orc/


cu!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091021/800d3c18/attachment.html>

From charlesr.harris at gmail.com  Wed Oct 21 12:28:02 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 21 Oct 2009 10:28:02 -0600
Subject: [Numpy-discussion] TypeError when calling numpy.kaiser()
In-Reply-To: <1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com>
References: <dbf395c0910181211t1da978fbi536cebf123d27c27@mail.gmail.com>
	<1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com>
Message-ID: <e06186140910210928j6a2b9b64g9a60df50cdd7b2d@mail.gmail.com>

On Wed, Oct 21, 2009 at 9:18 AM, <josef.pktd at gmail.com> wrote:

> On Sun, Oct 18, 2009 at 3:11 PM, Jeffrey McGee <jeffamcgee at gmail.com>
> wrote:
> > Howdy,
> > I'm having trouble getting the kaiser window to work.  Anytime I try
> > to call numpy.kaiser(), it throws an exception.  Here's the output when
> > I run the example code from
> > http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html :
> >
> >
> > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> > [GCC 4.3.3] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >
> >>>> from numpy import kaiser
> >>>> kaiser(12, 14)
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >
> >   File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
> > line 2630, in kaiser
> >     return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta)
> >   File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py",
> >
> > line 2507, in i0
> >     y[ind] = _i0_1(x[ind])
> > TypeError: array cannot be safely cast to required type
> >>>>
> >
> >
> > Is this a bug?  Am I doing something wrong?  (I'm using the Ubuntu 9.4
> >
> > packages for python and numpy.)
> > Thanks,
> > Jeff
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> It works with my numpy 1.3.0, but np.i0 doesn't like integers.
> Can you try with a float 14. instead of the integer?
> np.kaiser(12, 14.)
>
>
Hmm, I think np.i0 (the modified Bessel function of order zero), should
accept integer inputs, I'm not sure why it doesn't. As an aside, would it be
appropriate to have some of the more common Bessel functions as ufuncs?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091021/734a21dc/attachment.html>

From gregor.thalhammer at gmail.com  Wed Oct 21 12:38:10 2009
From: gregor.thalhammer at gmail.com (Gregor Thalhammer)
Date: Wed, 21 Oct 2009 18:38:10 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <hbn1ge$bu8$1@ger.gmane.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org>
Message-ID: <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com>

Pauli Virtanen schrieb:
> Wed, 21 Oct 2009 14:47:02 +0200, Francesc Alted wrote:
> [clip]
>
>>> Do you have any interest in adding SIMD to some core numpy
>>> (transcendental functions). If so, I would try to go back to the
>>> problem of runtime SSE detection and loading of optimized shared
>>> library in a cross-platform way - that's something which should be done
>>> at some point in numpy, and people requiring it would be a good
>>> incentive.
>>>
>> I don't personally have a lot of interest implementing this for numpy.
>> But in case anyone does, I find the next library:
>>
>> http://gruntthepeon.free.fr/ssemath/
>>
>> very interesting.  Perhaps there could be other (free)
>> implementations...
>>
>
> Optimized transcendental functions could be interesting. For example for
> tanh, call overhead is overcome already for ~30-element arrays.
>
> Since these are ufuncs, I suppose the SSE implementations could just be
> put in a separate module, which is always compiled. Before importing the
> module, we could simply check from Python side that the CPU supports the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.
>
I once wrote a module that replaces the built in transcendental
functions of numpy by optimized versions from Intels vector math
library. If someone is interested, I can publish it. In my experience it
was of little use since real world problems are limited by memory
bandwidth. Therefore extending numexpr with optimized transcendental
functions was the better solution. Afterwards I discovered that I could
have saved the effort of the first approach since gcc is able to use
optimized functions from Intels vector math library or AMD's math core
library, see the doc's of -mveclibabi. You just need to recompile numpy
with proper compiler arguments.

Gregor
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.
>
>


From robert.kern at gmail.com  Wed Oct 21 12:39:00 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 21 Oct 2009 11:39:00 -0500
Subject: [Numpy-discussion] TypeError when calling numpy.kaiser()
In-Reply-To: <e06186140910210928j6a2b9b64g9a60df50cdd7b2d@mail.gmail.com>
References: <dbf395c0910181211t1da978fbi536cebf123d27c27@mail.gmail.com> 
	<1cd32cbb0910210818g2ee16efma39ef4ab5cca32f9@mail.gmail.com> 
	<e06186140910210928j6a2b9b64g9a60df50cdd7b2d@mail.gmail.com>
Message-ID: <3d375d730910210939q73174d22m523d0d73b051476c@mail.gmail.com>

On Wed, Oct 21, 2009 at 11:28, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> As an aside, would it be
> appropriate to have some of the more common Bessel functions as ufuncs?

I'd prefer that we stick to the policy of including special functions
that are part of the C99 standard (or another appropriate one) and no
more.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From rmay31 at gmail.com  Wed Oct 21 14:23:14 2009
From: rmay31 at gmail.com (Ryan May)
Date: Wed, 21 Oct 2009 13:23:14 -0500
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> 
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> 
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org> 
	<42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com>
Message-ID: <c7b95a020910211123l6bd6ef28k9b3b40b4036c797c@mail.gmail.com>

On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
<gregor.thalhammer at gmail.com> wrote:
> I once wrote a module that replaces the built in transcendental
> functions of numpy by optimized versions from Intels vector math
> library. If someone is interested, I can publish it. In my experience it
> was of little use since real world problems are limited by memory
> bandwidth. Therefore extending numexpr with optimized transcendental
> functions was the better solution. Afterwards I discovered that I could
> have saved the effort of the first approach since gcc is able to use
> optimized functions from Intels vector math library or AMD's math core
> library, see the doc's of -mveclibabi. You just need to recompile numpy
> with proper compiler arguments.

Do you have a link to the documentation for -mveclibabi?  I can't find
this anywhere and I'm *very* interested.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


From rmay31 at gmail.com  Wed Oct 21 14:31:13 2009
From: rmay31 at gmail.com (Ryan May)
Date: Wed, 21 Oct 2009 13:31:13 -0500
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <c7b95a020910211123l6bd6ef28k9b3b40b4036c797c@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> 
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com> 
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org> 
	<42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com> 
	<c7b95a020910211123l6bd6ef28k9b3b40b4036c797c@mail.gmail.com>
Message-ID: <c7b95a020910211131p2ce8df74n8a149f7c9e885d4b@mail.gmail.com>

On Wed, Oct 21, 2009 at 1:23 PM, Ryan May <rmay31 at gmail.com> wrote:
> On Wed, Oct 21, 2009 at 11:38 AM, Gregor Thalhammer
> <gregor.thalhammer at gmail.com> wrote:
>> I once wrote a module that replaces the built in transcendental
>> functions of numpy by optimized versions from Intels vector math
>> library. If someone is interested, I can publish it. In my experience it
>> was of little use since real world problems are limited by memory
>> bandwidth. Therefore extending numexpr with optimized transcendental
>> functions was the better solution. Afterwards I discovered that I could
>> have saved the effort of the first approach since gcc is able to use
>> optimized functions from Intels vector math library or AMD's math core
>> library, see the doc's of -mveclibabi. You just need to recompile numpy
>> with proper compiler arguments.
>
> Do you have a link to the documentation for -mveclibabi? ?I can't find
> this anywhere and I'm *very* interested.

Ah, there it is.  Google doesn't come up with much, but the PDF manual
does have it:
http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc.pdf

(It helps when you don't mis-type your search in the PDF).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from Norman, Oklahoma, United States


From ndbecker2 at gmail.com  Wed Oct 21 14:46:45 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 21 Oct 2009 14:46:45 -0400
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org>
	<hbn1ge$bu8$1@ger.gmane.org>
	<42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com>
Message-ID: <hbnkum$lci$1@ger.gmane.org>

...
> I once wrote a module that replaces the built in transcendental
> functions of numpy by optimized versions from Intels vector math
> library. If someone is interested, I can publish it. In my experience it
> was of little use since real world problems are limited by memory
> bandwidth. Therefore extending numexpr with optimized transcendental
> functions was the better solution. Afterwards I discovered that I could
> have saved the effort of the first approach since gcc is able to use
> optimized functions from Intels vector math library or AMD's math core
> library, see the doc's of -mveclibabi. You just need to recompile numpy
> with proper compiler arguments.
> 

I'm interested.  I'd like to try AMD rather than intel, because AMD is 
easier to obtain.  I'm running on intel machine, I hope that doesn't matter 
too much.

What exactly do I need to do?

I see that numpy/site.cfg has an MKL section.  I'm assuming I should not 
touch that, but just mess with gcc flags?


From charlesr.harris at gmail.com  Wed Oct 21 15:02:59 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 21 Oct 2009 13:02:59 -0600
Subject: [Numpy-discussion] GSOC 2010
Message-ID: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>

Hi All,

I don't feel that numpy/scipy did as well in GSOC 2009 as it could have.  I
think this was mostly due to lack of preparation on our part, we weren't
ready when the students started showing up on the lists. So I would like to
put together a selection of suitable projects and corresponding mentors that
we could put on the wiki somewhere and advertise. Just to start things off,
here are two things that come to mind.


   - Python 3k transition. I think it is time to start looking at this
   seriously.
   - Best of breed special functions in cython. These could be part of a
   separate numpy extras package where code is restricted to C, Cython, and
   Python.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091021/76016705/attachment.html>

From josef.pktd at gmail.com  Wed Oct 21 15:11:55 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 21 Oct 2009 15:11:55 -0400
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
Message-ID: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com>

On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> I don't feel that numpy/scipy did as well in GSOC 2009 as it could have.? I
> think this was mostly due to lack of preparation on our part, we weren't
> ready when the students started showing up on the lists. So I would like to
> put together a selection of suitable projects and corresponding mentors that
> we could put on the wiki somewhere and advertise. Just to start things off,
> here are two things that come to mind.
>
> Python 3k transition. I think it is time to start looking at this seriously.
> Best of breed special functions in cython. These could be part of a separate
> numpy extras package where code is restricted to C, Cython, and Python.
>
> Thoughts?

for scipy: more stats, gsoc2009 went very well.

Josef

>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From charlesr.harris at gmail.com  Wed Oct 21 15:23:07 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 21 Oct 2009 13:23:07 -0600
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
	<1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com>
Message-ID: <e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>

On Wed, Oct 21, 2009 at 1:11 PM, <josef.pktd at gmail.com> wrote:

> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi All,
> >
> > I don't feel that numpy/scipy did as well in GSOC 2009 as it could have.
> I
> > think this was mostly due to lack of preparation on our part, we weren't
> > ready when the students started showing up on the lists. So I would like
> to
> > put together a selection of suitable projects and corresponding mentors
> that
> > we could put on the wiki somewhere and advertise. Just to start things
> off,
> > here are two things that come to mind.
> >
> > Python 3k transition. I think it is time to start looking at this
> seriously.
> > Best of breed special functions in cython. These could be part of a
> separate
> > numpy extras package where code is restricted to C, Cython, and Python.
> >
> > Thoughts?
>
> for scipy: more stats, gsoc2009 went very well.
>
>
Yes, it seems so. I had the impression that planning for that project was
undertaken pretty early on with the involvement of Skipper. What exactly
*was* the history of that project and what can we learn from it?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091021/3823f17b/attachment.html>

From dwf at cs.toronto.edu  Wed Oct 21 16:04:26 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Wed, 21 Oct 2009 16:04:26 -0400
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
Message-ID: <799092A0-98BD-4491-9F93-01F63397C47A@cs.toronto.edu>


On 21-Oct-09, at 3:02 PM, Charles R Harris wrote:

> 	? Best of breed special functions in cython. These could be part of  
> a separate numpy extras package where code is restricted to C,  
> Cython, and Python.

I think a lot of SciPy could be usefully brought over to Cython, as  
well (not all the C code, but some of it). Having Cython do the  
wrapping should reduce the burden in the eventual Py3k transition.

David

From dwf at cs.toronto.edu  Wed Oct 21 16:12:31 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Wed, 21 Oct 2009 16:12:31 -0400
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <hbn1ge$bu8$1@ger.gmane.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org>
	<hbn1ge$bu8$1@ger.gmane.org>
Message-ID: <FF93B835-CC5E-450E-B096-677C0F475341@cs.toronto.edu>


On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:

> Since these are ufuncs, I suppose the SSE implementations could just  
> be
> put in a separate module, which is always compiled. Before importing  
> the
> module, we could simply check from Python side that the CPU supports  
> the
> necessary instructions. If everything is OK, the accelerated
> implementations would then just replace the Numpy routines.

Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
CorePy work this summer?

Looking at his slides again, the speedups are rather impressive. I  
wonder if these could be usefully integrated into numpy itself?

David


From jsseabold at gmail.com  Wed Oct 21 16:13:14 2009
From: jsseabold at gmail.com (Skipper Seabold)
Date: Wed, 21 Oct 2009 16:13:14 -0400
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com> 
	<1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com> 
	<e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
Message-ID: <c048da1c0910211313k5bdd8cb2ged32778775ed97b8@mail.gmail.com>

On Wed, Oct 21, 2009 at 3:23 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Oct 21, 2009 at 1:11 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > Hi All,
>> >
>> > I don't feel that numpy/scipy did as well in GSOC 2009 as it could
>> > have.? I
>> > think this was mostly due to lack of preparation on our part, we weren't
>> > ready when the students started showing up on the lists. So I would like
>> > to
>> > put together a selection of suitable projects and corresponding mentors
>> > that
>> > we could put on the wiki somewhere and advertise. Just to start things
>> > off,
>> > here are two things that come to mind.
>> >
>> > Python 3k transition. I think it is time to start looking at this
>> > seriously.
>> > Best of breed special functions in cython. These could be part of a
>> > separate
>> > numpy extras package where code is restricted to C, Cython, and Python.
>> >
>> > Thoughts?
>>
>> for scipy: more stats, gsoc2009 went very well.
>>
>
> Yes, it seems so. I had the impression that planning for that project was
> undertaken pretty early on with the involvement of Skipper. What exactly
> *was* the history of that project and what can we learn from it?
>

Short(-ish) version of some general thoughts from my end:

GSoC was brought to my attention as a fruitful endeavor (and it
definitely was!).  There was a list of potential topics posted on
SciPy SoC mentoring page, and I just kind of went through all of them
to see where the most value-add would be (both ways from me to the
SciPy project and from the SciPy project to my studies/work).  So that
list of topics was the main driving force, and I'm glad we're starting
to push for ideas now (I have a few ideas of my own motivated mostly
by needs of stats/statistical modeling, but I need some more time to
think).  However, we obviously should be open to new ideas from
students coming to the project.

Another thing is the importance of the application process.  The thing
that pushed me was reading about other successful applicants for SoC
in general (there is a lot of really good advice and write-ups out
there).  It is a very competitive program, so your proposal needs to
be very, very well thought out.  That includes drafts of proposals
with feedback from the community and mentors well before the official
application process even starts, so the earlier that's taken care of,
the better.

Beyond that, students should know what's expected of them coming into
the program (what development tools they need to be familiar with,
numpy/scipy standards, familiarization with the code base), and what's
expected of the end product (high quality code, test driven
development, etc.).

I also can't stress enough how helpful it was to have Alan and Josef
as mentors, as well as the availability to use the MLs for more
general questions.  Obviously, the level of engagement of the mentor
is going to depend on the project and the student, but I for one
couldn't have learned as much as I did nor gotten as far as we did
without their help.

If these comments are seen as helpful, I can try to work on some more
detailed ones/links to detailed ones, as I think this would be
beneficial to establish as something to look forward to.  The
availability of this program (Thank you, Google) allows significant
strides in development to be made each summer and that should not be
overlooked (I don't think it is).

Cheers,
Skipper


From josef.pktd at gmail.com  Wed Oct 21 16:20:51 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 21 Oct 2009 16:20:51 -0400
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
	<1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com>
	<e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
Message-ID: <1cd32cbb0910211320h6e7c93bcwfa1dbee9f49dab60@mail.gmail.com>

On Wed, Oct 21, 2009 at 3:23 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Oct 21, 2009 at 1:11 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Oct 21, 2009 at 3:02 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > Hi All,
>> >
>> > I don't feel that numpy/scipy did as well in GSOC 2009 as it could
>> > have.? I
>> > think this was mostly due to lack of preparation on our part, we weren't
>> > ready when the students started showing up on the lists. So I would like
>> > to
>> > put together a selection of suitable projects and corresponding mentors
>> > that
>> > we could put on the wiki somewhere and advertise. Just to start things
>> > off,
>> > here are two things that come to mind.
>> >
>> > Python 3k transition. I think it is time to start looking at this
>> > seriously.
>> > Best of breed special functions in cython. These could be part of a
>> > separate
>> > numpy extras package where code is restricted to C, Cython, and Python.
>> >
>> > Thoughts?
>>
>> for scipy: more stats, gsoc2009 went very well.
>>
>
> Yes, it seems so. I had the impression that planning for that project was
> undertaken pretty early on with the involvement of Skipper. What exactly
> *was* the history of that project and what can we learn from it?

Skipper started early in the preparation, and with the help of Allan and me
had a pretty concrete proposal. Because of final exams, the actual work
on statsmodels started a bit late.

>From my perspective a few issues that helped:

Skipper, Alan and I have the same background (in econometrics), so I knew
roughly what knowledge I could expect.
Skipper was willing and able to work his way through several textbooks
for the models that he, and I, didn't know much (or anything) about.

"Cleaning up stats.models" was a relatively well defined project, with
relatively easy to define goals.

I kept reminding him about writing tests, and to verify results with other
packages, so that we knew when we had a model "correctly" cleaned
up. Skipper spend a lot of time on this.

For most parts, I worked on the code in parallel with him, checking
on his progress, looking at the problems we had with matching
the results of the other statistical packages, finding bugs and
writing some draft code. During July, August we had almost daily
long email threads.
I think, this helped a lot, so that Skipper didn't get stuck or sidetracked,
and that I was able to keep up with the changes (and learn some of
the statistical background).

Josef

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From afriedle at indiana.edu  Wed Oct 21 16:36:38 2009
From: afriedle at indiana.edu (Andrew Friedley)
Date: Wed, 21 Oct 2009 16:36:38 -0400
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <FF93B835-CC5E-450E-B096-677C0F475341@cs.toronto.edu>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<200910211112.22037.faltet@pytables.org>	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>	<200910211447.02842.faltet@pytables.org>	<hbn1ge$bu8$1@ger.gmane.org>
	<FF93B835-CC5E-450E-B096-677C0F475341@cs.toronto.edu>
Message-ID: <4ADF70D6.1030001@indiana.edu>

sigh; yet another email dropped by the list.

David Warde-Farley wrote:
> On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:
> 
>> Since these are ufuncs, I suppose the SSE implementations could just  
>> be
>> put in a separate module, which is always compiled. Before importing  
>> the
>> module, we could simply check from Python side that the CPU supports  
>> the
>> necessary instructions. If everything is OK, the accelerated
>> implementations would then just replace the Numpy routines.
> 
> Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
> CorePy work this summer?
> 
> Looking at his slides again, the speedups are rather impressive. I  
> wonder if these could be usefully integrated into numpy itself?

Yes, my GSoC project is closely related, though I didn't do the CPU 
detection part, that'd be easy to do.  Also I wrote my code specifically 
for 64-bit x86.

I didn't focus so much on the transcendental functions, though they 
wouldn't be too hard to implement.  There's also the possibility to 
provide implementations with differing tradeoffs between accuracy and 
performance.

I think the blog link got posted already, but here's relevant info:

http://numcorepy.blogspot.com
http://www.corepy.org/wiki/index.php?title=CoreFunc

I talked about this in my SciPy talk and up-coming paper, as well.

Also people have just been talking about x86 in this thread -- other 
architectures could be supported too; eg PPC/Altivec or even Cell SPU 
and other accelerators.  I actually wrote a quick/dirty implementation 
of addition and vector normalization ufuncs for Cell SPU recently. Basic 
result is that overall performance is very roughly comparable to a 
similar speed x86 chip, but this is a huge win over just running on the 
extremely slow Cell PPC cores.

Andrew


From millman at berkeley.edu  Wed Oct 21 16:40:52 2009
From: millman at berkeley.edu (Jarrod Millman)
Date: Wed, 21 Oct 2009 13:40:52 -0700
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>
Message-ID: <c7009a550910211340s7287c83bv862bd4fe991b62a6@mail.gmail.com>

On Wed, Oct 21, 2009 at 12:02 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> I don't feel that numpy/scipy did as well in GSOC 2009 as it could have.

I'd be curious to hear why you felt that numpy/scipy didn't do as well
this year.  We had more projects than any other year and I think that
most of the code ended being used.  It could be that the work done
wasn't publicized enough or that the most of the contributions end up
contributed to related projects like in a scikit or (hopefully soon to
be merged work) in cython.  At any rate, I'd be curious to hear more
about your concerns so that they we don't repeat them next year
(assuming the program is run again next year).

> I think this was mostly due to lack of preparation on our part, we weren't
> ready when the students started showing up on the lists. So I would like to
> put together a selection of suitable projects and corresponding mentors that
> we could put on the wiki somewhere and advertise. Just to start things off,
> here are two things that come to mind.

Regardless, better preparation would be a huge help.  Having detailed
lists of summer projects will be useful even if the SoC program
doesn't get approved for next year.

> Python 3k transition. I think it is time to start looking at this seriously.
> Best of breed special functions in cython. These could be part of a separate
> numpy extras package where code is restricted to C, Cython, and Python.

Both of these ideas sounds very interesting.  Personally, I would like
to see ideas like these make there way into fully fleshed out NEPs:
  http://projects.scipy.org/numpy/browser/trunk/doc/neps

-- 
Jarrod Millman
Helen Wills Neuroscience Institute
10 Giannini Hall, UC Berkeley
http://cirl.berkeley.edu/


From aisaac at american.edu  Wed Oct 21 16:42:25 2009
From: aisaac at american.edu (Alan G Isaac)
Date: Wed, 21 Oct 2009 16:42:25 -0400
Subject: [Numpy-discussion] GSOC 2010
In-Reply-To: <e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
References: <e06186140910211202w47fc8e60jb692a98f65ef6242@mail.gmail.com>	<1cd32cbb0910211211n1ed296ebo908f1a4f7b6fa240@mail.gmail.com>
	<e06186140910211223m5501d92cv437bd8a43390148d@mail.gmail.com>
Message-ID: <4ADF7231.8070200@american.edu>

On 10/21/2009 3:23 PM, Charles R Harris wrote:
> What exactly *was* the history of that project and what can we learn
> from it?

Imo, what really drove this project forward, is that Skipper
was able to interact regularly with someone else who was actively
using and developing on the code base (i.e., Josef).  While I am confident
Skipper would have made a worthwhile contribution without this,
I think he would agree that he both learned more and was more
productive because he was able to interact with Josef.

One other thing that was important was focus: Skipper (and Josef)
focused in on making sure an important but doable (summer is very
short!) piece of the stats code was refactored, extended,
documented, and tested.

Alan Isaac

PSI do not mean to diminish the importance of the feedback
kindly provided by others.


From cournape at gmail.com  Wed Oct 21 21:24:51 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 22 Oct 2009 10:24:51 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <hbn1ge$bu8$1@ger.gmane.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org>
Message-ID: <5b8d13220910211824o3ec041f3h8a7e9f745045567f@mail.gmail.com>

On Wed, Oct 21, 2009 at 10:14 PM, Pauli Virtanen <pav+sp at iki.fi> wrote:

>
> This type of project could probably also be started outside Numpy, and
> just monkey-patch the Numpy routines on import.

I think I would prefer this approach as a first shot. I will look into
adding a small C library + wrapper in python to know which SIMD
instructions are available to numpy. Then people can reuse this for
whatever approach they prefer.

David


From sturla at molden.no  Wed Oct 21 22:31:23 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 04:31:23 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
Message-ID: <4ADFC3FB.5030906@molden.no>

Mathieu Blondel skrev:
> Hello,
>
> About one year ago, a high-level, objected-oriented SIMD API was added
> to Mono. For example, there is a class Vector4f for vectors of 4
> floats and this class implements methods such as basic operators,
> bitwise operators, comparison operators, min, max, sqrt, shuffle
> directly using SIMD operations.
I think you are confusing SIMD with Intel's MMX/SSE instruction set.

SIMD means "single instruction - multiple data". NumPy is interherently 
an object-oriented SIMD API:

  array1[:] = array2 + array3

is a SIMD instruction by definition.

SIMD instructions in hardware for length-4 vectors are mostly useful for 
3D graphics. But they are not used a lot for that purpose, because GPUs 
are getting common. SSE is mostly for rendering 3D graphics without a 
GPU. There is nothing that prevents NumPy from having a Vector4f dtype, 
that internally stores four float32 and is aligned at 16 byte 
boundaries. But it would not be faster than the current float32 dtype. 
Do you know why?

The reason is that memory access is slow, and computation is fast. 
Modern CPUs are starved. The speed of NumPy is not limited by not using 
MMX/SSE whenever possible. It is limited from having to create and 
delete temporary arrays all the time. You are suggesting to optimize in 
the wrong place. There is a lot that can be done to speed up 
computation: There are optimized BLAS libraries like ATLAS and MKL. 
NumPy uses BLAS for things like matrix multiplication. There are OpenMP 
for better performance on multicores. There are OpenCL and CUDA for 
moving computation from CPUs to GPU. But the main boost you get from 
going from NumPy to hand-written C or Fortran comes from reduced memory use.

> existing discussion here. Memory-alignment is an import related issue
> since non-aligned movs can tank the performance.
>
>   

You can align an ndarray on 16-byte boundary like this:

def aligned_array(N, dtype):
     d = dtype()
     tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8)
     address = tmp.__array_interface__['data'][0]
     offset = (16 - address % 16) % 16
     return tmp[offset:offset+N].view(dtype=dtype)


Sturla Molden


From mathieu at mblondel.org  Wed Oct 21 23:32:13 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Thu, 22 Oct 2009 12:32:13 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4ADFC3FB.5030906@molden.no>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>
Message-ID: <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>

On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden <sturla at molden.no> wrote:
> Mathieu Blondel skrev:
>> Hello,
>>
>> About one year ago, a high-level, objected-oriented SIMD API was added
>> to Mono. For example, there is a class Vector4f for vectors of 4
>> floats and this class implements methods such as basic operators,
>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>> directly using SIMD operations.
> I think you are confusing SIMD with Intel's MMX/SSE instruction set.

OK, I should have said "Object-oriented SIMD API that is implemented
using hardware SIMD instructions".

And when an ISA doesn't allow to perform a specific operation in only
one instruction (say the absolute value of the differences), the
operation can be implemented in terms of other instructions.

> SIMD instructions in hardware for length-4 vectors are mostly useful for
> 3D graphics. But they are not used a lot for that purpose, because GPUs
> are getting common. SSE is mostly for rendering 3D graphics without a
> GPU. There is nothing that prevents NumPy from having a Vector4f dtype,
> that internally stores four float32 and is aligned at 16 byte
> boundaries. But it would not be faster than the current float32 dtype.
> Do you know why?

Yes I know because this has already been explained in this very thread
by someone before you!


Mathieu


From robert.kern at gmail.com  Wed Oct 21 23:46:29 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 21 Oct 2009 22:46:29 -0500
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> 
	<4ADFC3FB.5030906@molden.no>
	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
Message-ID: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>

On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel <mathieu at mblondel.org> wrote:
> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden <sturla at molden.no> wrote:
>> Mathieu Blondel skrev:
>>> Hello,
>>>
>>> About one year ago, a high-level, objected-oriented SIMD API was added
>>> to Mono. For example, there is a class Vector4f for vectors of 4
>>> floats and this class implements methods such as basic operators,
>>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>>> directly using SIMD operations.
>> I think you are confusing SIMD with Intel's MMX/SSE instruction set.
>
> OK, I should have said "Object-oriented SIMD API that is implemented
> using hardware SIMD instructions".

No, I think you're right. Using "SIMD" to refer to numpy-like
operations is an abuse of the term not supported by any outside
community that I am aware of. Everyone else uses "SIMD" to describe
hardware instructions, not the application of a single syntactical
element of a high level language to a non-trivial data structure
containing lots of atomic data elements.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From dwf at cs.toronto.edu  Thu Oct 22 02:47:09 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Thu, 22 Oct 2009 02:47:09 -0400
Subject: [Numpy-discussion] Using numpydoc outside of numpy
In-Reply-To: <4ADF251F.3070505@stsci.edu>
References: <4ADF1AE3.9030002@stsci.edu> <4ADF251F.3070505@stsci.edu>
Message-ID: <20091022064709.GA30908@rodimus>

On Wed, Oct 21, 2009 at 11:13:35AM -0400, Michael Droettboom wrote:
> Sorry for the noise.  Found the instructions in HOWTO_BUILD_DOCS.txt .

Not sure if this is part of what you discovered, but numpydoc is at the Cheese Shop too:

	http://pypi.python.org/pypi/numpydoc

David


From sturla at molden.no  Thu Oct 22 03:35:52 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 09:35:52 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
Message-ID: <4AE00B58.3090802@molden.no>

Robert Kern skrev:
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.
>   
Then you should pick up a book on parallel computing.

It is common to differentiate between four classes of computers: SISD, 
MISD, SIMD, and MIMD machines.

A SISD system is the classical von Neuman machine. A MISD system is a 
pipelined von Neuman machine, for example the x86 processor.

A SIMD system is one that has one CPU dedicated to control, and a large 
collection of subordinate ALUs for computation. Each ALU has a small 
amount of private memory. The IBM Cell processor is the typical SIMD 
machine.

A special class of SIMD machines are the so-called "vector machines", of 
which the most famous is the Cray C90. The MMX and SSE instructions in 
Intel Pentium processors are an example of vector instructions. Some 
computer scientists regard vector machines a subtype of MISD systems, 
orthogonal to piplines, because there are no subordinate ALUs with 
private memory.

MIMD systems multiple independent CPUs. MIMD systems comes in two 
categories: shared-memory processors (SMP) and distributed-memory 
machines (also called cluster computers). The dual- and quad-core x86 
processors are shared-memory MIMD machines.

Many people associate the word SIMD with SSE due to Intel marketing. But 
to the extent that vector machines are MISD orthogonal to piplined von 
Neuman machines, SSE cannot be called SIMD.

NumPy is a software simulated vector machine, usually executed on MISD 
hardware. To the extent that vector machines (such as SSE and C90) are 
SIMD, we must call NumPy an object-oriented SIMD library.


S.M.


From matthieu.brucher at gmail.com  Thu Oct 22 03:41:10 2009
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Thu, 22 Oct 2009 09:41:10 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>
	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
Message-ID: <e76aa17f0910220041k3e7ea994j55b630d936f4489@mail.gmail.com>

>> OK, I should have said "Object-oriented SIMD API that is implemented
>> using hardware SIMD instructions".
>
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.

I agree with Sturla, for instance nVidia GPUs do SIMD computations
with blocs of 16 values at a time, but the hardware behind can't
compute on so much data at a time. It's SIMD from our point of view,
just like Numpy does ;)

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


From sturla at molden.no  Thu Oct 22 03:45:35 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 09:45:35 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <e76aa17f0910220041k3e7ea994j55b630d936f4489@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<4ADFC3FB.5030906@molden.no>	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
	<e76aa17f0910220041k3e7ea994j55b630d936f4489@mail.gmail.com>
Message-ID: <4AE00D9F.1070107@molden.no>

Matthieu Brucher skrev:
> I agree with Sturla, for instance nVidia GPUs do SIMD computations
> with blocs of 16 values at a time, but the hardware behind can't
> compute on so much data at a time. It's SIMD from our point of view,
> just like Numpy does ;)
>
>   
A computer with a CPU and a GPU is a SIMD machine by definition, due to 
the single CPU and the multiple ALUs in the GPU, which are subordinate 
to the CPU. But with modern computers, these classifications becomes a 
bit unclear.

S.M.


From sturla at molden.no  Thu Oct 22 04:05:28 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 10:05:28 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
	<7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
Message-ID: <4AE01248.6050303@molden.no>

Mathieu Blondel skrev:
> Peter Norvig suggested to merge Numpy into Cython but he didn't
> mention SIMD as the reason (this one is from me). 

I don't know what Norvig said or meant.

However:

There is NumPy support in Cython. Cython has a general syntax applicable 
to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy 
arrays are converted to Py_buffer structs behind the scenes.)

Support for optimized vector expressions might be added later. 
Currently, slicing works as with NumPy in Python, producing slice 
objects and invoking NumPy's own code, instead of being converted to 
fast inlined C.

The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, 
replacing the current C source. That might be what Norvig meant if he 
suggested merging NumPy into Cython.


S.M.


From mathieu at mblondel.org  Thu Oct 22 04:26:08 2009
From: mathieu at mblondel.org (Mathieu Blondel)
Date: Thu, 22 Oct 2009 17:26:08 +0900
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4AE01248.6050303@molden.no>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>
	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>
	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>
	<7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>
	<4AE01248.6050303@molden.no>
Message-ID: <7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com>

On Thu, Oct 22, 2009 at 5:05 PM, Sturla Molden <sturla at molden.no> wrote:
> Mathieu Blondel skrev:

> The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k,
> replacing the current C source. That might be what Norvig meant if he
> suggested merging NumPy into Cython.

As I wrote earlier in this thread, I confused Cython and CPython. PN
was suggesting to include Numpy in the CPython  distribution (not
Cython). The reason why was also given earlier.

Mathieu


From sturla at molden.no  Thu Oct 22 04:59:21 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 10:59:21 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>	<4ADEB0B4.1060107@ar.media.kyoto-u.ac.jp>	<7e1472660910210048p3b642a7bs13e2a7767cfbc2d9@mail.gmail.com>	<4ADEC0C4.6060308@ar.media.kyoto-u.ac.jp>	<7e1472660910210138m568b5323kf1103fd1d23cba8c@mail.gmail.com>	<4AE01248.6050303@molden.no>
	<7e1472660910220126h1153d37u86ef53de57aa55fe@mail.gmail.com>
Message-ID: <4AE01EE9.6040409@molden.no>

Mathieu Blondel skrev:
> As I wrote earlier in this thread, I confused Cython and CPython. PN
> was suggesting to include Numpy in the CPython  distribution (not
> Cython). The reason why was also given earlier.
>
>   
First, that would currently not be possible, as NumPy does not support 
Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which 
would prevent adoption in the Python standard library. At least they 
have to change their current policy. Also with NumPy in the standard 
library, any modification to NumPy would require a PEP.

But Python should have a PEP 3118 compliant buffer object in the 
standard library, which NumPy could subclass.

S.M.


From nadavh at visionsense.com  Thu Oct 22 05:01:52 2009
From: nadavh at visionsense.com (Nadav Horesh)
Date: Thu, 22 Oct 2009 11:01:52 +0200
Subject: [Numpy-discussion] Convolution of a masked array
Message-ID: <710F2847B0018641891D9A21602763605AD1D9@ex3.envision.co.il>


Is there a way to proper convolve a masked array with a normal (nonmasked) array?
My specific problem is a convolution of a 2D masked array with a separable kernel (a convolution with 2 1D array along each axis).

  Nadav.

  
From stefan at sun.ac.za  Thu Oct 22 05:29:44 2009
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Thu, 22 Oct 2009 11:29:44 +0200
Subject: [Numpy-discussion] ANN: SciPy October Sprint
In-Reply-To: <9457e7c80910220228v15638e9awc4b8096b8d7e960e@mail.gmail.com>
References: <9457e7c80910070446k2c1ae895u4011d242abc224b@mail.gmail.com> 
	<9457e7c80910220228v15638e9awc4b8096b8d7e960e@mail.gmail.com>
Message-ID: <9457e7c80910220229j36774b61x7c259a8679fc3949@mail.gmail.com>

Hi all,

The weekend is just around the corner, and we're looking forward to
the sprint! ?Here is the detail again:

"""
Our patch queue keeps getting longer and longer, so here is an
opportunity to do some spring cleaning (it's spring in South Africa,
at least)!

Please join us for an October SciPy sprint:

? ?* Date: 24/25 October 2009 (Sat/Sun)
? ?* More information: http://projects.scipy.org/scipy/wiki/SciPySprint200910

We are looking for volunteers to write documentation, review code, fix
bugs or design marketing material. New contributors are most welcome,
and mentoring will be available.
"""

See you there,

Regards
St?fan


From gruben at bigpond.net.au  Thu Oct 22 05:51:54 2009
From: gruben at bigpond.net.au (Gary Ruben)
Date: Thu, 22 Oct 2009 20:51:54 +1100
Subject: [Numpy-discussion] Optimized sum of squares
In-Reply-To: <1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
References: <e95b09750910170836p54b649b9r91cc72a1a592def1@mail.gmail.com>	<e06186140910171020t5e0d1c08x6a48a7fb64f68df5@mail.gmail.com>	<1cd32cbb0910171054q3eb8c072o8c5fff95f2b74b0@mail.gmail.com>	<e06186140910171102t717c0172w60e094883e9a5ca9@mail.gmail.com>	<1cd32cbb0910171627w7b177874r98a43260c407b6e4@mail.gmail.com>	<20091018075732.GA31449@phare.normalesup.org>	<4ADAE897.1070000@bigpond.net.au>
	<1cd32cbb0910201016y108e4b27k6c44a9d164d93a48@mail.gmail.com>
Message-ID: <4AE02B3A.7040407@bigpond.net.au>

josef.pktd at gmail.com wrote:
> Is it really possible to get the same as np.sum(a*a, axis)  with
> tensordot  if a.ndim=2 ?
> Any way I try the "something_else", I get extra terms as in np.dot(a.T, a)

Just to answer this question, np.dot(a,a) is equivalent to 
np.tensordot(a,a, axis=(0,0))
but the latter is about 10x slower for me. That is, you have to specify 
the axes for both arrays for tensordot:

In [16]: a=rand(1000)

In [17]: timeit dot(a,a)
100000 loops, best of 3: 3.51 ?s per loop

In [18]: timeit tensordot(a,a,(0,0))
10000 loops, best of 3: 37.6 ?s per loop

In [19]: tensordot(a,a,(0,0))==dot(a,a)
Out[19]: True


From ralf.gommers at googlemail.com  Thu Oct 22 06:36:46 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Thu, 22 Oct 2009 12:36:46 +0200
Subject: [Numpy-discussion] why does binary_repr don't support arrays
In-Reply-To: <OF1ECCC356.71109568-ONC1257655.0032D20D-C1257655.00331A11@ifm-electronic.com>
References: <OF1ECCC356.71109568-ONC1257655.0032D20D-C1257655.00331A11@ifm-electronic.com>
Message-ID: <dde7764a0910220336g1b409cf2o498cdde126687b82@mail.gmail.com>

On Tue, Oct 20, 2009 at 11:17 AM, <markus.proeller at ifm.com> wrote:

>
> Hello,
>
> I'm always wondering why binary_repr doesn't allow arrays as input values.
> I always have to use a work around like:
>
> import numpy as np
>
> def binary_repr(arr, width=None):
>     binary_list = map((lambda foo: np.binary_repr(foo, width)),
> arr.flatten())
>     str_len_max = len(np.binary_repr(arr.max(), width=width))
>     str_len_min = len(np.binary_repr(arr.min(), width=width))
>     if str_len_max > str_len_min:
>         str_len = str_len_max
>     else:
>         str_len = str_len_min
>     binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len))
>     return binary_array.reshape(arr.shape)
>
> Is there a reason why arrays are not supported or is there another function
> that does support arrays?
>

Not sure if there was/is a reason, but imho it would be nice to have support
for arrays. Also in base_repr. Could you file a ticket in trac?

Cheers,
Ralf


>
> Thanks,
>
> Markus
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091022/aef61c22/attachment.html>

From gregor.thalhammer at gmail.com  Thu Oct 22 06:48:14 2009
From: gregor.thalhammer at gmail.com (Gregor Thalhammer)
Date: Thu, 22 Oct 2009 12:48:14 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <hbnkum$lci$1@ger.gmane.org>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<200910211112.22037.faltet@pytables.org>
	<5b8d13220910210527r64b5761amc4d7e1a67b062910@mail.gmail.com>
	<200910211447.02842.faltet@pytables.org> <hbn1ge$bu8$1@ger.gmane.org>
	<42de02940910210938t320cfe8sb00a7f0ea9745731@mail.gmail.com>
	<hbnkum$lci$1@ger.gmane.org>
Message-ID: <42de02940910220348r126ddd50ra8377da8c359ecb9@mail.gmail.com>

2009/10/21 Neal Becker <ndbecker2 at gmail.com>

> ...
> > I once wrote a module that replaces the built in transcendental
> > functions of numpy by optimized versions from Intels vector math
> > library. If someone is interested, I can publish it. In my experience it
> > was of little use since real world problems are limited by memory
> > bandwidth. Therefore extending numexpr with optimized transcendental
> > functions was the better solution. Afterwards I discovered that I could
> > have saved the effort of the first approach since gcc is able to use
> > optimized functions from Intels vector math library or AMD's math core
> > library, see the doc's of -mveclibabi. You just need to recompile numpy
> > with proper compiler arguments.
> >
>
> I'm interested.  I'd like to try AMD rather than intel, because AMD is
> easier to obtain.  I'm running on intel machine, I hope that doesn't matter
> too much.
>
> What exactly do I need to do?
>
I once tried to recompile numpy with AMD's AMCL. Unfortunately I lost the
settings after an upgrade. What I remember: install AMCL, (and read the docs
;-) ), mess with the compiler args (-mveclibabi and related), link with the
AMCL. Then you get faster pow/sin/cos/exp. The transcendental functions of
AMCL also work with Intel processors with the same performance. I did not
try the Intel SVML, which belongs to the Intel compilers.
This is different to the first approach, which is a small wrapper for Intels
VML, put into a python module and which can inject it's ufuncs (via
numpy.set_numeric_ops) into numpy. If you want I can send the package per
private email.


> I see that numpy/site.cfg has an MKL section.  I'm assuming I should not
> touch that, but just mess with gcc flags?
>
This is for using the lapack provided by Intels MKL. These settings are not
related to the above mentioned compiler options.

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091022/db0f8896/attachment.html>

From dagss at student.matnat.uio.no  Thu Oct 22 07:20:17 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Thu, 22 Oct 2009 13:20:17 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
Message-ID: <4AE03FF1.5000309@student.matnat.uio.no>

Robert Kern wrote:
> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel <mathieu at mblondel.org> wrote:
>   
>> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden <sturla at molden.no> wrote:
>>     
>>> Mathieu Blondel skrev:
>>>       
>>>> Hello,
>>>>
>>>> About one year ago, a high-level, objected-oriented SIMD API was added
>>>> to Mono. For example, there is a class Vector4f for vectors of 4
>>>> floats and this class implements methods such as basic operators,
>>>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>>>> directly using SIMD operations.
>>>>         
>>> I think you are confusing SIMD with Intel's MMX/SSE instruction set.
>>>       
>> OK, I should have said "Object-oriented SIMD API that is implemented
>> using hardware SIMD instructions".
>>     
>
> No, I think you're right. Using "SIMD" to refer to numpy-like
> operations is an abuse of the term not supported by any outside
> community that I am aware of. Everyone else uses "SIMD" to describe
> hardware instructions, not the application of a single syntactical
> element of a high level language to a non-trivial data structure
> containing lots of atomic data elements.
>   
BTW, is there any term for this latter concept that's not SIMD or 
"vector operation"? It would be good to have a word to distinguish this 
concept from both CPU instructions and linear algebra.

(Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU 
instructions makes best sense, but I'm happy to yield to conventions...)

Dag Sverre


From markus.proeller at ifm.com  Thu Oct 22 07:45:46 2009
From: markus.proeller at ifm.com (markus.proeller at ifm.com)
Date: Thu, 22 Oct 2009 13:45:46 +0200
Subject: [Numpy-discussion] Antwort: Re: why does binary_repr don't support
	arrays
In-Reply-To: <dde7764a0910220336g1b409cf2o498cdde126687b82@mail.gmail.com>
Message-ID: <OF52B1CFB7.04D71055-ONC1257657.004070A6-C1257657.0040A871@ifm-electronic.com>

numpy-discussion-bounces at scipy.org schrieb am 22.10.2009 12:36:46:

> >
> >
> > On Tue, Oct 20, 2009 at 11:17 AM, <markus.proeller at ifm.com> wrote:
> > 
> > Hello, 
> > 
> > I'm always wondering why binary_repr doesn't allow arrays as input 
> > values. I always have to use a work around like: 
> > 
> > import numpy as np 
> > 
> > def binary_repr(arr, width=None): 
> >     binary_list = map((lambda foo: np.binary_repr(foo, width)), 
arr.flatten())
> >     str_len_max = len(np.binary_repr(arr.max(), width=width)) 
> >     str_len_min = len(np.binary_repr(arr.min(), width=width)) 
> >     if str_len_max > str_len_min: 
> >         str_len = str_len_max 
> >     else: 
> >         str_len = str_len_min 
> >     binary_array = np.fromiter(binary_list, dtype='|S'+str(str_len)) 
> >     return binary_array.reshape(arr.shape) 
> > 
> > Is there a reason why arrays are not supported or is there another 
> > function that does support arrays? 
> 
> Not sure if there was/is a reason, but imho it would be nice to have
> support for arrays. Also in base_repr. Could you file a ticket in trac?
> 
> Cheers,
> Ralf
>  

Okay, I opened a new ticket:

http://projects.scipy.org/numpy/ticket/1270

Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091022/995b6839/attachment.html>

From ferrell at diablotech.com  Thu Oct 22 08:40:33 2009
From: ferrell at diablotech.com (Robert Ferrell)
Date: Thu, 22 Oct 2009 06:40:33 -0600
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4AE00B58.3090802@molden.no>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
	<4AE00B58.3090802@molden.no>
Message-ID: <290D922D-EBA1-4334-AF09-63C397F36C0F@diablotech.com>


On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote:

> Robert Kern skrev:
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> Then you should pick up a book on parallel computing.
>
> It is common to differentiate between four classes of computers: SISD,
> MISD, SIMD, and MIMD machines.
>
> A SISD system is the classical von Neuman machine. A MISD system is a
> pipelined von Neuman machine, for example the x86 processor.
>
> A SIMD system is one that has one CPU dedicated to control, and a  
> large
> collection of subordinate ALUs for computation. Each ALU has a small
> amount of private memory. The IBM Cell processor is the typical SIMD
> machine.
>
> A special class of SIMD machines are the so-called "vector  
> machines", of
> which the most famous is the Cray C90. The MMX and SSE instructions in
> Intel Pentium processors are an example of vector instructions. Some
> computer scientists regard vector machines a subtype of MISD systems,
> orthogonal to piplines, because there are no subordinate ALUs with
> private memory.
>
> MIMD systems multiple independent CPUs. MIMD systems comes in two
> categories: shared-memory processors (SMP) and distributed-memory
> machines (also called cluster computers). The dual- and quad-core x86
> processors are shared-memory MIMD machines.
>
> Many people associate the word SIMD with SSE due to Intel marketing.  
> But
> to the extent that vector machines are MISD orthogonal to piplined von
> Neuman machines, SSE cannot be called SIMD.
>
> NumPy is a software simulated vector machine, usually executed on MISD
> hardware. To the extent that vector machines (such as SSE and C90) are
> SIMD, we must call NumPy an object-oriented SIMD library.

This is not the terminology I am familiar with.  Calling NumPy an "  
object-oriented SIMD library" is very confusing for me.  I worked in  
the parallel computer world for a while (back in the dark ages) and  
this terminology would have been confusing to everyone I dealt with.   
I've also read many parallel computing books.  In my experience SIMD  
refers to hardware, not software.  There is no reason that NumPy can't  
be written to run great (get good speed-ups) on an 8-core shared  
memory system.  That would be a MIMD system, and there's nothing about  
it that doesn't fit with the NumPy abstraction.  And, although SIMD  
can be a subset of MIMD, there are things that can be done in NumPy  
that be parallelized on MIMD machines but not on SIMD machines (e.g.  
the NumPy vector type is flexible enough it can store a list of tasks,  
and the operations on that vector can be parallelized easily on a  
shared memory MIMD machine - task parallelism - but not on a SIMD  
machine).

If we say that  "NumPy is a software simulated vector machine" or an "  
object-oriented SIMD library" we are pigeonholing NumPy in a way which  
is too limiting and isn't accurate.  As a user it feels to me that  
NumPy is built around various algebra abstractions, many of which map  
well onto vector machine operations.  That means that many of the  
operations are amenable to efficient implementation on SIMD hardware.   
But, IMO, one of the nice features of NumPy is it is built around high- 
level operations, and I would hate to see the project go down a path  
which insists that everything in NumPy be efficient on all SIMD  
hardware.

Of course, I would also love to see implementations which take as much  
advantage of available HW as possible (e.g. exploit SIMD HW if  
available).

That's my $0.02, worth only a couple cents less than that.

-robert


From robert.kern at gmail.com  Thu Oct 22 11:51:14 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 22 Oct 2009 10:51:14 -0500
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4AE00B58.3090802@molden.no>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> 
	<4ADFC3FB.5030906@molden.no>
	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> 
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> 
	<4AE00B58.3090802@molden.no>
Message-ID: <3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com>

On Thu, Oct 22, 2009 at 02:35, Sturla Molden <sturla at molden.no> wrote:
> Robert Kern skrev:
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> Then you should pick up a book on parallel computing.

I would be delighted to see a reference to one that refers to a high
level language's API as SIMD. Please point one out to me. It's
certainly not any of the ones I have available to me.

> It is common to differentiate between four classes of computers: SISD,
> MISD, SIMD, and MIMD machines.
>
> A SISD system is the classical von Neuman machine. A MISD system is a
> pipelined von Neuman machine, for example the x86 processor.
>
> A SIMD system is one that has one CPU dedicated to control, and a large
> collection of subordinate ALUs for computation. Each ALU has a small
> amount of private memory. The IBM Cell processor is the typical SIMD
> machine.
>
> A special class of SIMD machines are the so-called "vector machines", of
> which the most famous is the Cray C90. The MMX and SSE instructions in
> Intel Pentium processors are an example of vector instructions. Some
> computer scientists regard vector machines a subtype of MISD systems,
> orthogonal to piplines, because there are no subordinate ALUs with
> private memory.
>
> MIMD systems multiple independent CPUs. MIMD systems comes in two
> categories: shared-memory processors (SMP) and distributed-memory
> machines (also called cluster computers). The dual- and quad-core x86
> processors are shared-memory MIMD machines.
>
> Many people associate the word SIMD with SSE due to Intel marketing. But
> to the extent that vector machines are MISD orthogonal to piplined von
> Neuman machines, SSE cannot be called SIMD.

That's a fair point, but unrelated to whether or not numpy can be
labeled SIMD. These all refer to hardware.

> NumPy is a software simulated vector machine, usually executed on MISD
> hardware. To the extent that vector machines (such as SSE and C90) are
> SIMD, we must call NumPy an object-oriented SIMD library.

numpy does not "simulate" anything. It is an object-oriented library.
If numpy could be said to "simulate" a vector machine, than just about
any object-oriented library that overloads operators could. It creates
a false equivalence between numpy and software that actually does
simulate hardware.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From robert.kern at gmail.com  Thu Oct 22 12:01:20 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 22 Oct 2009 11:01:20 -0500
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <4AE03FF1.5000309@student.matnat.uio.no>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com> 
	<4ADFC3FB.5030906@molden.no>
	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com> 
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com> 
	<4AE03FF1.5000309@student.matnat.uio.no>
Message-ID: <3d375d730910220901t2476e9b6i59c8c33a8e59686c@mail.gmail.com>

On Thu, Oct 22, 2009 at 06:20, Dag Sverre Seljebotn
<dagss at student.matnat.uio.no> wrote:
> Robert Kern wrote:
>> On Wed, Oct 21, 2009 at 22:32, Mathieu Blondel <mathieu at mblondel.org> wrote:
>>
>>> On Thu, Oct 22, 2009 at 11:31 AM, Sturla Molden <sturla at molden.no> wrote:
>>>
>>>> Mathieu Blondel skrev:
>>>>
>>>>> Hello,
>>>>>
>>>>> About one year ago, a high-level, objected-oriented SIMD API was added
>>>>> to Mono. For example, there is a class Vector4f for vectors of 4
>>>>> floats and this class implements methods such as basic operators,
>>>>> bitwise operators, comparison operators, min, max, sqrt, shuffle
>>>>> directly using SIMD operations.
>>>>>
>>>> I think you are confusing SIMD with Intel's MMX/SSE instruction set.
>>>>
>>> OK, I should have said "Object-oriented SIMD API that is implemented
>>> using hardware SIMD instructions".
>>>
>>
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> BTW, is there any term for this latter concept that's not SIMD or
> "vector operation"? It would be good to have a word to distinguish this
> concept from both CPU instructions and linear algebra.

Of course, "vector instruction" and "vectorized operation" sometimes
also refer to the CPU instructions. :-)

I don't think you will get much better than "vectorized operation",
though. While it's ambiguous, it has a long history in the high level
language world thanks to Matlab.

> (Personally I think describing NumPy as SIMD and use "SSE/MMX" for CPU
> instructions makes best sense, but I'm happy to yield to conventions...)

Well, "SSE/MMX" is also too limiting. Altivec instructions are also in
the same class, and we should be able to use them on PPC platforms.
Regardless of the origin of the term, "SIMD" is used to refer to all
of these instructions in common practice. Sturla may be right in some
prescriptive sense, but descriptively, he's quite wrong.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From silva at lma.cnrs-mrs.fr  Thu Oct 22 13:28:14 2009
From: silva at lma.cnrs-mrs.fr (Fabricio Silva)
Date: Thu, 22 Oct 2009 19:28:14 +0200
Subject: [Numpy-discussion] Sphinx/Numpydoc, attributes and property
Message-ID: <1256232494.3391.19.camel@PCTerrusse>

It seems that either Sphinx or NumpyDoc is having troubles with property
attributes.
Considering the following piece of code in foo.py

        class Profil(object):
            """
            Blabla
            
            Attributes
            ----------
            tfin
            tdeb : float
              Startpoint
            pts : array
              Blabla2.
              
            """
        
            def __init__(self):
                """
                """
                self.pts = np.array([[0,1]])
            
            @property
            def tfin(self):
                "The time horizon endpoint."
                return self.pts[0,:].max()
        
            @property
            def tdeb(self):
                "The time horizon startpoint."
                return self.pts[0,:].min()

and a foo.rst containing

        :mod:`foo` -- BlaTitle
        =====================================================
        
           .. autoclass:: foo.Profil

produces an attribute-table with only pts but without tfin and tdeb.
How can I handle this?

-- 
Fabrice Silva
Laboratory of Mechanics and Acoustics (CNRS, UPR 7051)


From sturla at molden.no  Thu Oct 22 13:42:42 2009
From: sturla at molden.no (Sturla Molden)
Date: Thu, 22 Oct 2009 19:42:42 +0200
Subject: [Numpy-discussion] Objected-oriented SIMD API for Numpy
In-Reply-To: <3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com>
References: <7e1472660910202244y6c85206am6a81afd154db6c2a@mail.gmail.com>
	<4ADFC3FB.5030906@molden.no>	<7e1472660910212032o587d6207y396c5a4abdea9de3@mail.gmail.com>
	<3d375d730910212046j15a7da19u211981675b7216a4@mail.gmail.com>
	<4AE00B58.3090802@molden.no>
	<3d375d730910220851ib57a10co47ee743f38b1bc0e@mail.gmail.com>
Message-ID: <4AE09992.10607@molden.no>

Robert Kern skrev:
> I would be delighted to see a reference to one that refers to a high
> level language's API as SIMD. Please point one out to me. It's
> certainly not any of the ones I have available to me.
>
>   
Numerical Receipes in Fortran 90, page 964 and 985-986, describes the 
syntax of Fortran 90 and 95 as SIMD.

Peter Pacheco's book on MPI describes the difference between von Neumann 
machines and vector machines as analogous to the difference between 
Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). 
He is ambigous as to whether vector machines really are SIMD, or more 
related to pipelined von Neumann machines.

Grama et al. "Introduction to Parallel Computing" describes SIMD as an 
"architecture", but it is more or less clear that the mean hardware. 
They do say the Fortran 90 "where statement" is a primitive used to 
support selective execution on SIMD processors, as conditional execution 
(if statements) are detrimental to performance.

So at least we here have three books claiming that Fortran is a language 
with special primities for SIMD processors.

>
> That's a fair point, but unrelated to whether or not numpy can be
> labeled SIMD. These all refer to hardware.
>   
Actually I don't think the distinction is that important as we are 
taking about Turing machines. Also, a lot of what we call "hardware" is 
actually implemented  as software on the chip: The most extreme example 
would be Transmeta, which completely software emulated x86 processors. 
The vague distinction between hardware and software is why we get 
patents on software in Europe, although pure software patents are 
prohibited. One can always argue that the program and the computer 
together constitutes a physical device; and circumventing patents by 
moving hardware into software should not be allowed. The distinction 
between hardware and software is not as clear as programmers tend to 
believe.

Another thing is that performance issues for vector machines and "vector 
languages" (Fortran 90, Matlab, NumPy) are similar. Precisely the same 
situations that makes NumPy and Matlab code slow are detrimental on 
SIMD/vector hardware. That would for example be long for loops with 
conditional if statements. On the other hand, vectorized operations over 
arrays, possibly using where/find masks, are fast. So although NumPy is 
not executed on a vector machine like the Cray C90, it certainly behaves 
like one performance wise.

I'd say that a MIMD machine running NumPy is a Turing machine emulating 
a SIMD/vector machine.

And now I am done with this stupid discussion...


Sturla Molden


From silva at lma.cnrs-mrs.fr  Thu Oct 22 18:42:16 2009
From: silva at lma.cnrs-mrs.fr (Fabricio Silva)
Date: Fri, 23 Oct 2009 00:42:16 +0200
Subject: [Numpy-discussion] Sphinx/Numpydoc, attributes and property
In-Reply-To: <1256232494.3391.19.camel@PCTerrusse>
References: <1256232494.3391.19.camel@PCTerrusse>
Message-ID: <1256251336.3391.27.camel@PCTerrusse>

It seems that


class Profil(object):        
    def __init__(self):
      """
      """
      pass
    
    def bla(self):
        "Blabla."
        return 0
        
    @property
    def tdeb(self):
        "The time horizon startpoint."
        return self.pts[0,:].min()

> and a foo.rst containing
:mod:`foo` -- BlaTitle
=====================================================
        
.. autoclass:: foo.Profil
   :members: bla, tdeb

produces a listing untitled "Methods" with methods bla and tdeb. Despite
tdeb is defined as a method, the decorator make tdeb be a property which
I would treat as an attribute and put it in the attribute list. That is
not what is done in sphinx/numpydoc. Who is to "blame" ? Sphinx or
NumpyDoc ?


-- 
Fabrice Silva
Laboratory of Mechanics and Acoustics (CNRS, UPR 7051)


From dwf at cs.toronto.edu  Fri Oct 23 05:02:45 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 23 Oct 2009 05:02:45 -0400
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
	<716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
	<c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com>
Message-ID: <CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>

On 21-Oct-09, at 11:01 AM, Ryan May wrote:

> ~/.local was added to *be the standard* for easily installing python
> packages in your user account.  And it works perfectly on the other
> major OSes, no twiddling of paths anymore.

I've had a lot of headaches with ~/.local on Ubuntu, actually.  
Apparently Ubuntu has some crazy 'dist-packages' thing going on in  
parallel to site-packages and /usr and /usr/local and its precedence  
is unclear. virtualenv also doesn't know jack about it (speaking of  
which, there's no way to control precedence of ~/.local with  
virtualenv, so I can't use virtualenv to override ~/.local if I want  
to treat "~/.local as the new site-packages").

Packaging is still more pain than it should be on *any* platform, I  
think, and I doubt we'll have it all sorted out until somewhere in the  
mid-to-upper 3.x's. :(

David


From dwf at cs.toronto.edu  Fri Oct 23 05:09:38 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Fri, 23 Oct 2009 05:09:38 -0400
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
Message-ID: <A353C540-7103-4C1B-BB89-A2602325DF3D@cs.toronto.edu>

On 21-Oct-09, at 6:58 AM, Robin wrote:

> My only worry is with installer packages - I'm thinking mainly of
> wxpython. Is there a way I can get that package to install in
> $HOME/.local. (The installer only seems to let you choose a drive).
> Also - if I build for example vim against the system python, will I be
> able to see packages in $HOME/.local from the python interpreter
> inside vim?

wxPython is going to be a problem with 64-bit Python. Namely, wxMac is  
based on Carbon, there is no 64-bit Carbon, and the wxCocoa port is  
not quite up to snuff. At any rate, any binary installer packages will  
almost certainly *not* work with the system python, at least if it  
runs in 64-bit mode by default.

The Python.org sources for 2.6.x has a script in the Mac/ subdirectory  
(I think, or in the build tools) for building a 4-way universal binary  
(i386, x86_64, ppc and ppc64). You can rather easily build it (just  
run the script) and it will produce executables of the form python (or  
python2.6) suffixed with -32 or -64 to run in one mode or the other.  
So, python-32 (or python2.6-32) will get you 32 bit Python, which will  
work with wxPython using wxMac, or python-64, which will not (but will  
do everything in 64-bit mode). I've successfully gotten svn numpy to  
build 4-way using such a 4-way Python.

David


From cournape at gmail.com  Fri Oct 23 05:26:45 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 23 Oct 2009 18:26:45 +0900
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
	<716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
	<c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com>
	<CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>
Message-ID: <5b8d13220910230226k15b28fc5n21e85e77e9e1a262@mail.gmail.com>

On Fri, Oct 23, 2009 at 6:02 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:

>
> Packaging is still more pain than it should be on *any* platform, I
> think, and I doubt we'll have it all sorted out until somewhere in the
> mid-to-upper 3.x's. :(

I think numpy and scipy on py3k will happen before that :)

David


From dsdale24 at gmail.com  Fri Oct 23 09:21:17 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Fri, 23 Oct 2009 09:21:17 -0400
Subject: [Numpy-discussion] numpy and C99
Message-ID: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>

Can we use features of C99 in numpy? For example, can we use "//"
style comments, and C99 for statements "for (int i=0, ...) "?

Darren


From mdroe at stsci.edu  Fri Oct 23 09:25:12 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Fri, 23 Oct 2009 09:25:12 -0400
Subject: [Numpy-discussion] numpydoc without autosummary
Message-ID: <4AE1AEB8.30709@stsci.edu>

Is there a way to use numpydoc without putting an autosummary table at 
the head of each class?  I'm using numpydoc primarily for the 
sectionized docstring support, but the autosummaries are somewhat 
overkill for my project.

Mike


From pav+sp at iki.fi  Fri Oct 23 09:29:58 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 23 Oct 2009 13:29:58 +0000 (UTC)
Subject: [Numpy-discussion] numpy and C99
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
Message-ID: <hbsb4m$vi8$1@ger.gmane.org>

Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote:
> Can we use features of C99 in numpy? For example, can we use "//" style
> comments, and C99 for statements "for (int i=0, ...) "?

It would be much easier if we could, but so far we have strived for C89 
compliance. So I guess the answer is "no".

-- 
Pauli Virtanen


From pav+sp at iki.fi  Fri Oct 23 09:31:54 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 23 Oct 2009 13:31:54 +0000 (UTC)
Subject: [Numpy-discussion] numpydoc without autosummary
References: <4AE1AEB8.30709@stsci.edu>
Message-ID: <hbsb8a$vi8$2@ger.gmane.org>

Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote:
> Is there a way to use numpydoc without putting an autosummary table at
> the head of each class?  I'm using numpydoc primarily for the
> sectionized docstring support, but the autosummaries are somewhat
> overkill for my project.

Numpydoc hooks into sphinx.ext.autodoc's docstring mangling. So if you 
just need to have docstrings formatted, you can use Sphinx's auto*:: 
directives.

-- 
Pauli Virtanen


From pav+sp at iki.fi  Fri Oct 23 09:39:29 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 23 Oct 2009 13:39:29 +0000 (UTC)
Subject: [Numpy-discussion] numpydoc without autosummary
References: <4AE1AEB8.30709@stsci.edu>
Message-ID: <hbsbmh$vi8$3@ger.gmane.org>

Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote:
> Is there a way to use numpydoc without putting an autosummary table at
> the head of each class?  I'm using numpydoc primarily for the
> sectionized docstring support, but the autosummaries are somewhat
> overkill for my project.

Ah, you meant the stuff output by default to class docstrings. Currently, 
there's no way to turn this off, unfortunately. It seems there should be, 
though...

-- 
Pauli Virtanen


From dsdale24 at gmail.com  Fri Oct 23 09:48:01 2009
From: dsdale24 at gmail.com (Darren Dale)
Date: Fri, 23 Oct 2009 09:48:01 -0400
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <hbsb4m$vi8$1@ger.gmane.org>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
	<hbsb4m$vi8$1@ger.gmane.org>
Message-ID: <a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>

On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen <pav+sp at iki.fi> wrote:
> Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote:
>> Can we use features of C99 in numpy? For example, can we use "//" style
>> comments, and C99 for statements "for (int i=0, ...) "?
>
> It would be much easier if we could, but so far we have strived for C89
> compliance. So I guess the answer is "no".

Out of curiosity (I am relatively new to C), what is holding numpy
back from embracing C99? Why adhere to a 20-year-old standard?

Darren


From dagss at student.matnat.uio.no  Fri Oct 23 10:03:14 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Fri, 23 Oct 2009 16:03:14 +0200
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>	<hbsb4m$vi8$1@ger.gmane.org>
	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
Message-ID: <4AE1B7A2.9070603@student.matnat.uio.no>

Darren Dale wrote:
> On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen <pav+sp at iki.fi> wrote:
>   
>> Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote:
>>     
>>> Can we use features of C99 in numpy? For example, can we use "//" style
>>> comments, and C99 for statements "for (int i=0, ...) "?
>>>       
>> It would be much easier if we could, but so far we have strived for C89
>> compliance. So I guess the answer is "no".
>>     
>
> Out of curiosity (I am relatively new to C), what is holding numpy
> back from embracing C99? Why adhere to a 20-year-old standard?
>   
Microsoft's compilers don't support C99 (or, at least, versions that 
still has to be used doesn't).

Dag Sverre


From cournape at gmail.com  Fri Oct 23 10:09:55 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 23 Oct 2009 23:09:55 +0900
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
Message-ID: <5b8d13220910230709s2c7a6712o8dc6914c84de510c@mail.gmail.com>

On Fri, Oct 23, 2009 at 10:21 PM, Darren Dale <dsdale24 at gmail.com> wrote:
> Can we use features of C99 in numpy? For example, can we use "//"
> style comments, and C99 for statements "for (int i=0, ...) "?

No, and most likely never will. Even Visual Studio 2010 does not
handle basic C99.

David


From charlesr.harris at gmail.com  Fri Oct 23 10:33:19 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 23 Oct 2009 08:33:19 -0600
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
	<hbsb4m$vi8$1@ger.gmane.org>
	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
Message-ID: <e06186140910230733i6e126efcr9d886846eb5193ea@mail.gmail.com>

On Fri, Oct 23, 2009 at 7:48 AM, Darren Dale <dsdale24 at gmail.com> wrote:

> On Fri, Oct 23, 2009 at 9:29 AM, Pauli Virtanen <pav+sp at iki.fi<pav%2Bsp at iki.fi>>
> wrote:
> > Fri, 23 Oct 2009 09:21:17 -0400, Darren Dale wrote:
> >> Can we use features of C99 in numpy? For example, can we use "//" style
> >> comments, and C99 for statements "for (int i=0, ...) "?
> >
> > It would be much easier if we could, but so far we have strived for C89
> > compliance. So I guess the answer is "no".
>
> Out of curiosity (I am relatively new to C), what is holding numpy
> back from embracing C99? Why adhere to a 20-year-old standard?
>
>
To clarify: most compilers support the "//" comment style, but some of the
older Sun compilers don't. The main problem on using any of the newer stuff
is portability. Some of the new stuff, like "//", while handy isn't crucial.
What really hurts is not being able to rely on the math library being up to
snuff.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091023/7a98b218/attachment.html>

From sturla at molden.no  Fri Oct 23 10:41:27 2009
From: sturla at molden.no (Sturla Molden)
Date: Fri, 23 Oct 2009 16:41:27 +0200
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <4AE1B7A2.9070603@student.matnat.uio.no>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>	<hbsb4m$vi8$1@ger.gmane.org>	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
	<4AE1B7A2.9070603@student.matnat.uio.no>
Message-ID: <4AE1C097.1080904@molden.no>

Dag Sverre Seljebotn skrev:

 
> Microsoft's compilers don't support C99 (or, at least, versions that 
> still has to be used doesn't).
>
>   
Except for automatic arrays, they do support some of the more important 
parts of C99 as extensions to C89:

inline functions
restrict qualifier
for (int i=0; i<; i++)


Personally I think all of NumPy's C base should be moved to Cython. With 
your excellent syntax for PEP 3118 buffers, I see no reason to keep 
NumPy in C. This would make porting to Py3k as well as maintainence 
easier. When Cython can build Sage, it can be used for a smaller project 
like NumPy as well.

The question of using C89, C99 or C++ would be deferred to the Cython 
compiler. We could use C++ on one platform (MSVC) and C99 on another 
(GCC). We would also get direct support for C99 _Complex and C++ 
std::complex<> types.

I'd also suggest that ndarray subclasses memoryview in Py3k.


S.M.


From rmay31 at gmail.com  Fri Oct 23 10:46:47 2009
From: rmay31 at gmail.com (Ryan May)
Date: Fri, 23 Oct 2009 09:46:47 -0500
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com> 
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com> 
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com> 
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com> 
	<716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
	<c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com> 
	<CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>
Message-ID: <c7b95a020910230746x3a27c77fx337f3df903fd75a1@mail.gmail.com>

On Fri, Oct 23, 2009 at 4:02 AM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> On 21-Oct-09, at 11:01 AM, Ryan May wrote:
>
>> ~/.local was added to *be the standard* for easily installing python
>> packages in your user account. ?And it works perfectly on the other
>> major OSes, no twiddling of paths anymore.
>
> I've had a lot of headaches with ~/.local on Ubuntu, actually.
> Apparently Ubuntu has some crazy 'dist-packages' thing going on in
> parallel to site-packages and /usr and /usr/local and its precedence
> is unclear. virtualenv also doesn't know jack about it (speaking of
> which, there's no way to control precedence of ~/.local with
> virtualenv, so I can't use virtualenv to override ~/.local if I want
> to treat "~/.local as the new site-packages").

Ok, so *some* linux distros also choose to break stuff.  I'm noticing
a theme here where OSes that strive for ease end up breaking something
basic.  I'm not saying they all need to drastically change; they just
need to insert their paths *after* ~/.local.  (Thankfully, Gentoo
doesn't get in my way.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


From gael.varoquaux at normalesup.org  Fri Oct 23 11:04:41 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 23 Oct 2009 17:04:41 +0200
Subject: [Numpy-discussion] recommended way to run numpy on snow	leopard
In-Reply-To: <c7b95a020910230746x3a27c77fx337f3df903fd75a1@mail.gmail.com>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<4ADEE54E.1030409@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210424q407791fdi2f2def2db48ec090@mail.gmail.com>
	<c7b95a020910210654s35c8e288w9e0f7e24f1e9aa0d@mail.gmail.com>
	<716E55E6-754B-4796-9CD4-83A8957245AD@yale.edu>
	<c7b95a020910210801g69534ff6xba061d54d8632c92@mail.gmail.com>
	<CAD27341-B1EB-4BCA-A8CD-0C359456779B@cs.toronto.edu>
	<c7b95a020910230746x3a27c77fx337f3df903fd75a1@mail.gmail.com>
Message-ID: <20091023150441.GA16254@phare.normalesup.org>

On Fri, Oct 23, 2009 at 09:46:47AM -0500, Ryan May wrote:
> On Fri, Oct 23, 2009 at 4:02 AM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> > On 21-Oct-09, at 11:01 AM, Ryan May wrote:

> >> ~/.local was added to *be the standard* for easily installing python
> >> packages in your user account. ?And it works perfectly on the other
> >> major OSes, no twiddling of paths anymore.

> > I've had a lot of headaches with ~/.local on Ubuntu, actually.
> > Apparently Ubuntu has some crazy 'dist-packages' thing going on in
> > parallel to site-packages and /usr and /usr/local and its precedence
> > is unclear. virtualenv also doesn't know jack about it (speaking of
> > which, there's no way to control precedence of ~/.local with
> > virtualenv, so I can't use virtualenv to override ~/.local if I want
> > to treat "~/.local as the new site-packages").

> Ok, so *some* linux distros also choose to break stuff.  I'm noticing
> a theme here where OSes that strive for ease end up breaking something
> basic.  I'm not saying they all need to drastically change; they just
> need to insert their paths *after* ~/.local.  (Thankfully, Gentoo
> doesn't get in my way.)


For instance, last time I looked, fedora had removed numpy.distutils from
the numpy package, and packaged it in a different package. Very
confusing...

Ga?l


From charlesr.harris at gmail.com  Fri Oct 23 11:47:58 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 23 Oct 2009 09:47:58 -0600
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <4AE1C097.1080904@molden.no>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
	<hbsb4m$vi8$1@ger.gmane.org>
	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
	<4AE1B7A2.9070603@student.matnat.uio.no> <4AE1C097.1080904@molden.no>
Message-ID: <e06186140910230847m17d2ce79s76a953de61e2e8a4@mail.gmail.com>

On Fri, Oct 23, 2009 at 8:41 AM, Sturla Molden <sturla at molden.no> wrote:

> Dag Sverre Seljebotn skrev:
>
>
>
> > Microsoft's compilers don't support C99 (or, at least, versions that
> > still has to be used doesn't).
> >
> >
> Except for automatic arrays, they do support some of the more important
> parts of C99 as extensions to C89:
>
> inline functions
> restrict qualifier
> for (int i=0; i<; i++)
>
>
> Personally I think all of NumPy's C base should be moved to Cython. With
> your excellent syntax for PEP 3118 buffers, I see no reason to keep
> NumPy in C. This would make porting to Py3k as well as maintainence
> easier. When Cython can build Sage, it can be used for a smaller project
> like NumPy as well.
>
>
Sage doesn't have the accumulated layers of crud that numpy has. Yet ;)
However, moving parts of the code to cython is certainly one path forward. A
good starting point would probably be to separate ufuncs from ndarrays.
However, I think some code, say loops.c.src, looks better in C than it would
in cython. C is a rather nice language for that sort of thing. OTOH, the
ufunc_object.c code might look better in cython. In general, I think a
separation between pure C code and python interface code would be the way to
go, with the latter written in cython.


> The question of using C89, C99 or C++ would be deferred to the Cython
> compiler. We could use C++ on one platform (MSVC) and C99 on another
> (GCC). We would also get direct support for C99 _Complex and C++
> std::complex<> types.
>
>
How about symbol export control for the modules? I think that is one more
tool that would benefit from a portable interface in cython.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091023/05fe2406/attachment.html>

From gael.varoquaux at normalesup.org  Fri Oct 23 11:52:20 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 23 Oct 2009 17:52:20 +0200
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <e06186140910230847m17d2ce79s76a953de61e2e8a4@mail.gmail.com>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
	<hbsb4m$vi8$1@ger.gmane.org>
	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
	<4AE1B7A2.9070603@student.matnat.uio.no>
	<4AE1C097.1080904@molden.no>
	<e06186140910230847m17d2ce79s76a953de61e2e8a4@mail.gmail.com>
Message-ID: <20091023155220.GC16254@phare.normalesup.org>

On Fri, Oct 23, 2009 at 09:47:58AM -0600, Charles R Harris wrote:
>    However, I think some code, say loops.c.src, looks better in C than it
>    would in cython. C is a rather nice language for that sort of thing. OTOH,
>    the ufunc_object.c code might look better in cython. In general, I think a
>    separation between pure C code and python interface code would be the way
>    to go, with the latter written in cython.

I have some demand in house to be able to use the C parts of numpy for C.
Say for instance you are coding a Python library, with a C-optimized
Monte-Carlo sampler. Linking to the C code of randomkit is very useful
for this. Right now the only way to do this is to copy the randomkit
source and to ship it with your libary, however, hopefully, this will
change in the long run.

So I guess this is a +1 to keep some core numerical functionality in C,
most probably for ABI reasons.

My 2 cents,

Ga?l


From mdroe at stsci.edu  Fri Oct 23 12:13:08 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Fri, 23 Oct 2009 12:13:08 -0400
Subject: [Numpy-discussion] numpydoc without autosummary
In-Reply-To: <hbsbmh$vi8$3@ger.gmane.org>
References: <4AE1AEB8.30709@stsci.edu> <hbsbmh$vi8$3@ger.gmane.org>
Message-ID: <4AE1D614.20609@stsci.edu>

On 10/23/2009 09:39 AM, Pauli Virtanen wrote:
> Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote:
>    
>> Is there a way to use numpydoc without putting an autosummary table at
>> the head of each class?  I'm using numpydoc primarily for the
>> sectionized docstring support, but the autosummaries are somewhat
>> overkill for my project.
>>      
> Ah, you meant the stuff output by default to class docstrings. Currently,
> there's no way to turn this off, unfortunately. It seems there should be,
> though...
>
>    
Exactly.  It would be great if there was a conf.py option (or something) 
to turn this off.  Thanks for considering it.

Cheers,
Mike


From d.l.goldsmith at gmail.com  Fri Oct 23 14:56:31 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Fri, 23 Oct 2009 11:56:31 -0700
Subject: [Numpy-discussion] numpydoc without autosummary
In-Reply-To: <4AE1D614.20609@stsci.edu>
References: <4AE1AEB8.30709@stsci.edu> <hbsbmh$vi8$3@ger.gmane.org>
	<4AE1D614.20609@stsci.edu>
Message-ID: <45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com>

The proper thing to do is file an "enhancement" ticket, at:

http://code.google.com/p/pydocweb/issues/list

Thanks.

DG

On Fri, Oct 23, 2009 at 9:13 AM, Michael Droettboom <mdroe at stsci.edu> wrote:

> On 10/23/2009 09:39 AM, Pauli Virtanen wrote:
> > Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote:
> >
> >> Is there a way to use numpydoc without putting an autosummary table at
> >> the head of each class?  I'm using numpydoc primarily for the
> >> sectionized docstring support, but the autosummaries are somewhat
> >> overkill for my project.
> >>
> > Ah, you meant the stuff output by default to class docstrings. Currently,
> > there's no way to turn this off, unfortunately. It seems there should be,
> > though...
> >
> >
> Exactly.  It would be great if there was a conf.py option (or something)
> to turn this off.  Thanks for considering it.
>
> Cheers,
> Mike
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091023/148c81a7/attachment.html>

From mdroe at stsci.edu  Fri Oct 23 15:47:56 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Fri, 23 Oct 2009 15:47:56 -0400
Subject: [Numpy-discussion] numpydoc without autosummary
In-Reply-To: <45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com>
References: <4AE1AEB8.30709@stsci.edu>
	<hbsbmh$vi8$3@ger.gmane.org>	<4AE1D614.20609@stsci.edu>
	<45d1ab480910231156n67cef87bpf514a37636579d6c@mail.gmail.com>
Message-ID: <4AE2086C.7050208@stsci.edu>

Done.  Issue #50.

Thanks,
Mike

On 10/23/2009 02:56 PM, David Goldsmith wrote:
> The proper thing to do is file an "enhancement" ticket, at:
>
> http://code.google.com/p/pydocweb/issues/list
>
> Thanks.
>
> DG
>
> On Fri, Oct 23, 2009 at 9:13 AM, Michael Droettboom <mdroe at stsci.edu 
> <mailto:mdroe at stsci.edu>> wrote:
>
>     On 10/23/2009 09:39 AM, Pauli Virtanen wrote:
>     > Fri, 23 Oct 2009 09:25:12 -0400, Michael Droettboom wrote:
>     >
>     >> Is there a way to use numpydoc without putting an autosummary
>     table at
>     >> the head of each class?  I'm using numpydoc primarily for the
>     >> sectionized docstring support, but the autosummaries are somewhat
>     >> overkill for my project.
>     >>
>     > Ah, you meant the stuff output by default to class docstrings.
>     Currently,
>     > there's no way to turn this off, unfortunately. It seems there
>     should be,
>     > though...
>     >
>     >
>     Exactly.  It would be great if there was a conf.py option (or
>     something)
>     to turn this off.  Thanks for considering it.
>
>     Cheers,
>     Mike
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091023/f708fbef/attachment.html>

From cournape at gmail.com  Fri Oct 23 15:54:20 2009
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 24 Oct 2009 04:54:20 +0900
Subject: [Numpy-discussion] numpy and C99
In-Reply-To: <4AE1C097.1080904@molden.no>
References: <a08e5f80910230621m343c8833xc1433e9ae337af34@mail.gmail.com>
	<hbsb4m$vi8$1@ger.gmane.org>
	<a08e5f80910230648g44f175b4w747c44d812dd8ece@mail.gmail.com>
	<4AE1B7A2.9070603@student.matnat.uio.no> <4AE1C097.1080904@molden.no>
Message-ID: <5b8d13220910231254q1cd039a3g2f284b5c4bb4d1a@mail.gmail.com>

On Fri, Oct 23, 2009 at 11:41 PM, Sturla Molden <sturla at molden.no> wrote:

> Except for automatic arrays, they do support some of the more important
> parts of C99 as extensions to C89:
>
> inline functions
> restrict qualifier
> for (int i=0; i<; i++)

No, it doesn't. The above only works in C++ mode, not in C mode.
Visual Studio supports almost none of the useful C99 (VL array,
complex number). The VS team has clearly stated that they don't care
about updating C support.

David


From charlesr.harris at gmail.com  Sat Oct 24 00:44:21 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 23 Oct 2009 22:44:21 -0600
Subject: [Numpy-discussion] Carriage returns in files.
Message-ID: <e06186140910232144v50dc3ac9q93f67be36e24a7c4@mail.gmail.com>

Hi All,

I just fixed scipy ticket #1029, where the Sun Fortran compiler failed
because nnls.f contained carriage returns (\r). Out of curiosity I decided
to look as the numpy and scipy repositories to see how common \r was, with
the results:

numpy: 1232 instances
scipy: 3315 instances

Do we have a policy on this? IIRC, it is something that should be handled by
subversion.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091023/4c93276c/attachment.html>

From dwf at cs.toronto.edu  Sat Oct 24 01:24:23 2009
From: dwf at cs.toronto.edu (David Warde-Farley)
Date: Sat, 24 Oct 2009 01:24:23 -0400
Subject: [Numpy-discussion] Carriage returns in files.
In-Reply-To: <e06186140910232144v50dc3ac9q93f67be36e24a7c4@mail.gmail.com>
References: <e06186140910232144v50dc3ac9q93f67be36e24a7c4@mail.gmail.com>
Message-ID: <B8BE111F-F328-4A57-8ECF-097016F14111@cs.toronto.edu>

On 24-Oct-09, at 12:44 AM, Charles R Harris wrote:

> Do we have a policy on this? IIRC, it is something that should be  
> handled by subversion.

AFAIK you're right, the exception might be that sometimes, if there  
are files with mixed newlines, subversion will get confused and leave  
them there.

That should only happen if someone is using a seriously dumb editor.

David


From martin.teichmann at mbi-berlin.de  Sat Oct 24 07:38:29 2009
From: martin.teichmann at mbi-berlin.de (Martin Teichmann)
Date: Sat, 24 Oct 2009 13:38:29 +0200
Subject: [Numpy-discussion] User data types
Message-ID: <16538b570910240438o32c638bbh2b03dfb21aacf37f@mail.gmail.com>

Hello List,

I'm working on an extension for pytables,
a package to store numpy arrays into hdf5 files.

hdf5 supports the additional datatype "reference",
which makes sense only in an hdf5 context. In order
to be able to use them in pytables, I figured the
best idea is to define a user-defined datatype,
register it with PyArray_RegisterDataType and insert
it into the typeDict. This all works fine, except that
spurious exceptions are risen after one has
called dtype("r") (I use r as the type code).

I tracked it down to the function PyArray_DescrConverter,
which calls PyArray_DescrFromType (just after the finish:
label). This function sets a python exception with
PyErr_SetString and returns NULL. But the calling
function ignores this, and returns successfully. The
exception then is still dangling, and at a later time
will be risen in completely unrelated code.

I guess a PyErr_Clear should be added at the
beginning of the if-clause to solve the problem.

I submitted this bug as Ticket #1255 to the numpy
trac system, where you can also find the code
that triggered the bug.

Greetings

Martin Teichmann


From ralf.gommers at googlemail.com  Sat Oct 24 11:17:07 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sat, 24 Oct 2009 17:17:07 +0200
Subject: [Numpy-discussion] distutils docstrings; request for review/help
Message-ID: <dde7764a0910240817s299eaa9fpee9d82c1a7802911@mail.gmail.com>

This request is mainly addressed to David Cournapeau I guess.

I wrote docstrings for pretty much all the distutils items not marked
"unimportant" in the doc wiki. Pretty much all the info I got from reading
the code and comments in it, plus a little bit from reading the
distutils.rst file and the Python distutils docs. This was not the easiest
code to understand, so I would like to ask for a review and some help
filling in the blanks.

The main items that I could not finish are:
- VariableSet.interpolate (if you could throw in an accurate description of
exactly what it does, I can polish it up)
- CCompiler_compile (could use more details I'm sure)
- UnixCCompiler__compile (same)
- UnixCCompiler_create_static_lib (same)

Also, I left some comment about either things I was not sure about or things
like unused parameters. You can find them here:
http://docs.scipy.org/numpy/changes/ , at the top (made them today or
yesterday). Could you please have a look at those?

Finally, there are some items that could be important to document, but are
marked as unimportant (Configuration class and methods, exec_command, ...).
Would you mind looking through those items on
http://docs.scipy.org/numpy/docs/ and change the status of the ones you
think are important to "needs editing"? Then I'll try to finish those too.

Thanks,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091024/45591b78/attachment.html>

From d.l.goldsmith at gmail.com  Sat Oct 24 13:39:47 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sat, 24 Oct 2009 10:39:47 -0700
Subject: [Numpy-discussion] distutils docstrings; request for review/help
In-Reply-To: <dde7764a0910240817s299eaa9fpee9d82c1a7802911@mail.gmail.com>
References: <dde7764a0910240817s299eaa9fpee9d82c1a7802911@mail.gmail.com>
Message-ID: <45d1ab480910241039y7420a648s3163b95d78e0af75@mail.gmail.com>

Thanks for "biting this bullet" Ralf.  A couple comments (directed more to
the "peanut gallery" than to Ralf): obviously, if anyone else besides David
C. has the expertise, please help him, Ralf, and indeed all of us, on this;
B) I looked at this a little while ago, and reference/distutils.rst just
now, and IMO the main "large" thing that strikes me as lacking is an
"integrative" example, i.e., a "sample package," (perhaps numpy itself?)
described, in distutils.rst, perhaps, and used as a common example across
attributes.  FWIW,

DG

On Sat, Oct 24, 2009 at 8:17 AM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

> This request is mainly addressed to David Cournapeau I guess.
>
> I wrote docstrings for pretty much all the distutils items not marked
> "unimportant" in the doc wiki. Pretty much all the info I got from reading
> the code and comments in it, plus a little bit from reading the
> distutils.rst file and the Python distutils docs. This was not the easiest
> code to understand, so I would like to ask for a review and some help
> filling in the blanks.
>
> The main items that I could not finish are:
> - VariableSet.interpolate (if you could throw in an accurate description of
> exactly what it does, I can polish it up)
> - CCompiler_compile (could use more details I'm sure)
> - UnixCCompiler__compile (same)
> - UnixCCompiler_create_static_lib (same)
>
> Also, I left some comment about either things I was not sure about or
> things like unused parameters. You can find them here:
> http://docs.scipy.org/numpy/changes/ , at the top (made them today or
> yesterday). Could you please have a look at those?
>
> Finally, there are some items that could be important to document, but are
> marked as unimportant (Configuration class and methods, exec_command, ...).
> Would you mind looking through those items on
> http://docs.scipy.org/numpy/docs/ and change the status of the ones you
> think are important to "needs editing"? Then I'll try to finish those too.
>
> Thanks,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091024/de893813/attachment.html>

From cburns at berkeley.edu  Sat Oct 24 18:14:17 2009
From: cburns at berkeley.edu (Christopher Burns)
Date: Sat, 24 Oct 2009 15:14:17 -0700
Subject: [Numpy-discussion] parameter types for documentation
Message-ID: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>

Are the appropriate parameter types for the docstrings, listed
somewhere?  In particular, in reviewing some docs I see both 'str' and
'string' used.  Which one is correct?

Chris


From ralf.gommers at googlemail.com  Sat Oct 24 18:19:04 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sun, 25 Oct 2009 00:19:04 +0200
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
Message-ID: <dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>

On Sun, Oct 25, 2009 at 12:14 AM, Christopher Burns <cburns at berkeley.edu>wrote:

> Are the appropriate parameter types for the docstrings, listed
> somewhere?  In particular, in reviewing some docs I see both 'str' and
> 'string' used.  Which one is correct?
>
> Not all of them are listed in one place. For general advice, see the
Parameters section of
http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines
One error on that page is {True, False}, that should be bool.

str is correct, string is not. Reason: str is the type.

Partial list:
str
bool
list
tuple
sequence
ndarray
array_like

or if you can be more precise:
list of str
sequence of ints

and for keywords, add ", optional"

Cheers,
Ralf


> Chris
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/799b4824/attachment.html>

From cburns at berkeley.edu  Sat Oct 24 18:42:46 2009
From: cburns at berkeley.edu (Christopher Burns)
Date: Sat, 24 Oct 2009 15:42:46 -0700
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
Message-ID: <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>

Cool, thanks.  Mind if I update the HOWTO_DOCUMENT adding in the
partial list below?

Chris

On Sat, Oct 24, 2009 at 3:19 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> Not all of them are listed in one place. For general advice, see the
> Parameters section of
> http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines
> One error on that page is {True, False}, that should be bool.
>
> str is correct, string is not. Reason: str is the type.
>
> Partial list:
> str
> bool
> list
> tuple
> sequence
> ndarray
> array_like
>
> or if you can be more precise:
> list of str
> sequence of ints
>
> and for keywords, add ", optional"
>


From ralf.gommers at googlemail.com  Sat Oct 24 18:52:40 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sun, 25 Oct 2009 00:52:40 +0200
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
	<764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
Message-ID: <dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>

On Sun, Oct 25, 2009 at 12:42 AM, Christopher Burns <cburns at berkeley.edu>wrote:

> Cool, thanks.  Mind if I update the HOWTO_DOCUMENT adding in the
> partial list below?
>
> Sure, that would be useful. While you're at it, could you get rid of the
{True, False}?

Cheers,
Ralf


> Chris
>
> On Sat, Oct 24, 2009 at 3:19 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> > Not all of them are listed in one place. For general advice, see the
> > Parameters section of
> > http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines
> > One error on that page is {True, False}, that should be bool.
> >
> > str is correct, string is not. Reason: str is the type.
> >
> > Partial list:
> > str
> > bool
> > list
> > tuple
> > sequence
> > ndarray
> > array_like
> >
> > or if you can be more precise:
> > list of str
> > sequence of ints
> >
> > and for keywords, add ", optional"
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/bc218149/attachment.html>

From cburns at berkeley.edu  Sat Oct 24 19:11:55 2009
From: cburns at berkeley.edu (Christopher Burns)
Date: Sat, 24 Oct 2009 16:11:55 -0700
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
	<764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
	<dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>
Message-ID: <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com>

Done.

On Sat, Oct 24, 2009 at 3:52 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> Sure, that would be useful. While you're at it, could you get rid of the
> {True, False}?


From ralf.gommers at googlemail.com  Sat Oct 24 19:17:22 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sun, 25 Oct 2009 01:17:22 +0200
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
	<764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
	<dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>
	<764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com>
Message-ID: <dde7764a0910241617u1cb23257o2f49ae1962dfe49d@mail.gmail.com>

On Sun, Oct 25, 2009 at 1:11 AM, Christopher Burns <cburns at berkeley.edu>wrote:

> Done.
>
> That section looks much better now. Except for the word "back-tics" :)

Thanks,
Ralf


> On Sat, Oct 24, 2009 at 3:52 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> > Sure, that would be useful. While you're at it, could you get rid of the
> > {True, False}?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/7a5fef95/attachment.html>

From cburns at berkeley.edu  Sat Oct 24 19:30:58 2009
From: cburns at berkeley.edu (Christopher Burns)
Date: Sat, 24 Oct 2009 16:30:58 -0700
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <dde7764a0910241617u1cb23257o2f49ae1962dfe49d@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
	<764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
	<dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>
	<764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com>
	<dde7764a0910241617u1cb23257o2f49ae1962dfe49d@mail.gmail.com>
Message-ID: <764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com>

Just committed a change to 'backticks'.

;)

On Sat, Oct 24, 2009 at 4:17 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> That section looks much better now. Except for the word "back-tics" :)
>
> Thanks,
> Ralf


From d.l.goldsmith at gmail.com  Sat Oct 24 22:19:39 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sat, 24 Oct 2009 19:19:39 -0700
Subject: [Numpy-discussion] parameter types for documentation
In-Reply-To: <764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com>
References: <764e38540910241514r33dd10fnd8890c6f2cab9a29@mail.gmail.com>
	<dde7764a0910241519n39482b44p45a8bc9dd72db162@mail.gmail.com>
	<764e38540910241542s270f2a69v357d837349e48d3f@mail.gmail.com>
	<dde7764a0910241552r59112bcem61dd3433f86ffb5f@mail.gmail.com>
	<764e38540910241611m23856959t5d293d8d8b11c826@mail.gmail.com>
	<dde7764a0910241617u1cb23257o2f49ae1962dfe49d@mail.gmail.com>
	<764e38540910241630v65309535u1bcb746cc29f37fe@mail.gmail.com>
Message-ID: <45d1ab480910241919h3024836cr5f6628c361141f20@mail.gmail.com>

One other comment (sorry I'm late chiming in): in general, for something
like "sequence of ints," usually what is really intended as viable input is
"array-like of int-likes," and indeed, in the process of confirming this for
various functions, I have found bugs where what was intended was in fact not
supported.  So, though it's more work, i.e., will take more time, the ideal
scenario, IMO, when you're dealing w/ something like that, is to confirm
that the function does indeed presently support the full gamut of viable
inputs, note any strange behavior, post to the list if you're uncertain if
it's a bug, or just file a bug ticket if you are sure.  And in the past,
when this has come up, I've been instructed to document the intended
behavior, not the present buggy behavior (which just reinforces the need to
file a bug report).

DG

On Sat, Oct 24, 2009 at 4:30 PM, Christopher Burns <cburns at berkeley.edu>wrote:

> Just committed a change to 'backticks'.
>
> ;)
>
> On Sat, Oct 24, 2009 at 4:17 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> > That section looks much better now. Except for the word "back-tics" :)
> >
> > Thanks,
> > Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091024/b1f556e2/attachment.html>

From opossumnano at gmail.com  Sun Oct 25 04:38:37 2009
From: opossumnano at gmail.com (Tiziano Zito)
Date: Sun, 25 Oct 2009 09:38:37 +0100
Subject: [Numpy-discussion] [ANN] Advanced Scientific Programming in Python
	Winter School in Warsaw, Poland
Message-ID: <e195e9cb0910250138s27657d41y52283a59e0ca33d9@mail.gmail.com>

Advanced Scientific Programming in Python

a Winter School by the G-Node and University of Warsaw

Scientists spend more and more time writing, maintaining, and
debugging software. While techniques for doing this efficiently have
evolved, only few scientists actually use them. As a result, instead
of doing their research, they spend far too much time writing
deficient code and reinventing the wheel. In this course we will
present a selection of advanced programming techniques with
theoretical lectures and practical exercises tailored to the needs
of a programming scientist. New skills will be tested in a real
programming project: we will team up to develop an entertaining
scientific computer game.

We'll use the Python programming language for the entire course.
Python works as a simple programming language for beginners, but
more importantly, it also works great in scientific simulations and
data analysis. Clean language design and easy extensibility are
driving Python to become a standard tool for scientific computing.
Some of the most useful open source libraries for scientific
computing and visualization will be presented.

This winter school is targeted at Post-docs and PhD students from
all areas. Substantial proficiency in Python or in another language
(e.g. Java, C/C++, MATLAB, Mathematica) is absolutely required. An
optional, one-day introduction to Python is offered to participants
without prior experience with the language.

Date and Location: February 8th ? 12th, 2010. Warsaw, Poland.

Preliminary Program:
- Day 0 (Mon Feb 8) ? [Optional] Dive into Python
- Day 1 (Tue Feb 9) ? Software Carpentry
   ? Documenting code and using version control
   ? Test-driven development and unit testing
   ? Debugging, profiling and benchmarking techniques
   ? Object-oriented programming, design patterns, and agile
     programming
- Day 2 (Wed Feb 10) ? Scientific Tools for Python
   ? NumPy, SciPy, Matplotlib
   ? Data serialization: from pickle to databases
   ? Programming project in the afternoon
- Day 3 (Thu Feb 11) ? The Quest for Speed
   ? Writing parallel applications in Python
   ? When parallelization does not help: the starving CPUs
     problem
   ? Programming project in the afternoon
- Day 4 (Fri Feb 12) ? Practical Software Development
   ? Software design
   ? Efficient programming in teams
   ? Quality Assurance
   ? Programming project final

Applications:
Applications should be sent before December 6th, 2009 to:
python-winterschool at g-node.org
No fee is charged but participants should take care of travel,
living, and accommodation expenses. Applications should include full
contact information (name, affiliation, email & phone), a *short* CV
and a *short* statement addressing the following questions:

  ? What is your educational background?
  ? What experience do you have in programming?
  ? Why do you think ?Advanced Scientific Programming in Python? is
    an appropriate course for your skill profile?

Candidates will be selected on the basis of their profile. Places
are limited: early application is recommended.  Notifications of
acceptance will be sent by December 14th, 2009.

Faculty
? Francesc Alted, author of PyTables,
  Castell? de la Plana, Spain [Day 3]
? Pietro Berkes, Volen Center for Complex Systems,
  Brandeis University, USA [Day 1]
? Zbigniew J?drzejewski-Szmek, Institute of Experimental Physics,
  University of Warsaw, Poland [Day 0]
? Eilif Muller, Laboratory of Computational Neuroscience,
  Ecole Polytechnique F?d?rale de Lausanne, Switzerland [Day 3]
? Bartosz Tele?czuk, Institute for Theoretical Biology,
  Humboldt-Universit?t zu Berlin, Germany [Day 2]
? Niko Wilbert, Institute for Theoretical Biology,
  Humboldt-Universit?t zu Berlin, Germany [Day 1]
? Tiziano Zito, Bernstein Center for Computational Neuroscience,
  Berlin, Germany [Day 4]

Organized by Piotr Durka, Joanna and Zbigniew J?drzejewscy-Szmek
(Institute of Experimental Physics, University of Warsaw), and
Tiziano Zito (German Neuroinformatics Node of the INCF).

Website: http://www.g-node.org/python-winterschool
Contact: python-winterschool at g-node.org


From ralf.gommers at googlemail.com  Sun Oct 25 18:21:00 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sun, 25 Oct 2009 23:21:00 +0100
Subject: [Numpy-discussion] fftpack_lite question
Message-ID: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>

Hi all,

Can anyone tell me if fftpack_lite is an exact C translation of the fftpack
Fortran code? Or at least close enough that the signature, parameter
descriptions and algorithm are the same?

If so, I can use the fftpack Fortran sources (which have useful comments) to
write docs for fftpack_lite funcs (rfft* and cfft*).

Thanks,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/53e353c2/attachment.html>

From charlesr.harris at gmail.com  Sun Oct 25 18:51:24 2009
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 25 Oct 2009 16:51:24 -0600
Subject: [Numpy-discussion] fftpack_lite question
In-Reply-To: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>
References: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>
Message-ID: <e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>

On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

> Hi all,
>
> Can anyone tell me if fftpack_lite is an exact C translation of the fftpack
> Fortran code? Or at least close enough that the signature, parameter
> descriptions and algorithm are the same?
>
> If so, I can use the fftpack Fortran sources (which have useful comments)
> to write docs for fftpack_lite funcs (rfft* and cfft*).
>
>
fft_pack is an interface to a c translation of fftpack. IIRC, it adds some
stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty
close, though.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/cc7501bd/attachment.html>

From ralf.gommers at googlemail.com  Sun Oct 25 19:04:29 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 26 Oct 2009 00:04:29 +0100
Subject: [Numpy-discussion] fftpack_lite question
In-Reply-To: <e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>
References: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>
	<e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>
Message-ID: <dde7764a0910251604h51bdb81en274e0d8bc6b1ad4b@mail.gmail.com>

On Sun, Oct 25, 2009 at 11:51 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers <ralf.gommers at googlemail.com
> > wrote:
>
>> Hi all,
>>
>> Can anyone tell me if fftpack_lite is an exact C translation of the
>> fftpack Fortran code? Or at least close enough that the signature, parameter
>> descriptions and algorithm are the same?
>>
>> If so, I can use the fftpack Fortran sources (which have useful comments)
>> to write docs for fftpack_lite funcs (rfft* and cfft*).
>>
>>
> fft_pack is an interface to a c translation of fftpack. IIRC, it adds some
> stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty
> close, though.
>

Okay, thanks.  I'll start with the Fortran docs then, and someone familiar
with the differences could then easily throw in a few notes on that.

Ralf

Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091026/907bd876/attachment.html>

From cburns at berkeley.edu  Sun Oct 25 20:42:04 2009
From: cburns at berkeley.edu (Christopher Burns)
Date: Sun, 25 Oct 2009 17:42:04 -0700
Subject: [Numpy-discussion] Procedure for doing documentation reviews
Message-ID: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com>

When a documents status is "Needs Review" and when reviewing it we
feel it needs edits, should we add comments regarding the edits, or
should we feel free to edit it directly?

Chris


From d.l.goldsmith at gmail.com  Sun Oct 25 21:16:44 2009
From: d.l.goldsmith at gmail.com (David Goldsmith)
Date: Sun, 25 Oct 2009 18:16:44 -0700
Subject: [Numpy-discussion] Procedure for doing documentation reviews
In-Reply-To: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com>
References: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com>
Message-ID: <45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com>

Technically, after "Needs Review," it's supposed to go through "Needs Work
(Reviewed)"  The "by the book" way to do it would be to:

0 & 1) Provide comments in the Discussion section and change status to
"Needs Work (Reviewed)" (in either order);

2) Edit, if inclined.

3) Answer your own comments as a record of what you've done;

4) At the end of all this, if you feel it's ready for review again, change
status to "Needs Review (Revised)."

Thanks!

DG

On Sun, Oct 25, 2009 at 5:42 PM, Christopher Burns <cburns at berkeley.edu>wrote:

> When a documents status is "Needs Review" and when reviewing it we
> feel it needs edits, should we add comments regarding the edits, or
> should we feel free to edit it directly?
>
> Chris
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091025/e526c3b5/attachment.html>

From bpederse at gmail.com  Sun Oct 25 22:16:06 2009
From: bpederse at gmail.com (Brent Pedersen)
Date: Sun, 25 Oct 2009 19:16:06 -0700
Subject: [Numpy-discussion] documenting optional out parameter
Message-ID: <e183a99d0910251916j5adc323h1b1546febc1db847@mail.gmail.com>

hi, i've seen this section:
http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument

should _all_ functions with an optional out parameter have exactly that text?
so if i find a docstring with reasonable, but different doc for out,
should it be changed
to that?

and if a docstring of a function with an optional out that needs
review does not have
the out parameter documented should it be marked as 'Needs Work'?

thanks,
-brentp


From scott.sinclair.za at gmail.com  Mon Oct 26 01:51:40 2009
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Mon, 26 Oct 2009 07:51:40 +0200
Subject: [Numpy-discussion] documenting optional out parameter
In-Reply-To: <e183a99d0910251916j5adc323h1b1546febc1db847@mail.gmail.com>
References: <e183a99d0910251916j5adc323h1b1546febc1db847@mail.gmail.com>
Message-ID: <6a17e9ee0910252251m7e21c6adr9003fa2285e8c94d@mail.gmail.com>

> 2009/10/26 Brent Pedersen <bpederse at gmail.com>:
> hi, i've seen this section:
> http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument
>
> should _all_ functions with an optional out parameter have exactly that text?
> so if i find a docstring with reasonable, but different doc for out,
> should it be changed
> to that?

The Q&A doesn't seem to have reached a firm conclusion, so I'd suggest
that any correct and reasonable documentation of the out parameter is
fine.

> and if a docstring of a function with an optional out that needs
> review does not have
> the out parameter documented should it be marked as 'Needs Work'?

I'd say yes, since the docstring is incomplete in this case.

Cheers,
Scott


From nwagner at iam.uni-stuttgart.de  Mon Oct 26 04:04:16 2009
From: nwagner at iam.uni-stuttgart.de (Nils Wagner)
Date: Mon, 26 Oct 2009 09:04:16 +0100
Subject: [Numpy-discussion] Multiplicity of an entry
Message-ID: <web-126158258@uni-stuttgart.de>

Hi all,

how can I obtain the multiplicity of an entry in a list


a = ['abc','def','abc','ghij']

The multiplicity of 'abc' is 2.
                             'def' is 1.
                             'ghij' is 1.

Nils


From ralf.gommers at googlemail.com  Mon Oct 26 04:55:05 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 26 Oct 2009 09:55:05 +0100
Subject: [Numpy-discussion] Procedure for doing documentation reviews
In-Reply-To: <45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com>
References: <764e38540910251742x7481a625m74faa4128dfcbae0@mail.gmail.com>
	<45d1ab480910251816i12fbe11cxd35a3457187ca330@mail.gmail.com>
Message-ID: <dde7764a0910260155w539cde60kf1d8ea56256d9ff1@mail.gmail.com>

On Mon, Oct 26, 2009 at 2:16 AM, David Goldsmith <d.l.goldsmith at gmail.com>wrote:

> Technically, after "Needs Review," it's supposed to go through "Needs Work
> (Reviewed)"  The "by the book" way to do it would be to:
>
> 0 & 1) Provide comments in the Discussion section and change status to
> "Needs Work (Reviewed)" (in either order);
>
> 2) Edit, if inclined.
>
> 3) Answer your own comments as a record of what you've done;
>
> 4) At the end of all this, if you feel it's ready for review again, change
> status to "Needs Review (Revised)."
>
> Thanks!
>
> DG
>
>
> On Sun, Oct 25, 2009 at 5:42 PM, Christopher Burns <cburns at berkeley.edu>wrote:
>
>> When a documents status is "Needs Review" and when reviewing it we
>> feel it needs edits, should we add comments regarding the edits, or
>> should we feel free to edit it directly?
>>
>> If they are largish changes, then what David said. If they are minor
changes though, just edit away (I do that all the time).

For example, if you see some mistakes in type descriptions, just fix them.
If you feel a whole section is unclear and needs a rewrite, follow the
review procedure.
In between, exercise your good judgment.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091026/d36f070d/attachment.html>

From ndbecker2 at gmail.com  Mon Oct 26 08:15:51 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Mon, 26 Oct 2009 08:15:51 -0400
Subject: [Numpy-discussion] 500 internal server error from docs.scipy.org
Message-ID: <hc43to$g6o$1@ger.gmane.org>

This link:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.var.html#scipy.stats.var

gives 500 internal server error


From aisaac at american.edu  Mon Oct 26 08:25:22 2009
From: aisaac at american.edu (Alan G Isaac)
Date: Mon, 26 Oct 2009 08:25:22 -0400
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <web-126158258@uni-stuttgart.de>
References: <web-126158258@uni-stuttgart.de>
Message-ID: <4AE59532.3000708@american.edu>

On 10/26/2009 4:04 AM, Nils Wagner wrote:
> how can I obtain the multiplicity of an entry in a list
> a = ['abc','def','abc','ghij']

That's a Python question, not a NumPy question.
So comp.lang.python would be a better forum.

But here's a simplest solution::

a = ['abc','def','abc','ghij']
for item in set(a):
	print item, a.count(item)

This is horribly inefficient of course.
If you have a big list, if would be *much*
better to use defaultdict:

from collections import defaultdict
myct = defaultdict(int)
for item in a:
	myct[item] += 1
print myct.items()


fwiw,
Alan Isaac


From pav+sp at iki.fi  Mon Oct 26 08:29:11 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Mon, 26 Oct 2009 12:29:11 +0000 (UTC)
Subject: [Numpy-discussion] 500 internal server error from docs.scipy.org
References: <hc43to$g6o$1@ger.gmane.org>
Message-ID: <hc44mn$k0h$1@ger.gmane.org>

Mon, 26 Oct 2009 08:15:51 -0400, Neal Becker wrote:

> This link:
> 
> http://docs.scipy.org/doc/scipy/reference/generated/
scipy.stats.var.html#scipy.stats.var
> 
> gives 500 internal server error

Now that's strange. It's a static page.

-- 
Pauli Virtanen


From ralf.gommers at googlemail.com  Mon Oct 26 09:33:57 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 26 Oct 2009 14:33:57 +0100
Subject: [Numpy-discussion] fftpack_lite question
In-Reply-To: <dde7764a0910251604h51bdb81en274e0d8bc6b1ad4b@mail.gmail.com>
References: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>
	<e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>
	<dde7764a0910251604h51bdb81en274e0d8bc6b1ad4b@mail.gmail.com>
Message-ID: <dde7764a0910260633q324fa248w2a1540b8b15ca053@mail.gmail.com>

On Mon, Oct 26, 2009 at 12:04 AM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Sun, Oct 25, 2009 at 11:51 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sun, Oct 25, 2009 at 4:21 PM, Ralf Gommers <
>> ralf.gommers at googlemail.com> wrote:
>>
>>> Hi all,
>>>
>>> Can anyone tell me if fftpack_lite is an exact C translation of the
>>> fftpack Fortran code? Or at least close enough that the signature, parameter
>>> descriptions and algorithm are the same?
>>>
>>> If so, I can use the fftpack Fortran sources (which have useful comments)
>>> to write docs for fftpack_lite funcs (rfft* and cfft*).
>>>
>>>
>> fft_pack is an interface to a c translation of fftpack. IIRC, it adds some
>> stuff like zerofill and such so it isn't a1-1 matchup. I think it is pretty
>> close, though.
>>
>
> Okay, thanks.  I'll start with the Fortran docs then, and someone familiar
> with the differences could then easily throw in a few notes on that.
>
> There are docs now for all six exposed functions (cfft*, rfft*):
http://docs.scipy.org/numpy/docs/numpy.fft.fftpack_lite/

If anyone with knowledge of the differences between the C and Fortran
versions could add a few notes at the above link, that would be great.

Thanks,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091026/226d696b/attachment.html>

From ebressert at cfa.harvard.edu  Mon Oct 26 10:54:37 2009
From: ebressert at cfa.harvard.edu (Eli Bressert)
Date: Mon, 26 Oct 2009 14:54:37 +0000
Subject: [Numpy-discussion] Astype and strings
Message-ID: <da19b6a80910260754p465d1f09qe7e56521018be9c0@mail.gmail.com>

Hi Everyone,

Is Numpy supposed to behave this like this when converting an array of
numbers to an array of strings with astype?

print(arange(20).astype(np.str))
['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '1' '1' '1' '1' '1' '1' '1'
'1' '1' '1']

When I do the following it works fine,

print(arange(20).astype('|S2'))
['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12' '13' '14' '15'
'16' '17' '18' '19']

I would have thought that astype would be more intelligent with
strings rather than just resorting to the first character for each
element. Is this a bug or or is it how astype works?

Thanks,

Eli


From sturla at molden.no  Mon Oct 26 12:24:56 2009
From: sturla at molden.no (Sturla Molden)
Date: Mon, 26 Oct 2009 17:24:56 +0100
Subject: [Numpy-discussion] fftpack_lite question
In-Reply-To: <dde7764a0910260633q324fa248w2a1540b8b15ca053@mail.gmail.com>
References: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>	<e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>	<dde7764a0910251604h51bdb81en274e0d8bc6b1ad4b@mail.gmail.com>
	<dde7764a0910260633q324fa248w2a1540b8b15ca053@mail.gmail.com>
Message-ID: <4AE5CD58.7080902@molden.no>

Ralf Gommers skrev:
>
> If anyone with knowledge of the differences between the C and Fortran 
> versions could add a few notes at the above link, that would be great.
>
The most notable difference (from a user perspective) is that the 
Fortran version has more transforms, such as discrete sine and cosine 
transforms. It also supports single and double precision. The older 
Fortran version is used in SciPy.

FFTs from FFTW and MKL tend to be faster than FFTPACK, at least on Intel 
hardware. FFTPACK was originally written for running fast on vector 
machines like the Cray and NEC.

FFTPACK-lite:
http://projects.scipy.org/numpy/browser/trunk/scipy/basic/fftpack_lite?rev=1676

Older Fortran version:
http://www.netlib.org/fftpack/

Fortran 90 version (no license):
http://orion.math.iastate.edu/burkardt/f_src/fftpack/fftpack.html

Another C version:
http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=fftpack/fft.c 
<http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=fftpack/fft.c>


S.M.


From ralf.gommers at googlemail.com  Mon Oct 26 13:43:48 2009
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 26 Oct 2009 18:43:48 +0100
Subject: [Numpy-discussion] fftpack_lite question
In-Reply-To: <4AE5CD58.7080902@molden.no>
References: <dde7764a0910251521m76a27ac7raddec65ac1526ee5@mail.gmail.com>
	<e06186140910251551i1b4aafb1j7576a87416b4718a@mail.gmail.com>
	<dde7764a0910251604h51bdb81en274e0d8bc6b1ad4b@mail.gmail.com>
	<dde7764a0910260633q324fa248w2a1540b8b15ca053@mail.gmail.com>
	<4AE5CD58.7080902@molden.no>
Message-ID: <dde7764a0910261043m4c15d054n1271ce133b17a371@mail.gmail.com>

Hi Sturla,

Thanks for the overview.

On Mon, Oct 26, 2009 at 5:24 PM, Sturla Molden <sturla at molden.no> wrote:

> Ralf Gommers skrev:
> >
> > If anyone with knowledge of the differences between the C and Fortran
> > versions could add a few notes at the above link, that would be great.
> >
> The most notable difference (from a user perspective) is that the
> Fortran version has more transforms, such as discrete sine and cosine
> transforms. It also supports single and double precision. The older
> Fortran version is used in SciPy.
>

I added this to the module docstring.

The info that would still be useful is how the functions that are exposed in
fftpack_lite are subtly different from the older Fortran functions. Charles
mentioned zerofill for example. Those funcs are:
cfftb
cfftf
cffti
rfftb
rfftf
rffti

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091026/ccd5fe84/attachment.html>

From Chris.Barker at noaa.gov  Mon Oct 26 14:12:49 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Mon, 26 Oct 2009 11:12:49 -0700
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <4AE59532.3000708@american.edu>
References: <web-126158258@uni-stuttgart.de> <4AE59532.3000708@american.edu>
Message-ID: <4AE5E6A1.4030901@noaa.gov>

Alan G Isaac wrote:
> On 10/26/2009 4:04 AM, Nils Wagner wrote:
>> how can I obtain the multiplicity of an entry in a list
>> a = ['abc','def','abc','ghij']
> 
> That's a Python question, not a NumPy question.

but we can make it a numpy question!

In [15]: a = np.array(['abc','def','abc','ghij'])


In [16]: a
Out[16]:
array(['abc', 'def', 'abc', 'ghij'],
       dtype='|S4')

In [17]: for item in set(a):
     print item, (a == item).sum()

abc 2
ghij 1
def 1

I'll leave pro=filing to the OP.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From josef.pktd at gmail.com  Mon Oct 26 14:26:12 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 26 Oct 2009 14:26:12 -0400
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <4AE5E6A1.4030901@noaa.gov>
References: <web-126158258@uni-stuttgart.de> <4AE59532.3000708@american.edu>
	<4AE5E6A1.4030901@noaa.gov>
Message-ID: <1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>

On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> Alan G Isaac wrote:
>> On 10/26/2009 4:04 AM, Nils Wagner wrote:
>>> how can I obtain the multiplicity of an entry in a list
>>> a = ['abc','def','abc','ghij']
>>
>> That's a Python question, not a NumPy question.
>
> but we can make it a numpy question!
>
> In [15]: a = np.array(['abc','def','abc','ghij'])
>
>
> In [16]: a
> Out[16]:
> array(['abc', 'def', 'abc', 'ghij'],
> ? ? ? dtype='|S4')
>
> In [17]: for item in set(a):
> ? ? print item, (a == item).sum()

It's *very* slow, when there are a large number of items.
numpy creates the full boolean array for each item.

see also  http://projects.scipy.org/scipy/ticket/905

Josef


>
> abc 2
> ghij 1
> def 1
>
> I'll leave pro=filing to the OP.
>
> -Chris
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice
> 7600 Sand Point Way NE ? (206) 526-6329 ? fax
> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From mdroe at stsci.edu  Mon Oct 26 14:26:20 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Mon, 26 Oct 2009 14:26:20 -0400
Subject: [Numpy-discussion] C code coverage tool
Message-ID: <4AE5E9CC.3070808@stsci.edu>

I know David Cournapeau has done some work on using gcov for coverage 
with Numpy.

Unaware of this, (doh! -- I should have Googled first), I wrote a small 
C code-coverage tool built on top of valgrind's callgrind tool, so it 
basically only works on x86/AMD64 unixy platforms, but unlike gcov it 
doesn't require any recompilation headaches (though compiling 
unoptimized helps).

It's about 200 lines of Python code that parses callgrind's output and 
generates text and HTML.  It has specialized support for the code 
generation used in Numpy -- each line is marked not only by *if* it ran, 
but in which version of the function it ran.

I've put an example of its output from the numpy unit tests up here 
(temporary address):

   http://www.droettboom.com/c_coverage/

A particularly interesting file is this one:

   
http://www.droettboom.com/c_coverage/numpy_core_src_multiarray_arraytypes.c.src.html

Is this something we want to add to the SVN tree, maybe under tools?

Usage instructions are below.

Mike


===============
C coverage tool
===============

This is a tool to generate C code-coverage reports using valgrind's
callgrind tool.

Prerequisites
-------------

 * `Valgrind <http://www.valgrind.org/>`_ (3.5.0 tested, earlier
   versions may work)

 * `Pygments <http://www.pygments.org/>`_ (0.11 or later required)

C code-coverage
---------------

Generating C code coverage reports requires two steps:

 * Collecting coverage results (from valgrind)

 * Generating a report from one or more sets of results

For most cases, it is good enough to do::

  > c_coverage_collect.sh python -c "import numpy; numpy.test()"
  > c_coverage_report.py callgrind.out.pid

which will run all of the Numpy unit tests, create a directory called
`coverage` and write the coverage report there.

In a more advanced scenario, you may wish to run individual unit tests
(since running under valgrind slows things down) and combine multiple
results files together in a single report.

Collecting results
``````````````````

To collect coverage results, you merely run the python interpreter
under valgrind's callgrind tool.  The `c_coverage_collect.sh` helper
script will pass all of the required arguments to valgrind.

For example, in typical usage, you may want to run all of the Numpy
unit tests::

  > c_coverage_collect.sh python -c "import numpy; numpy.test()"

This will output a file ``callgrind.out.pid`` containing the results of
the run, where ``pid`` is the process id of the run.

Generating a report
```````````````````

To generate a report, you pass the ``callgrind.out.pid`` output file to
the `c_coverage_report.py` script::

  > c_coverage_report.py callgrind.out.pid

To combine multiple results files together, simply list them on the
commandline or use wildcards::

  > c_coverage_report.py callgrind.out.*

Options
'''''''

  * ``--directory``: Specify a different output directory

  * ``--pattern``: Specify a regex pattern to match for source files.
    The default is `numpy`, so it will only include source files whose
    path contains the string `numpy`.  If, for instance, you wanted to
    include all source files covered (that are available on your
    system), pass ``--pattern=.``.

  * ``--format``: Specify the output format(s) to generate.  May be
    either ``text`` or ``html``.  If ``--format`` is not provided,
    both formats will be output.

Reading a report
----------------

The C code coverage report is a flat directory of files, containing
text and/or html files.  The files are named based on their path in
the original source tree with slashes converted to underscores.

Text reports
````````````

The text reports add a prefix to each line of source code:

 - '>' indicates the line of code was run

 - '!' indicates the line of code was not run

HTML reports
````````````

The HTML report highlights the code that was run in green.

The HTML report has special support for the "generated" functions in
Numpy.  Each run line of code also contains a number in square
brackets indicating the number of different generated functions the
line was run in.  Hovering the mouse over the line will display a list
of the versions of the function in which the line was run.  These
numbers can be used to see if a particular line was run in all
versions of the function.

Caveats
-------

The coverage results occasionally misses lines that clearly must have
been run.  This usually can be traced back to the compiler optimizer
removing lines because they are tautologically impossible or to
combine lines together.  Compiling Numpy without optimizations helps,
but not completely.  Even despite this flaw, this tool is still
helpful in identifying large missed blocks or functions.

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From nadavh at visionsense.com  Mon Oct 26 22:27:13 2009
From: nadavh at visionsense.com (Nadav Horesh)
Date: Tue, 27 Oct 2009 04:27:13 +0200
Subject: [Numpy-discussion] Multiplicity of an entry
References: <web-126158258@uni-stuttgart.de> <4AE59532.3000708@american.edu>
	<4AE5E6A1.4030901@noaa.gov>
	<1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>
Message-ID: <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>

In principle you could use:

np.equal(a,a).sum(0)

but, for unknown reason, np.equal operates only on "normal" arrays. maybe you can transform the array to arrays of numbers, for example by hash.

  Nadav


-----????? ??????-----
???: numpy-discussion-bounces at scipy.org ??? josef.pktd at gmail.com
????: ? 26-???????-09 20:26
??: Discussion of Numerical Python
????: Re: [Numpy-discussion] Multiplicity of an entry
 
On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> Alan G Isaac wrote:
>> On 10/26/2009 4:04 AM, Nils Wagner wrote:
>>> how can I obtain the multiplicity of an entry in a list
>>> a = ['abc','def','abc','ghij']
>>
>> That's a Python question, not a NumPy question.
>
> but we can make it a numpy question!
>
> In [15]: a = np.array(['abc','def','abc','ghij'])
>
>
> In [16]: a
> Out[16]:
> array(['abc', 'def', 'abc', 'ghij'],
> ? ? ? dtype='|S4')
>
> In [17]: for item in set(a):
> ? ? print item, (a == item).sum()

It's *very* slow, when there are a large number of items.
numpy creates the full boolean array for each item.

see also  http://projects.scipy.org/scipy/ticket/905

Josef


>
> abc 2
> ghij 1
> def 1
>
> I'll leave pro=filing to the OP.
>
> -Chris
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice
> 7600 Sand Point Way NE ? (206) 526-6329 ? fax
> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3776 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/6215fc49/attachment.bin>

From pav+sp at iki.fi  Tue Oct 27 05:11:29 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Tue, 27 Oct 2009 09:11:29 +0000 (UTC)
Subject: [Numpy-discussion] C code coverage tool
References: <4AE5E9CC.3070808@stsci.edu>
Message-ID: <hc6dg1$lci$1@ger.gmane.org>

Mon, 26 Oct 2009 14:26:20 -0400, Michael Droettboom wrote:
> I know David Cournapeau has done some work on using gcov for coverage
> with Numpy.
> 
> Unaware of this, (doh! -- I should have Googled first), I wrote a small
> C code-coverage tool built on top of valgrind's callgrind tool, so it
> basically only works on x86/AMD64 unixy platforms, but unlike gcov it
> doesn't require any recompilation headaches (though compiling
> unoptimized helps).
[clip]

Where's the code?

[clip]
> Is this something we want to add to the SVN tree, maybe under tools?

Yes. Also, maybe you want to send it to the Valgrind guys, too. If they 
don't yet have a code coverage functionality yet, it could be nice to 
have.

	Pauli


From mdroe at stsci.edu  Tue Oct 27 07:33:54 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Tue, 27 Oct 2009 07:33:54 -0400
Subject: [Numpy-discussion] C code coverage tool
In-Reply-To: <hc6dg1$lci$1@ger.gmane.org>
References: <4AE5E9CC.3070808@stsci.edu> <hc6dg1$lci$1@ger.gmane.org>
Message-ID: <4AE6DAA2.3060804@stsci.edu>

On 10/27/2009 05:11 AM, Pauli Virtanen wrote:
> Mon, 26 Oct 2009 14:26:20 -0400, Michael Droettboom wrote:
>    
>> I know David Cournapeau has done some work on using gcov for coverage
>> with Numpy.
>>
>> Unaware of this, (doh! -- I should have Googled first), I wrote a small
>> C code-coverage tool built on top of valgrind's callgrind tool, so it
>> basically only works on x86/AMD64 unixy platforms, but unlike gcov it
>> doesn't require any recompilation headaches (though compiling
>> unoptimized helps).
>>      
> [clip]
>
> Where's the code?
>    
It's in the Numpy SVN tree now, under tools/c_coverage
> [clip]
>    
>> Is this something we want to add to the SVN tree, maybe under tools?
>>      
> Yes. Also, maybe you want to send it to the Valgrind guys, too. If they
> don't yet have a code coverage functionality yet, it could be nice to
> have.
>    
There has been a coverage-only valgrind tool in the works for almost two 
years (vcov).  That's a lot more work that what I've done here (by 
reusing callgrind for the purpose), but it will apparently be more 
performant.  Personally, I couldn't get it to compile (I think it's out 
of sync with the rest of valgrind atm).

I didn't want to wait for that, and I don't know enough about valgrind 
internals to effectively contribute, so I just wrote a callgrind parser 
(really not that hard).  I found another project [1] that takes a 
similar approach, but it's written in C++ and looked too difficult to 
adapt to handle Numpy's code generation.

[1] http://github.com/icefox/callgrind_tools

Mike


From gokhansever at gmail.com  Tue Oct 27 07:56:33 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Tue, 27 Oct 2009 06:56:33 -0500
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
Message-ID: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>

Hello,

Consider this sample two columns of data:

 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999   1693.9069
 999999.9999   1676.1059
 999999.9999   1621.5875
    651.8040       1542.1373
    691.0138       1650.4214
    678.5558       1710.7311
    621.5777    999999.9999
    644.8341    999999.9999
    696.2080    999999.9999

Putting into this data into a file say "sample.data" and loading with:

a,b = np.loadtxt('sample.data', dtype="float").T

I[16]: a
O[16]:
array([  1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         6.51804000e+02,   6.91013800e+02,   6.78555800e+02,
         6.21577700e+02,   6.44834100e+02,   6.96208000e+02])

I[17]: b
O[17]:
array([ 999999.9999,  999999.9999,  999999.9999,    1693.9069,
          1676.1059,    1621.5875,    1542.1373,    1650.4214,
          1710.7311,  999999.9999,  999999.9999,  999999.9999])

### interestingly, the second column is loaded as it is but a values
reformed a little. Why this could be happening? Any idea? Anyways, back to
masked arrays:

I[24]: am = ma.masked_values(a, value=999999.9999)

I[25]: am
O[25]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False
False False False],
       fill_value = 999999.9999)


I[30]: bm = ma.masked_values(b, value=999999.9999)

I[31]: am
O[31]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False
False False False],
       fill_value = 999999.9999)


So far so good. A few basic checks:

I[33]: am/bm
O[33]:
masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
0.39664667346 -- -- --],
             mask = [ True  True  True  True  True  True False False False
True  True  True],
       fill_value = 999999.9999)


I[34]: mean(am/bm)
O[34]: 0.41266624676580849

Unfortunately, matplotlib.mlab's prctile cannot handle this division:

I[54]: prctile(am/bm, p=[5,25,50,75,95])
O[54]:
array([  3.96646673e-01,   6.21577700e+02,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06])


This also results with wrong looking box-and-whisker plots.


Testing further with scipy.stats functions yields expected correct results:

I[55]: stats.scoreatpercentile(am/bm, per=5)
O[55]: 0.40877012449846228

I[49]: stats.scoreatpercentile(am/bm, per=25)
O[49]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)

I[56]: stats.scoreatpercentile(am/bm, per=95)
O[56]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


Any confirmation?


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/6492935c/attachment.html>

From martin.raspaud at smhi.se  Tue Oct 27 08:43:54 2009
From: martin.raspaud at smhi.se (Raspaud Martin)
Date: Tue, 27 Oct 2009 13:43:54 +0100
Subject: [Numpy-discussion] C-API: How is data filling done in
	PyArray_SimpleNewFromData ?
Message-ID: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I?m using numpy v1.2.0, and I have the following codes that provide
different results :

- ---------------------
cal = (PyArrayObject *)PyArray_SimpleNew(2,dims,NPY_FLOAT);
for(i=0;i<dims[0];i++)
  for(j=0;j<dims[1];j++)
    {
      *((npy_float *)PyArray_GETPTR2(cal,i,j))=(npy_float)in[i][j];
    }
- ---------------------
and
- ---------------------
cal = (PyArrayObject *)PyArray_SimpleNewFromData(2,dims,NPY_FLOAT,in);
- ---------------------

As you probably guessed, "in" is a 2D array of floats of dimensions "dims".

My questions are thus:
- - Why do the two methods provide different results ?
- - How do I get the second to behave like the first ?

Thanks,

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJK5usJAAoJEBdvyODiyJI4YWwH/1VyfpeVVfI0/VfcVJ4L064z
cWbqI4WOpDeYZmFIKLQfuVWwlEkQzbe7AQTxDiy3kR1iL93d6af0+YiLCaURdx9X
yv3P7VE+i5clr6jqpESLngkl15LT8aJBzlI/UJL6UnrVBiKYAQ0XSFVdAWChOLE/
VtnEMNxbb5ZHqAwBhmjD/fldG9OwSODpq0yOudVgk1VWTQd1M4AESypY7QGFPod2
Exu83zm3ZVLzOvHjBE8yVOjVTlgm2hPBWMDySazz6owMh3XBCMMMz9rN3H4XQ0CJ
2R3bN22sJm363D2DlauHoTAMuAz8958fPzMFYVFxdSXZBH2V6Ck8Q3fi4MA/fnI=
=h7J3
-----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/34122264/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: martin_raspaud.vcf
Type: text/x-vcard
Size: 260 bytes
Desc: martin_raspaud.vcf
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/34122264/attachment.vcf>

From josef.pktd at gmail.com  Tue Oct 27 09:25:21 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 27 Oct 2009 09:25:21 -0400
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
In-Reply-To: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
Message-ID: <1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com>

On Tue, Oct 27, 2009 at 7:56 AM, G?khan Sever <gokhansever at gmail.com> wrote:
> Hello,
>
> Consider this sample two columns of data:
>
> ?999999.9999 999999.9999
> ?999999.9999 999999.9999
> ?999999.9999 999999.9999
> ?999999.9999?? 1693.9069
> ?999999.9999?? 1676.1059
> ?999999.9999?? 1621.5875
> ??? 651.8040?????? 1542.1373
> ??? 691.0138?????? 1650.4214
> ??? 678.5558?????? 1710.7311
> ??? 621.5777??? 999999.9999
> ??? 644.8341??? 999999.9999
> ??? 696.2080??? 999999.9999
>
> Putting into this data into a file say "sample.data" and loading with:
>
> a,b = np.loadtxt('sample.data', dtype="float").T
>
> I[16]: a
> O[16]:
> array([? 1.00000000e+06,?? 1.00000000e+06,?? 1.00000000e+06,
> ???????? 1.00000000e+06,?? 1.00000000e+06,?? 1.00000000e+06,
> ???????? 6.51804000e+02,?? 6.91013800e+02,?? 6.78555800e+02,
> ???????? 6.21577700e+02,?? 6.44834100e+02,?? 6.96208000e+02])
>
> I[17]: b
> O[17]:
> array([ 999999.9999,? 999999.9999,? 999999.9999,??? 1693.9069,
> ????????? 1676.1059,??? 1621.5875,??? 1542.1373,??? 1650.4214,
> ????????? 1710.7311,? 999999.9999,? 999999.9999,? 999999.9999])
>
> ### interestingly, the second column is loaded as it is but a values
> reformed a little. Why this could be happening? Any idea? Anyways, back to
> masked arrays:
>
> I[24]: am = ma.masked_values(a, value=999999.9999)
>
> I[25]: am
> O[25]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> ???????????? mask = [ True? True? True? True? True? True False False False
> False False False],
> ?????? fill_value = 999999.9999)
>
>
> I[30]: bm = ma.masked_values(b, value=999999.9999)
>
> I[31]: am
> O[31]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> ???????????? mask = [ True? True? True? True? True? True False False False
> False False False],
> ?????? fill_value = 999999.9999)
>
>
> So far so good. A few basic checks:
>
> I[33]: am/bm
> O[33]:
> masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
> 0.39664667346 -- -- --],
> ???????????? mask = [ True? True? True? True? True? True False False False
> True? True? True],
> ?????? fill_value = 999999.9999)
>
>
> I[34]: mean(am/bm)
> O[34]: 0.41266624676580849
>
> Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>
> I[54]: prctile(am/bm, p=[5,25,50,75,95])
> O[54]:
> array([? 3.96646673e-01,?? 6.21577700e+02,?? 1.00000000e+06,
> ???????? 1.00000000e+06,?? 1.00000000e+06])
>
>
> This also results with wrong looking box-and-whisker plots.
>
>
> Testing further with scipy.stats functions yields expected correct results:

This should not be the correct results if you use scipy.stats.scoreatpercentile,
it doesn't have correct missing value handling, it treats nans or
mask/fill values as regular numbers sorted to the end.

stats.mstats.scoreatpercentile  is the corresponding function for
masked arrays.

(BTW I wasn't able to quickly copy and past your example because
MaskedArrays don't seem to have a constructive __repr__, i.e.
no commas)

I don't know anything about the matplotlib story.

Josef

>
> I[55]: stats.scoreatpercentile(am/bm, per=5)
> O[55]: 0.40877012449846228
>
> I[49]: stats.scoreatpercentile(am/bm, per=25)
> O[49]:
> masked_array(data = --,
> ???????????? mask = True,
> ?????? fill_value = 1e+20)
>
> I[56]: stats.scoreatpercentile(am/bm, per=95)
> O[56]:
> masked_array(data = --,
> ???????????? mask = True,
> ?????? fill_value = 1e+20)
>
>
> Any confirmation?
>
>
>
>
>
>
>
> --
> G?khan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From Chris.Barker at noaa.gov  Tue Oct 27 12:09:53 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Tue, 27 Oct 2009 09:09:53 -0700
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>
References: <web-126158258@uni-stuttgart.de> <4AE59532.3000708@american.edu>
	<4AE5E6A1.4030901@noaa.gov>
	<1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>
	<710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>
Message-ID: <4AE71B51.8040002@noaa.gov>

Nadav Horesh wrote:
> np.equal(a,a).sum(0)
> 
> but, for unknown reason, np.equal operates only on "normal" arrays.

true:

In [25]: a
Out[25]:
array(['abc', 'def', 'abc', 'ghij'],
       dtype='|S4')

In [27]: np.equal(a,a)
Out[27]: NotImplemented

however:

In [28]: a == a
Out[28]: array([ True,  True,  True,  True], dtype=bool)

don't they use the same code? or is "==" reverting to plain old generic 
python sequence comparison, which would partly explain why it is so slow.

> maybe you can transform the array to arrays of numbers, for example by hash.

or even easier:

In [32]: a2 = a.view(dtype=np.int32)

In [33]: a2
Out[33]: array([1633837824, 1684366848, 1633837824, 1734895978])

In [34]: np.equal(a2, a2[0])
Out[34]: array([ True, False,  True, False], dtype=bool)

though that only works if your strings are a handy length like 4 bytes...

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From pgmdevlist at gmail.com  Tue Oct 27 13:23:49 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Tue, 27 Oct 2009 13:23:49 -0400
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
In-Reply-To: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
Message-ID: <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com>


On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote:
>
>
> Unfortunately, matplotlib.mlab's prctile cannot handle this division:

Actually, the division's OK, it's mlab.prctile which is borked. It  
uses the length of the input array instead of its count to compute the  
nb of valid data. The easiest workaround in your case is probably to  
use:
 >>> prctile((am/bm).compressed(), p=[5,25,50,75,95])
HIH
P.

From mdroe at stsci.edu  Tue Oct 27 15:31:39 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Tue, 27 Oct 2009 15:31:39 -0400
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <4AE71B51.8040002@noaa.gov>
References: <web-126158258@uni-stuttgart.de>
	<4AE59532.3000708@american.edu>	<4AE5E6A1.4030901@noaa.gov>	<1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>	<710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>
	<4AE71B51.8040002@noaa.gov>
Message-ID: <4AE74A9B.4060603@stsci.edu>

Christopher Barker wrote:
> Nadav Horesh wrote:
>   
>> np.equal(a,a).sum(0)
>>
>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>     
>
> true:
>
> In [25]: a
> Out[25]:
> array(['abc', 'def', 'abc', 'ghij'],
>        dtype='|S4')
>
> In [27]: np.equal(a,a)
> Out[27]: NotImplemented
>
> however:
>
> In [28]: a == a
> Out[28]: array([ True,  True,  True,  True], dtype=bool)
>
> don't they use the same code? or is "==" reverting to plain old generic 
> python sequence comparison, which would partly explain why it is so slow.
>   
It looks as if "a == a" (that is array_richcompare) is triggering 
special case code for strings, so it is fast.  However, IMHO np.equal 
should be made to work as well.  Can you file a bug and assign it to me 
(I'm dealing with a number of other string-related things, so I might as 
well take this too). 

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From oliphant at enthought.com  Tue Oct 27 15:54:53 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Tue, 27 Oct 2009 14:54:53 -0500
Subject: [Numpy-discussion] C-API: How is data filling done in
	PyArray_SimpleNewFromData ?
In-Reply-To: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se>
References: <783F32138ED65D4A9CF016980481B6BF01CB5785@CORRE.ad.smhi.se>
Message-ID: <FA07B66B-C190-4699-A008-65DA69A05C15@enthought.com>


On Oct 27, 2009, at 7:43 AM, Raspaud Martin wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> I?m using numpy v1.2.0, and I have the following codes that provide
> different results :
>
> - ---------------------
> cal = (PyArrayObject *)PyArray_SimpleNew(2,dims,NPY_FLOAT);
> for(i=0;i<dims[0];i++)
>   for(j=0;j<dims[1];j++)
>     {
>       *((npy_float *)PyArray_GETPTR2(cal,i,j))=(npy_float)in[i][j];
>     }
> - ---------------------
> and
> - ---------------------
> cal = (PyArrayObject *)PyArray_SimpleNewFromData(2,dims,NPY_FLOAT,in);
> - ---------------------
>
> As you probably guessed, "in" is a 2D array of floats of dimensions  
> "dims".
>
> My questions are thus:
> - - Why do the two methods provide different results ?
> - - How do I get the second to behave like the first ?
>

In the second case, "in" should be a pointer to a place in memory with  
space for dims[0]*dims[1] floats.   In particular, it should not be a  
2-d array of floats.

FromData expects to get a single pointer to float (not a 2D array).    
I can't think of a way to get the second case to work other than have  
"in" be a 1-D array.

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/fa6a6570/attachment.html>

From oliphant at enthought.com  Tue Oct 27 15:59:04 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Tue, 27 Oct 2009 14:59:04 -0500
Subject: [Numpy-discussion] Astype and strings
In-Reply-To: <da19b6a80910260754p465d1f09qe7e56521018be9c0@mail.gmail.com>
References: <da19b6a80910260754p465d1f09qe7e56521018be9c0@mail.gmail.com>
Message-ID: <8DC4A103-F099-4841-BC1B-BCE4EE2AB1B6@enthought.com>


On Oct 26, 2009, at 9:54 AM, Eli Bressert wrote:

> Hi Everyone,
>
> Is Numpy supposed to behave this like this when converting an array of
> numbers to an array of strings with astype?

In general you have to tell NumPy how big the string should be (i.e.  
np.str is generic).

There are a few places where NumPy will look at the data you have in  
order to guess a size, but as you've seen astype is not one of those  
places.

I think astype could be fixed (by putting a special-case check in the  
current code for conversion to an unspecified-length string), but that  
has not been implemented.

Please file a feature enhancement issue on the Trac so we don't lose  
sight of this.

-Travis


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/c2192943/attachment.html>

From oliphant at enthought.com  Tue Oct 27 16:04:08 2009
From: oliphant at enthought.com (Travis Oliphant)
Date: Tue, 27 Oct 2009 15:04:08 -0500
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <4AE74A9B.4060603@stsci.edu>
References: <web-126158258@uni-stuttgart.de>
	<4AE59532.3000708@american.edu>	<4AE5E6A1.4030901@noaa.gov>	<1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>	<710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>
	<4AE71B51.8040002@noaa.gov> <4AE74A9B.4060603@stsci.edu>
Message-ID: <D3A9E6A1-28F5-4CC6-AB98-B89A6AED0C4D@enthought.com>


On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:

> Christopher Barker wrote:
>> Nadav Horesh wrote:
>>
>>> np.equal(a,a).sum(0)
>>>
>>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>>
>>
>> true:
>>
>> In [25]: a
>> Out[25]:
>> array(['abc', 'def', 'abc', 'ghij'],
>>       dtype='|S4')
>>
>> In [27]: np.equal(a,a)
>> Out[27]: NotImplemented
>>
>> however:
>>
>> In [28]: a == a
>> Out[28]: array([ True,  True,  True,  True], dtype=bool)
>>
>> don't they use the same code? or is "==" reverting to plain old  
>> generic
>> python sequence comparison, which would partly explain why it is so  
>> slow.
>>
> It looks as if "a == a" (that is array_richcompare) is triggering
> special case code for strings, so it is fast.  However, IMHO np.equal
> should be made to work as well.  Can you file a bug and assign it to  
> me
> (I'm dealing with a number of other string-related things, so I  
> might as
> well take this too).

The array_richcompare special-cased strings not for speed but for  
actual functionality.

Making np.equal work with strings requires changes to the ufunc code  
itself which was never written to work with "variable-length" data- 
types (like strings, unicode, and records).    There are several  
things that would have to be fixed.   Some of the changes we made to  
allow for date-time data-types also made it possible to support  
variable-length strings, but this is non-trivial to implement.  It's  
certainly possible, but I would want to look at any changes you make  
before committing them to make sure all the issues are being understood.

Thanks,

-Travis


--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant at enthought.com


From mdroe at stsci.edu  Tue Oct 27 17:07:33 2009
From: mdroe at stsci.edu (Michael Droettboom)
Date: Tue, 27 Oct 2009 17:07:33 -0400
Subject: [Numpy-discussion] Multiplicity of an entry
In-Reply-To: <D3A9E6A1-28F5-4CC6-AB98-B89A6AED0C4D@enthought.com>
References: <web-126158258@uni-stuttgart.de>	<4AE59532.3000708@american.edu>	<4AE5E6A1.4030901@noaa.gov>	<1cd32cbb0910261126v1d524972ue8856fc143b40ee1@mail.gmail.com>	<710F2847B0018641891D9A21602763605AD1E5@ex3.envision.co.il>	<4AE71B51.8040002@noaa.gov>
	<4AE74A9B.4060603@stsci.edu>
	<D3A9E6A1-28F5-4CC6-AB98-B89A6AED0C4D@enthought.com>
Message-ID: <4AE76115.7010709@stsci.edu>

Travis Oliphant wrote:
> On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
>
>   
>> Christopher Barker wrote:
>>     
>>> Nadav Horesh wrote:
>>>
>>>       
>>>> np.equal(a,a).sum(0)
>>>>
>>>> but, for unknown reason, np.equal operates only on "normal" arrays.
>>>>
>>>>         
>>> true:
>>>
>>> In [25]: a
>>> Out[25]:
>>> array(['abc', 'def', 'abc', 'ghij'],
>>>       dtype='|S4')
>>>
>>> In [27]: np.equal(a,a)
>>> Out[27]: NotImplemented
>>>
>>> however:
>>>
>>> In [28]: a == a
>>> Out[28]: array([ True,  True,  True,  True], dtype=bool)
>>>
>>> don't they use the same code? or is "==" reverting to plain old  
>>> generic
>>> python sequence comparison, which would partly explain why it is so  
>>> slow.
>>>
>>>       
>> It looks as if "a == a" (that is array_richcompare) is triggering
>> special case code for strings, so it is fast.  However, IMHO np.equal
>> should be made to work as well.  Can you file a bug and assign it to  
>> me
>> (I'm dealing with a number of other string-related things, so I  
>> might as
>> well take this too).
>>     
>
> The array_richcompare special-cased strings not for speed but for  
> actual functionality.
>
> Making np.equal work with strings requires changes to the ufunc code  
> itself which was never written to work with "variable-length" data- 
> types (like strings, unicode, and records).    There are several  
> things that would have to be fixed.   Some of the changes we made to  
> allow for date-time data-types also made it possible to support  
> variable-length strings, but this is non-trivial to implement.  It's  
> certainly possible, but I would want to look at any changes you make  
> before committing them to make sure all the issues are being understood.
>   
Yeah -- I'm realizing this is a bigger project than I initially 
suspected.  I'll keep you posted if I find the time to do this right.

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


From sturla at molden.no  Tue Oct 27 17:46:08 2009
From: sturla at molden.no (Sturla Molden)
Date: Tue, 27 Oct 2009 22:46:08 +0100
Subject: [Numpy-discussion] Syntax highlighting for Cython and NumPy
Message-ID: <4AE76A20.2060405@molden.no>

Here is an XML for Cython syntax highlighting in katepart (e.g. KATE and 
KDevelop). I made this because KATE is my faviourite text edior (feel 
free to call me a heretic for not using emacs). Unfortunately, the 
Python highlighting for KDE contains several bugs. And the Pyrex/Cython 
version that circulates on the web builds on this and introduces a 
couple more. I have tried to clean it up. Note that this will also 
highlight numpy.* or np.*, if * is a type or function you get from 
"cimport numpy" or "import numpy". This works on Windows as well, if you 
have installed KDE for Windows. Just copy the XML to:

   ~/.kde/share/apps/katepart/syntax/
   C:\kde\share\apps\katepart\syntax  (or whereever you have KDE installed)

and "Cython with NumPy" shows up under Sources. Anyway, this is the 
syntax high-lighter I use to write Cython. Feel free to use it as you wish.

P.S. I am also cleaning up Python high-lighting for KDE. Not done yet, 
but I will post a "Python with NumPy" highlighter later on if this is 
interesting.

P.P.S. This also covers Pyrex, but add in some Cython stuff.


Sturla Molden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cython.xml
Type: text/xml
Size: 34481 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/fe8112cf/attachment.xml>

From sturla at molden.no  Tue Oct 27 18:31:55 2009
From: sturla at molden.no (Sturla Molden)
Date: Tue, 27 Oct 2009 23:31:55 +0100
Subject: [Numpy-discussion] [Cython] Syntax highlighting for Cython and
	NumPy
In-Reply-To: <4AE76A20.2060405@molden.no>
References: <4AE76A20.2060405@molden.no>
Message-ID: <4AE774DB.4020209@molden.no>

Sturla Molden skrev:
>
> and "Cython with NumPy" shows up under Sources. Anyway, this is the 
> syntax high-lighter I use to write Cython. 
It seems I posted the wrong file. :-(

S.M.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cython.xml
Type: text/xml
Size: 34521 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091027/9a1e5c83/attachment.xml>

From sturla at molden.no  Tue Oct 27 19:25:36 2009
From: sturla at molden.no (Sturla Molden)
Date: Wed, 28 Oct 2009 00:25:36 +0100
Subject: [Numpy-discussion] [Cython] Syntax highlighting for Cython and
	NumPy
In-Reply-To: <e7ba66e40910271614p46cae603nee6bffd9c706581d@mail.gmail.com>
References: <4AE76A20.2060405@molden.no> <4AE774DB.4020209@molden.no>
	<e7ba66e40910271614p46cae603nee6bffd9c706581d@mail.gmail.com>
Message-ID: <4AE78170.4040601@molden.no>

Lisandro Dalcin skrev:
> Is there any specific naming convention for these XML files to work
> with KATE? Would it be fine to call it 'cython-mode-kate.xml' to push
> it to the repo? Will it still work (I mean, with that name) when
> placed appropriately in KATE config dirs or whatever? ... Just
> concerned that 'cython.xml' is a bit too generic filename...
>
>   
You can name it anything you want. The file has an entry like this:

<language name="Cython with NumPy" version="1.1" kateversion="2.4" 
section="Sources" extensions="*.pyx;*.pxi;*.pxd"

that tells KDE what this is and where to put it.

I was thinking of changing section to "Scientific",  because that I 
where I want to put  "Python with NumPy" as well. It makes sence to have 
NumPy highlighting alongside Matlab and Octave.

S.M.


From ole-usenet-spam at gmx.net  Wed Oct 28 09:21:06 2009
From: ole-usenet-spam at gmx.net (Ole Streicher)
Date: Wed, 28 Oct 2009 14:21:06 +0100
Subject: [Numpy-discussion] Segfault when using scipy.special.hermite?
Message-ID: <ytzd447u01p.fsf@news.ole.ath.cx>

Hi,

Is there something wrong with scipy.special.hermite? The following code
produces glibc errors:

------------8<-----------------------
import scipy.special
h = []
for i in xrange(15):
    print i
    h.append(scipy.special.hermite(i+1))
------------8<-----------------------

results in
...
12
*** glibc detected *** python: free(): invalid next size (fast): 0x00000000007e2290 ***

OS: OpenSUSE 11.1 (x86_64)
Python 2.6.0
Scipy: 0.7.0

When using ipython 0.8.4 on the same machine, the error does not occur.

What may be the problem here?

Regards

Ole


From gokhansever at gmail.com  Wed Oct 28 09:47:08 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 28 Oct 2009 08:47:08 -0500
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
In-Reply-To: <1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com>
References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
	<1cd32cbb0910270625p72fdd11fj8f18db42aa9bb566@mail.gmail.com>
Message-ID: <49d6b3500910280647n5351e248se85b432eb4a660e2@mail.gmail.com>

On Tue, Oct 27, 2009 at 8:25 AM, <josef.pktd at gmail.com> wrote:

> This should not be the correct results if you use
> scipy.stats.scoreatpercentile,
> it doesn't have correct missing value handling, it treats nans or
> mask/fill values as regular numbers sorted to the end.
>
> stats.mstats.scoreatpercentile  is the corresponding function for
> masked arrays.
>
>
Thanks for the suggestion. I forgot the existence of such module. It yields
better results.

I[14]: st.mstats.scoreatpercentile(r, per=25)
O[14]:
masked_array(data = 0.401055201111,
             mask = False,
       fill_value = 1e+20)

I[17]: st.scoreatpercentile(r, per=25)
O[17]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)

I usually fall into traps using masked arrays. Hopefully I will figure out
these before I make funnier mistakes in my analysis.

Besides, it would be nice to have the "per" argument accepts a sequence
instead of a one item. Like matplotlib's prctile. Using it as: ...(array,
per=[5,25,50,75,95]) in a one call.


> (BTW I wasn't able to quickly copy and past your example because
> MaskedArrays don't seem to have a constructive __repr__, i.e.
> no commas)
>
>
You can copy and paste the sample data from this link. When I copied from a
txt file into gmail into somehow distorted the original look of the data.

http://code.google.com/p/ccnworks/source/browse/trunk/sample.data


> I don't know anything about the matplotlib story.
>
> Josef
>
> >
> > I[55]: stats.scoreatpercentile(am/bm, per=5)
> > O[55]: 0.40877012449846228
> >
> > I[49]: stats.scoreatpercentile(am/bm, per=25)
> > O[49]:
> > masked_array(data = --,
> >              mask = True,
> >        fill_value = 1e+20)
> >
> > I[56]: stats.scoreatpercentile(am/bm, per=95)
> > O[56]:
> > masked_array(data = --,
> >              mask = True,
> >        fill_value = 1e+20)
> >
> >
> > Any confirmation?
> >
> >
> >
> >
> >
> >
> >
> > --
> > G?khan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091028/9ec81ce1/attachment.html>

From gokhansever at gmail.com  Wed Oct 28 09:52:32 2009
From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=)
Date: Wed, 28 Oct 2009 08:52:32 -0500
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
In-Reply-To: <08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com>
References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
	<08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com>
Message-ID: <49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com>

On Tue, Oct 27, 2009 at 12:23 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote:
> >
> >
> > Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>
> Actually, the division's OK, it's mlab.prctile which is borked. It
> uses the length of the input array instead of its count to compute the
> nb of valid data. The easiest workaround in your case is probably to
> use:
>  >>> prctile((am/bm).compressed(), p=[5,25,50,75,95])
> HIH
> P.
>

Great. Exact solution. I should have asked this last week :)

One simple method solves all the riddle. I had manually masked the MVCs
using NaN's.

My guess is using compressed() masked arrays could be used with any of
regularly defined numpy and scipy functions, right?

Thanks for the tip.


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
G?khan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091028/d0729436/attachment.html>

From josef.pktd at gmail.com  Wed Oct 28 10:03:23 2009
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 28 Oct 2009 10:03:23 -0400
Subject: [Numpy-discussion] Using matplotlib's prctile on masked arrays
In-Reply-To: <49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com>
References: <49d6b3500910270456o278fbdedgea2c8148459802a5@mail.gmail.com>
	<08B7694F-30B0-4303-83B5-3687BD2A3777@gmail.com>
	<49d6b3500910280652i701e2988yf4e5b2ab83d5f575@mail.gmail.com>
Message-ID: <1cd32cbb0910280703p1805ef1tc28eb4ccb65b4ef1@mail.gmail.com>

On Wed, Oct 28, 2009 at 9:52 AM, G?khan Sever <gokhansever at gmail.com> wrote:
>
>
> On Tue, Oct 27, 2009 at 12:23 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>
>> On Oct 27, 2009, at 7:56 AM, G?khan Sever wrote:
>> >
>> >
>> > Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>>
>> Actually, the division's OK, it's mlab.prctile which is borked. It
>> uses the length of the input array instead of its count to compute the
>> nb of valid data. The easiest workaround in your case is probably to
>> use:
>> ?>>> prctile((am/bm).compressed(), p=[5,25,50,75,95])
>> HIH
>> P.
>
> Great. Exact solution. I should have asked this last week :)
>
> One simple method solves all the riddle. I had manually masked the MVCs
> using NaN's.
>
> My guess is using compressed() masked arrays could be used with any of
> regularly defined numpy and scipy functions, right?

Yes, however it only works for 1d or with ravel().

You cannot compress a 2d array, and preserve a rectangular shape (with
unequal numbers of missing numbers.)

I some cases removing rows or columns with missing values might be
more appropriate, or finding a "neutral" fill value.

Josef


>
> Thanks for the tip.
>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> G?khan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


From pschmidtke at mmb.pcb.ub.es  Wed Oct 28 15:31:43 2009
From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke)
Date: Wed, 28 Oct 2009 20:31:43 +0100
Subject: [Numpy-discussion] reading gzip compressed files using
	numpy.fromfile
Message-ID: <fc345224bfa26132e9474287e32e083b@mmb.pcb.ub.es>

Dear Numpy Mailing List Readers,

I have a quite simple problem, for what I did not find a solution for now. 
I have a gzipped file lying around that has some numbers stored in it and I
want to read them into a numpy array as fast as possible but only a bunch
of data at a time. 
So I would like to use numpys fromfile funtion. 

For now I have somehow the following code :


        f=gzip.open( "myfile.gz", "r" )
xyz=npy.fromfile(f,dtype="float32",count=400) 


So I would read 400 entries from the file, keep it open, process my data,
come back and read the next 400 entries. If I do this, numpy is complaining
that the file handle f is not a normal file handle :
OError: first argument must be an open file

but in fact it is a zlib file handle. But gzip gives access to the normal
filehandle through f.fileobj.

So I tried  xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)

But there I get just meaningless values (not the actual data) and when I
specify the sep=" " argument for npy.fromfile I get just .1 and nothing
else. 

Can you tell me why and how to fix this problem? I know that I could read
everything to memory, but these files are rather big, so I simply have to
avoid this.

Thanks in advance.


-- 

Peter Schmidtke

----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona


From robert.kern at gmail.com  Wed Oct 28 15:33:11 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 28 Oct 2009 14:33:11 -0500
Subject: [Numpy-discussion] reading gzip compressed files using
	numpy.fromfile
In-Reply-To: <fc345224bfa26132e9474287e32e083b@mmb.pcb.ub.es>
References: <fc345224bfa26132e9474287e32e083b@mmb.pcb.ub.es>
Message-ID: <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com>

On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke <pschmidtke at mmb.pcb.ub.es> wrote:
> Dear Numpy Mailing List Readers,
>
> I have a quite simple problem, for what I did not find a solution for now.
> I have a gzipped file lying around that has some numbers stored in it and I
> want to read them into a numpy array as fast as possible but only a bunch
> of data at a time.
> So I would like to use numpys fromfile funtion.
>
> For now I have somehow the following code :
>
>
>
> ? ? ? ?f=gzip.open( "myfile.gz", "r" )
> xyz=npy.fromfile(f,dtype="float32",count=400)
>
>
> So I would read 400 entries from the file, keep it open, process my data,
> come back and read the next 400 entries. If I do this, numpy is complaining
> that the file handle f is not a normal file handle :
> OError: first argument must be an open file
>
> but in fact it is a zlib file handle. But gzip gives access to the normal
> filehandle through f.fileobj.

np.fromfile() requires a true file object, not just a file-like
object. np.fromfile() works by grabbing the FILE* pointer underneath
and using C system calls to read the data, not by calling the .read()
method.

> So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
>
> But there I get just meaningless values (not the actual data) and when I
> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
> else.

This is reading the compressed data, not the data that you want.

> Can you tell me why and how to fix this problem? I know that I could read
> everything to memory, but these files are rather big, so I simply have to
> avoid this.

Read in reasonably-sized chunks of bytes at a time, and use
np.fromstring() to create arrays from them.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Chris.Barker at noaa.gov  Wed Oct 28 16:26:41 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Wed, 28 Oct 2009 13:26:41 -0700
Subject: [Numpy-discussion] reading gzip compressed files
	using	numpy.fromfile
In-Reply-To: <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com>
References: <fc345224bfa26132e9474287e32e083b@mmb.pcb.ub.es>
	<3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com>
Message-ID: <4AE8A901.3060403@noaa.gov>

Robert Kern wrote:
>>        f=gzip.open( "myfile.gz", "r" )
>> xyz=npy.fromfile(f,dtype="float32",count=400)

> Read in reasonably-sized chunks of bytes at a time, and use
> np.fromstring() to create arrays from them.

Something like:

count = 400
xyz = np.fromstring(f.read(count*4), dtype=np.float32)

should work (untested...)

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From sccolbert at gmail.com  Wed Oct 28 18:05:05 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Wed, 28 Oct 2009 23:05:05 +0100
Subject: [Numpy-discussion] Segfault when using scipy.special.hermite?
In-Reply-To: <ytzd447u01p.fsf@news.ole.ath.cx>
References: <ytzd447u01p.fsf@news.ole.ath.cx>
Message-ID: <7f014ea60910281505t686f244cx4a452b4978977b9a@mail.gmail.com>

that code works fine for me:

ubuntu 9.04 x64
python 2.6.2
scipy 0.7.1
numpy 1.3.0
ipython 0.9.1

On Wed, Oct 28, 2009 at 2:21 PM, Ole Streicher <ole-usenet-spam at gmx.net> wrote:
> Hi,
>
> Is there something wrong with scipy.special.hermite? The following code
> produces glibc errors:
>
> ------------8<-----------------------
> import scipy.special
> h = []
> for i in xrange(15):
> ? ?print i
> ? ?h.append(scipy.special.hermite(i+1))
> ------------8<-----------------------
>
> results in
> ...
> 12
> *** glibc detected *** python: free(): invalid next size (fast): 0x00000000007e2290 ***
>
> OS: OpenSUSE 11.1 (x86_64)
> Python 2.6.0
> Scipy: 0.7.0
>
> When using ipython 0.8.4 on the same machine, the error does not occur.
>
> What may be the problem here?
>
> Regards
>
> Ole
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From pav at iki.fi  Wed Oct 28 18:31:02 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 29 Oct 2009 00:31:02 +0200
Subject: [Numpy-discussion] Segfault when using scipy.special.hermite?
In-Reply-To: <ytzd447u01p.fsf@news.ole.ath.cx>
References: <ytzd447u01p.fsf@news.ole.ath.cx>
Message-ID: <1256769062.7650.0.camel@idol>

ke, 2009-10-28 kello 14:21 +0100, Ole Streicher kirjoitti:
> Is there something wrong with scipy.special.hermite? The following
> code produces glibc errors:

It's probably this issue:

        http://projects.scipy.org/numpy/ticket/1211

The most likely cause is that the linear algebra libraries
(ATLAS/BLAS/LAPACK) shipped with that version of 64-bit Opensuse are
somehow broken. At least on Mandriva it turned out that the problem did
not appear if ATLAS was not installed, and it also went away with a
newer version of LAPACK.

(special.hermite is pure-python code. The only part that can cause
problems is scipy.linalg.eig or numpy.linalg.eig, and, much less likely,
scipy.special.gamma. The former are thin wrappers around LAPACK
routines.)

-- 
Pauli Virtanen


From dyamins at gmail.com  Thu Oct 29 00:29:38 2009
From: dyamins at gmail.com (Dan Yamins)
Date: Thu, 29 Oct 2009 00:29:38 -0400
Subject: [Numpy-discussion] Numpy/Scipy for EC2
Message-ID: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>

Hi all:

I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing
Numpy/Scipy computations on the Amazon EC2 cloud.

I'm writing to ask if anyone has any advice for which (if any) publicly
available AMI I should start with.

If any one has any specific AMI's that they think are good bases from which
to modify -- or really, any other advice about using numpy/scipy on EC2 --
I'd love to know.

Beyond that, even if you don't know which AMI to recommend (or even what an
AMI is), I still would like advice about which Linux flavor to use.  I've
had some experience with Mac OSX (and, with David Cornapeau's help over this
list, I was able to build 64-bit Scipy with Python 2.6!), but I really know
nothing about what the build process is like on Linux (and most likely,
unless someone recommends a good AMI with optimized BLAS/LAPACK already
built, I'm going to have to built it from scratch).    So, should I use
Ubuntu or Debian or Fedora or Centos or ...?

Thanks!
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091029/8e614375/attachment.html>

From deldotdr at gmail.com  Thu Oct 29 02:24:00 2009
From: deldotdr at gmail.com (Dorian Raymer)
Date: Wed, 28 Oct 2009 23:24:00 -0700
Subject: [Numpy-discussion] Numpy/Scipy for EC2
In-Reply-To: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>
References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>
Message-ID: <ea3c455d0910282324x4fe670b8y5e73b81362cbca77@mail.gmail.com>

Hi Dan,

I have recently created an AMI for running python processes.

I recommend using the ubuntu server ami's provided by http://alestic.com/.
Alestic is a well known provider of public AMI images. I think this is
exactly the place you want to start from; anything you need is an apt-get or
easy_install away.
>From the moment you launch an instance, you are literally minutes away from
being able to run a computation with Python.

I also recommend the FireFox plugin called ElasticFox for interfacing the
AWS api. It is a lot easier than the command line api tools!

I left some rough notes on my AMI creation/setup process here:
http://wiki.github.com/codenode/codenode/backend-demonstration-ec2-image
The notes include the ami-id of my resulting image, which you should be able
to launch if you wish. If you are interested, I can dive into more detail on
how I set up the os/python environment, etc.

The image I created is used as the Codenode live public notebook backend:
http://live.codenode.org/
You can create an account, login, start a Notebook, import Numpy and run any
code you want right now!

Hope this is useful,
Dorian

I cross-posted this to codenode-devel, sympy, and sage-notebook; I think
this topic could be of interest to others on those lists.


On Wed, Oct 28, 2009 at 9:29 PM, Dan Yamins <dyamins at gmail.com> wrote:

> Hi all:
>
> I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing
> Numpy/Scipy computations on the Amazon EC2 cloud.
>
> I'm writing to ask if anyone has any advice for which (if any) publicly
> available AMI I should start with.
>
> If any one has any specific AMI's that they think are good bases from which
> to modify -- or really, any other advice about using numpy/scipy on EC2 --
> I'd love to know.
>
> Beyond that, even if you don't know which AMI to recommend (or even what an
> AMI is), I still would like advice about which Linux flavor to use.  I've
> had some experience with Mac OSX (and, with David Cornapeau's help over this
> list, I was able to build 64-bit Scipy with Python 2.6!), but I really know
> nothing about what the build process is like on Linux (and most likely,
> unless someone recommends a good AMI with optimized BLAS/LAPACK already
> built, I'm going to have to built it from scratch).    So, should I use
> Ubuntu or Debian or Fedora or Centos or ...?
>
> Thanks!
> Dan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091028/4fd86fe9/attachment.html>

From david at ar.media.kyoto-u.ac.jp  Thu Oct 29 03:17:10 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Thu, 29 Oct 2009 16:17:10 +0900
Subject: [Numpy-discussion] [RFC] new function for floating point comparison
Message-ID: <4AE94176.8080804@ar.media.kyoto-u.ac.jp>

Hi,

    I have added a couple of utilities for floating point comparison, to
be used in unit tests mostly, and would like some comments, especially
from people knowledgeable about floating point.

http://github.com/cournape/numpy/tree/new_ulp_comp

The main difference compared to other functions is that they are
'amplitude-independent', and use IEEE-754-specific properties. The
tolerance is based on ULP, and two numbers x, y are closed depending on
how many numbers are representable between x and y at the given
precision. The branch contains the following new functions:

    * spacing(x): equivalent to the F90 intrinsic. Returns the smallest
representable number needed so that spacing(x) + x > x. Spacing(1) is
EPS by definition.
    * assert_array_almost_equal_nulp(x, y, nulp=1): assertion is defined
as abs(x - y) <= nulps * spacing(max(abs(x), abs(y))).
    * assert_array_max_ulp(a, b, maxulp=1, dtype=None): given two
numbers a and b, raise an assertion if there are more than maxulp
representable numbers between a and b.

They only support single and double precision - for complex number, one
could arbitrarily define a distance between numbers based on nulps, say
max of number of representable number for real and imag parts. Extended
precision would be a bit more painful, because of the variety of
implementations.

I hope that they can give more robust/meaningful comparison for most of
our unit tests,

cheers,

David


From nabble2 at lonely-star.org  Thu Oct 29 07:19:05 2009
From: nabble2 at lonely-star.org (TheLonelyStar)
Date: Thu, 29 Oct 2009 04:19:05 -0700 (PDT)
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
 element with a sequence.
Message-ID: <26111151.post@talk.nabble.com>


Hi,

I am trying to load a tsv file using numpy.loadtxt:

data = np.loadtxt('data.txt',delimiter='\t',dtype=np.float)

And I get:
-----------------
/usr/lib/python2.6/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype,
comments, delimiter, converters, skiprows, usecols, unpack)
    503         X = X.view(dtype)
    504     else:
--> 505         X = np.array(X, dtype)
    506 
    507     X = np.squeeze(X)

ValueError: setting an array element with a sequence.
> /usr/lib/python2.6/site-packages/numpy/lib/io.py(505)loadtxt()
    504     else:
--> 505         X = np.array(X, dtype)
    506 
----------------

I am on archlinux using 1.3.0. The file contians integers and floats
sperated by tabs.

Ideas?

Thanks!
Nathan
-- 
View this message in context: http://www.nabble.com/numpy-loadtxt---ValueError%3A-setting-an-array-element-with-a-sequence.-tp26111151p26111151.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.


From pschmidtke at mmb.pcb.ub.es  Thu Oct 29 07:38:11 2009
From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke)
Date: Thu, 29 Oct 2009 12:38:11 +0100
Subject: [Numpy-discussion] reading gzip compressed files using
	numpy.fromfile
In-Reply-To: <mailman.407.1256801949.2454.numpy-discussion@scipy.org>
References: <mailman.407.1256801949.2454.numpy-discussion@scipy.org>
Message-ID: <8efd38d31962398588d5c0e87d46e162@mmb.pcb.ub.es>

> Date: Wed, 28 Oct 2009 20:31:43 +0100
> From: Peter Schmidtke <pschmidtke at mmb.pcb.ub.es>
> Subject: [Numpy-discussion] reading gzip compressed files using
> 	numpy.fromfile
> To: numpy-discussion at scipy.org
> Message-ID: <fc345224bfa26132e9474287e32e083b at mmb.pcb.ub.es>
> Content-Type: text/plain; charset="UTF-8"
> 
> Dear Numpy Mailing List Readers,
> 
> I have a quite simple problem, for what I did not find a solution for
now. 
> I have a gzipped file lying around that has some numbers stored in it and
I
> want to read them into a numpy array as fast as possible but only a bunch
> of data at a time. 
> So I would like to use numpys fromfile funtion. 
> 
> For now I have somehow the following code :
> 
> 
> 
>         f=gzip.open( "myfile.gz", "r" )
> xyz=npy.fromfile(f,dtype="float32",count=400) 
> 
> 
> So I would read 400 entries from the file, keep it open, process my data,
> come back and read the next 400 entries. If I do this, numpy is
complaining
> that the file handle f is not a normal file handle :
> OError: first argument must be an open file
> 
> but in fact it is a zlib file handle. But gzip gives access to the normal
> filehandle through f.fileobj.
> 
> So I tried  xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
> 
> But there I get just meaningless values (not the actual data) and when I
> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
> else. 
> 
> Can you tell me why and how to fix this problem? I know that I could read
> everything to memory, but these files are rather big, so I simply have to
> avoid this.
> 
> Thanks in advance.
> 
> 
> -- 
> 
> Peter Schmidtke
> 
> ----------------------
> PhD Student at the Molecular Modeling and Bioinformatics Group
> Dep. Physical Chemistry
> Faculty of Pharmacy
> University of Barcelona
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 28 Oct 2009 14:33:11 -0500
> From: Robert Kern <robert.kern at gmail.com>
> Subject: Re: [Numpy-discussion] reading gzip compressed files using
> 	numpy.fromfile
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
> 	<3d375d730910281233r5cadd0fcubea14676a3a978f1 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke <pschmidtke at mmb.pcb.ub.es>
> wrote:
>> Dear Numpy Mailing List Readers,
>>
>> I have a quite simple problem, for what I did not find a solution for
>> now.
>> I have a gzipped file lying around that has some numbers stored in it
and
>> I
>> want to read them into a numpy array as fast as possible but only a
bunch
>> of data at a time.
>> So I would like to use numpys fromfile funtion.
>>
>> For now I have somehow the following code :
>>
>>
>>
>> ? ? ? ?f=gzip.open( "myfile.gz", "r" )
>> xyz=npy.fromfile(f,dtype="float32",count=400)
>>
>>
>> So I would read 400 entries from the file, keep it open, process my
data,
>> come back and read the next 400 entries. If I do this, numpy is
>> complaining
>> that the file handle f is not a normal file handle :
>> OError: first argument must be an open file
>>
>> but in fact it is a zlib file handle. But gzip gives access to the
normal
>> filehandle through f.fileobj.
> 
> np.fromfile() requires a true file object, not just a file-like
> object. np.fromfile() works by grabbing the FILE* pointer underneath
> and using C system calls to read the data, not by calling the .read()
> method.
> 
>> So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
>>
>> But there I get just meaningless values (not the actual data) and when I
>> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
>> else.
> 
> This is reading the compressed data, not the data that you want.
> 
>> Can you tell me why and how to fix this problem? I know that I could
read
>> everything to memory, but these files are rather big, so I simply have
to
>> avoid this.
> 
> Read in reasonably-sized chunks of bytes at a time, and use
> np.fromstring() to create arrays from them.
> 
> -- 
> Robert Kern
> 
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 28 Oct 2009 13:26:41 -0700
> From: Christopher Barker <Chris.Barker at noaa.gov>
> Subject: Re: [Numpy-discussion] reading gzip compressed files	using
> 	numpy.fromfile
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID: <4AE8A901.3060403 at noaa.gov>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> Robert Kern wrote:
>>>        f=gzip.open( "myfile.gz", "r" )
>>> xyz=npy.fromfile(f,dtype="float32",count=400)
> 
>> Read in reasonably-sized chunks of bytes at a time, and use
>> np.fromstring() to create arrays from them.
> 
> Something like:
> 
> count = 400
> xyz = np.fromstring(f.read(count*4), dtype=np.float32)
> 
> should work (untested...)
> 
> -Chris
> 
> 
> 
> 
> -- 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> 
> Chris.Barker at noaa.gov
> 
> 

Thanks Robert and Chris...indeed I managed to read it quite fast this way.

++


Peter Schmidtke

----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona


From pschmidtke at mmb.pcb.ub.es  Thu Oct 29 07:48:00 2009
From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke)
Date: Thu, 29 Oct 2009 12:48:00 +0100
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
	element with a sequence.
Message-ID: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es>

Have you tried the numpy.fromfile function? This usually worked great for
my files that had the same format than yours.

++

Peter

----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona


From robince at gmail.com  Thu Oct 29 07:44:11 2009
From: robince at gmail.com (Robin)
Date: Thu, 29 Oct 2009 11:44:11 +0000
Subject: [Numpy-discussion] recommended way to run numpy on snow leopard
In-Reply-To: <A353C540-7103-4C1B-BB89-A2602325DF3D@cs.toronto.edu>
References: <2d5132a50910210246i36866369k433c844eccaead40@mail.gmail.com>
	<4ADED429.4030809@ar.media.kyoto-u.ac.jp>
	<2d5132a50910210358i144486aaic3fd5849b7399146@mail.gmail.com>
	<A353C540-7103-4C1B-BB89-A2602325DF3D@cs.toronto.edu>
Message-ID: <2d5132a50910290444o3f598652s35601244f4b16ef3@mail.gmail.com>

On Fri, Oct 23, 2009 at 9:09 AM, David Warde-Farley <dwf at cs.toronto.edu>wrote:

> The Python.org sources for 2.6.x has a script in the Mac/ subdirectory
> (I think, or in the build tools) for building a 4-way universal binary
> (i386, x86_64, ppc and ppc64). You can rather easily build it (just
> run the script) and it will produce executables of the form python (or
> python2.6) suffixed with -32 or -64 to run in one mode or the other.
> So, python-32 (or python2.6-32) will get you 32 bit Python, which will
> work with wxPython using wxMac, or python-64, which will not (but will
> do everything in 64-bit mode). I've successfully gotten svn numpy to
> build 4-way using such a 4-way Python.
>

After having some trouble I decided to try this way to build universal 32/64
bit intel framework build and just use that as my main python for my work.
(Had some problems with macports and virtualenv, I want to leave the system
one alone and theres no 64 bit python.org build).

Just in case any one else tries this - there is a problem where it's
impossible to select the 32 bit architecture:
http://bugs.python.org/issue6834
It might be possible to work around or use the alternative pythonw.c in the
ticket - but it won't be fixed in a release until 2.7.

Cheers

Robin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091029/2fe5fdeb/attachment.html>

From nabble2 at lonely-star.org  Thu Oct 29 08:30:09 2009
From: nabble2 at lonely-star.org (TheLonelyStar)
Date: Thu, 29 Oct 2009 05:30:09 -0700 (PDT)
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
 element with a sequence.
In-Reply-To: <26111151.post@talk.nabble.com>
References: <26111151.post@talk.nabble.com>
Message-ID: <26112100.post@talk.nabble.com>


 Adter trying the same thing in matlab, I realized that my "tsv" file is not
matrix-style. But this I mean, not all lines ave the same lenght (not the
same number of values).

What would be the best way to load this?

Regards,
Nathan
-- 
View this message in context: http://www.nabble.com/numpy-loadtxt---ValueError%3A-setting-an-array-element-with-a-sequence.-tp26111151p26112100.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.


From pschmidtke at mmb.pcb.ub.es  Thu Oct 29 09:19:50 2009
From: pschmidtke at mmb.pcb.ub.es (Peter Schmidtke)
Date: Thu, 29 Oct 2009 14:19:50 +0100
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
	element with a sequence.
In-Reply-To: <26112100.post@talk.nabble.com>
References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com>
Message-ID: <cbffb9a4583a9185017b8cdde1cd1c73@mmb.pcb.ub.es>

On Thu, 29 Oct 2009 05:30:09 -0700 (PDT), TheLonelyStar
<nabble2 at lonely-star.org> wrote:
> Adter trying the same thing in matlab, I realized that my "tsv" file is
not
> matrix-style. But this I mean, not all lines ave the same lenght (not the
> same number of values).
> 
> What would be the best way to load this?
> 
> Regards,
> Nathan

Use the numpy fromfile function :

For instance I read the file :

5       8       5       5.5     6.1
3       5.5
2       6.5


with : 
x=npy.fromfile("test.txt",sep="\t")

and it returns an array x :

array([ 5. ,  8. ,  5. ,  5.5,  6.1,  3. ,  5.5,  2. ,  6.5])

You can reshape this array to a 3x3 matrix using the reshape function -> 

x.reshape((3,3))

-- 

Peter Schmidtke

----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona


From bsouthey at gmail.com  Thu Oct 29 09:22:34 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 29 Oct 2009 08:22:34 -0500
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
 element with a sequence.
In-Reply-To: <26112100.post@talk.nabble.com>
References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com>
Message-ID: <4AE9971A.2090809@gmail.com>

On 10/29/2009 07:30 AM, TheLonelyStar wrote:
>   Adter trying the same thing in matlab, I realized that my "tsv" file is not
> matrix-style. But this I mean, not all lines ave the same lenght (not the
> same number of values).
>
> What would be the best way to load this?
>
> Regards,
> Nathan
>    
Hi,
Really you have to find the reason why there are extra values in some 
rows compared to other rows. There have been some recent changes in 
numpy.genfromtxt that I would strong suggest using. It will indicate any 
problem rows that you can fix or just ignore.

Regards
Bruce


From dagss at student.matnat.uio.no  Thu Oct 29 09:26:44 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Thu, 29 Oct 2009 14:26:44 +0100
Subject: [Numpy-discussion] Unexplained nans in matrix multiplication
Message-ID: <4AE99814.7000306@student.matnat.uio.no>

I'm getting (to me( very mysterious NaNs when doing matrix 
multiplication with certain (randomly generated) data:

In [52]: a.shape, b.shape, i, j
Out[52]: ((22, 1000), (1000, 22), 0, 16)

In [53]: np.dot(a, b)[i,j]
Out[53]: (31.322778824758661+nan*j)

In [54]: np.dot(a[i,:], b[:,j])
Out[54]: (31.322778824758657+6.5017268607881213j)

In [55]: np.any(np.isnan(a)), np.any(np.isnan(b))
Out[55]: (False, False)

In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T))))
Out[63]: 4.0744710639852633

dtype is complex128. Is this a bug? Should I start looking in NumPy, 
ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug*

I realize that matmul doesn't have to happen via naive vector dot 
products, but certainly one shouldn't hit Inf anywhere anyway?

Dag Sverre


From cournape at gmail.com  Thu Oct 29 09:31:26 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 29 Oct 2009 22:31:26 +0900
Subject: [Numpy-discussion] Unexplained nans in matrix multiplication
In-Reply-To: <4AE99814.7000306@student.matnat.uio.no>
References: <4AE99814.7000306@student.matnat.uio.no>
Message-ID: <5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com>

On Thu, Oct 29, 2009 at 10:26 PM, Dag Sverre Seljebotn
<dagss at student.matnat.uio.no> wrote:
> I'm getting (to me( very mysterious NaNs when doing matrix
> multiplication with certain (randomly generated) data:
>
> In [52]: a.shape, b.shape, i, j
> Out[52]: ((22, 1000), (1000, 22), 0, 16)
>
> In [53]: np.dot(a, b)[i,j]
> Out[53]: (31.322778824758661+nan*j)
>
> In [54]: np.dot(a[i,:], b[:,j])
> Out[54]: (31.322778824758657+6.5017268607881213j)
>
> In [55]: np.any(np.isnan(a)), np.any(np.isnan(b))
> Out[55]: (False, False)
>
> In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T))))
> Out[63]: 4.0744710639852633
>
> dtype is complex128. Is this a bug? Should I start looking in NumPy,
> ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug*

Most likely an atlas bug. Which version of atlas are you using, on which cpu ?

David


From dagss at student.matnat.uio.no  Thu Oct 29 09:39:44 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Thu, 29 Oct 2009 14:39:44 +0100
Subject: [Numpy-discussion] Unexplained nans in matrix multiplication
In-Reply-To: <5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com>
References: <4AE99814.7000306@student.matnat.uio.no>
	<5b8d13220910290631o541fa446ja2cf5fcadf7b753b@mail.gmail.com>
Message-ID: <4AE99B20.2090401@student.matnat.uio.no>

David Cournapeau wrote:
> On Thu, Oct 29, 2009 at 10:26 PM, Dag Sverre Seljebotn
> <dagss at student.matnat.uio.no> wrote:
>   
>> I'm getting (to me( very mysterious NaNs when doing matrix
>> multiplication with certain (randomly generated) data:
>>
>> In [52]: a.shape, b.shape, i, j
>> Out[52]: ((22, 1000), (1000, 22), 0, 16)
>>
>> In [53]: np.dot(a, b)[i,j]
>> Out[53]: (31.322778824758661+nan*j)
>>
>> In [54]: np.dot(a[i,:], b[:,j])
>> Out[54]: (31.322778824758657+6.5017268607881213j)
>>
>> In [55]: np.any(np.isnan(a)), np.any(np.isnan(b))
>> Out[55]: (False, False)
>>
>> In [63]: np.max(np.abs(np.vstack((a.real, a.imag, b.real.T, b.imag.T))))
>> Out[63]: 4.0744710639852633
>>
>> dtype is complex128. Is this a bug? Should I start looking in NumPy,
>> ATLAS (Sage-compiled), the C compiler, the Fortran compiler...*shrug*
>>     
>
> Most likely an atlas bug. Which version of atlas are you using, on which cpu ?
>   
Thanks.

Sage reports atlas-3.8.3.p7. Intel(R) Xeon(TM) CPU 3.20GHz, 64-bit 
RedHat Linux.

Dag Sverre


From robert.kern at gmail.com  Thu Oct 29 11:41:23 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 29 Oct 2009 10:41:23 -0500
Subject: [Numpy-discussion] Numpy/Scipy for EC2
In-Reply-To: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>
References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>
Message-ID: <3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com>

On Wed, Oct 28, 2009 at 23:29, Dan Yamins <dyamins at gmail.com> wrote:
> Hi all:
>
> I'm gearing up to build an Amazon Machine Instance (AMI) for use in doing
> Numpy/Scipy computations on the Amazon EC2 cloud.
>
> I'm writing to ask if anyone has any advice for which (if any) publicly
> available AMI I should start with.

I haven't used it, but this seems to provide a good environment for your needs.

http://web.mit.edu/stardev/cluster/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From dyamins at gmail.com  Thu Oct 29 11:45:10 2009
From: dyamins at gmail.com (Dan Yamins)
Date: Thu, 29 Oct 2009 11:45:10 -0400
Subject: [Numpy-discussion] Numpy/Scipy for EC2
In-Reply-To: <3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com>
References: <15e4667e0910282129t4ad7f5eble1de56c91e6cff25@mail.gmail.com>
	<3d375d730910290841u78343cd2v4715cd40001bb09a@mail.gmail.com>
Message-ID: <15e4667e0910290845te2f9217gf03efa635ff82bd4@mail.gmail.com>

I haven't used it, but this seems to provide a good environment for your
> needs.
>
> http://web.mit.edu/stardev/cluster/
>
>
Robert Kern to the rescue again! StarCluster looks great.   ....  And thanks
Dorian as well, I'm also checking out Alestic.

Dan


> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091029/6f0d589b/attachment.html>

From robert.kern at gmail.com  Thu Oct 29 11:51:59 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 29 Oct 2009 10:51:59 -0500
Subject: [Numpy-discussion] [RFC] new function for floating point
	comparison
In-Reply-To: <4AE94176.8080804@ar.media.kyoto-u.ac.jp>
References: <4AE94176.8080804@ar.media.kyoto-u.ac.jp>
Message-ID: <3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com>

On Thu, Oct 29, 2009 at 02:17, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> Hi,
>
> ? ?I have added a couple of utilities for floating point comparison, to
> be used in unit tests mostly, and would like some comments, especially
> from people knowledgeable about floating point.
>
> http://github.com/cournape/numpy/tree/new_ulp_comp
>
> The main difference compared to other functions is that they are
> 'amplitude-independent', and use IEEE-754-specific properties. The
> tolerance is based on ULP, and two numbers x, y are closed depending on
> how many numbers are representable between x and y at the given
> precision. The branch contains the following new functions:
>
> ? ?* spacing(x): equivalent to the F90 intrinsic. Returns the smallest
> representable number needed so that spacing(x) + x > x. Spacing(1) is
> EPS by definition.
> ? ?* assert_array_almost_equal_nulp(x, y, nulp=1): assertion is defined
> as abs(x - y) <= nulps * spacing(max(abs(x), abs(y))).
> ? ?* assert_array_max_ulp(a, b, maxulp=1, dtype=None): given two
> numbers a and b, raise an assertion if there are more than maxulp
> representable numbers between a and b.

That sounds good. Another worthwhile addition would be nextafter().

http://www.opengroup.org/onlinepubs/000095399/functions/nextafter.html

With a little bit of care, a nextafter ufunc can be used to generate a
dense grid of floating point values around a given center. This can be
used to explore the error characteristics of a function at a very fine
level of detail that is otherwise unavailable.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From Chris.Barker at noaa.gov  Thu Oct 29 12:29:20 2009
From: Chris.Barker at noaa.gov (Christopher Barker)
Date: Thu, 29 Oct 2009 09:29:20 -0700
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
 element with a sequence.
In-Reply-To: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es>
References: <42b97315c0694c06f08eb9a59479a7bc@mmb.pcb.ub.es>
Message-ID: <4AE9C2E0.50402@noaa.gov>

Peter Schmidtke wrote:
> Have you tried the numpy.fromfile function?

good point -- fromfile() can be much faster for the simple cases it can 
handle.

 > not all lines ave the same lenght (not the
> same number of values).
> 
> What would be the best way to load this?

That depends on what the data mean. Is it a 2-d array with missing 
values? If so, how do you know which are missing? Are there the same 
number of tabs in each row? If do than loadtxt should be able to handle it.

You may be best off looping through the file:

for line in file:
   a = numpy.fromstring(line, sep='\t', dtype=np.float)

and do what makes sense with each line.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From pgmdevlist at gmail.com  Thu Oct 29 14:31:24 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Thu, 29 Oct 2009 14:31:24 -0400
Subject: [Numpy-discussion] numpy loadtxt - ValueError: setting an array
	element with a sequence.
In-Reply-To: <26112100.post@talk.nabble.com>
References: <26111151.post@talk.nabble.com> <26112100.post@talk.nabble.com>
Message-ID: <B0805A36-211E-465F-95D9-0D43962DDFA5@gmail.com>


On Oct 29, 2009, at 8:30 AM, TheLonelyStar wrote:

>
> Adter trying the same thing in matlab, I realized that my "tsv" file  
> is not
> matrix-style. But this I mean, not all lines ave the same lenght  
> (not the
> same number of values).
>
> What would be the best way to load this?

The SVN version of np.genfromtxt will let you know where some rows are  
longer than others. You can decide what to do from then (ignore the  
corresponding rows or modify your file).
The .fromfile approach is a solution if you don't really care about  
getting a 2D array (or structured 1D array with different fields for  
ints and floats on a same row), as a previous poster illustrated.


From arokem at berkeley.edu  Thu Oct 29 15:18:51 2009
From: arokem at berkeley.edu (Ariel Rokem)
Date: Thu, 29 Oct 2009 12:18:51 -0700
Subject: [Numpy-discussion] datetime64
Message-ID: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com>

Hi -

I want to start trying out the new dtype for representation of arrays
of times, datetime64, which is implemented in the current svn. Is
there any documentation anywhere? I know of this proposal:

http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst

but apparently the current implementation of the dtype didn't follow
this proposal - the hypothetical examples in the spec don't work with
the implementation.
I just want to see a couple of examples on how to initialize arrays of
this dtype, and what kinds of operations can be done with them (and
with timedelta64).

Thanks a lot,

Ariel

-- 
Ariel Rokem
Helen Wills Neuroscience Institute
University of California, Berkeley
http://argentum.ucbso.berkeley.edu/ariel


From as8ca at mail.astro.virginia.edu  Thu Oct 29 16:22:31 2009
From: as8ca at mail.astro.virginia.edu (Alok Singhal)
Date: Thu, 29 Oct 2009 16:22:31 -0400
Subject: [Numpy-discussion] datetime64
In-Reply-To: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com>
References: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com>
Message-ID: <20091029202231.GA22121@virginia.edu>

Hi,

On 29/10/09: 12:18, Ariel Rokem wrote:
> I want to start trying out the new dtype for representation of arrays
> of times, datetime64, which is implemented in the current svn. Is
> there any documentation anywhere? I know of this proposal:
> 
> http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst
> 
> but apparently the current implementation of the dtype didn't follow
> this proposal - the hypothetical examples in the spec don't work with
> the implementation.
> I just want to see a couple of examples on how to initialize arrays of
> this dtype, and what kinds of operations can be done with them (and
> with timedelta64).

I think the only thing that works as of now for dates and deltas is
using datetime.datetime and datetime.timedelta objects in the
initilization of the arrays.  See
http://projects.scipy.org/numpy/ticket/1225 for some tests.

Even when you construct the arrays using datetime.datetime objects,
things are a bit strange:

In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.4.0.dev7599'
In [3]: import datetime
In [4]: d = datetime.datetime(2009, 10, 5, 12, 35, 2)
In [5]: d1 = datetime.datetime.now()
In [6]: np.array([d, d1], 'M')
Out[6]: array([2009-10-04 23:27:37.359744, 2009-10-29 00:10:59.677844], dtype=datetime64[ns])

-Alok

-- 
                                           *   *
Alok Singhal                           *           *     *
http://www.astro.virginia.edu/~as8ca/
                                                   *    *


From pgmdevlist at gmail.com  Thu Oct 29 16:43:11 2009
From: pgmdevlist at gmail.com (Pierre GM)
Date: Thu, 29 Oct 2009 16:43:11 -0400
Subject: [Numpy-discussion] datetime64
In-Reply-To: <20091029202231.GA22121@virginia.edu>
References: <43958ee60910291218i322d94b0y2fe6e362b876ef8b@mail.gmail.com>
	<20091029202231.GA22121@virginia.edu>
Message-ID: <41F46D98-37B4-4E88-BBEC-FCE2CE3C2E6A@gmail.com>


On Oct 29, 2009, at 4:22 PM, Alok Singhal wrote:

> Hi,
>
> On 29/10/09: 12:18, Ariel Rokem wrote:
>> I want to start trying out the new dtype for representation of arrays
>> of times, datetime64, which is implemented in the current svn. Is
>> there any documentation anywhere? I know of this proposal:
>>
>> http://numpy.scipy.org/svn/numpy/tags/1.3.0/doc/neps/datetime-proposal3.rst
>>
>> but apparently the current implementation of the dtype didn't follow
>> this proposal - the hypothetical examples in the spec don't work with
>> the implementation.
>> I just want to see a couple of examples on how to initialize arrays  
>> of
>> this dtype, and what kinds of operations can be done with them (and
>> with timedelta64).
>
> I think the only thing that works as of now for dates and deltas is
> using datetime.datetime and datetime.timedelta objects in the
> initilization of the arrays.  See
> http://projects.scipy.org/numpy/ticket/1225 for some tests.

Oh yes, I saw that... Marty Fuhry, one of our GSoC students, had  
written some pretty extensive series of tests to allocate datetime/ 
strings to elements of a ndarray with datetime64 dtype. He also had  
written some functions allowing conversion from one frequency to  
another. Unfortunately, I don't think his work has been incorporated  
yet. Maybe Jarrod M. and Travis O. will shed some light on that  
matter. I for one would be quite interested into checking what's  
happening on that front.

In other more personal news: Ariel, I gonna be quite busy for the next  
couple of weeks, but we should chat off-list about our parallel  
efforts with time series (I still haven't found the time to delve into  
nipy, could you point me tothe most relevant part).


From pav+sp at iki.fi  Fri Oct 30 05:36:06 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 30 Oct 2009 09:36:06 +0000 (UTC)
Subject: [Numpy-discussion] [RFC] complex functions in npymath
Message-ID: <hcec26$aps$2@ger.gmane.org>

Hi (esp. David),

If there are no objections, I'd like to move Numpy's complex-valued
C99-like functions to npymath:

	http://github.com/pv/numpy-work/tree/npymath-complex

This'll come useful if we want to start eg. writing Ufuncs in Cython.

I'm working around possible compiler-incompatibilities of struct
return values by having only pointer versions of the functions in
libnpymath.a, and the non-pointer versions as inlined static
functions.

Also, perhaps we should add a header file

	npy_math_c99compat.h

that would detect if the compiler supports C99, and if not,
substitute the C99 functions with our npy_math implementations.
This'd be great for scipy.special.

-- 
Pauli Virtanen


From david at ar.media.kyoto-u.ac.jp  Fri Oct 30 05:34:07 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 30 Oct 2009 18:34:07 +0900
Subject: [Numpy-discussion] [RFC] complex functions in npymath
In-Reply-To: <hcec26$aps$2@ger.gmane.org>
References: <hcec26$aps$2@ger.gmane.org>
Message-ID: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>

Hi Pauli,

Pauli Virtanen wrote:
> Hi (esp. David),
>
> If there are no objections, I'd like to move Numpy's complex-valued
> C99-like functions to npymath:
>
> 	http://github.com/pv/numpy-work/tree/npymath-complex
>
> This'll come useful if we want to start eg. writing Ufuncs in Cython.
>   

Actually, I am in the process of cleaning my numpy branches for review,
and intend to push them into svn as fast as possible. Complex is pretty
high on the list.

The missing piece in complex support in npymath is mostly tests: I have
tests for all the special cases (all special cases specified in C99
standard are tested), but no test for the actual 'normal' values. If you
(or someone else) could add a couple of tests, that would be great.

> I'm working around possible compiler-incompatibilities of struct
> return values by having only pointer versions of the functions in
> libnpymath.a, and the non-pointer versions as inlined static
> functions.
>   

Is this a problem if we guarantee that our complex type is bit
compatible with C99 complex (e.g. casting a complex to a double[2]
should alway work) ?

That's how the complex math is implemented ATM.

> Also, perhaps we should add a header file
>
> 	npy_math_c99compat.h
>
> that would detect if the compiler supports C99, and if not,
> substitute the C99 functions with our npy_math implementations.
> This'd be great for scipy.special.
>   

I am not sure I understand this: currently, if a given complex function
is detected on the platform, npy_foo is just an alias to foo, so we use
the platform implementation whenever possible.

cheers,

David


From pav+sp at iki.fi  Fri Oct 30 06:07:31 2009
From: pav+sp at iki.fi (Pauli Virtanen)
Date: Fri, 30 Oct 2009 10:07:31 +0000 (UTC)
Subject: [Numpy-discussion] [RFC] complex functions in npymath
References: <hcec26$aps$2@ger.gmane.org>
	<4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>
Message-ID: <hcedt3$aps$5@ger.gmane.org>

Fri, 30 Oct 2009 18:34:07 +0900, David Cournapeau wrote:
[clip]
> Actually, I am in the process of cleaning my numpy branches for review,
> and intend to push them into svn as fast as possible. Complex is pretty
> high on the list.

Great!

> The missing piece in complex support in npymath is mostly tests: I have
> tests for all the special cases (all special cases specified in C99
> standard are tested), but no test for the actual 'normal' values. If you
> (or someone else) could add a couple of tests, that would be great.

I can probably take a shot at this.

>> I'm working around possible compiler-incompatibilities of struct return
>> values by having only pointer versions of the functions in
>> libnpymath.a, and the non-pointer versions as inlined static functions.
>
> Is this a problem if we guarantee that our complex type is bit
> compatible with C99 complex (e.g. casting a complex to a double[2]
> should alway work) ?
> 
> That's how the complex math is implemented ATM.

Correct me if I'm wrong, but I think the problem is that for

	typedef struct foo foo_t;
	foo_t bar();

different compilers may put the return value of bar() to a different 
place (registers vs. memory). If we put those functions in a library, and 
switch compilers, I think the behavior is undefined as there seems to be 
no standard.

I don't think functions in C can return arrays, so double[2] 
representation probably does not help us here.

>> Also, perhaps we should add a header file
>>
>> 	npy_math_c99compat.h
>>
>> that would detect if the compiler supports C99, and if not, substitute
>> the C99 functions with our npy_math implementations. This'd be great
>> for scipy.special.
>
> I am not sure I understand this: currently, if a given complex function
> is detected on the platform, npy_foo is just an alias to foo, so we use
> the platform implementation whenever possible.

I'd like to write code like this:

	coshf(a) + sinhf(b)

and not like this:

	npy_coshf(a) + npy_sinhf(b)

This seems easy to achieve with a convenience header that substitutes
the C99 functions with npy_ functions when C99 is not available.

	Pauli


From david at ar.media.kyoto-u.ac.jp  Fri Oct 30 05:57:12 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 30 Oct 2009 18:57:12 +0900
Subject: [Numpy-discussion] [RFC] complex functions in npymath
In-Reply-To: <hcedt3$aps$5@ger.gmane.org>
References: <hcec26$aps$2@ger.gmane.org>	<4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>
	<hcedt3$aps$5@ger.gmane.org>
Message-ID: <4AEAB878.4030308@ar.media.kyoto-u.ac.jp>

Pauli Virtanen wrote:
>
> I can probably take a shot at this.
>   

Cool.

> Correct me if I'm wrong, but I think the problem is that for
>
> 	typedef struct foo foo_t;
> 	foo_t bar();
>   

You're right, I was thinking about alignment issues myself - that's why
I mentioned npy_complex and double[2] being equivalent, as defining
complex with a struct does not guarantee this.

> different compilers may put the return value of bar() to a different 
> place (registers vs. memory). If we put those functions in a library, and 
> switch compilers, I think the behavior is undefined as there seems to be 
> no standard.
>   

Is this a problem in practice ? If two compilers differ in this,
wouldn't they have incompatible ABI ?

David


From david at ar.media.kyoto-u.ac.jp  Fri Oct 30 06:01:34 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 30 Oct 2009 19:01:34 +0900
Subject: [Numpy-discussion] [RFC] complex functions in npymath
In-Reply-To: <hcedt3$aps$5@ger.gmane.org>
References: <hcec26$aps$2@ger.gmane.org>	<4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>
	<hcedt3$aps$5@ger.gmane.org>
Message-ID: <4AEAB97E.4030902@ar.media.kyoto-u.ac.jp>

Pauli Virtanen wrote:
> I'd like to write code like this:
>
> 	coshf(a) + sinhf(b)
>
> and not like this:
>
> 	npy_coshf(a) + npy_sinhf(b)
>   

Using npy_ prefix was a consciously designed feature :) I would prefer
avoid doing this, as it may cause trouble: sometimes, even if the foo
function is available, we may want to use npy_foo because it is better,
faster, more standard compliant.

For example, I remember that the few complex functions on Visual Studio
are broken, so even though they are detected, I have a MSVC ifdef to use
our own in that case.

David


From david at ar.media.kyoto-u.ac.jp  Fri Oct 30 06:04:13 2009
From: david at ar.media.kyoto-u.ac.jp (David Cournapeau)
Date: Fri, 30 Oct 2009 19:04:13 +0900
Subject: [Numpy-discussion] [RFC] new function for floating
	point	comparison
In-Reply-To: <3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com>
References: <4AE94176.8080804@ar.media.kyoto-u.ac.jp>
	<3d375d730910290851k12df5db7v26c3c479d8b1efd6@mail.gmail.com>
Message-ID: <4AEABA1D.5070802@ar.media.kyoto-u.ac.jp>

Robert Kern wrote:
>
> That sounds good. Another worthwhile addition would be nextafter().
>
> http://www.opengroup.org/onlinepubs/000095399/functions/nextafter.html
>   

Ah, I did not know about this one. I have implemented it and committed
it. One issue is that it will cause failures on platforms without
nextafterl and where long double != double, but I don't think we have  a
lot of those, if any,

David


From seb.haase at gmail.com  Fri Oct 30 07:04:36 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Fri, 30 Oct 2009 12:04:36 +0100
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
Message-ID: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com>

Hi,
I get this error:
set(chainsA[0,:,0])
TypeError: unhashable type: 'numpy.ndarray'
>>> list(chainsA[0,:,0])
[2636, 2590, 2619, 2590]
>>> list(chainsA[0,:,0])[0]
2636
>>> type(_)
<type 'numpy.int64'>

I understand where this error comes from, however what I was trying to
do seems to "intuitive" that I would like to ask for suggestions:
"What should I do if the "number" 2636 becomes unhashable ?"

Thanks,

Sebastian Haase


From cournape at gmail.com  Fri Oct 30 07:21:16 2009
From: cournape at gmail.com (David Cournapeau)
Date: Fri, 30 Oct 2009 20:21:16 +0900
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
In-Reply-To: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com>
References: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com>
Message-ID: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com>

On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase at gmail.com> wrote:

> I understand where this error comes from, however what I was trying to
> do seems to "intuitive" that I would like to ask for suggestions:
> "What should I do if the "number" 2636 becomes unhashable ?"

In your example, that's the array which is unhashable, the numbers
itself should be hashable. Arrays are mutable, so I don't think you
can easily make them hashable. You could transform everything into
tuple of tuple of... if you need to use set, though.

David


From gael.varoquaux at normalesup.org  Fri Oct 30 07:23:52 2009
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Fri, 30 Oct 2009 12:23:52 +0100
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
In-Reply-To: <5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com>
References: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com>
	<5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com>
Message-ID: <20091030112352.GD16315@phare.normalesup.org>

On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote:
> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase at gmail.com> wrote:

> > I understand where this error comes from, however what I was trying to
> > do seems to "intuitive" that I would like to ask for suggestions:
> > "What should I do if the "number" 2636 becomes unhashable ?"

> In your example, that's the array which is unhashable, the numbers
> itself should be hashable. Arrays are mutable, so I don't think you
> can easily make them hashable. You could transform everything into
> tuple of tuple of... if you need to use set, though.

Use md5's of their .data attribute. This works quite well (you might want
to hash a pickled string of the dtype in addition).

Ga?l


From bergstrj at iro.umontreal.ca  Fri Oct 30 09:11:35 2009
From: bergstrj at iro.umontreal.ca (James Bergstra)
Date: Fri, 30 Oct 2009 09:11:35 -0400
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
In-Reply-To: <20091030112352.GD16315@phare.normalesup.org>
References: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com>
	<5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com>
	<20091030112352.GD16315@phare.normalesup.org>
Message-ID: <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com>

On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote:
>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase at gmail.com> wrote:
>
>> > I understand where this error comes from, however what I was trying to
>> > do seems to "intuitive" that I would like to ask for suggestions:
>> > "What should I do if the "number" 2636 becomes unhashable ?"
>
>> In your example, that's the array which is unhashable, the numbers
>> itself should be hashable. Arrays are mutable, so I don't think you
>> can easily make them hashable. You could transform everything into
>> tuple of tuple of... if you need to use set, though.
>
> Use md5's of their .data attribute. This works quite well (you might want
> to hash a pickled string of the dtype in addition).
>
> Ga?l

Careful... if your data is not contiguous in memory then you could be
adding lots of random noise to your hash key by doing this.  This
could cause equal ndarrays to hash to different values -- not good.
Make sure memory is contiguous before hashing the .data.  Flatten()
does this i think, as does copy(), array(), and many others.

James
-- 
http://www-etud.iro.umontreal.ca/~bergstrj


From mail at stevesimmons.com  Fri Oct 30 09:18:05 2009
From: mail at stevesimmons.com (Stephen Simmons)
Date: Fri, 30 Oct 2009 14:18:05 +0100
Subject: [Numpy-discussion] Designing a new storage format for numpy
	recarrays
Message-ID: <4AEAE78D.6030309@stevesimmons.com>

Hi,

Is anyone working on alternative storage options for numpy arrays, and 
specifically recarrays? My main application involves processing series 
of large recarrays (say 1000 recarrays, each with 5M rows having 50 
fields). Existing options meet some but not all of my  requirements.

Requirements
--------------
The basic requirements are:

Mandatory
 - fast
 - suitable for very large arrays (larger than can fit in memory)
 - compressed (to reduce disk space, read data more quickly)
 - seekable (can read subset of data without decompressing everything)
 - can append new data to an existing file
 - able to extract individual fields from a recarray (for when indexing 
or processing needs just a few fields)
Nice to have
 - files can be split without decompressing and recompressing (e.g. 
distribute processing over a grid)
 - encryption, ideally field-level, with encryption occurring after 
compression
 - can store multiple arrays in one physical file (convenience)
 - portable/stardard/well documented

Existing options
-----------------
Over the last few years I've tried most of numpy's options for saving 
arrays to disk, including pickles, .npy, .npz, memmap-ed files and HDF 
(Pytables).

None of these is perfect, although Pytables comes close:
 - .npy - not compressed, need to read whole array into memory
 - .npz - compressed but ZLIB compression is too slow
 - memmap - not compressed
 - Pytables (HDF using chunked storage for recarrays with LZO 
compression and shuffle filter)
    - can't extract individual field from a recarray
    - multiple dependencies (HDF, PyTables+LZO, Pyh5+LZF)
    - HDF is standard but LZO implementation is specific to Pytables 
(similarly LZF is specific to Pyh5)

Are there any other options?


Thoughts about a new format
--------------------------------
It seems that numpy could benefit from a new storage format. My first 
thoughts involve:

 - Use chunked format - split big arrays into pages of consecutive rows, 
compressed separately
 - Get good compression ratios by shuffling data before compressing 
(byte 1 of all rows, then byte 2 of all rows, ...)
 - Get efficient access to individual fields in recarrays by compressing 
each recarray field's data separately (shuffling has nice side-effect of 
separating recarray fields' data)
 - Make it fast to compress and decompress by using LZO
 - Store pages of rows (and compressd field data within a page) using a 
numpy variation of IFF chunked format (e.g. used by the DjVu scanned 
document format version 3). For example, FORM chunk for whole file, DTYP 
chunk for dtype info, DIRM chunk for directory to pages holding rows, 
NPAG chunk for a page
 - The IFF structure of named chunk types allows format to be extended 
(other compressors than LZO, encryption, links to remote data chunks, etc)

I'd appreciate any comments or suggestions before I start coding.

References
-----------
DjVu format - http://djvu.org/resources/
DjVu v3 format - http://djvu.org/docs/DjVu3Spec.djvu


Stephen

P.S. Maybe this will be too much work, and I'd be better off sticking 
with Pytables.....


From dagss at student.matnat.uio.no  Fri Oct 30 09:48:54 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Fri, 30 Oct 2009 14:48:54 +0100
Subject: [Numpy-discussion] Designing a new storage format for numpy
 recarrays
In-Reply-To: <4AEAE78D.6030309@stevesimmons.com>
References: <4AEAE78D.6030309@stevesimmons.com>
Message-ID: <2384af0be6bd9de94f7f140ea5a6aca3.squirrel@webmail.uio.no>

Dag Sverre Seljebotn:
> Hi,
>
> Is anyone working on alternative storage options for numpy arrays, and
> specifically recarrays? My main application involves processing series
> of large recarrays (say 1000 recarrays, each with 5M rows having 50
> fields). Existing options meet some but not all of my  requirements.
>
> Requirements
> --------------
> The basic requirements are:
>
> Mandatory
>  - fast
>  - suitable for very large arrays (larger than can fit in memory)
>  - compressed (to reduce disk space, read data more quickly)
>  - seekable (can read subset of data without decompressing everything)
>  - can append new data to an existing file
>  - able to extract individual fields from a recarray (for when indexing
> or processing needs just a few fields)
> Nice to have
>  - files can be split without decompressing and recompressing (e.g.
> distribute processing over a grid)
>  - encryption, ideally field-level, with encryption occurring after
> compression
>  - can store multiple arrays in one physical file (convenience)
>  - portable/stardard/well documented
>
> Existing options
> -----------------
> Over the last few years I've tried most of numpy's options for saving
> arrays to disk, including pickles, .npy, .npz, memmap-ed files and HDF
> (Pytables).
>
> None of these is perfect, although Pytables comes close:
>  - .npy - not compressed, need to read whole array into memory
>  - .npz - compressed but ZLIB compression is too slow
>  - memmap - not compressed
>  - Pytables (HDF using chunked storage for recarrays with LZO
> compression and shuffle filter)
>     - can't extract individual field from a recarray

I'm just learning PyTables so I'm curious about this... if I use a normal
Table, it will be presented as a NumPy record array when I access it, and
I can access individual fields. What are the disadvantages to that?

>     - multiple dependencies (HDF, PyTables+LZO, Pyh5+LZF)

(I think this is a pro, not a con: It means that there's a lot of already
bugfixed code being used. Any codebase is only as strong as the number of
eyes on it.)

Dag Sverre


From dagss at student.matnat.uio.no  Fri Oct 30 10:08:20 2009
From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn)
Date: Fri, 30 Oct 2009 15:08:20 +0100
Subject: [Numpy-discussion] Designing a new storage format for numpy
 recarrays
In-Reply-To: <4AEAE78D.6030309@stevesimmons.com>
References: <4AEAE78D.6030309@stevesimmons.com>
Message-ID: <834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no>

Stephen Simmons wrote:
> P.S. Maybe this will be too much work, and I'd be better off sticking
> with Pytables.....

I can't judge that, but I want to share some thoughts (rant?):

 - Are you ready to not only write the code, but maintain it over years to
come, and work through nasty bugs, and think things through when people
ask for parallellism or obscure filesystem locking functionality or
whatnot?

 - Are you ready to finish even the last, boring "10%". Since there are
existing options in the same area you can't expect a growing userbase to
help you with the last "10%" (unlike projects in unexplored areas).

 - When you are done, are you sure that what you finally have will really
be leaner and easier to work with than the existing options (like
PyTables?).

If not, odds are the result will in the end only be used by yourself.
Simply writing the prototype is the easy part of the job!

Perhaps needless to say, my hunch would be to try to work with PyTables to
add what you miss there. There's a harder learning curve than writing
something from scratch, but not harder than what others will have with
something you write from scratch.

The advantage of hdf5 is that there's lot of existing tools for
inspecting, processing and sharing the data independent of NumPy (well, up
to propriotary compression; but that's hardly worse than the entire format
being propriotary).

Dag Sverre


From zachary.pincus at yale.edu  Fri Oct 30 10:26:21 2009
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Fri, 30 Oct 2009 10:26:21 -0400
Subject: [Numpy-discussion] Designing a new storage format for numpy
	recarrays
In-Reply-To: <834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no>
References: <4AEAE78D.6030309@stevesimmons.com>
	<834a4c20c8c7923cc132386169bbdd2a.squirrel@webmail.uio.no>
Message-ID: <BD492D83-861D-4064-B6F0-7DFC44709AEE@yale.edu>

Unless I read your request or the documentation wrong, h5py already  
supports pulling specific fields out of "compound data types":

http://h5py.alfven.org/docs-1.1/guide/hl.html#id3

> For compound data, you can specify multiple field names alongside  
> the numeric slices:
> >>> dset["FieldA"]
> >>> dset[0,:,4:5, "FieldA", "FieldB"]
> >>> dset[0, ..., "FieldC"]

Is this latter style of access what you were asking for? (Or is the  
problem that it's not fast enough in hdf5, even with the shuffle  
filter, etc?)

So then the issue is that there's a dependency on hdf5 and h5py? (or  
if you want to access LZF-compressed files without h5py, a dependency  
on hdf5 and the C LZF compressor?). This is pretty lightweight,  
especially if you're proposing writing new code which itself would be  
a dependency. So your new code couldn't depend on *anything* else if  
you wanted it to be a fewer-dependencies option than hdf5+h5py, right?

Zach


From faltet at pytables.org  Fri Oct 30 11:17:08 2009
From: faltet at pytables.org (Francesc Alted)
Date: Fri, 30 Oct 2009 16:17:08 +0100
Subject: [Numpy-discussion] Designing a new storage format for numpy
	recarrays
In-Reply-To: <4AEAE78D.6030309@stevesimmons.com>
References: <4AEAE78D.6030309@stevesimmons.com>
Message-ID: <200910301617.08823.faltet@pytables.org>

A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?:
>  - Pytables (HDF using chunked storage for recarrays with LZO
> compression and shuffle filter)
>     - can't extract individual field from a recarray

Er... Have you tried the ``cols`` accessor?

http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr

Cheers,

-- 
Francesc Alted


From robert.kern at gmail.com  Fri Oct 30 12:09:26 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 30 Oct 2009 11:09:26 -0500
Subject: [Numpy-discussion] Designing a new storage format for numpy
	recarrays
In-Reply-To: <4AEAE78D.6030309@stevesimmons.com>
References: <4AEAE78D.6030309@stevesimmons.com>
Message-ID: <3d375d730910300909h4e3beb8v4fe582a2cccd27ff@mail.gmail.com>

On Fri, Oct 30, 2009 at 08:18, Stephen Simmons <mail at stevesimmons.com> wrote:

> Thoughts about a new format
> --------------------------------
> It seems that numpy could benefit from a new storage format.

While you may indeed need a new format, I'm not sure that numpy does.
Lord knows I've gotten enough flak for inventing yet another binary
format with .npy. :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From mail at stevesimmons.com  Fri Oct 30 12:19:42 2009
From: mail at stevesimmons.com (Stephen Simmons)
Date: Fri, 30 Oct 2009 17:19:42 +0100
Subject: [Numpy-discussion] Designing a new storage format for
	numpy	recarrays
In-Reply-To: <200910301617.08823.faltet@pytables.org>
References: <4AEAE78D.6030309@stevesimmons.com>
	<200910301617.08823.faltet@pytables.org>
Message-ID: <4AEB121E.8080607@stevesimmons.com>

I should clarify what I meant......

Suppose I have a recarray with 50 fields and want to read just one of 
those fields. PyTables/HDF will read in the compressed data for chunks 
of complete rows, decompress the full 50 fields, and then give me back 
the data for just one field.

I'm after a solution where asking for a single field reads in the bytes 
for just that field from disk and decompresses it.

This is similar to the difference between databases storing their data 
as rows or columns. See for example Mike Stonebraker's C-store 
column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).

Stephen


Francesc Alted wrote:
> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?:
>   
>>  - Pytables (HDF using chunked storage for recarrays with LZO
>> compression and shuffle filter)
>>     - can't extract individual field from a recarray
>>     
>
> Er... Have you tried the ``cols`` accessor?
>
> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr
>
> Cheers,
>
>   


From peridot.faceted at gmail.com  Fri Oct 30 12:35:10 2009
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Fri, 30 Oct 2009 12:35:10 -0400
Subject: [Numpy-discussion] Designing a new storage format for numpy
	recarrays
In-Reply-To: <4AEB121E.8080607@stevesimmons.com>
References: <4AEAE78D.6030309@stevesimmons.com>
	<200910301617.08823.faltet@pytables.org>
	<4AEB121E.8080607@stevesimmons.com>
Message-ID: <ce557a360910300935p6e926242xb79c153e4e10aff1@mail.gmail.com>

2009/10/30 Stephen Simmons <mail at stevesimmons.com>:
> I should clarify what I meant......
>
> Suppose I have a recarray with 50 fields and want to read just one of
> those fields. PyTables/HDF will read in the compressed data for chunks
> of complete rows, decompress the full 50 fields, and then give me back
> the data for just one field.
>
> I'm after a solution where asking for a single field reads in the bytes
> for just that field from disk and decompresses it.
>
> This is similar to the difference between databases storing their data
> as rows or columns. See for example Mike Stonebraker's C-store
> column-oriented database (http://db.lcs.mit.edu/projects/cstore/vldb.pdf).

Is there any reason not to simply store the data as a collection of
separate arrays, one per column? It shouldn't be too hard to write a
wrapper to give this nicer syntax, while implementing it under the
hood with HDF5...

Anne

> Stephen
>
>
>
> Francesc Alted wrote:
>> A Friday 30 October 2009 14:18:05 Stephen Simmons escrigu?:
>>
>>> ?- Pytables (HDF using chunked storage for recarrays with LZO
>>> compression and shuffle filter)
>>> ? ? - can't extract individual field from a recarray
>>>
>>
>> Er... Have you tried the ``cols`` accessor?
>>
>> http://www.pytables.org/docs/manual/ch04.html#ColsClassDescr
>>
>> Cheers,
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From robert.kern at gmail.com  Fri Oct 30 12:44:11 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 30 Oct 2009 11:44:11 -0500
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
In-Reply-To: <7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com>
References: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com> 
	<5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> 
	<20091030112352.GD16315@phare.normalesup.org>
	<7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com>
Message-ID: <3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com>

On Fri, Oct 30, 2009 at 08:11, James Bergstra <bergstrj at iro.umontreal.ca> wrote:
> On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote:
>>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase at gmail.com> wrote:
>>
>>> > I understand where this error comes from, however what I was trying to
>>> > do seems to "intuitive" that I would like to ask for suggestions:
>>> > "What should I do if the "number" 2636 becomes unhashable ?"
>>
>>> In your example, that's the array which is unhashable, the numbers
>>> itself should be hashable. Arrays are mutable, so I don't think you
>>> can easily make them hashable. You could transform everything into
>>> tuple of tuple of... if you need to use set, though.
>>
>> Use md5's of their .data attribute. This works quite well (you might want
>> to hash a pickled string of the dtype in addition).
>>
>> Ga?l
>
> Careful... if your data is not contiguous in memory then you could be
> adding lots of random noise to your hash key by doing this. ?This
> could cause equal ndarrays to hash to different values -- not good.
> Make sure memory is contiguous before hashing the .data. ?Flatten()
> does this i think, as does copy(), array(), and many others.

.data doesn't work for non-contiguous arrays anyways. :-)

But all of this is irrelevant to the OP. First, I cannot replicate his problem.

In [12]: chainsA = np.arange(10, dtype=np.int64)

In [13]: set(chainsA)
Out[13]: set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Second, he seems to be interested in scalar objects, not arrays. The
scalar objects should all be hashable and comparable out-of-box and
ready to be used in sets and as dict keys. We will need a complete,
self-contained example that demonstrates the problem to get any
further with this.

Third, even if he wanted to use arrays as set elements, he couldn't
because such objects not only need to have __hash__ defined, they also
need __eq__ to return a bool. We return boolean arrays that cannot be
used as a truth value.

Fourth, even if arrays could be compared, you couldn't replace their
__hash__ method or tell set to use a different function in place of
the __hash__ method.

Fifth, even if you could tell set to use a different hash function,
you wouldn't use cryptographic hashes. You would just
hash(buffer(arr)) for contiguous arrays and hash(arr.tostring()) for
the rest.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From jorgen.stenarson at bostream.nu  Fri Oct 30 13:22:56 2009
From: jorgen.stenarson at bostream.nu (=?ISO-8859-1?Q?J=F6rgen_Stenarson?=)
Date: Fri, 30 Oct 2009 18:22:56 +0100
Subject: [Numpy-discussion] [RFC] complex functions in npymath
In-Reply-To: <4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>
References: <hcec26$aps$2@ger.gmane.org>
	<4AEAB30F.8090604@ar.media.kyoto-u.ac.jp>
Message-ID: <4AEB20F0.8070201@bostream.nu>

David Cournapeau skrev:
> The missing piece in complex support in npymath is mostly tests: I have
> tests for all the special cases (all special cases specified in C99
> standard are tested), but no test for the actual 'normal' values. If you
> (or someone else) could add a couple of tests, that would be great.
> 

In ticket #1271 I reported on some edgecase errors for pow with negative 
exponents which would be great to include in the test suite as well.

/J?rgen


From reckoner at gmail.com  Fri Oct 30 14:13:28 2009
From: reckoner at gmail.com (Reckoner)
Date: Fri, 30 Oct 2009 11:13:28 -0700
Subject: [Numpy-discussion] persistent ImportError: No module named
	multiarray when moving cPickle files between machines
Message-ID: <a1b53c580910301113gcb0c503sfe2f7ce11a951e69@mail.gmail.com>

Hi,

% python -c 'import numpy.core.multiarray'

works just fine, but when I try to load a file that I have transferred
from another machine running Windows to one running Linux, I get:

%  python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named multiarray

otherwise, cPickle works normally when transferring files that *do*
not contain numpy arrays.

I am using version 1.2 on both machines. It's not so easy for me to
change versions, by the way, since this is the version that my working
group has decided on to standardize on for this effort.


Any help appreciated.


From robert.kern at gmail.com  Fri Oct 30 15:09:48 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 30 Oct 2009 14:09:48 -0500
Subject: [Numpy-discussion] persistent ImportError: No module named
	multiarray when moving cPickle files between machines
In-Reply-To: <a1b53c580910301113gcb0c503sfe2f7ce11a951e69@mail.gmail.com>
References: <a1b53c580910301113gcb0c503sfe2f7ce11a951e69@mail.gmail.com>
Message-ID: <3d375d730910301209y5a59c472v86d62c7fb517d77b@mail.gmail.com>

On Fri, Oct 30, 2009 at 13:13, Reckoner <reckoner at gmail.com> wrote:
> Hi,
>
> % python -c 'import numpy.core.multiarray'
>
> works just fine, but when I try to load a file that I have transferred
> from another machine running Windows to one running Linux, I get:
>
> % ?python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))'
>
> Traceback (most recent call last):
> ?File "<string>", line 1, in <module>
> ImportError: No module named multiarray
>
> otherwise, cPickle works normally when transferring files that *do*
> not contain numpy arrays.
>
> I am using version 1.2 on both machines. It's not so easy for me to
> change versions, by the way, since this is the version that my working
> group has decided on to standardize on for this effort.

You can import numpy.core.multiarray on both machines?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


From pav at iki.fi  Fri Oct 30 17:05:16 2009
From: pav at iki.fi (Pauli Virtanen)
Date: Fri, 30 Oct 2009 23:05:16 +0200
Subject: [Numpy-discussion] [RFC] complex functions in npymath
In-Reply-To: <4AEAB878.4030308@ar.media.kyoto-u.ac.jp>
References: <hcec26$aps$2@ger.gmane.org>
	<4AEAB30F.8090604@ar.media.kyoto-u.ac.jp> <hcedt3$aps$5@ger.gmane.org>
	<4AEAB878.4030308@ar.media.kyoto-u.ac.jp>
Message-ID: <1256936716.6755.14.camel@idol>

pe, 2009-10-30 kello 18:57 +0900, David Cournapeau kirjoitti:
[clip: struct return values]
> Is this a problem in practice ? If two compilers differ in this,
> wouldn't they have incompatible ABI ?

Yep, it would be an incompatible ABI. I don't really know how common
this in practice -- but there was a comment warning about this in the
old ufunc sources, so I wanted to be wary... I don't think there's a
significant downside in having thin wrappers around the pointer
functions.

Googling a bit reveals at least some issues that have cropped up in gcc:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36834 (MSVC vs. gcc)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9506 (bug on freebsd)

I'd imagine the situation vs. compilers is here a bit similar to C++
ABIs and sounds like it's a less tested corner of the calling
conventions. No idea whether this matters in practice, but at least the
above MSVC vs. gcc issue sounds like it might bite.

	Pauli


From seb.haase at gmail.com  Fri Oct 30 17:08:38 2009
From: seb.haase at gmail.com (Sebastian Haase)
Date: Fri, 30 Oct 2009 22:08:38 +0100
Subject: [Numpy-discussion] type 'numpy.int64' unhashable
In-Reply-To: <3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com>
References: <bc657ead0910300404w2fdd0fa1q478172950c62dece@mail.gmail.com> 
	<5b8d13220910300421y7c4b83bfyf507164a26b3db46@mail.gmail.com> 
	<20091030112352.GD16315@phare.normalesup.org>
	<7f1eaee30910300611y6d6a8b3fg7e266671a93eb49@mail.gmail.com> 
	<3d375d730910300944y3dbe2aefwd14c582f5aa0bbcc@mail.gmail.com>
Message-ID: <bc657ead0910301408n31fbf252o30eefca8bfe0aafa@mail.gmail.com>

On Fri, Oct 30, 2009 at 5:44 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Oct 30, 2009 at 08:11, James Bergstra <bergstrj at iro.umontreal.ca> wrote:
>> On Fri, Oct 30, 2009 at 7:23 AM, Gael Varoquaux
>> <gael.varoquaux at normalesup.org> wrote:
>>> On Fri, Oct 30, 2009 at 08:21:16PM +0900, David Cournapeau wrote:
>>>> On Fri, Oct 30, 2009 at 8:04 PM, Sebastian Haase <seb.haase at gmail.com> wrote:
>>>
>>>> > I understand where this error comes from, however what I was trying to
>>>> > do seems to "intuitive" that I would like to ask for suggestions:
>>>> > "What should I do if the "number" 2636 becomes unhashable ?"
>>>
>>>> In your example, that's the array which is unhashable, the numbers
>>>> itself should be hashable. Arrays are mutable, so I don't think you
>>>> can easily make them hashable. You could transform everything into
>>>> tuple of tuple of... if you need to use set, though.
>>>
>>> Use md5's of their .data attribute. This works quite well (you might want
>>> to hash a pickled string of the dtype in addition).
>>>
>>> Ga?l
>>
>> Careful... if your data is not contiguous in memory then you could be
>> adding lots of random noise to your hash key by doing this. ?This
>> could cause equal ndarrays to hash to different values -- not good.
>> Make sure memory is contiguous before hashing the .data. ?Flatten()
>> does this i think, as does copy(), array(), and many others.
>
> .data doesn't work for non-contiguous arrays anyways. :-)
>
> But all of this is irrelevant to the OP. First, I cannot replicate his problem.
>
> In [12]: chainsA = np.arange(10, dtype=np.int64)
>
> In [13]: set(chainsA)
> Out[13]: set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
>
> Second, he seems to be interested in scalar objects, not arrays. The
> scalar objects should all be hashable and comparable out-of-box and
> ready to be used in sets and as dict keys. We will need a complete,
> self-contained example that demonstrates the problem to get any
> further with this.
>
> Third, even if he wanted to use arrays as set elements, he couldn't
> because such objects not only need to have __hash__ defined, they also
> need __eq__ to return a bool. We return boolean arrays that cannot be
> used as a truth value.
>
> Fourth, even if arrays could be compared, you couldn't replace their
> __hash__ method or tell set to use a different function in place of
> the __hash__ method.
>
> Fifth, even if you could tell set to use a different hash function,
> you wouldn't use cryptographic hashes. You would just
> hash(buffer(arr)) for contiguous arrays and hash(arr.tostring()) for
> the rest.
>
> --
> Robert Kern
>
Thanks to everyone for replying. Nice detective work, Robert - indeed
it seems to work with "real" ndarrays -- I have to do some more
homework to get my problem into a shape so that I could demonstrate it
in a "small, self contained form".
Thanks again,

Sebastian


From reckoner at gmail.com  Fri Oct 30 21:48:23 2009
From: reckoner at gmail.com (Reckoner)
Date: Fri, 30 Oct 2009 18:48:23 -0700
Subject: [Numpy-discussion] persistent ImportError: No module named
	multiarray when moving cPickle files between machines
In-Reply-To: <a1b53c580910301113gcb0c503sfe2f7ce11a951e69@mail.gmail.com>
References: <a1b53c580910301113gcb0c503sfe2f7ce11a951e69@mail.gmail.com>
Message-ID: <a1b53c580910301848i5ed2c1d4v4443ec0642a3aee9@mail.gmail.com>

> Robert Kern wrote:
> You can import numpy.core.multiarray on both machines?

Yes. For each machine separately, you can cPickle files with numpy
arrays without problems loading/dumping. The problem comes from
transferring the win32 cPickle'd files to Linux 64 bit and then trying
to load them. Transferring cPickle'd files that do *not* have numpy
arrays work as expected. In other words, cPICKLE'd lists transfer fine
back and forth between the two machines. In fact, we currently get
around this problem by converting the numpy arrays to lists,
transferring them, and then re-numpy-ing them on the respective hosts

thanks.


On Fri, Oct 30, 2009 at 11:13 AM, Reckoner <reckoner at gmail.com> wrote:
> Hi,
>
> % python -c 'import numpy.core.multiarray'
>
> works just fine, but when I try to load a file that I have transferred
> from another machine running Windows to one running Linux, I get:
>
> % ?python -c 'import cPickle;a=cPickle.load(open("matrices.pkl"))'
>
> Traceback (most recent call last):
> ?File "<string>", line 1, in <module>
> ImportError: No module named multiarray
>
> otherwise, cPickle works normally when transferring files that *do*
> not contain numpy arrays.
>
> I am using version 1.2 on both machines. It's not so easy for me to
> change versions, by the way, since this is the version that my working
> group has decided on to standardize on for this effort.
>
>
> Any help appreciated.
>


From sccolbert at gmail.com  Sat Oct 31 08:22:47 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Sat, 31 Oct 2009 13:22:47 +0100
Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array
	become?
Message-ID: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>

For example say we have an original array  a=np.random.random((512, 512, 3))
and we take a slice of that array b=a[:100, :100, :]

now, b is discontiguous, but all if its memory is owned by a.

Will there ever be a situation where a discontiguous array owns its
own data? Or more generally, will discontiguous data alway have a
contiguous parent?

As far as i understand the numpy strided model, that could only be
supported if len(strides) = ndim+1, I dont think numpy supports that.

Don't get me wrong, I'm not making a feature request, just making sure
I fully understand the array model so I can avoid trampling on memory.

Cheers!

Chris


From cournape at gmail.com  Sat Oct 31 08:32:17 2009
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 31 Oct 2009 21:32:17 +0900
Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array
	become?
In-Reply-To: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>
References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>
Message-ID: <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com>

On Sat, Oct 31, 2009 at 9:22 PM, Chris Colbert <sccolbert at gmail.com> wrote:

>
> Will there ever be a situation where a discontiguous array owns its
> own data? Or more generally, will discontiguous data alway have a
> contiguous parent?

Yes to Q1 and No to Q2.

Discontiguous arrays are very easy to create: for example, if you say
np.empty((10, 50), order="F"), you have a discontiguous array. I use
this quite often when I need to interoperate with C or Fortran
libraries - interoperation with other libraries/formats is another
common source of discontiguous arrays, compared to memory views.

David


From sccolbert at gmail.com  Sat Oct 31 08:45:19 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Sat, 31 Oct 2009 13:45:19 +0100
Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array
	become?
In-Reply-To: <5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com>
References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>
	<5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com>
Message-ID: <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com>

Thanks for the response david.

Lemme rephrase the question a little bit.

It terms of actually memory space, will a numpy array ever point to a
chunk of memory that is not a continually running series of memory
addresses and also not a child of a continuous block of addresses.


Graphically can this every occur in hardware memory:

|--- a portion of array A ---|--- python object foo ---|--- The rest
of array A ----|


The reason I ask is because I am passing numpy arrays into another
library which uses a strided memory model, but not FULLY strided, and
I need to figure out
what checks I need to put in place to ensure that it doesnt trample on
memory. In the best case senario, it would just trample on the parent
array, in the worst case senario it would segfault.

Cheers,

Chris


On Sat, Oct 31, 2009 at 1:32 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, Oct 31, 2009 at 9:22 PM, Chris Colbert <sccolbert at gmail.com> wrote:
>
>>
>> Will there ever be a situation where a discontiguous array owns its
>> own data? Or more generally, will discontiguous data alway have a
>> contiguous parent?
>
> Yes to Q1 and No to Q2.
>
> Discontiguous arrays are very easy to create: for example, if you say
> np.empty((10, 50), order="F"), you have a discontiguous array. I use
> this quite often when I need to interoperate with C or Fortran
> libraries - interoperation with other libraries/formats is another
> common source of discontiguous arrays, compared to memory views.
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From cournape at gmail.com  Sat Oct 31 08:58:53 2009
From: cournape at gmail.com (David Cournapeau)
Date: Sat, 31 Oct 2009 21:58:53 +0900
Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array
	become?
In-Reply-To: <7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com>
References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>
	<5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com>
	<7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com>
Message-ID: <5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com>

On Sat, Oct 31, 2009 at 9:45 PM, Chris Colbert <sccolbert at gmail.com> wrote:

> Graphically can this every occur in hardware memory:
>
> |--- a portion of array A ---|--- python object foo ---|--- The rest
> of array A ----|

No, this can never happen in the current numpy memory model, the
allocated block has to be contiguous, and you can get to any item of
the array from the data pointer (address of the first item) by N *
item_size. That's a fundamental feature to enable fast access (you
only need to jump once).

David


From sccolbert at gmail.com  Sat Oct 31 09:02:14 2009
From: sccolbert at gmail.com (Chris Colbert)
Date: Sat, 31 Oct 2009 14:02:14 +0100
Subject: [Numpy-discussion] just how 'discontiguous' can a numpy array
	become?
In-Reply-To: <5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com>
References: <7f014ea60910310522k5bf2fd4dj4a6c6f55ba64c704@mail.gmail.com>
	<5b8d13220910310532l3819c7b3led51770b15e387a5@mail.gmail.com>
	<7f014ea60910310545v6b90343ey4f737612fdbf930d@mail.gmail.com>
	<5b8d13220910310558t47d77f18me17beeeb9a571ba1@mail.gmail.com>
Message-ID: <7f014ea60910310602y4b0ea611q43e47be0f7011a40@mail.gmail.com>

Great!

Thanks for the help David!


On Sat, Oct 31, 2009 at 1:58 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, Oct 31, 2009 at 9:45 PM, Chris Colbert <sccolbert at gmail.com> wrote:
>
>> Graphically can this every occur in hardware memory:
>>
>> |--- a portion of array A ---|--- python object foo ---|--- The rest
>> of array A ----|
>
> No, this can never happen in the current numpy memory model, the
> allocated block has to be contiguous, and you can get to any item of
> the array from the data pointer (address of the first item) by N *
> item_size. That's a fundamental feature to enable fast access (you
> only need to jump once).
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From matthew.brett at gmail.com  Sat Oct 31 12:38:34 2009
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 31 Oct 2009 12:38:34 -0400
Subject: [Numpy-discussion] Recarray comparison and byte order
Message-ID: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com>

Hi,

I was surprised by this - is it a bug or a feature or me
misunderstanding something?

a = np.zeros((1,), dtype=[('f1', 'u2')])
b = a.copy()
b == a
(array([True], dtype=bool)) # as expected
c = a.byteswap().newbyteorder()
c == a
(False) # to me, unexpected, note bool rather than array

Thanks for any clarification...

Matthew


From geometrian at gmail.com  Sat Oct 31 14:46:50 2009
From: geometrian at gmail.com (Ian Mallett)
Date: Sat, 31 Oct 2009 11:46:50 -0700
Subject: [Numpy-discussion] Recarray comparison and byte order
In-Reply-To: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com>
References: <1e2af89e0910310938n7aec3ca0w8d5b0f8e6c7b0933@mail.gmail.com>
Message-ID: <a62fab400910311146v730e0d55rd62b1637b70f6b8a@mail.gmail.com>

On Sat, Oct 31, 2009 at 9:38 AM, Matthew Brett <matthew.brett at gmail.com>wrote:

> c = a.byteswap().newbyteorder()
> c == a
>
In the last two lines, a variable "c" is assigned to a modified "a".  The
next line tests (==) to see if "c" is the same as (==) the unmodified "a".
It isn't, because "c" is the modified "a".  Hence, "False".

Do you mean:
c = a
instead of:
c == a
...?

HTH,
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091031/c1ccf72e/attachment.html>