[Numpy-discussion] NumPy-Discussion Digest, Vol 66, Issue 61

Tue Mar 20 13:49:49 EDT 2012

On Tue, Mar 20, 2012 at 5:13 AM, Matthieu Rigal <rigal at rapideye.net> wrote:

> In fact, I was hoping to have a less memory and more speed solution.

which do often go together, at least for big problems -- pushingm
emory around often takes more time than the computation itself.

> At the end, I am rather interested by more speed.
>
> I tried first a code-sparing version :
array = numpy.asarray([(aBlueChannel < 1.0),(aNirChannel > aBlueChannel *
> 1.0),(aNirChannel < aBlueChannel * 1.8)]).all()

(by the way -- it is MUCH better if you post example code with actual
data (the example data can be much smaller) -- while with small data
sets you can't test performance, you can test correctness -- the
easier you make it for us to try stuff out, the more we can help you.

re-formatting so I can read this, I get:

array = numpy.asarray([(aBlueChannel < 1.0),
                                      (aNirChannel > aBlueChannel * 1.0),
                                      (aNirChannel < aBlueChannel * 1.8)]).all()

a few notes:

asarray  will work, but is pointless here -- I'd just use "array" --
asarray() is for when you may or may not have an array as input, and
you want to preserve it if you do.

I'd probably use numpy.vstack or rstack here, rather than the array
(or asarray) function -- there is a larger parsing overhead to array()

I suppose this is a special case, but multiplying by 1.0 is kind of
pointless there (or are  you doing that to cast to a float array)?

if you are doing an all() -- not much reason to put them all in the
same array first anyway.

> But this one is at the end more than 2 times slower than :

array1 = numpy.empty([3,6566,6682], dtype=numpy.bool)
numpy.less(aBlueChannel, 1.0, out=array1[0])
numpy.greater(aNirChannel, (aBlueChannel * 1.0), out=array1[1])
numpy.less(aNirChannel, (aBlueChannel * 1.8), out=array1[2])
array = array1.all()

yup -- creating temporaries can be slow for big data -- there is the
trade-off between compact code and performance some times.

I think you can be more memory efficient here, though -- if in the end
all you want is the final "all" check, no need to store all checks for
each channel -- something like:

#allocate a bool array:
array1 = numpy.empty( (6566,6682),  dtype=numpy.bool)

result = numpy.less(aBlueChannel, 1.0, out=array1).all()
result &= numpy.greater(aNirChannel, (aBlueChannel * 1.0), out=array1).all()
result &= numpy.less(aNirChannel, (aBlueChannel * 1.8), out=array1[2]).all()

three loops for the all(), but less memory to push around -- may be faster.

I'd also take a look at numexpr for this, it could be very helpful:

http://code.google.com/p/numexpr/

-Chris

> (and this solution is about 30% faster than the original one)
>
> I could find another way which was fine for me too:
> array = (aBlueChannel < 1.0) * (aNirChannel > (aBlueChannel * 1.0)) *
> (aNirChannel < (aBlueChannel * 1.8))
>
> But this one is only 5-10% faster than the original solution, even if probably
> using less memory than the 2 previous ones. (same was possible with operator
> +, but slower than operator *)
>
> Regards,
> Matthieu Rigal
>
>
> On Monday 19 March 2012 18:00:02 numpy-discussion-request at scipy.org wrote:
>> Message: 2
>> Date: Mon, 19 Mar 2012 13:20:23 +0000
>> From: Richard Hattersley <rhattersley at gmail.com>
>> Subject: Re: [Numpy-discussion] Using logical function on more than 2
>>         arrays, availability of a "between" function ?
>> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
>> Message-ID:
>>         <CAP=RS9=UBOc6Kmtmnne7W093t19w=T=oSrXUAW0WF8B49hqcXQ at mail.gmail.com
>> > Content-Type: text/plain; charset=ISO-8859-1
>>
>> What do you mean by "efficient"? Are you trying to get it execute
>> faster? Or using less memory? Or have more concise source code?
>>
>> Less memory:
>>  - numpy.vectorize would let you get to the end result without any
>> intermediate arrays but will be slow.
>>  - Using the "out" parameter of numpy.logical_and will let you avoid
>> one of the intermediate arrays.
>>
>> More speed?:
>> Perhaps putting all three boolean temporary results into a single
>> boolean array (using the "out" parameter of numpy.greater, etc) and
>> using numpy.all might benefit from logical short-circuiting.
>>
>> And watch out for divide-by-zero from "aNirChannel/aBlueChannel".
>>
>> Regards,
>> Richard Hattersley
>>
>
> RapidEye AG
> Molkenmarkt 30
> 14776 Brandenburg an der Havel
> Germany
>
> Follow us on Twitter! www.twitter.com/rapideye_ag
>
> Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
> Management Board/Vorstand: Ryan Johnson
> Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates:
> Robert Johnson
> Commercial Register/Handelsregister Potsdam HRB 24742 P
> Tax Number/Steuernummer: 048/100/00053
> VAT-Ident-Number/Ust.-ID: DE 199331235
> DIN EN ISO 9001 certified
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov