Hi, The following gives the wrong answer: In [2]: A = array(['a','aa','b']) In [3]: B = array(['d','e']) In [4]: A.searchsorted(B) Out[4]: array([3, 0]) The answer should be [3,3]. I've come across this while trying to come up with an ismember function which works for strings (setmember1d doesn't seems to assume numerical arrays). Thanks, James
works fine for me. In [33]: A = numpy.array(['a','aa','b']) In [34]: B = numpy.array(['d','e']) In [35]: A.searchsorted(B) Out[35]: array([3, 3]) In [36]: numpy.__version__ Out[36]: '1.0.5.dev4567' L. On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hi,
The following gives the wrong answer:
In [2]: A = array(['a','aa','b'])
In [3]: B = array(['d','e'])
In [4]: A.searchsorted(B) Out[4]: array([3, 0])
The answer should be [3,3]. I've come across this while trying to come up with an ismember function which works for strings (setmember1d doesn't seems to assume numerical arrays).
Thanks, James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
It works fine also for me (numpy 1.04 gentoo linux on amd64) Nadav On Thu, 20080131 at 15:51 +0100, lorenzo bolla wrote:
works fine for me.
In [33]: A = numpy.array(['a','aa','b'])
In [34]: B = numpy.array(['d','e'])
In [35]: A.searchsorted(B) Out[35]: array([3, 3])
In [36]: numpy.__version__ Out[36]: '1.0.5.dev4567'
L.
On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hi,
The following gives the wrong answer:
In [2]: A = array(['a','aa','b'])
In [3]: B = array(['d','e'])
In [4]: A.searchsorted(B) Out[4]: array([3, 0])
The answer should be [3,3]. I've come across this while trying to come up with an ismember function which works for strings (setmember1d doesn't seems to assume numerical arrays).
Thanks, James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Hmmm. Just downloaded and installed 1.0.4 and i'm still getting this error. Are you guys using the bleeding edge version or the official 1.0.4 tarball from the webpage? James
Hi, Just tried with numpy from svn and still get this problem:
import numpy numpy.__version__ '1.0.5.dev4763' A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 0])
I guess this must be a platformdependent bug. I'm running python version: Python 2.5 (r25:51908, Nov 6 2007, 15:55:44) [GCC 4.1.2 20070925 (Red Hat 4.1.227)] on linux2 I'm on an Intel Xeon E5345. Any help would be much appreciated. Thanks, James
Problem also with Windows P3 binaries. fwiw, Alan Isaac Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import numpy numpy.__version__ '1.0.4' A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 0])
from docstring in multiarraymodule.c /** @brief Use bisection of sorted array to find first entries >= keys. * * For each key use bisection to find the first index i s.t. key <= arr[i]. * When there is no such index i, set i = len(arr). Return the results in ret. * All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. < * * @param arr contiguous sorted array to be searched. * @param key contiguous array of keys. * @param ret contiguous array of intp for returned indices. * @return void */ static void local_search_left(PyArrayObject *arr, PyArrayObject *key, PyArrayObject *ret) In particular: * All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. < A and B are not of the same type ('S2' is not 'S1'). This should be mentioned somewhere more accessible. L. On 1/31/08, Alan G Isaac <aisaac@american.edu> wrote:
Problem also with Windows P3 binaries. fwiw, Alan Isaac
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import numpy numpy.__version__ '1.0.4' A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 0])
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
Hi,
In particular:
* All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. < In which case, this seems to be an overly strict implementation of searchsorted. Surely all that should be required is that the comparison function can take both types.
James
On Jan 31, 2008 10:33 AM, James Philbin <philbinj@gmail.com> wrote:
Hi,
In particular:
* All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. < In which case, this seems to be an overly strict implementation of searchsorted. Surely all that should be required is that the comparison function can take both types.
True. The problem is knowing when that is the case. The subroutine in question is at the bottom of the heap and don't know nothin'. IIRC, it just sits there and does the comparison by calling through a pointer with char* arguments. Chuck
True. The problem is knowing when that is the case. The subroutine in question is at the bottom of the heap and don't know nothin'. IIRC, it just sits there and does the comparison by calling through a pointer with char* arguments.
What does the comparison function actually look like for the case of dtype='Sn'? Is there no way of sending the underlying types to the comparison, so it can throw an exception if the two data types are not supported? James
On Jan 31, 2008 10:55 AM, James Philbin <philbinj@gmail.com> wrote:
True. The problem is knowing when that is the case. The subroutine in question is at the bottom of the heap and don't know nothin'. IIRC, it just sits there and does the comparison by calling through a pointer with char* arguments.
What does the comparison function actually look like for the case of dtype='Sn'? Is there no way of sending the underlying types to the comparison, so it can throw an exception if the two data types are not supported?
There is an upper level routine that parses the keywords and sets things up. There may even be two upper level routines, but I don't recall. The purpose of the two routines you touched was to pull out a small block of code that could be very simple because of its assumptions. Chuck
I can't fathom where the comparison functions exist in the code. It seems that the comparison signature is of the form (void*, void*, PyArrayObject*), so it doesn't seem possible at the moment to specify a compare function which can reason about the underlying types of the two void*'s. However, I think arrays of strings are a common enough use case that they should work as expected  would it be possible to extend the comparison type to accept two integers specifying the types of the arguments? James On Jan 31, 2008 6:02 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Jan 31, 2008 10:55 AM, James Philbin <philbinj@gmail.com> wrote:
True. The problem is knowing when that is the case. The subroutine in question is at the bottom of the heap and don't know nothin'. IIRC, it
just
sits there and does the comparison by calling through a pointer with char* arguments.
What does the comparison function actually look like for the case of dtype='Sn'? Is there no way of sending the underlying types to the comparison, so it can throw an exception if the two data types are not supported?
There is an upper level routine that parses the keywords and sets things up. There may even be two upper level routines, but I don't recall. The purpose of the two routines you touched was to pull out a small block of code that could be very simple because of its assumptions.
Chuck
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
James Philbin wrote:
I can't fathom where the comparison functions exist in the code. It seems that the comparison signature is of the form (void*, void*, PyArrayObject*), so it doesn't seem possible at the moment to specify a compare function which can reason about the underlying types of the two void*'s. However, I think arrays of strings are a common enough use case that they should work as expected  would it be possible to extend the comparison type to accept two integers specifying the types of the arguments?
The problem is due to the use of an older API in type conversion. I think I can provide a fix in a few minutes. The compare function is typespecific and works for strings but requires the same length string for each argument. It is defined as part of the PyArray_Descr object (defined in arraytypes.inc.src). It may be possible to not require contiguous arrays, but that is a separate issue. Travis O.
James Philbin wrote:
I can't fathom where the comparison functions exist in the code. It seems that the comparison signature is of the form (void*, void*, PyArrayObject*), so it doesn't seem possible at the moment to specify a compare function which can reason about the underlying types of the two void*'s. However, I think arrays of strings are a common enough use case that they should work as expected  would it be possible to extend the comparison type to accept two integers specifying the types of the arguments?
Try out latest SVN. It should have this problem fixed. Look at the diff to see the fix (note that copies are made if the types are not "exactly" the same  meaning the same length if variable types). Travis O.
Try out latest SVN. It should have this problem fixed. Thanks for this. I've realized that for my case, using object arrays is probably best. I still think that long term it would be good to allow comparison functions to take different types, so that one could compare say integer arrays with floating point arrays without doing an upcast.
lorenzo bolla wrote:
* All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. <
A and B are not of the same type ('S2' is not 'S1'). This should be mentioned somewhere more accessible.
It should also raise an exception  something that *sometimes* works, and sometimes doesn't is really asking for trouble! Chris  Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception Chris.Barker@noaa.gov
On Jan 31, 2008 9:17 AM, lorenzo bolla <lbolla@gmail.com> wrote:
from docstring in multiarraymodule.c
/** @brief Use bisection of sorted array to find first entries >= keys. * * For each key use bisection to find the first index i s.t. key <= arr[i]. * When there is no such index i, set i = len(arr). Return the results in ret. * All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. < * * @param arr contiguous sorted array to be searched. * @param key contiguous array of keys. * @param ret contiguous array of intp for returned indices. * @return void */ static void local_search_left(PyArrayObject *arr, PyArrayObject *key, PyArrayObject *ret)
In particular:
* All arrays are assumed contiguous on entry and both arr and key must be of < * the same comparable type. <
Heh. I knew there was a reason I documented that subroutine, I had forgotten about that. Anyway, I think an exception should be thrown in the higher level routine that sets up the call when there is a type mismatch. Some typecasting might also be appropriate. Chuck
Well, i've digged around in the source code and here is a patch which makes it work for the case I wanted:  multiarraymodule.c.old 20080131 17:42:32.000000000 +0000 +++ multiarraymodule.c 20080131 17:43:43.000000000 +0000 @@ 2967,7 +2967,10 @@ char *parr = arr>data; char *pkey = key>data; intp *pret = (intp *)ret>data;  int elsize = arr>descr>elsize; + + int elsize1 = arr>descr>elsize; + int elsize2 = key>descr>elsize; + intp i; for(i = 0; i < nkeys; ++i) { @@ 2975,14 +2978,14 @@ intp imax = nelts; while (imin < imax) { intp imid = imin + ((imax  imin) >> 2);  if (compare(parr + elsize*imid, pkey, key) < 0) + if (compare(parr + elsize1*imid, pkey, key) < 0) imin = imid + 1; else imax = imid; } *pret = imin; pret += 1;  pkey += elsize; + pkey += elsize2; } } @@ 3008,7 +3011,10 @@ char *parr = arr>data; char *pkey = key>data; intp *pret = (intp *)ret>data;  int elsize = arr>descr>elsize; + + int elsize1 = arr>descr>elsize; + int elsize2 = key>descr>elsize; + intp i; for(i = 0; i < nkeys; ++i) { @@ 3016,14 +3022,14 @@ intp imax = nelts; while (imin < imax) { intp imid = imin + ((imax  imin) >> 2);  if (compare(parr + elsize*imid, pkey, key) <= 0) + if (compare(parr + elsize1*imid, pkey, key) <= 0) imin = imid + 1; else imax = imid; } *pret = imin; pret += 1;  pkey += elsize; + pkey += elsize2; } } James
On Jan 31, 2008 10:49 AM, James Philbin <philbinj@gmail.com> wrote:
Well, i've digged around in the source code and here is a patch which makes it work for the case I wanted:
 multiarraymodule.c.old 20080131 17:42:32.000000000 +0000 +++ multiarraymodule.c 20080131 17:43:43.000000000 +0000 @@ 2967,7 +2967,10 @@ char *parr = arr>data; char *pkey = key>data; intp *pret = (intp *)ret>data;  int elsize = arr>descr>elsize; + + int elsize1 = arr>descr>elsize; + int elsize2 = key>descr>elsize; + intp i;
for(i = 0; i < nkeys; ++i) { @@ 2975,14 +2978,14 @@ intp imax = nelts; while (imin < imax) { intp imid = imin + ((imax  imin) >> 2);  if (compare(parr + elsize*imid, pkey, key) < 0) + if (compare(parr + elsize1*imid, pkey, key) < 0) imin = imid + 1; else imax = imid; } *pret = imin; pret += 1;  pkey += elsize; + pkey += elsize2; } }
But is that safe? You have changed the stepping to adjust for the element size, but there is no guarantee that the comparison works. Chuck
I use a dev version (1.0.5.dev4567). L. On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hmmm. Just downloaded and installed 1.0.4 and i'm still getting this error. Are you guys using the bleeding edge version or the official 1.0.4 tarball from the webpage?
James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
No problem for me (also a svn version) : Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) [GCC 4.1.2 20070925 (Red Hat 4.1.233)] on linux2
import numpy A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 3])
Matthieu 2008/1/31, lorenzo bolla <lbolla@gmail.com>:
I use a dev version (1.0.5.dev4567). L.
On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hmmm. Just downloaded and installed 1.0.4 and i'm still getting this error. Are you guys using the bleeding edge version or the official 1.0.4 tarball from the webpage?
James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 French PhD student Website : http://matthieubrucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
I do get the problem with a recent(ish) svn, on OS X 10.5.1, python 2.5.1 (from python.org): In [76]: A = array(['a','aa','b']) In [77]: B = array(['d','e']) In [78]: A.searchsorted(B) Out[78]: array([3, 0]) In [79]: numpy.__version__ Out[79]: '1.0.5.dev4722'
Hi, With my system running x86_64 SUSE10.0 AMD opteron: Under Python 2.5.1 (Python 2.5.1 r251:54863) and numpy 1.0.4 (download of released version) I have the same bug. Under Python 2.4.1 (May 25 2007, 18:41:31) and numpy 1.0.3 I have no problem. Perhaps a 32/64 bit problem? Bruce On Jan 31, 2008 9:17 AM, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
No problem for me (also a svn version) :
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) [GCC 4.1.2 20070925 (Red Hat 4.1.233)] on linux2
import numpy
A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 3])
Matthieu
2008/1/31, lorenzo bolla <lbolla@gmail.com>:
I use a dev version (1.0.5.dev4567). L.
On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hmmm. Just downloaded and installed 1.0.4 and i'm still getting this error. Are you guys using the bleeding edge version or the official 1.0.4 tarball from the webpage?
James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/ _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 French PhD student Website : http://matthieubrucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
oops. it fails also on an SGI Altix with Suse Linux on it: Linux pico 2.6.16.270.9default #1 SMP Tue Feb 13 09:35:18 UTC 2007 ia64 ia64 ia64 GNU/Linux  In [33]: A = numpy.array(['a','aa','b']) In [34]: B = numpy.array(['d','e']) In [35]: A.searchsorted(B) Out[35]: array([3, 0]) In [36]: numpy.__version__ Out[36]: '1.0.5.dev4567'  The problem seems to be in the different dtypes of A and B. Using the same dtype 'S2' it works fine.  In [38]: A = numpy.array(['a','aa','b']) In [39]: A.dtype Out[39]: dtype('S2') In [40]: B = numpy.array(['d','e']) In [41]: A.searchsorted(B) Out[41]: array([3, 0]) In [42]: B = numpy.array(['d','e'], dtype='S2') In [43]: A.searchsorted(B) Out[43]: array([3, 3])  L. On 1/31/08, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
No problem for me (also a svn version) :
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) [GCC 4.1.2 20070925 (Red Hat 4.1.233)] on linux2
import numpy A = numpy.array(['a','aa','b']) B = numpy.array(['d','e']) A.searchsorted(B) array([3, 3])
Matthieu
2008/1/31, lorenzo bolla <lbolla@gmail.com>:
I use a dev version (1.0.5.dev4567). L.
On 1/31/08, James Philbin <philbinj@gmail.com> wrote:
Hmmm. Just downloaded and installed 1.0.4 and i'm still getting this error. Are you guys using the bleeding edge version or the official 1.0.4 tarball from the webpage?
James _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 French PhD student Website : http://matthieubrucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 Lorenzo Bolla lbolla@gmail.com http://lorenzobolla.emurse.com/
Hi, The following gives the wrong answer: In [2]: A = array(['a','aa','b']) In [3]: B = array(['d','e']) In [4]: A.searchsorted(B) Out[4]: array([3, 0]) The answer should be [3,3]. I've come across this while trying to come up with an ismember function which works for strings (setmember1d doesn't seems to assume numerical arrays). Thanks, James
Hi, OK, i'm using: In [6]: numpy.__version__ Out[6]: '1.0.3' Should I try the development version? Which version of numpy would people generally recommend? James
Am Donnerstag, 31. Januar 2008 15:35:25 schrieb James Philbin:
The following gives the wrong answer:
In [2]: A = array(['a','aa','b'])
In [3]: B = array(['d','e'])
In [4]: A.searchsorted(B) Out[4]: array([3, 0])
The answer should be [3,3]. Heh, I got both answers in the same session (not 100% reproducable, but several times already)!
In [1]: %cpaste Pasting code; enter '' alone on the line to stop. :>>> import numpy :>>> A = numpy.array(['a','aa','b']) :>>> B = numpy.array(['d','e']) :>>> print A.searchsorted(B) :>>> print numpy.__version__ : [3 3] 1.0.5.dev4420 In [2]: In [3]: print A.searchsorted(B) [3 3] In [4]: print A.searchsorted(B) [3 3] In [5]: print A.searchsorted(B) [3 3] In [6]: %cpaste Pasting code; enter '' alone on the line to stop. :>>> import numpy :>>> A = numpy.array(['a','aa','b']) :>>> B = numpy.array(['d','e']) :>>> print A.searchsorted(B) :>>> print numpy.__version__ : [3 0] 1.0.5.dev4420  Ciao, / / // / / ANS
participants (11)

Alan G Isaac

Bruce Southey

Charles R Harris

Christopher Barker

Hans Meine

James Philbin

lorenzo bolla

Matthieu Brucher

Nadav Horesh

Robin

Travis E. Oliphant