I have a set of limits, e.g. array([0, 35, 45, 55, 75]) and I want to use these to "classify" a set of numbers (another one-dimensional array). The "class" is the number of the first limit that is lower than or equal to the number I want to classify. E.g. I'd classify 17 as 0 and 42 as 1. My current approach is: sum(nums[:,NewAxis] >= lims, dim=-1) It seems a bit unnecessary to compare each number with all the limits when O(log(n)) would suffice (the limits are ordered); or even with O(n) running time, a smarter implementation could get an average of n/2 comparisons... Suggestions? -- Magnus Lie Hetland http://hetland.org
Magnus Lie Hetland wrote:
I have a set of limits, e.g. array([0, 35, 45, 55, 75]) and I want to use these to "classify" a set of numbers (another one-dimensional array). The "class" is the number of the first limit that is lower than or equal to the number I want to classify. E.g. I'd classify 17 as 0 and 42 as 1.
My current approach is:
sum(nums[:,NewAxis] >= lims, dim=-1)
It seems a bit unnecessary to compare each number with all the limits when O(log(n)) would suffice (the limits are ordered); or even with O(n) running time, a smarter implementation could get an average of n/2 comparisons...
Suggestions?
Try searchsorted(). Searchsorted returns the index of the first bin >= the number being classified and has O(log(n)) running time.
a=numarray.array([0, 35, 45, 55, 75]) numarray.searchsorted(a, [1,42,35]) array([1, 2, 1])
Todd
Todd Miller <jmiller@stsci.edu>: [snip]
Try searchsorted().
Ah! <slaps forehead> Thanks! That's exactly what I was looking for; I think I've seen it before, but somehow I missed this when looking for it... Thanks again, Magnus -- Magnus Lie Hetland http://hetland.org
Hi folks, I use Numeric an wxPython together a lot (of course I do, I use Numeric for everything!). Unfortunately, since wxPython is not Numeric aware, you lose some real potential performance advantages. For example, I'm now working on expanding the extensions to graphics device contexts (DCs) so that you can draw a whole bunch of objects with a single Python call. The idea is that the looping can be done in C++, rather than Python, saving a lot of overhead of the loop itself, as well as the Python-wxWindows translation step. For drawing thousands of points, the speed-up is substantial. It's less substantial on more complex objects (rectangles give a factor of two improvement for ~1000 objects), due to the longer time it takes to draw the object itself, rather than make the call. Anyway, at the moment, Robin Dunn has the wrappers set up so that you can pass in a NumPy array (or, indeed, and sequence) rather than a list or tuple of coordinates, but it is faster to use a list than a NumPy array, because for arrays, it uses the generic PySequence_GetItem call. If we used the NumPy API directly, it should be faster than using a list, not slower! THis is how a representative section of the code looks now: bool isFastSeq = PyList_Check(pyPoints) || PyTuple_Check(pyPoints); . . . // Get the point coordinants if (isFastSeq) { obj = PySequence_Fast_GET_ITEM(pyPoints, i); } else { obj = PySequence_GetItem(pyPoints, i); } . . . So you can see that if a NumPy array is passed in, PySequence_GetItem will be used. What I would like to do is have an isNumPyArray check, and then access the NumPy array directly in that case. The tricky part is that Robin does not want to have wxPython require Numeric. (Oh how I dream of the day that NumArray becomes part of the standard library!) How can I check if an Object is a NumPy array (and then use it as such), without including Numeric during compilation? I know one option is to have condition compilation, with a NumPy and non-Numpy version, but Robin is managing a whole lot of different version as it is, and I don't think he wants to deal with twice as many! Anyone have any ideas? By the way, you can substitute NumArray for NumPy in this, as it is the wave of the future, and particularly if it would be easier. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
If you could do: try: import Numeric haveNumeric = 1 except: haveNumeric = 0 in some initialization routine, then you could use this flag. Alternately you could test on the fly 'Numeric' in [m.__name__ for m in sys.modules]
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Chris Barker Sent: Wednesday, January 15, 2003 9:22 AM Cc: Numpy-discussion Subject: [Numpy-discussion] Optionally using Numeric in another compiled extension package.
Hi folks,
I use Numeric an wxPython together a lot (of course I do, I use Numeric for everything!).
Unfortunately, since wxPython is not Numeric aware, you lose some real potential performance advantages. For example, I'm now working on expanding the extensions to graphics device contexts (DCs) so that you can draw a whole bunch of objects with a single Python call. The idea is that the looping can be done in C++, rather than Python, saving a lot of overhead of the loop itself, as well as the Python-wxWindows translation step.
For drawing thousands of points, the speed-up is substantial. It's less substantial on more complex objects (rectangles give a factor of two improvement for ~1000 objects), due to the longer time it takes to draw the object itself, rather than make the call.
Anyway, at the moment, Robin Dunn has the wrappers set up so that you can pass in a NumPy array (or, indeed, and sequence) rather than a list or tuple of coordinates, but it is faster to use a list than a NumPy array, because for arrays, it uses the generic PySequence_GetItem call. If we used the NumPy API directly, it should be faster than using a list, not slower! THis is how a representative section of the code looks now:
bool isFastSeq = PyList_Check(pyPoints) || PyTuple_Check(pyPoints); . . . // Get the point coordinants if (isFastSeq) { obj = PySequence_Fast_GET_ITEM(pyPoints, i); } else { obj = PySequence_GetItem(pyPoints, i); }
. . .
So you can see that if a NumPy array is passed in, PySequence_GetItem will be used.
What I would like to do is have an isNumPyArray check, and then access the NumPy array directly in that case.
The tricky part is that Robin does not want to have wxPython require Numeric. (Oh how I dream of the day that NumArray becomes part of the standard library!) How can I check if an Object is a NumPy array (and then use it as such), without including Numeric during compilation?
I know one option is to have condition compilation, with a NumPy and non-Numpy version, but Robin is managing a whole lot of different version as it is, and I don't think he wants to deal with twice as many!
Anyone have any ideas?
By the way, you can substitute NumArray for NumPy in this, as it is the wave of the future, and particularly if it would be easier.
-Chris
-- Christopher Barker, Ph.D. Oceanographer
NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
------------------------------------------------------- This SF.NET email is sponsored by: A Thawte Code Signing Certificate is essential in establishing user confidence by providing assurance of authenticity and code integrity. Download our Free Code Signing guide: http://ads.sourceforge.net/cgi-> bin/redirect.pl?thaw0028en
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Paul F Dubois wrote:
If you could do: try: import Numeric haveNumeric = 1 except: haveNumeric = 0
in some initialization routine, then you could use this flag. Alternately you could test on the fly 'Numeric' in [m.__name__ for m in sys.modules]
Thanks, but I'm talking about doing this at the C++ level in an extension package, not at the Python level. This kind of thing is Soo much easier in Python, of course! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On woensdag, jan 15, 2003, at 19:01 Europe/Amsterdam, Chris Barker wrote:
Paul F Dubois wrote:
If you could do: try: import Numeric haveNumeric = 1 except: haveNumeric = 0
in some initialization routine, then you could use this flag. Alternately you could test on the fly 'Numeric' in [m.__name__ for m in sys.modules]
Thanks, but I'm talking about doing this at the C++ level in an extension package, not at the Python level. This kind of thing is Soo much easier in Python, of course!
This can be done, but it is difficult, and you need the cooperation of both parties (Numeric and wxPython, in this case). The problem is that you need a way to pass C pointers from one extension module to the other. One of the pointers you want to pass is the PyTypeObject, so you can check that an object passed in from Python is of the correct type. Another is the address of some C routine that will get you a C pointer to the data. The first one may be visible from Python (so you can get at it through normal means) but the second one won't be. The dirty way to do this (and you should probably avoid this) is to put these pointers into Python integers in the supplying module, and put them in the module namespace with a funny name (__ConvertToCPointerAddress). In wxPython you import Numeric, and if it succeeds you look up the funny name, convert the Python integer to a C pointer, cross your fingers, and call the address. A cleaner way to do this is with cobject objects. These are in the core, in Objects/cobject.c. Numeric exports a cobject (again named __ConvertToCPointerAddress) with the address of the routine as the value. But, and this is the nice bit, cobjects can be passed along by Python code but can't be fiddled with. And cobject.c even provides a C function PyCObject_Import(char *modulename, char *attributename) which directly returns you the pointer you're looking for by importing the module, looking up the name, checking that it's a cobject and extracting the value. And it even has support for "protocols": Cobjects have an extra field called the description, again only settable and readable from C. Modules that don't know about each others' existence could still decide on a common description that would signify that the pointer in the cobject has a specific meaning. We could decide here that if the description is the C string "this pointer is a function that you pass one Python object and that returns the data just as Numeric would store it" would fit that bill, and anyone in the world writing an extension module could follow the protocol. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
Chris Barker wrote:
Hi folks,
I use Numeric an wxPython together a lot (of course I do, I use Numeric for everything!).
Unfortunately, since wxPython is not Numeric aware, you lose some real potential performance advantages. For example, I'm now working on expanding the extensions to graphics device contexts (DCs) so that you can draw a whole bunch of objects with a single Python call. The idea is that the looping can be done in C++, rather than Python, saving a lot of overhead of the loop itself, as well as the Python-wxWindows translation step.
For drawing thousands of points, the speed-up is substantial. It's less substantial on more complex objects (rectangles give a factor of two improvement for ~1000 objects), due to the longer time it takes to draw the object itself, rather than make the call.
Anyway, at the moment, Robin Dunn has the wrappers set up so that you can pass in a NumPy array (or, indeed, and sequence) rather than a list or tuple of coordinates, but it is faster to use a list than a NumPy array, because for arrays, it uses the generic PySequence_GetItem call. If we used the NumPy API directly, it should be faster than using a list, not slower! THis is how a representative section of the code looks now:
bool isFastSeq = PyList_Check(pyPoints) || PyTuple_Check(pyPoints); . . . // Get the point coordinants if (isFastSeq) { obj = PySequence_Fast_GET_ITEM(pyPoints, i); } else { obj = PySequence_GetItem(pyPoints, i); }
. . .
So you can see that if a NumPy array is passed in, PySequence_GetItem will be used.
What I would like to do is have an isNumPyArray check, and then access the NumPy array directly in that case.
The tricky part is that Robin does not want to have wxPython require Numeric. (Oh how I dream of the day that NumArray becomes part of the standard library!) How can I check if an Object is a NumPy array (and then use it as such), without including Numeric during compilation?
I know one option is to have condition compilation, with a NumPy and non-Numpy version, but Robin is managing a whole lot of different version as it is, and I don't think he wants to deal with twice as many!
Anyone have any ideas?
Use the Python C-API and string literals as the basis for the interface. I think the steps are something like this: 1. Import "Numeric". (PyImport_ImportModule) 2. Get the module dictionary. (PyModule_GetDict) 3. Get "array" out of the dictionary. (PyDict_GetItemString) 4. Call "isinstance" on Numeric.array and the object. (PyObject_IsInstance) Similarly: 1. Import "numarray". 2. Get the module dictionary. 3. Get "NumArray" out of the dictionary 4. Call the C-API equivalent of "isinstance" on numarray.NumArray and the object. The first 3 steps of both cases can be initialized once, I think, and stored in C static variables to avoid repeated fetches. If any of the first 3 steps fail, then consider that case failed and returning False. If it's not a Numeric array, check to see if it's a numarray.
By the way, you can substitute NumArray for NumPy in this, as it is the wave of the future, and particularly if it would be easier.
-Chris
Todd
Todd Miller wrote:
Chris Barker wrote:
How can I check if an Object is a NumPy array (and then use it as such), without including Numeric during compilation?
I know one option is to have condition compilation, with a NumPy and non-Numpy version, but Robin is managing a whole lot of different version as it is, and I don't think he wants to deal with twice as many!
Anyone have any ideas?
Use the Python C-API and string literals as the basis for the interface. I think the steps are something like this:
1. Import "Numeric". (PyImport_ImportModule)
2. Get the module dictionary. (PyModule_GetDict)
3. Get "array" out of the dictionary. (PyDict_GetItemString)
4. Call "isinstance" on Numeric.array and the object. (PyObject_IsInstance)
Similarly:
1. Import "numarray".
2. Get the module dictionary.
3. Get "NumArray" out of the dictionary
4. Call the C-API equivalent of "isinstance" on numarray.NumArray and the object.
The first 3 steps of both cases can be initialized once, I think, and stored in C static variables to avoid repeated fetches.
On second thought, just do two functions, one for Numeric, one for numarray. If any of the first 3 steps fail, return False. Otherwise, return the result of the isinstance call.
If it's not a Numeric array, check to see if it's a numarray.
My idea to couple these was "not good". They're not compatible at that level anyway. Since numarray and Numeric are only source level compatible, C-code can be compiled to work with one or the other, but not both at the same time. It probably makes more sense to just implement for Numeric. If you do want to implement for both, treat them as seperate cases with seperate recognizer functions and element access code. But... It's not clear to me that knowing an object is an array will help since getting data elements still has to be done fast, and that seems hard to do without knowing the arrayobject struct. Keep in mind that Numeric and numarray arrays are strided and possibly discontiguous, so there's more to data access than owning a base pointer, as would be the case in C. Todd
A Dimecres 15 Gener 2003 21:16, Todd Miller va escriure:
My idea to couple these was "not good". They're not compatible at that level anyway.
Since numarray and Numeric are only source level compatible, C-code can be compiled to work with one or the other, but not both at the same time. It probably makes more sense to just implement for Numeric. If you do want to implement for both, treat them as seperate cases with seperate recognizer functions and element access code.
But... It's not clear to me that knowing an object is an array will help since getting data elements still has to be done fast, and that seems hard to do without knowing the arrayobject struct. Keep in mind that Numeric and numarray arrays are strided and possibly discontiguous, so there's more to data access than owning a base pointer, as would be the case in C.
I think you can use the numarray High-Level C API to overcome these dificulties. For example, by using the calls: PyArrayObject* NA InputArray(PyObject *numarray, NumarrayType t, int requires) PyArrayObject* NA OutputArray(PyObject *numarray, NumarrayType t, int requires) PyArrayObject* NA IoArray(PyObject *numarray, NumarrayType t, int requires) as documented in the User's Guide, you can get well-behaved (i.e. contiguous and well-aligned) C arrays (copying them, if needed) from both numarray or Numeric arrays if you pass C_ARRAY as the value for requires parameter. In fact, I'm using the InputArray in PyTables to manage both numarray and Numeric arrays with good results. -- Francesc Alted
Francesc Alted wrote:
A Dimecres 15 Gener 2003 21:16, Todd Miller va escriure:
But... It's not clear to me that knowing an object is an array will help since getting data elements still has to be done fast, and that seems hard to do without knowing the arrayobject struct. Keep in mind that Numeric and numarray arrays are strided and possibly discontiguous, so there's more to data access than owning a base pointer, as would be the case in C.
I think you can use the numarray High-Level C API to overcome these dificulties.
<snip> But doesn't using the numarray C-API require a level of coupling (direct knowledge of numarray during compilation) that Chris is trying to avoid?
Todd
A Dimecres 15 Gener 2003 21:54, Todd Miller va escriure:
I think you can use the numarray High-Level C API to overcome these dificulties.
But doesn't using the numarray C-API require a level of coupling (direct knowledge of numarray during compilation) that Chris is trying to avoid?
Ooops!, you are right. Perhaps this kind of scenario (accessing Numeric and numarray arrays from C) would be more and more common as people is getting more aware of the numarray capabilities and want to integrate it in their extensions. That reinforces me in the belief that having a small core with the "glue" functionality between numarray objects and 3rd party extensions in C (or SWIG, Pyrex or whatever) can be a good thing (until numarray is in the Standard Library). That way, people interested in supporting numarray objects in their extensions has only to install this small core (or even include it as part of the extension). Well, speaking as non-interested and impartial person ;-) -- Francesc Alted
Francesc Alted wrote:
that having a small core with the "glue" functionality between numarray objects and 3rd party extensions in C (or SWIG, Pyrex or whatever) can be a good thing (until numarray is in the Standard Library).
That way, people interested in supporting numarray objects in their extensions has only to install this small core (or even include it as part of the extension).
I think that's a fabulous idea, but I have no idea how hard it would be. There would still be the problem of keeping versions in-sync. If I distributed my package with the glue code, it would only work on installations using the same version of Numeric (or NumArray, I suppose) Thanks to all who have commented on my post. These are some ideas I now have based on your comments:
Use the Python C-API and string literals as the basis for the interface. I think the steps are something like this:
1. Import "Numeric". (PyImport_ImportModule)
2. Get the module dictionary. (PyModule_GetDict)
3. Get "array" out of the dictionary. (PyDict_GetItemString)
4. Call "isinstance" on Numeric.array and the object. (PyObject_IsInstance)
OK, so now I can know, at runtime, whether Numeric has been imported.
But... It's not clear to me that knowing an object is an array will help since getting data elements still has to be done fast, and that seems hard to do without knowing the arrayobject struct.
Exactly. that's my whole problem. However, I have an idea about this. If I do the above test, I can now put all the Numeric specific code into a conditional, so it would only get called in Numeric were imported. My idea is that I could make sure Numeric was around at compile time, so I could use all the Numeric API to access the array data, but it wouldn't have to be installed at runtime, as none of the Numeric calls would be executed if Numeric hadn't been imported. Would this work, or would the system try to load the .dll or .so or whatever even if the calls weren't executed? All that being said, Tim Hochberg has mentioned that when he first made wxPython DCs work with Numeric Arrays,( sorry I didn't give him credit before, I had forgotten who did that, thanks Tim ) he did some timing and discovered that the the overhead of the drawing calls was substantially larger than the overhead of the indexing anyway, so speedin up that process couldn't make much difference. My timing indicated something different, but I'm using Linux/wxGTK/X11, and I think the drawing calls return after the message has been sent to X, but X may not have completed the actual drawing yet. This means that I'm not timing the whole process, and if I did, I might not see such a difference. I did some tests with 100,000 points, and found that I could see the difference with a List and Array, and the List was about twice as fast. Drawing rectangles, however, I can't see the difference. So, I think I'll probably shelve this for the moment, and concentrate on getting all the drawing shapes supported by DrawXXXList methods. Thanks for all your input. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Wed, 15 Jan 2003, Chris Barker wrote: [...]
My idea is that I could make sure Numeric was around at compile time, so I could use all the Numeric API to access the array data, but it wouldn't have to be installed at runtime, as none of the Numeric calls would be executed if Numeric hadn't been imported. Would this work, or would the system try to load the .dll or .so or whatever even if the calls weren't executed?
One way is to import a dynamic library, explicitly, which has glue code to handle the array objects when you need them. [...]
My timing indicated something different, but I'm using Linux/wxGTK/X11, and I think the drawing calls return after the message has been sent to X, but X may not have completed the actual drawing yet.
That's right. X's communication model between client and server is asynchronous.
This means that I'm not timing the whole process, and if I did, I might not see such a difference.
You can synchronise the output buffer using XSync(3) and then do the timing. Peter
peter.chang@nottingham.ac.uk wrote:
You can synchronise the output buffer using XSync(3) and then do the timing.
I'd love to try this, but I confess I have no idea how! I'm working with the *.i files that tell swig what to add when creating wrappers around wxWindows for Python. wxWindows is using wxGTK, which is using GTK, which is using Xlib (I think, so I'm pretty far away from X, and I barely know enough C/C++ to attempt this. I suppose I could try including Xlib, then calling XSync, but I need to pass a reference to a disply. I have not idea how to get that. Any hints? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, 16 Jan 2003, Chris Barker wrote:
peter.chang@nottingham.ac.uk wrote:
You can synchronise the output buffer using XSync(3) and then do the timing.
Oops, that should be XSynchronize(3). [...]
I suppose I could try including Xlib, then calling XSync, but I need to pass a reference to a disply. I have not idea how to get that.
Any hints?
wxGetDisplayName() gives the Display name but not a pointer to the display structure. So this is not much help. In gtk+, any program can be called with --sync to aid debugging. I'd guess wxWindows may allow you to do the same. Peter
participants (7)
-
Chris Barker
-
Francesc Alted
-
Jack Jansen
-
Magnus Lie Hetland
-
Paul F Dubois
-
peter.chang@nottingham.ac.uk
-
Todd Miller