Slow performance in array protocol with string arrays

Hi, Perhaps this is not very important because it only has effects at high dimensionality, but I think it would be good to send it here for the records. It seems that numarray implementation for the array protocol in string arrays is very slow for dimensionality > 10: In [258]: a=scicore.reshape(scicore.array((1,)), (1,)*15) In [259]: a Out[259]: array([[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]) In [260]: t1=time(); c=numarray.array(a);print time()-t1 0.000355958938599 # numerical conversion is pretty fast: 0.3 ms In [261]: b=scicore.array(a, dtype="S1") In [262]: b Out[262]: array([[[[[[[[[[[[[[[1]]]]]]]]]]]]]]], dtype=(string,1)) In [263]: t1=time(); c=numarray.strings.array(b);print time()-t1 0.61981511116 # string conversion is more than 1000x slower In [264]: t1=time(); d=scicore.array(c);print time()-t1 0.000162839889526 # scipy_core speed seems normal In [266]: t1=time(); d=numarray.strings.array(c);print time()-t1 1.38820910454 # converting numarray strings into themselves is # the slowest! Using numarray 1.5.0 and scipy_core 0.9.2.1763. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Francesc Altet wrote:
Hi,
Perhaps this is not very important because it only has effects at high dimensionality, but I think it would be good to send it here for the records.
It seems that numarray implementation for the array protocol in string arrays is very slow for dimensionality > 10:
In [258]: a=scicore.reshape(scicore.array((1,)), (1,)*15)
In [259]: a Out[259]: array([[[[[[[[[[[[[[[1]]]]]]]]]]]]]]])
In [260]: t1=time(); c=numarray.array(a);print time()-t1 0.000355958938599 # numerical conversion is pretty fast: 0.3 ms
In [261]: b=scicore.array(a, dtype="S1")
In [262]: b Out[262]: array([[[[[[[[[[[[[[[1]]]]]]]]]]]]]]], dtype=(string,1))
In [263]: t1=time(); c=numarray.strings.array(b);print time()-t1 0.61981511116 # string conversion is more than 1000x slower
In [264]: t1=time(); d=scicore.array(c);print time()-t1 0.000162839889526 # scipy_core speed seems normal
In [266]: t1=time(); d=numarray.strings.array(c);print time()-t1 1.38820910454 # converting numarray strings into themselves is # the slowest!
Using numarray 1.5.0 and scipy_core 0.9.2.1763.
Cheers
I logged this on Source Forge with the growing collection of numarray.strings issues. For now, strings.array() isn't taking advantage of the new array protocol and is implemented largely in Python. Todd

Francesc Altet wrote:
It seems that numarray implementation for the array protocol in string arrays is very slow for dimensionality > 10:
OK, I'll bite -- what in the world do you need an array of strings with dimensionality > 10 for ? Which brings up another curiosity: I'm all in favor of not having arbitrary limits on anything, but I'm curious what the largest rank NumPy array anyone has ever had a real use for is? I don't think I've ever used rank > 3, or maybe 4. Anyone have a use case for a very large rank array? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Which brings up another curiosity: I'm all in favor of not having arbitrary limits on anything, but I'm curious what the largest rank NumPy array anyone has ever had a real use for is? I don't think I've ever used rank > 3, or maybe 4.
Anyone have a use case for a very large rank array?
Depends on your definition of "very". In neuroimaging at least, rank 4 is a standard dataset "unit" (3D+time). If you then include subjects, replications (same day), and sessions (i.e., testing on different days), that's rank=7. Can't say as I've ever reached 10 though. ;-) -best Gary -------------------------------------------------------------- Gary Strangman, PhD | Director, Neural Systems Group Office: 617-724-0662 | Massachusetts General Hospital Fax: 617-726-4078 | 149 13th Street, Ste 10018 strang@nmr.mgh.harvard.edu | Charlestown, MA 02129 http://www.nmr.mgh.harvard.edu/NSG/

I fixed a performance bug in numarray.strings so the lion's share of this problem is now gone in CVS: scipy -> numarray Int32 0.000634908676147 scipy -> numarray S1 0.000502109527588 numarray -> scipy S1 0.000125885009766 numarray -> numarray S1 0.00110602378845 Things could be further improved by adding "import" support for the newcore array protocol to numarray.strings. Todd Gary Strangman wrote:
Which brings up another curiosity: I'm all in favor of not having arbitrary limits on anything, but I'm curious what the largest rank NumPy array anyone has ever had a real use for is? I don't think I've ever used rank > 3, or maybe 4.
Anyone have a use case for a very large rank array?
Depends on your definition of "very". In neuroimaging at least, rank 4 is a standard dataset "unit" (3D+time). If you then include subjects, replications (same day), and sessions (i.e., testing on different days), that's rank=7. Can't say as I've ever reached 10 though. ;-)
-best Gary
-------------------------------------------------------------- Gary Strangman, PhD | Director, Neural Systems Group Office: 617-724-0662 | Massachusetts General Hospital Fax: 617-726-4078 | 149 13th Street, Ste 10018 strang@nmr.mgh.harvard.edu | Charlestown, MA 02129 http://www.nmr.mgh.harvard.edu/NSG/
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Good! Thanks Todd A Dimecres 04 Gener 2006 19:38, Todd Miller va escriure:
I fixed a performance bug in numarray.strings so the lion's share of this problem is now gone in CVS:
scipy -> numarray Int32 0.000634908676147 scipy -> numarray S1 0.000502109527588 numarray -> scipy S1 0.000125885009766 numarray -> numarray S1 0.00110602378845
Things could be further improved by adding "import" support for the newcore array protocol to numarray.strings.
Todd
--
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

A Dimecres 04 Gener 2006 18:27, Christopher Barker va escriure:
Francesc Altet wrote:
It seems that numarray implementation for the array protocol in string arrays is very slow for dimensionality > 10:
OK, I'll bite -- what in the world do you need an array of strings with dimensionality > 10 for ?
Good question. The fact is that Numeric, numarray and scipy_core has all been designed to support objects up to 32 (and perhaps 40 in some cases) dimensions. Why? I must confess that I don't exactly know, but when your mission is to check every bit of the implementation and push your package to its limits, you may encounter very funny things that probably will never be a problem for real users. Somehow, this kind of issues with high dimensionalities are sometimes useful for isolating other potential problems. So, yeah, one can say that they are usally a loss of time, but sometimes they can save your life (well, kind of ;-). --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

I ran up against the dimesion limit in Numeric back when it was 10. Or 20, it was actually defined inconsistently in different places, and you could crash the interpreter by creating and array w/ >10 dim in a part of the code that would let you do that and feeding it to a part that wouldn't. I didn't complain about the limit because the code I was working on was a toy^H^H^Hlearning exercise and I didn't complain about the inconsistency because I suck. I was trying to make an FFT routine that would reshape the input sequence to have one dimension per prime factor of its length, and then manipulate that. Warren Focke On Wed, 4 Jan 2006, Francesc Altet wrote:
A Dimecres 04 Gener 2006 18:27, Christopher Barker va escriure:
Francesc Altet wrote:
It seems that numarray implementation for the array protocol in string arrays is very slow for dimensionality > 10:
OK, I'll bite -- what in the world do you need an array of strings with dimensionality > 10 for ?
Good question. The fact is that Numeric, numarray and scipy_core has all been designed to support objects up to 32 (and perhaps 40 in some cases) dimensions. Why? I must confess that I don't exactly know, but when your mission is to check every bit of the implementation and push your package to its limits, you may encounter very funny things that probably will never be a problem for real users.
Somehow, this kind of issues with high dimensionalities are sometimes useful for isolating other potential problems. So, yeah, one can say that they are usally a loss of time, but sometimes they can save your life (well, kind of ;-).
--
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&opÌk _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
participants (5)
-
Christopher Barker
-
Francesc Altet
-
Gary Strangman
-
Todd Miller
-
Warren Focke