I am trying to use NumPy to generate some matrix inputs to Maxima for symbolic analysis. I am using a fair number of matrix.astype('S%d'%maxlen) statements. This seems to work very well. It also doesn't seem to pad the elements in anyway if maxlen is bigger than I need, which is great. This may seem like a dumb computer science question, but what is the memory/performance cost of making maxlen bigger than I want (but making sure that it is way bigger than I need so that the elements don't get truncated)? If my biggest matrices will be 13x13, how long can the strings be before I consume more than a few megs (or a few dozen megs) of memory? Thanks, Ryan
I actually have a problem with the elements of a string matrix from
astype('S#'). The shorter elements in my matrix have a bunch of terms
like '1.0', because the matrix they started from was a float. I need
to keep the float type, but want to get rid of the '.0 ' when I
convert the string output to latex. I was going to check if
element[-2:]=='.0' but ran into this problem:
In [15]: temp[-2:]
Out[15]: '\x00\x00'
In [16]: temp.strip()
Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I think I can get rid of the \x00's by calling str(element), but is
this a feature or a bug? It would be slightly cleaner for me if the
string matrix elements didn't have the trailing null characters (or
whatever those are), but this may not be possible given the underlying
representation.
Thanks,
Ryan
On 4/3/06, Ryan Krauss
I am trying to use NumPy to generate some matrix inputs to Maxima for symbolic analysis. I am using a fair number of matrix.astype('S%d'%maxlen) statements. This seems to work very well. It also doesn't seem to pad the elements in anyway if maxlen is bigger than I need, which is great. This may seem like a dumb computer science question, but what is the memory/performance cost of making maxlen bigger than I want (but making sure that it is way bigger than I need so that the elements don't get truncated)? If my biggest matrices will be 13x13, how long can the strings be before I consume more than a few megs (or a few dozen megs) of memory?
Thanks,
Ryan
Ryan Krauss wrote:
I actually have a problem with the elements of a string matrix from astype('S#'). The shorter elements in my matrix have a bunch of terms like '1.0', because the matrix they started from was a float. I need to keep the float type, but want to get rid of the '.0 ' when I convert the string output to latex. I was going to check if element[-2:]=='.0' but ran into this problem:
In [15]: temp[-2:] Out[15]: '\x00\x00'
In [16]: temp.strip() Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I think I can get rid of the \x00's by calling str(element), but is this a feature or a bug?
Probably both. :-) On the one hand, you want to be able to get a useful string out of the array; the nulls are just padding, and the string that you put in was '1.0'. However, suppose that the string you put in was '1.\x00'. Then you would get the "wrong" string out. However, the only real alternative is to also store an integer containing the length of the string with each element. That probably interferes with some of the uses of string arrays.
It would be slightly cleaner for me if the string matrix elements didn't have the trailing null characters (or whatever those are), but this may not be possible given the underlying representation.
You can also use temp.strip('\x00') which is a bit more explicit. -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
Ryan Krauss wrote:
I actually have a problem with the elements of a string matrix from astype('S#'). The shorter elements in my matrix have a bunch of terms like '1.0', because the matrix they started from was a float. I need to keep the float type, but want to get rid of the '.0 ' when I convert the string output to latex. I was going to check if element[-2:]=='.0' but ran into this problem:
In [15]: temp[-2:] Out[15]: '\x00\x00'
In [16]: temp.strip() Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I think I can get rid of the \x00's by calling str(element), but is this a feature or a bug?
Probably both. :-) On the one hand, you want to be able to get a useful string out of the array; the nulls are just padding, and the string that you put in was '1.0'. However, suppose that the string you put in was '1.\x00'. Then you would get the "wrong" string out.
However, the only real alternative is to also store an integer containing the length of the string with each element. That probably interferes with some of the uses of string arrays.
It would be slightly cleaner for me if the string matrix elements didn't have the trailing null characters (or whatever those are), but this may not be possible given the underlying representation.
You can also use temp.strip('\x00') which is a bit more explicit.
Or even temp.rstrip('\x00') which works for all those time you pad the front of your string with '\x00' ;) -tim
Ryan Krauss wrote:
I actually have a problem with the elements of a string matrix from astype('S#'). The shorter elements in my matrix have a bunch of terms like '1.0', because the matrix they started from was a float. I need to keep the float type, but want to get rid of the '.0 ' when I convert the string output to latex. I was going to check if element[-2:]=='.0' but ran into this problem
In [15]: temp[-2:] Out[15]: '\x00\x00'
In [16]: temp.strip() Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I think I can get rid of the \x00's by calling str(element), but is this a feature or a bug?
Of course the elements are padded with '\x00' so that they are all the same length, but we have been trying to make it so that it doesn't matter. Equality testing is one area where it still does. We are using the underlying string equality testing (and it doesn't strip the '\x00'). So, I guess it's a missing feature at this point. -Travis
participants (4)
-
Robert Kern
-
Ryan Krauss
-
Tim Hochberg
-
Travis Oliphant