How to create a boolean subarray from a larger string array?
Hello all, My challenge is this: I'm working on an application that parses numerical data from a text report using regular expressions, and then places the results in Numpy matrices for processing. The data contains integers, floats, and boolean values. The boolean values are represented in the text file by either an empty string '', or by a star '*'. The regex parser creates a sequence of nested lists that is readily converted to a MxN stringtype matrix. Then, the necessary rows of that matrix are sliced to create the necessary new submatrices. Here is a simplified sample of my solution so far: import numpy as _N data = [['1', '5.30', '', '3.44', '*'], ['2', '4.12', '*', '1.24', ''], ['3', '0.45', '', '3.22', '*']] mdat = _N.mat(data).T # mdat.shape is now (5,3) ids = mdat[0,].astype(_N.int) #this works for str>int noms = mdat[(1,3),].astype(_N.float64) #same idea also works for str>float64 ## The following technique would be nice, but ## it causes a ValueError: invalid literal for int() with base 10: '' outs = mdat[(2,4),].astype(_N.bool) ## Instead, I have to convert the strings to '0' or '1' ## explicitly, then cast them to a bool matrix: for i, b in enumerate(mdat[(2,4),].T): mdat[2, i] = 1 if mdat[2, i] else 0 mdat[4, i] = 1 if mdat[4, i] else 0 outs = mdat[(2,4),].astype(_N.bool) I was expecting the above to behave similar to the Python bool() function on strings:
bool(''), bool('*') (False, True) but it doesn't work that way.
Can anyone enlighten me as to why slices of my string matrix cannot be cast to boolean matrices? I'd rather not have to resort to the 'for' loop if there is a smarter way to do this. If an intermediate numpy.array is required instead of numpy.matrix as I have shown here, it's acceptable. I am using the matrix class in this case because the application thrives on it. I'm using Python 2.5 and NumPy 1.0.1 on WinXP. Any help and useful comments will be appreciated, Basilisk96
Andriy Basilisk wrote:
Hello all,
My challenge is this: I'm working on an application that parses numerical data from a text report using regular expressions, and then places the results in Numpy matrices for processing. The data contains integers, floats, and boolean values. The boolean values are represented in the text file by either an empty string '', or by a star '*'. The regex parser creates a sequence of nested lists that is readily converted to a MxN stringtype matrix. Then, the necessary rows of that matrix are sliced to create the necessary new submatrices.
Here is a simplified sample of my solution so far:
import numpy as _N data = [['1', '5.30', '', '3.44', '*'], ['2', '4.12', '*', '1.24', ''], ['3', '0.45', '', '3.22', '*']] mdat = _N.mat(data).T # mdat.shape is now (5,3) ids = mdat[0,].astype(_N.int) #this works for str>int noms = mdat[(1,3),].astype(_N.float64) #same idea also works for str>float64 ## The following technique would be nice, but ## it causes a ValueError: invalid literal for int() with base 10: '' outs = mdat[(2,4),].astype(_N.bool) ## Instead, I have to convert the strings to '0' or '1' ## explicitly, then cast them to a bool matrix: for i, b in enumerate(mdat[(2,4),].T): mdat[2, i] = 1 if mdat[2, i] else 0 mdat[4, i] = 1 if mdat[4, i] else 0 outs = mdat[(2,4),].astype(_N.bool)
I was expecting the above to behave similar to the Python bool() function on strings:
bool(''), bool('*') (False, True) but it doesn't work that way.
Can anyone enlighten me as to why slices of my string matrix cannot be cast to boolean matrices?
It's kind of a tossup as to what's needed in general. I suspect that for the majority of cases, one deals with strings of '0' and '1' instead of empty strings and nonempty strings. You can always use something like mdat[[2,4]] == '*' to get the boolean array you want. This scheme can work with any string representation of True and False.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco
participants (2)

Andriy Basilisk

Robert Kern