[Numpy-discussion] How to create a boolean sub-array from a larger string array?
Robert Kern
robert.kern at gmail.com
Sat Jun 23 02:14:23 EDT 2007
Andriy Basilisk wrote:
> Hello all,
>
> My challenge is this:
> I'm working on an application that parses numerical data from a text
> report using regular expressions, and then places the results in Numpy
> matrices for processing. The data contains integers, floats, and
> boolean values. The boolean values are represented in the text file
> by either an empty string '', or by a star '*'. The regex parser
> creates a sequence of nested lists that is readily converted to a MxN
> string-type matrix. Then, the necessary rows of that matrix are
> sliced to create the necessary new sub-matrices.
>
> Here is a simplified sample of my solution so far:
>
> import numpy as _N
> data = [['1', '5.30', '', '3.44', '*'], ['2', '-4.12', '*', '-1.24',
> ''], ['3', '0.45', '', '3.22', '*']]
> mdat = _N.mat(data).T # mdat.shape is now (5,3)
> ids = mdat[0,].astype(_N.int) #this works for str->int
> noms = mdat[(1,3),].astype(_N.float64) #same idea also works for
> str->float64
> ## The following technique would be nice, but
> ## it causes a ValueError: invalid literal for int() with base 10: ''
> outs = mdat[(2,4),].astype(_N.bool)
> ## Instead, I have to convert the strings to '0' or '1'
> ## explicitly, then cast them to a bool matrix:
> for i, b in enumerate(mdat[(2,4),].T):
> mdat[2, i] = 1 if mdat[2, i] else 0
> mdat[4, i] = 1 if mdat[4, i] else 0
> outs = mdat[(2,4),].astype(_N.bool)
>
> I was expecting the above to behave similar to the Python bool()
> function on strings:
> >>> bool(''), bool('*')
> (False, True)
> but it doesn't work that way.
>
> Can anyone enlighten me as to why slices of my string matrix cannot be
> cast to boolean matrices?
It's kind of a toss-up as to what's needed in general. I suspect that for the
majority of cases, one deals with strings of '0' and '1' instead of empty
strings and non-empty strings.
You can always use something like
mdat[[2,4]] == '*'
to get the boolean array you want. This scheme can work with any string
representation of True and False.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list