[Numpy-discussion] problem converting to matrix from Unicode input string

Fri Oct 5 00:13:34 EDT 2007

Hello all,

I have the following function, with print statements inserted for
debugging:

import numpy
def file2mat(inFile, sep=None, T=True):
    try:
        input = inFile.readlines()
        print "input=%s" % input
    except:
        raise
    finally:
        inFile.close()
    data = [line.split(sep) for line in input]
    print "data=%s" % data
    if T==True:
        return numpy.mat(data).astype(numpy.float64).T
    else:
        return numpy.mat(data).astype(numpy.float64)

which is then tested as follows:

>>> s = "-0.500 -0.500\n0.500 -0.500\n-0.500 0.500\n0.500 0.500"
>>> u = unicode(s)
>>> from cStringIO import StringIO
>>> file2mat(StringIO(s))
input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500
0.500']
data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'],
['0.500', '0.500']]
matrix([[-0.5,  0.5, -0.5,  0.5],
        [-0.5, -0.5,  0.5,  0.5]])

This is the expected result matrix... But now:

>>> file2mat(StringIO(u))
input=['-\x000\x00.\x005\x000\x000\x00 \x00-
\x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00 \x00-
\x000\x00.\x005\x000\x000\x00\n', '\x00-\x000\x00.\x005\x000\x000\x00
\x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00
\x000\x00.\x005\x000\x000\x00']
data=[['-\x000\x00.\x005\x000\x000\x00', '\x00-
\x000\x00.\x005\x000\x000\x00'], ['\x000\x00.\x005\x000\x000\x00',
'\x00-\x000\x00.\x005\x000\x000\x00'], ['\x00-
\x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00'],
['\x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00']]
Traceback (most recent call last):
    ...
ValueError: invalid literal for float(): -

When I explicitly cast the input to 'string', I get the expected
result:

>>> file2mat(StringIO(str(u)))
input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500
0.500']
data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'],
['0.500', '0.500']]
matrix([[-0.5,  0.5, -0.5,  0.5],
        [-0.5, -0.5,  0.5,  0.5]])

Any suggestions on how to improve my code?
Is this a Unicode issue, numpy issue, or both?
The input string can come from an ASCII file or a GUI text control. In
the case of a GUI, the control returns a Unicode string, so for now I
am casting it to str(), but it seems like a hack..

BTW, the reason that I am using the astype() method of numpy.matrix is
that I get a "ValueError: setting an array element with a sequence"
when trying to use
    return numpy.mat(data, numpy.float64)
in the above function.

Thank you,
-Basilisk96