problem converting to matrix from Unicode input string
Hello all, I have the following function, with print statements inserted for debugging: import numpy def file2mat(inFile, sep=None, T=True): try: input = inFile.readlines() print "input=%s" % input except: raise finally: inFile.close() data = [line.split(sep) for line in input] print "data=%s" % data if T==True: return numpy.mat(data).astype(numpy.float64).T else: return numpy.mat(data).astype(numpy.float64) which is then tested as follows:
s = "-0.500 -0.500\n0.500 -0.500\n-0.500 0.500\n0.500 0.500" u = unicode(s) from cStringIO import StringIO file2mat(StringIO(s)) input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500 0.500'] data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'], ['0.500', '0.500']] matrix([[-0.5, 0.5, -0.5, 0.5], [-0.5, -0.5, 0.5, 0.5]])
This is the expected result matrix... But now:
file2mat(StringIO(u)) input=['-\x000\x00.\x005\x000\x000\x00 \x00- \x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00 \x00- \x000\x00.\x005\x000\x000\x00\n', '\x00-\x000\x00.\x005\x000\x000\x00 \x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00 \x000\x00.\x005\x000\x000\x00'] data=[['-\x000\x00.\x005\x000\x000\x00', '\x00- \x000\x00.\x005\x000\x000\x00'], ['\x000\x00.\x005\x000\x000\x00', '\x00-\x000\x00.\x005\x000\x000\x00'], ['\x00- \x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00'], ['\x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00']] Traceback (most recent call last): ... ValueError: invalid literal for float(): -
When I explicitly cast the input to 'string', I get the expected result:
file2mat(StringIO(str(u))) input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500 0.500'] data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'], ['0.500', '0.500']] matrix([[-0.5, 0.5, -0.5, 0.5], [-0.5, -0.5, 0.5, 0.5]])
Any suggestions on how to improve my code? Is this a Unicode issue, numpy issue, or both? The input string can come from an ASCII file or a GUI text control. In the case of a GUI, the control returns a Unicode string, so for now I am casting it to str(), but it seems like a hack.. BTW, the reason that I am using the astype() method of numpy.matrix is that I get a "ValueError: setting an array element with a sequence" when trying to use return numpy.mat(data, numpy.float64) in the above function. Thank you, -Basilisk96
participants (1)
-
Basilisk96