[Numpy-discussion] String manipulation
Christopher Barker
Chris.Barker at noaa.gov
Mon Jul 20 15:44:23 EDT 2009
Nils Wagner wrote:
> How can I split the second line in such a way that I get
>
> ['-1.000000E+00', '-1.000000E+00', '-1.000000E+00',
> '-1.000000E+00', '1.250000E+00', '1.250000E+00']
>
> instead of
>
> ['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00',
> '1.250000E+00', '1.250000E+00']
It looks like you have fixed-length fields. The naive way do do this is
simple string slicing:
def line2array1(line, field_len=10):
nums = []
i = 0
while i < len(line):
nums.append(float(line[i:i+field_len]))
i += field_len
return np.array(nums)
Then I saw the nifty list comprehension posted by Alan(?), which led me
to the one (long) liner:
def line2array2(line, field_len=10):
return np.array(map(float, [line[i*field_len:(i+1)*field_len] for i
in range(len(line)/field_len)]))
But it seems I should be able to do this using numpy arrays manipulating
the data as characters. However, I had a little trouble getting a string
into a numpy array as characters. This didn't work:
In [55]: s
Out[55]: '-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00
1.250000E+00 1.250000E+00'
In [57]: np.array(s, 'S13')
Out[57]:
array('-1.000000E+00',
dtype='|S13')
so I tried single characters:
In [56]: np.array(s, 'S1')
Out[56]:
array('-',
dtype='|S1')
I still only got the first one.
closer, but not quite:
In [61]: np.array(tuple(s), 'S13')
Out[61]:
array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
'-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0',
' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0'],
dtype='|S13')
So I ended up with this:
s_array = np.array(tuple(line), dtype='S1').view(dtype='S%i'%field_len)
which seems uglier than it should be, but did lead so this one-liner:
np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)
Is there a cleaner way to do this?
(test code attached)
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: application/x-python
Size: 879 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090720/6cb67dbd/attachment.bin>
More information about the NumPy-Discussion
mailing list