[Numpy-discussion] String manipulation

Christopher Barker Chris.Barker at noaa.gov
Mon Jul 20 15:44:23 EDT 2009


Nils Wagner wrote:
> How can I split the second line in such a way that I get
> 
> ['-1.000000E+00', '-1.000000E+00', '-1.000000E+00', 
> '-1.000000E+00', '1.250000E+00', '1.250000E+00']
> 
> instead of
> 
> ['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00', 
> '1.250000E+00', '1.250000E+00']

It looks like you have fixed-length fields. The naive way do do this is 
simple string slicing:

def line2array1(line, field_len=10):
     nums = []
     i = 0
     while i < len(line):
         nums.append(float(line[i:i+field_len]))
         i += field_len
     return np.array(nums)

Then I saw the nifty list comprehension posted by Alan(?), which led me 
to the one (long) liner:

def line2array2(line, field_len=10):
     return np.array(map(float, [line[i*field_len:(i+1)*field_len] for i 
in range(len(line)/field_len)]))

But it seems I should be able to do this using numpy arrays manipulating 
the data as characters. However, I had a little trouble getting a string 
into a numpy array as characters. This didn't work:

In [55]: s
Out[55]: '-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00 
1.250000E+00 1.250000E+00'

In [57]: np.array(s, 'S13')
Out[57]:
array('-1.000000E+00',
       dtype='|S13')

so I tried single characters:

In [56]: np.array(s, 'S1')
Out[56]:
array('-',
       dtype='|S1')

I still only got the first one.

closer, but not quite:

In [61]: np.array(tuple(s), 'S13')
Out[61]:
array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
        '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0',
        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0',
        ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0'],
       dtype='|S13')

So I ended up with this:
s_array = np.array(tuple(line), dtype='S1').view(dtype='S%i'%field_len)

which seems uglier than it should be, but did lead so this one-liner:

np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)


Is there a cleaner way to do this?

(test code attached)

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: application/x-python
Size: 879 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090720/6cb67dbd/attachment.bin>


More information about the NumPy-Discussion mailing list