Hi all,
Please consider two strings
line_a
'12345678abcdefgh12345678'
line_b
'12345678 abcdefgh 12345678'
line_b.split()
['12345678', 'abcdefgh', '12345678']
Is it possible to split line_a such that the output is
['12345678', 'abcdefgh', '12345678']
Nils
A Monday 11 May 2009, Nils Wagner escrigué:
Hi all,
Please consider two strings
line_a
'12345678abcdefgh12345678'
line_b
'12345678 abcdefgh 12345678'
line_b.split()
['12345678', 'abcdefgh', '12345678']
Is it possible to split line_a such that the output is
['12345678', 'abcdefgh', '12345678']
Mmh, your question is a bit too generic. If what you want is to separate the strings made of digits and the ones made of letters, it is worth to use regular expressions:
In [22]: re.split("(\d*)", line_a)[1:-1] Out[22]: ['12345678', 'abcdefgh', '12345678']
Although regular expressions seems a bit thought to learn, they will payoff your effort in many occasions.
Cheers,
A Monday 11 May 2009, Francesc Alted escrigué:
Although regular expressions seems a bit thought to learn, they will
^^^^^^^ --> tough :-\
On Mon, 11 May 2009 14:25:46 +0200 Francesc Alted faltet@pytables.org wrote:
A Monday 11 May 2009, Nils Wagner escrigué:
Hi all,
Please consider two strings
line_a
'12345678abcdefgh12345678'
line_b
'12345678 abcdefgh 12345678'
line_b.split()
['12345678', 'abcdefgh', '12345678']
Is it possible to split line_a such that the output is
['12345678', 'abcdefgh', '12345678']
Mmh, your question is a bit too generic.
Indeed. I would like to split strings made of digits after eight characters each.
line_a
'111111.1222222.2333333.3'
line_b
'111111.1 222222.2 333333.3'
line_b.split()
['111111.1', '222222.2', '333333.3']
How can I accomplish that ?
Nils
On Mon, 11 May 2009 10:48:14 -0400 Alan G Isaac aisaac@american.edu wrote:
On 5/11/2009 8:36 AM Nils Wagner apparently wrote:
I would like to split strings made of digits after eight characters each.
[l[i*8:(i+1)*8] for i in range(len(l)/8)]
Alan Isaac
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
1.000000E+00 0.000000E+00 1.000000E+00 1.000000E+00 1.000000E+00 1.000000E+00 -1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00 1.250000E+00 1.250000E+00
ifile = open('mydata','r') lines = ifile.readlines() for line in lines: print line.split()
How can I split the second line in such a way that I get
['-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '1.250000E+00', '1.250000E+00']
instead of
['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00', '1.250000E+00', '1.250000E+00']
Thanks in advance
Nils
Nils Wagner wrote:
How can I split the second line in such a way that I get
['-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '1.250000E+00', '1.250000E+00']
instead of
['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00', '1.250000E+00', '1.250000E+00']
It looks like you have fixed-length fields. The naive way do do this is simple string slicing:
def line2array1(line, field_len=10): nums = [] i = 0 while i < len(line): nums.append(float(line[i:i+field_len])) i += field_len return np.array(nums)
Then I saw the nifty list comprehension posted by Alan(?), which led me to the one (long) liner:
def line2array2(line, field_len=10): return np.array(map(float, [line[i*field_len:(i+1)*field_len] for i in range(len(line)/field_len)]))
But it seems I should be able to do this using numpy arrays manipulating the data as characters. However, I had a little trouble getting a string into a numpy array as characters. This didn't work:
In [55]: s Out[55]: '-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00 1.250000E+00 1.250000E+00'
In [57]: np.array(s, 'S13') Out[57]: array('-1.000000E+00', dtype='|S13')
so I tried single characters:
In [56]: np.array(s, 'S1') Out[56]: array('-', dtype='|S1')
I still only got the first one.
closer, but not quite:
In [61]: np.array(tuple(s), 'S13') Out[61]: array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0', ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0'], dtype='|S13')
So I ended up with this: s_array = np.array(tuple(line), dtype='S1').view(dtype='S%i'%field_len)
which seems uglier than it should be, but did lead so this one-liner:
np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)
Is there a cleaner way to do this?
(test code attached)
-Chris
On Mon, 20 Jul 2009 12:44:23 -0700 Christopher Barker Chris.Barker@noaa.gov wrote:
Nils Wagner wrote:
How can I split the second line in such a way that I get
['-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '-1.000000E+00', '1.250000E+00', '1.250000E+00']
instead of
['-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00', '1.250000E+00', '1.250000E+00']
It looks like you have fixed-length fields.
Yes. See http://www.sdrl.uc.edu/universal-file-formats-for-modal-analysis-testing-1/f...
The naive
way do do this is simple string slicing:
def line2array1(line, field_len=10): nums = [] i = 0 while i < len(line): nums.append(float(line[i:i+field_len])) i += field_len return np.array(nums)
Then I saw the nifty list comprehension posted by Alan(?), which led me to the one (long) liner:
def line2array2(line, field_len=10): return np.array(map(float, [line[i*field_len:(i+1)*field_len] for i in range(len(line)/field_len)]))
But it seems I should be able to do this using numpy arrays manipulating the data as characters. However, I had a little trouble getting a string into a numpy array as characters. This didn't work:
In [55]: s Out[55]: '-1.000000E+00-1.000000E+00-1.000000E+00-1.000000E+00 1.250000E+00 1.250000E+00'
In [57]: np.array(s, 'S13') Out[57]: array('-1.000000E+00', dtype='|S13')
so I tried single characters:
In [56]: np.array(s, 'S1') Out[56]: array('-', dtype='|S1')
I still only got the first one.
closer, but not quite:
In [61]: np.array(tuple(s), 'S13') Out[61]: array(['-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', '-', '1', '.', '0', '0', '0', '0', '0', '0', 'E', '+', '0', '0', ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0', ' ', '1', '.', '2', '5', '0', '0', '0', '0', 'E', '+', '0', '0'], dtype='|S13')
So I ended up with this: s_array = np.array(tuple(line), dtype='S1').view(dtype='S%i'%field_len)
which seems uglier than it should be, but did lead so this one-liner:
np.array(tuple(line),dtype='S1').view(dtype='S%i'%field_len).astype(np.float)
Is there a cleaner way to do this?
(test code attached)
-Chris
Fixed-length fields are quite common e.g. in the area of Finite Element pre/postprocessing. Therefore It would be nice to have a function like line2array in numpy. Comments ?
Nils
On Jul 21, 2009, at 2:42 AM, Nils Wagner wrote:
Fixed-length fields are quite common e.g. in the area of Finite Element pre/postprocessing. Therefore It would be nice to have a function like line2array in numpy. Comments ?
Er, there's already something like that: np.lib._iotools.LineSplitter
Initialize it with either a character or an integer as delimiter, and call your instance with a string as input. When you use an integer as delimiter, it corresponds to the length of your field. eg:
s = '-1.000000E+00-1.000000E+00 1.000000E+00-1.000000E+00' conv = np.lib._iotools.LineConverter(13) np.array(conv(s))
array(['-1.000000E+00', '-1.000000E+00', '1.000000E+00', '-1.000000E +00'], dtype='|S13')
np.array([float(_) for _ in conv(s)])
array([-1., -1., 1., -1.])
Note that LineSplitter is already used in np.genfromtxt:
import StringIO np.genfromtxt(StringIO.StringIO(s),delimiter=13)
array([-1., -1., 1., -1.])
On Tue, 21 Jul 2009 02:56:28 -0400 Pierre GM pgmdevlist@gmail.com wrote:
On Jul 21, 2009, at 2:42 AM, Nils Wagner wrote:
Fixed-length fields are quite common e.g. in the area of Finite Element pre/postprocessing. Therefore It would be nice to have a function like line2array in numpy. Comments ?
Er, there's already something like that: np.lib._iotools.LineSplitter
Initialize it with either a character or an integer as delimiter, and call your instance with a string as input. When you use an integer as delimiter, it corresponds to the length of your field. eg:
s = '-1.000000E+00-1.000000E+00
1.000000E+00-1.000000E+00'
conv = np.lib._iotools.LineConverter(13) np.array(conv(s))
array(['-1.000000E+00', '-1.000000E+00', '1.000000E+00', '-1.000000E +00'], dtype='|S13')
np.array([float(_) for _ in conv(s)])
array([-1., -1., 1., -1.])
Note that LineSplitter is already used in np.genfromtxt:
import StringIO np.genfromtxt(StringIO.StringIO(s),delimiter=13)
array([-1., -1., 1., -1.])
Great. I didn't know about that.
Your examples are very useful.
IMHO the examples should be added to
http://www.scipy.org/Cookbook/InputOutput
to attract interest.
Nils
On Jul 21, 2009, at 3:16 AM, Nils Wagner wrote:
Er, there's already something like that: np.lib._iotools.LineSplitter
Great. I didn't know about that.
Your examples are very useful.
IMHO the examples should be added to
http://www.scipy.org/Cookbook/InputOutput
to attract interest.
Nils, feel free to edit the corresponding entry of the Cookbook.
Note that I wouldn't be surprised if a few bugs lurked in the corners, especially when dealing with a sequence of different field-lengths. Just drop me a line if you run into a problem.
Pierre GM wrote:
On Jul 20, 2009, at 3:44 PM, Christopher Barker wrote:
... Is there a cleaner way to do this?
Yes. np.lib._iotools.LineSplitter and/or np.genfromtxt
Great, thanks -- though the underscore in _iotools implies that this isn't supposed to be general purpose tools.
Also, aside from the problem at hand, what I was getting at was whether there is a cleaner way to go from a string to a numpy array of characters or strings.
I see that LineSplitter() uses a list comprehension to do the slicing, and I was looking for a way (perhaps needlessly) to use numpy instead, which requires an efficient way to get a numpy array from a string.
I don't see why:
np.array('a string', dtype='S1')
results in a length (1,) array, for instance.
Actually, I think I do -- numpy is treating the string as a single scalar, rather than a sequence of characters, and doing its best to convert that scaler to a length one string. However, I don't know if there is a compelling reason why it should do that -- in other contexts, Python generally treats strings as a sequence of characters.
-Chris
On 5/11/2009 8:03 AM Nils Wagner apparently wrote:
line_a
'12345678abcdefgh12345678' Is it possible to split line_a such that the output is
['12345678', 'abcdefgh', '12345678']
More of a comp.lang.python question, I think:
out = list() for k, g in groupby('123abc456',lambda x: x.isalpha()): out.append( ''.join(g) )
fwiw, Alan Isaac