Pierre GM wrote:
[Some background: we're talking about numpy.lib.recfunctions, a set of functions to manipulate structured arrays]
Ryan, If the two files have the same structure, you can use that fact and specify the dtype of the output directly with the dtype parameter of mafromtxt. That way, you're sure that the two arrays will have the same dtype. If you don't know the structure beforehand, you could try to load one array and use its dtype as input of mafromtxt to load the second one.
I could force the dtype. However, since the flexibility is there in mafromtxt, I'd like to avoid hard coding the dtype, so I don't have to worry about updating the code if the file format ever changes (this parses live data).
Now, we could also try to modify stack_arrays so that it would take the largest dtype when several fields have the same name. I'm not completely satisfied by this approach, as it makes dtype conversions under the hood. Maybe we could provide the functionality as an option (w/ a forced_conversion boolean input parameter) ?
I definitely wouldn't advocate magic by default, but I think it would be nice to be able to get the functionality if one wanted to. There is one problem I noticed, however. I found common_type and lib.mintypecode, but both raise errors when trying to find a dtype to match both bool and float. I don't know if there's another function somewhere that would work for what I want.
I'm a bit surprised by the error message you get. If I try:
a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[('a',int), ('b',bool), ('c',float)]) b = ma.array([(4, 5, 6)], dtype=[('a', int), ('b', float), ('c', float)]) test = np.stack_arrays((a, b))
I get a TypeError instead (the field 'b' hasn't the same type in a and b). Now, I get the 'two fields w/ the same name' when I use np.merge_arrays (with the flatten option). Could you send a small example ?
Apparently, I get my error as a result of my use of titles in the dtype to store an alternate name for the field. (If you're not familiar with titles, they're nice because you can get fields by either name, so for the following example, a['a'] and a['A'] both return array([1]).) The following version of your case gives me the ValueError:
from numpy.lib.recfunctions import stack_arrays a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[(('a','A'),int), (('b','B'),bool), (('c','C'),float)]) b = ma.array([(4,5,6)], dtype=[(('a','A'),int), (('b','B'),float), (('c','C'),float)]) stack_arrays((a,b)) ValueError: two fields with the same name
As a side question, do you have some local mods to your numpy SVN so that some of the functions in recfunctions are available in numpy's top level? On mine, I can't get to them except by importing them from numpy.lib.recfunctions. I don't see any mention of recfunctions in lib/__init__.py. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma