On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
Manuel Metz wrote:
Ryan May wrote:
3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a converter. The problem with this is that you have to pass in a converter for *every column* that will contain missing values. If you have a text file with 50 columns, writing this dictionary of converters seems like ugly and needless boilerplate. I'm unsure of how best to pass in both what values indicate missing values and what values to fill in their place. I'd love suggestions
Hi Ryan, this would be a great feature to have !!!
About missing values: * I don't think missing values should be supported in np.loadtxt. That should go into a specific np.ma.io.loadtxt function, a preview of which I posted earlier. I'll modify it taking Ryan's new function into account, and Chrisopher's suggestion (defining a dictionary {column name : missing values}. * StringConverter already defines some default filling values for each dtype. In np.ma.io.loadtxt, these values can be overwritten. Note that you should also be able to define a filling value by specifying a converter (think float(x or 0) for example) * Missing values on space-separated fields are very tricky to handle: take a line like "a,,,d". With a comma as separator, it's clear that the 2nd and 3rd fields are missing. Now, imagine that commas are actually spaces ( "a d"): 'd' is now seen as the 2nd field of a 2-field record, not as the 4th field of a 4- field record with 2 missing values. I thought about it, and kicked in touch * That said, there should be a way to deal with fixed-length fields, probably by taking consecutive slices of the initial string. That way, we should be able to keep track of missing data...