Re: [Numpy-discussion] More loadtxt() changes

26 Nov 2008

      On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
...
Manuel Metz wrote:
...
Ryan May wrote:
...
3) Better support for missing values.  The docstring mentions a  
way of
handling missing values by passing in a converter.  The problem  
with this is
that you have to pass in a converter for *every column* that will  
contain
missing values.  If you have a text file with 50 columns, writing  
this
dictionary of converters seems like ugly and needless  
boilerplate.  I'm
unsure of how best to pass in both what values indicate missing  
values and
what values to fill in their place.  I'd love suggestions
Hi Ryan,
  this would be a great feature to have !!!
About missing values:

* I don't think missing values should be supported in np.loadtxt. That  
should go into a specific np.ma.io.loadtxt function, a preview of  
which I posted earlier. I'll modify it taking Ryan's new function into  
account, and Chrisopher's suggestion (defining a dictionary {column  
name : missing values}.

* StringConverter already defines some default filling values for each  
dtype. In  np.ma.io.loadtxt, these values can be overwritten. Note  
that you should also be able to define a filling value by specifying a  
converter (think float(x or 0) for example)

* Missing values on space-separated fields are very tricky to handle:
take a line like "a,,,d". With a comma as separator, it's clear that  
the 2nd and 3rd fields are missing.
Now, imagine that commas are actually spaces ( "a     d"): 'd' is now  
seen as the 2nd field of a 2-field record, not as the 4th field of a 4- 
field record with 2 missing values. I thought about it, and kicked in  
touch

* That said, there should be a way to deal with fixed-length fields,  
probably by taking consecutive slices of the initial string. That way,  
we should be able to keep track of missing data...
...

Re: [Numpy-discussion] More loadtxt() changes

Pierre GM