[Numpy-discussion] `missing` argument in genfromtxt only a string?
Pierre GM
pgmdevlist at gmail.com
Mon Sep 14 21:59:51 EDT 2009
On Sep 13, 2009, at 3:51 PM, Skipper Seabold wrote:
> On Sun, Sep 13, 2009 at 1:29 PM, Skipper Seabold
> <jsseabold at gmail.com> wrote:
>> Is there a reason that the missing argument in genfromtxt only
>> takes a string?
Because we check strings. Note that you can specify several characters
at once, provided they're separated by a comma, like missing="0,nan,n/a"
>> For instance, I have a dataset that in most columns has a zero for
>> some observations but in others it was just left blank, which is the
>> equivalent of zero. I would like to set all of the missing to 0 (it
>> defaults to -1 now) when loading in the data. I suppose I could do
>> this with a converter, but I have too many columns for this.
OK, I see. Gonna try to find some fix.
> All of the missing values in the second observation are now -1. Also,
> I'm having trouble defining a converter for my dates.
>
> I have the function
>
> from datetime import datetime
>
> def str2date(date):
> day,month,year = date.strip().split('/')
> return datetime(*map(int, [year, month, day]))
>
> conv = {1 : lambda s: str2date(s)}
> s.seek(0)
> data = np.genfromtxt(s, dtype=None, delimiter=",", names=None,
> converters=conv)
OK, I see the problem...
When no dtype is defined, we try to guess what a converter should
return by testing its inputs. At first we check whether the input is a
boolean, then whether it's an integer, then a float, and so on. When
you define explicitly a converter, there's no need for all those
checks, so we lock the converter to a particular state, which sets the
conversion function and the value to return in case of missing.
Except that I messed it up and it fails in that case (the conversion
function is set properly, bu the dtype of the output is still
undefined). That's a bug, I'll try to fix that once I've tamed my snow
kitten.
Meanwhile, you can use tsfromtxt (in scikits.timeseries), or even
simpler, define a dtype for the output (you know that your first
column is a str, your second an object, and the others ints or floats...
More information about the NumPy-Discussion
mailing list