[Numpy-discussion] `missing` argument in genfromtxt only a string?

Skipper Seabold jsseabold at gmail.com
Mon Sep 14 22:55:23 EDT 2009


On Mon, Sep 14, 2009 at 10:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
> On Sep 14, 2009, at 10:31 PM, Skipper Seabold wrote:
>>
>> I actually figured out a workaround with converters, since my missing
>> values are " ","  ","   " ie., irregular number of spaces and the
>> values aren't stripped of white spaces.  I just define {# : lambda s:
>> float(s.strip() or 0)}, and I have a loop build all of the converters,
>> but then I have to go through and drop the ones that are supposed to
>> be strings or dates, which is still pretty tedious, since I have a
>> number of datasets that are like this, but they all contain different
>> data in different orders and there's no (computer) logical order to it
>> that I've discovered yet.
>
> I understand your frustration... We could think about some kind of
> global default for the missing values...

I'm not too frustrated, I'd just like to do this as few times as
humanly (or machine-ly, rather) possible in the future...

The main thing I'd like right now I think is for whitespace to be
stripped, but maybe there is a good reason for this.  I didn't realize
this was the source of my confusion at first.  Also just being able to
define missing as a number would be nice.  I started a patch for this,
but I reverted when I realized I could make the converters as I did.

While we're on the subject, the other thing on my wishlist (unless I
just don't know how to do this) is being able to define a "column map"
for datasets that have no delimiters.  At first each observation of my
data was just one long string with no gaps or regular breaks but I
knew which columns had what.  Eg., the first variable was (not
zero-indexed) columns 1-6, the second columns 11-15, the third column
16, etc.  so I would just say delimiter = [1:6,11:15,16,...].

>> I tried another workaround for the dates with my converters defined
>> as conv
>>
>> conv.update({date : lambda s : datetime(*map(int,
>> s.strip().split('/')[-1:]+s.strip().split('/')[:2]))})
>>
>> Where `date` is the column that contains a date.  The problem was that
>> my dates are "mm/dd/yyyy" and datetime needs "yyyy,mm,dd," it worked
>> for a test case if my dates were "dd/mm/yyyy" and I just use reversed,
>> but gave an error about not finding the day in the third position,
>> though that lambda function worked for a test case outside of
>> genfromtxt.
>
> Check the archives of the mailing list, there's an example using
> dateutil.parser that may be just what you need.
>

Ah ok.  I looked for a bit, but I was sure I missed something.  Thanks.

Skipper



More information about the NumPy-Discussion mailing list