[Numpy-discussion] `missing` argument in genfromtxt only a string?

Skipper Seabold jsseabold at gmail.com
Mon Sep 14 22:56:56 EDT 2009


On Mon, Sep 14, 2009 at 10:55 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, Sep 14, 2009 at 10:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>
>> On Sep 14, 2009, at 10:31 PM, Skipper Seabold wrote:
>>>
>>> I actually figured out a workaround with converters, since my missing
>>> values are " ","  ","   " ie., irregular number of spaces and the
>>> values aren't stripped of white spaces.  I just define {# : lambda s:
>>> float(s.strip() or 0)}, and I have a loop build all of the converters,
>>> but then I have to go through and drop the ones that are supposed to
>>> be strings or dates, which is still pretty tedious, since I have a
>>> number of datasets that are like this, but they all contain different
>>> data in different orders and there's no (computer) logical order to it
>>> that I've discovered yet.
>>
>> I understand your frustration... We could think about some kind of
>> global default for the missing values...
>
> I'm not too frustrated, I'd just like to do this as few times as
> humanly (or machine-ly, rather) possible in the future...
>
> The main thing I'd like right now I think is for whitespace to be
> stripped, but maybe there is a good reason for this.  I didn't realize
> this was the source of my confusion at first.  Also just being able to
> define missing as a number would be nice.  I started a patch for this,
> but I reverted when I realized I could make the converters as I did.
>
> While we're on the subject, the other thing on my wishlist (unless I
> just don't know how to do this) is being able to define a "column map"
> for datasets that have no delimiters.  At first each observation of my
> data was just one long string with no gaps or regular breaks but I
> knew which columns had what.  Eg., the first variable was (not
> zero-indexed) columns 1-6, the second columns 11-15, the third column
> 16, etc.  so I would just say delimiter = [1:6,11:15,16,...].
>

Err, 1-6, 7-10, 11-15, 16...  I need some sleep.

>>> I tried another workaround for the dates with my converters defined
>>> as conv
>>>
>>> conv.update({date : lambda s : datetime(*map(int,
>>> s.strip().split('/')[-1:]+s.strip().split('/')[:2]))})
>>>
>>> Where `date` is the column that contains a date.  The problem was that
>>> my dates are "mm/dd/yyyy" and datetime needs "yyyy,mm,dd," it worked
>>> for a test case if my dates were "dd/mm/yyyy" and I just use reversed,
>>> but gave an error about not finding the day in the third position,
>>> though that lambda function worked for a test case outside of
>>> genfromtxt.
>>
>> Check the archives of the mailing list, there's an example using
>> dateutil.parser that may be just what you need.
>>
>
> Ah ok.  I looked for a bit, but I was sure I missed something.  Thanks.
>
> Skipper
>



More information about the NumPy-Discussion mailing list