[Numpy-discussion] genfromtxt - the return

Christopher Barker Chris.Barker at noaa.gov
Wed Oct 7 15:14:58 EDT 2009


Pierre GM wrote:
> On Oct 6, 2009, at 10:08 PM, Bruce Southey wrote:
>> option to merge delimiters - actually in SAS it is default

Wow! that sure strikes me as a bad choice.

> Ahah! I get it. Well, I remember that we discussed something like that a  
> few months ago when I started working on np.genfromtxt, and the  
> default of *not* merging whitespaces was requested. I gonna check  
> whether we can't put this option somewhere now...

I'd think you might want to have two options: either "whitespace" which 
would be any type or amount of whitespace, or a specific delimeter: say 
"\t" or " " or "  " (two spaces), etc. In that case, it would mean "one 
and only one of these".

Of course, this would fail in Bruce's example:

 >>>> A B C D
 >>>> 1 2 3 4
 >>>> 1     4 5

as there is a space for the delimeter, and one for the data! This looks 
like fixed-format to me. if it were single-space delimited, it would 
look more like:

when the delimiter is whitespace.
A B C D E
1 2 3 4 5
1   4 5

which is the same as:

A, B, C, D, E
1, 2, 3, 4, 5
1,  ,  , 4, 5


If something like SAS actually does merge decimeters, which I interpret 
to mean that if there are a few empty fields and you call for 
tab-delimited , you only get one tab, then information as simply been 
lost -- there is no way to recover it!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list