[Numpy-discussion] vacuum expansion of strings in record array

Mon Jun 6 04:13:08 EDT 2005

On Sun, 2005-06-05 at 11:09 -0400, Les Schaffer wrote:
> Todd Miller wrote:
> 
> >This is a subtle quirk that it would be nice to be without but not an
> >accident or bug.  It's an intentional feature since if conforms to the
> >FITS file format which motivated the development of records.py to begin
> >with.  I think there probably should be a less eclectic subclass of
> >RawCharArray,  but don't have the time to write it myself.
> >  
> >
> i take it this is what bit us:
> 
>     "When an element of a CharArray is fetched trailing whitespace is
>     stripped off. The sole exception to this rule is that a single
>     whitespace is never stripped down to the empty string."

Yep, that's it.

> 
> so the strings are all the same length in storage, and the empty string 
> '' expands to this fixed size WITH SPACES and then contract down to a 
> single space when retrieved.  true inflation, something from the vacuum.
> 
> i am not familiar with the FITS file format to know why you would want 
> such creation-prone behavior: why not just fill the slots with \0's  
> (empty string '' ---> n*\0)? in any case,  i will take a look at whats 
> involved in basing a subclassed RecordArray on a variant of the raw char 
> array. is this stuff in C or in Python? 

Both.  What you're talking about should be doable in pure Python for
starters.

> a quick hint where to look would 
> help.

Lib/strings.py, Lib/records.py, Src/_chararraymodule.c

> 
> Record Arrays are nice for holding stuff from Excel tables where columns 
> are of similar type, with a column name up top. However, to grab Excel 
> data w/ Python requires COM which delivers everything as UniCode strings 
> which needed to be encoded() before RecordArray accepts them. is there a 
> plan to include UniCode eventually?

There is no plan for adding UniCode I'm aware of but Perry might have
more to say about that.  Unicode has come up before and would be a nice
addition.

Regards,
Todd