vacuum expansion of strings in record array
is this a feature or a bug? see snippet below, i would like the empty string to stay empty, not grow to an empty space from nothing. thnx les schaffer code: import numarray.records as rec names = ['col1', 'col2'] d1 = [ ['', 'hello'], ['', 'world']] d2 = [ ['hh', 'hello'], ['', 'world']] print d1, d2 recarray1 = rec.array(d1, names=names) # aligned=0,1 made no diff recarray2 = rec.array(d2, names=names) # aligned=0,1 made no diff print recarray1 print recarray2 output: [['', 'hello'], ['', 'world']] [['hh', 'hello'], ['', 'world']] RecArray[ ('', 'hello'), ('', 'world') ] RecArray[ ('hh', 'hello'), (' ', 'world') ^ |____ vacuum expansion ]
On Sat, 2005-06-04 at 14:56 -0400, Les Schaffer wrote:
is this a feature or a bug? see snippet below, i would like the empty string to stay empty, not grow to an empty space from nothing.
This is a subtle quirk that it would be nice to be without but not an accident or bug. It's an intentional feature since if conforms to the FITS file format which motivated the development of records.py to begin with. I think there probably should be a less eclectic subclass of RawCharArray, but don't have the time to write it myself. Regards, Todd
thnx
les schaffer
code:
import numarray.records as rec
names = ['col1', 'col2']
d1 = [ ['', 'hello'], ['', 'world']] d2 = [ ['hh', 'hello'], ['', 'world']] print d1, d2
recarray1 = rec.array(d1, names=names) # aligned=0,1 made no diff recarray2 = rec.array(d2, names=names) # aligned=0,1 made no diff print recarray1 print recarray2
output:
[['', 'hello'], ['', 'world']] [['hh', 'hello'], ['', 'world']] RecArray[ ('', 'hello'), ('', 'world') ] RecArray[ ('hh', 'hello'), (' ', 'world')
^ |____ vacuum expansion ]
------------------------------------------------------- This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput a projector? How fast can you ride your desk chair down the office luge track? If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Todd Miller wrote:
This is a subtle quirk that it would be nice to be without but not an accident or bug. It's an intentional feature since if conforms to the FITS file format which motivated the development of records.py to begin with. I think there probably should be a less eclectic subclass of RawCharArray, but don't have the time to write it myself.
i take it this is what bit us: "When an element of a CharArray is fetched trailing whitespace is stripped off. The sole exception to this rule is that a single whitespace is never stripped down to the empty string." so the strings are all the same length in storage, and the empty string '' expands to this fixed size WITH SPACES and then contract down to a single space when retrieved. true inflation, something from the vacuum. i am not familiar with the FITS file format to know why you would want such creation-prone behavior: why not just fill the slots with \0's (empty string '' ---> n*\0)? in any case, i will take a look at whats involved in basing a subclassed RecordArray on a variant of the raw char array. is this stuff in C or in Python? a quick hint where to look would help. Record Arrays are nice for holding stuff from Excel tables where columns are of similar type, with a column name up top. However, to grab Excel data w/ Python requires COM which delivers everything as UniCode strings which needed to be encoded() before RecordArray accepts them. is there a plan to include UniCode eventually? Les Schaffer
On Sun, 2005-06-05 at 11:09 -0400, Les Schaffer wrote:
Todd Miller wrote:
This is a subtle quirk that it would be nice to be without but not an accident or bug. It's an intentional feature since if conforms to the FITS file format which motivated the development of records.py to begin with. I think there probably should be a less eclectic subclass of RawCharArray, but don't have the time to write it myself.
i take it this is what bit us:
"When an element of a CharArray is fetched trailing whitespace is stripped off. The sole exception to this rule is that a single whitespace is never stripped down to the empty string."
Yep, that's it.
so the strings are all the same length in storage, and the empty string '' expands to this fixed size WITH SPACES and then contract down to a single space when retrieved. true inflation, something from the vacuum.
i am not familiar with the FITS file format to know why you would want such creation-prone behavior: why not just fill the slots with \0's (empty string '' ---> n*\0)? in any case, i will take a look at whats involved in basing a subclassed RecordArray on a variant of the raw char array. is this stuff in C or in Python?
Both. What you're talking about should be doable in pure Python for starters.
a quick hint where to look would help.
Lib/strings.py, Lib/records.py, Src/_chararraymodule.c
Record Arrays are nice for holding stuff from Excel tables where columns are of similar type, with a column name up top. However, to grab Excel data w/ Python requires COM which delivers everything as UniCode strings which needed to be encoded() before RecordArray accepts them. is there a plan to include UniCode eventually?
There is no plan for adding UniCode I'm aware of but Perry might have more to say about that. Unicode has come up before and would be a nice addition. Regards, Todd
participants (2)
-
Les Schaffer
-
Todd Miller