[Numpy-discussion] some question on new dtype

Travis Oliphant oliphant.travis at ieee.org
Tue Jan 24 23:34:02 EST 2006


N. Volbers wrote:

> Hello everyone on the list!
>
> I have been playing around with the latest and greatest numpy 0.94 and 
> its dtype mechanism. I am especially interested in using the 
> record-array-like facilities, e.g. the following which is based on an 
> example from a mail of Travis to this list:
>
> <--
> # define array with three "columns".
> dtype = numpy.dtype( {'names': ['name', 'age', 'weight'],
>  'formats': ['U30', 'i2', numpy.float32]} )
> a = numpy.array( [(u'Bill', 31, 260), (u'Fred', 15, 135)], dtype=dtype )
>
> # specify column by key
> print a ['name']
> print a['age']
> print a['weight']
> #print a['object']
>
> # specify row by number
> print a[0]
> print a[1]
>
> # first item of row 1 (Fred's age)
> print a[1]['age']
>
> # first item of name column (name 'Bill')
> print a['name'][0]
> -->
>
> I now have a few questions, maybe someone can help me with them:
>
> 1) When reading the sample chapter from Travis' documentation, I 
> noticed that there is also a type 'object' with the character 'O'. So 
> I kind of hoped that it would be possible to have arbitrary python 
> objects in an array. However, when I add a fourth "column" of type 
> 'O', then numpy will mem-fault. Is this not allowed or is this some 
> implementation bug?

It's a bug if it seg-faults, that should be allowed.  Please post your 
code so we can track it down.

>
> 2) Is it possible to rename the type descriptors? For my application, 
> I need to treat these names as keys of dataset columns, so it should 
> be possible to rename these.  More generally speaking: Is it possible 
> to alter parts of the dtype after instantiation? Of course it should 
> be possible to copy the dtype, modify it accordingly and create a new 
> array. However, maybe there is a suggested way to doing this?

Right now, you would have to construct a new data-type with the new 
field names,  There may be some ways we could make this easier, though.  
The big thing is that you can't change data-types in-place because they 
are used by other array objects.   

Perhaps the easiest way to do this is by getting the arrdescr attribute 
of the dtype (it's a list of tuples) and then constructing a new list of 
tuples using it (replaceing the tuples you want to rename) and then 
using that in the dtype constructor:

Something like this:

descriptor = a.dtype.arrdescr   # this could probably be renamed to 
descr now that we're     
                                                    #  not using that word.
                  # or a.__array_descr__  retrieves the same thing.

field = descriptor[field_num]
descriptor[field_num] = (newname,)+field[1:]
newdtype = numpy.dtype(descriptor)

Then you can do:

a.dtype = newdtype

and have a new field name.  


Actually, it would probably be faster to do:

new = dict(a.dtype.fields) # get a writeable dictionary.
new['<newname>'] = new['<oldname>']
del new['<oldname>']
del new[-1]  # get rid of the special ordering entry
a.dtype = dtype(new)

>
> 3) When I use two identical entries in the names part of the dtype, I 
> get the message 'TypeError: expected a readable buffer object'. It 
> makes sense that it is not allowed to have two identical names, but I 
> think the error message should be worded more descriptive.

Yeah, we ought to check for this.  Could you post your code that raises 
this error.

>
> 4) In the example above, printing any of the strings via 'print' will 
> yield the characters and then the characters up to the string size 
> filled up with \x00, e.g.
>
>  u'Bill\x00\x00\x00\x00\x00\x00\x00.... (30 characters total)'
>
> Why doesn't 'prin't terminate the output when the first \x00 is reached ?


Because I don't know the Unicode equivalent to PyString_FromString 
(which is what I'm using
to automatically truncate the fixed-field string before printing it).  
UNICODE_getitem in arraytypes.inc.src is the relevant function.  See 
STRING_getitem for comparison.

I think it would be better to truncate it if anybody has a good 
suggestion...

-Travis








More information about the NumPy-Discussion mailing list