[Numpy-discussion] Review of issue 825

Wed Jun 25 17:09:41 EDT 2008

On Wed, Jun 25, 2008 at 10:49 AM, Neil Muller
<drnlmuller+scipy at gmail.com<drnlmuller%2Bscipy at gmail.com>>
wrote:

> On Wed, Jun 25, 2008 at 5:14 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > OK, the problem in the UNICODE_{get,set}item routines is converting
> between
> > ucs4 and the encoding python is using, which may be ucs2.  But there is
> > something strange if sparc is using ucs4 (Py_UNICODE_WIDE) and the
> pointer
> > ip is aligned on two bytes instead of 4, that would seem to indicate a
> > problem further up the call chain. Could you check that that is actually
> > happening, i.e., ip is not 4 byte aligned and Py_UNICODE_WIDE is defined?
>
> You need to keep the test case in the 1st comment of the issue in mind
> here - the problem is extracting the unicode string for a dtype
> specified as (unsigned char, unicode string). This is allocated as 5
> bytes, and the string is not correctly aligned within these 5 bytes
> for access via a long pointer, as is needed for the current check in
> UNICODE_getitem to work.
>

UNICODE_getitem can be called from several places. Here is looks like print
is the caller. Can you check if you can  extract the string with an explicit
call? Something like

In [1]: desc = [ ('x', 'u1'), ('s', 'U2'), ]

In [2]: buffer = [ (5, 'cc'), (6, 'dd') ]

In [3]: ta = array(buffer, dtype(desc))

In [4]: ta[0]['s']
Out[4]: u'cc'

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080625/848c3f5d/attachment.html>