<br><br><div class="gmail_quote">On Fri, Dec 2, 2011 at 8:23 AM, Thouis (Ray) Jones <span dir="ltr"><<a href="mailto:thouis@gmail.com">thouis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">On Thu, Dec 1, 2011 at 17:39, Charles R Harris<br>

<<a href="mailto:charlesr.harris@gmail.com">charlesr.harris@gmail.com</a>> wrote:<br>

> Given that strings should be the result, this looks like a bug. It's a bit<br>

> of a corner case that probably slipped through during the recent work on<br>

> casting. There needs to be tests for these sorts of things, so if you find<br>

> more oddities post them so we can add them.<br>

<br>

</div>I'm happy to add a patch and tests, but could use some guidance...<br>

<br>

It looks like discover_itemsize() in core/src/multiarray/ctors.c<br>

should compute the length of the string or unicode representation of<br>

the object based on the eventual type, but looking at<br>

UNICODE_setitem() and STRING_setitem() in<br>

core/src/multiarray/arraytypes.c.src, this is not trivial.<br>

<br>

Perhaps the object-to-unicode/string parts of<br>

UNICODE_setitem/STRING_setitem can be extracted into separate<br>

functions that can be called from *_setitem as well as<br>

discover_itemsize.   discover_itemsize would also need to know the<br>

type it's discovering for (string or unicode or user-defined).<br>

<br></blockquote><div><br>After sleeping on this, I think an object array in this situation would be the better choice and wouldn't result in lost information. This might change the behavior of<br>some functions though, so would need testing.<br>

<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Not sure what to do to handle user-defined types (error?).<br>

<br>

If that's is too complicated, maybe discover_itemsize should return -1<br>

(or warn, but given the danger of truncation, that seems a bit weak)<br>

if asked to discover from data that doesn't have a length.  This would<br>

result in dtype=object when np.array is handed a mixed int/string<br>

list.<br>

<br>

I wonder, also, if STRING_setitem and UNICODE_setitem shouldn't emit a<br>

warning if asked to truncate data?<br>

<div class="HOEnZb"><div class="h5"><br></div></div></blockquote><div><br>I think a warning would be useful. But I don't use strings much so input from a user might carry more weight.<br><br>Chuck <br></div><br></div>