<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 22, 2015 at 3:40 PM, Aldcroft, Thomas <span dir="ltr"><<a href="mailto:aldcroft@head.cfa.harvard.edu" target="_blank">aldcroft@head.cfa.harvard.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Sun, Feb 22, 2015 at 2:52 PM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Sun, Feb 22, 2015 at 10:21 AM, Aldcroft, Thomas<br>

<<a href="mailto:aldcroft@head.cfa.harvard.edu" target="_blank">aldcroft@head.cfa.harvard.edu</a>> wrote:<br>

> The idea of a one-byte string dtype has been extensively discussed twice<br>

> before, with a lot of good input and ideas, but no action [1, 2].<br>

><br>

> tl;dr: Perfect is the enemy of good.  Can numpy just add a one-byte string<br>

> dtype named 's' that uses latin-1 encoding as a bridge to enable Python 3<br>

> usage in the near term?<br>

<br>

</span>I think this is a good idea. I think overall it would be good for<br>

numpy to switch to using variable-length strings in most cases (cf.<br>

pandas), which is a different kind of change, but fixed-length 8-bit<br>

encoded text is obviously a common on-disk format in scientific<br>

applications, so numpy will still need some way to deal with it<br>

conveniently. In the long run we'd like to have more flexibility (e.g.<br>

allowing choice of character encoding), but since this proposal is a<br>

subset of that functionality, then it won't interfere with later<br>

improvements. I can see an argument for utf8 over latin1, but it<br>

really doesn't matter that much so whatever, blue and purple bikesheds<br>

are both fine.<br>

<br>

The tricky bit here is "just" :-). Do you want to implement this? Do<br>

you know someone who does? It's possible but will be somewhat<br>

annoying, since to do it directly without refactoring how dtypes work<br>

first then you'll have to add lots of copy-paste code to all the<br>

different ufuncs.<br></blockquote><div><br></div></span><div>I'm would be happy to have a go at this, with the caveat that someone who understands numpy would need to get me started with a minimal prototype.  From there I can do the "annoying" copy-paste for ufuncs etc, writing tests and docs.  I'm assuming that with a prototype then the rest can be done without any deep understanding of numpy internals (which I do not have).</div><div><br></div><div>- Tom</div><span class=""><div> </div></span></div></div></div></blockquote><div><br></div><div>The last two new types added to numpy were float16 and datetime64. Might be worth looking at the steps needed to implement those. There was also a user type, `rational` that got added, that could also provide a template. Maybe we need to have a way to add 'numpy certified' user data types. It might also be possible to reuse the `c` data type, currently implemented as `S1` IIRC, but that could cause some problems. <br><br></div><div>Chuck <br></div><br></div></div></div>