[Numpy-discussion] One-byte string dtype: third time's the charm?

Aldcroft, Thomas aldcroft at head.cfa.harvard.edu
Sun Feb 22 17:40:08 EST 2015


On Sun, Feb 22, 2015 at 2:52 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Feb 22, 2015 at 10:21 AM, Aldcroft, Thomas
> <aldcroft at head.cfa.harvard.edu> wrote:
> > The idea of a one-byte string dtype has been extensively discussed twice
> > before, with a lot of good input and ideas, but no action [1, 2].
> >
> > tl;dr: Perfect is the enemy of good.  Can numpy just add a one-byte
> string
> > dtype named 's' that uses latin-1 encoding as a bridge to enable Python 3
> > usage in the near term?
>
> I think this is a good idea. I think overall it would be good for
> numpy to switch to using variable-length strings in most cases (cf.
> pandas), which is a different kind of change, but fixed-length 8-bit
> encoded text is obviously a common on-disk format in scientific
> applications, so numpy will still need some way to deal with it
> conveniently. In the long run we'd like to have more flexibility (e.g.
> allowing choice of character encoding), but since this proposal is a
> subset of that functionality, then it won't interfere with later
> improvements. I can see an argument for utf8 over latin1, but it
> really doesn't matter that much so whatever, blue and purple bikesheds
> are both fine.
>
> The tricky bit here is "just" :-). Do you want to implement this? Do
> you know someone who does? It's possible but will be somewhat
> annoying, since to do it directly without refactoring how dtypes work
> first then you'll have to add lots of copy-paste code to all the
> different ufuncs.
>

I'm would be happy to have a go at this, with the caveat that someone who
understands numpy would need to get me started with a minimal prototype.
>From there I can do the "annoying" copy-paste for ufuncs etc, writing tests
and docs.  I'm assuming that with a prototype then the rest can be done
without any deep understanding of numpy internals (which I do not have).

- Tom


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150222/62874d7f/attachment.html>


More information about the NumPy-Discussion mailing list