[Numpy-discussion] One-byte string dtype: third time's the charm?

Robert Kern robert.kern at gmail.com
Sun Feb 22 15:04:17 EST 2015


On Sun, Feb 22, 2015 at 7:29 PM, Sturla Molden <sturla.molden at gmail.com>
wrote:
>
> On 22/02/15 19:21, Aldcroft, Thomas wrote:
>
> > Problems like this are now showing up in the wild [3].  Workarounds are
> > also showing up, like a way to easily convert from 'S' to 'U' within
> > astropy Tables [4], but this is really not a desirable way to go.
> > Gigabyte-sized string data arrays are not uncommon, so converting to
> > UCS-4 is a real memory and performance hit.
>
> Why UCS-4? The Python's internal "flexible string respresentation" will
> use ascii for ascii text.

numpy's 'U' dtype is UCS-4, and this is what Thomas is referring to, not
Python's string type. It cannot have a flexible representation as it *is*
the representation. Python 3's `str` type is opaque, so it can freely
choose how to represent the data in memory. numpy dtypes transparently
describe how the data is represented in memory.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150222/e242a03a/attachment.html>


More information about the NumPy-Discussion mailing list