[Numpy-discussion] GSOC 2013

Tue Mar 5 13:52:56 EST 2013

On 4 Mar 2013 23:21, "Jaime Fernández del Río" <jaime.frio at gmail.com> wrote:
>
> On Mon, Mar 4, 2013 at 2:29 PM, Todd <toddrjen at gmail.com> wrote:
>>
>>
>> 5. Currently dtypes are limited to a set of fixed types, or combinations
of these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
project would be to allow users to create entirely new, non-standard dtypes
based on simple rules, such as specifying the length of the sign, length of
the exponent, and length of the mantissa for a custom floating-point
number.  Hopefully this would mostly be used for reading in non-standard
data and not used that often, but for some situations it could be useful
for storing data too (such as large amounts of boolean data, or genetic
code which can be stored in 2 bits and is often very large).
>
>
> I second this general idea. Simply having a pair of packbits/unpackbits
functions that could work with 2 and 4 bit uints would make my life easier.
If it were possible to have an array of dtype 'uint4' that used half the
space of a 'uint8', but could have ufuncs an the like ran on it, it would
be pure bliss. Not that I'm complaining, but a man can dream...

This would be quite difficult, since it would require reworking the guts of
the ndarray data structure to store strides and buffer offsets in bits
rather than bytes, and probably with endianness handling too. Indexing is
all done at the ndarray buffer-of-bytes layer, without any involvement of
the dtype.

Consider:

a = zeros(10, dtype=uint4)
b = a[1::3]

Now b is a view onto a discontiguous set of half-bytes within a...

You could have a dtype that represented several uint4s that together added
up to an integral number of bytes, sort of like a structured dtype. Or
packbits()/unpackbits(), like you say.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/3333ebf7/attachment.html>