[Numpy-discussion] Enum type

Nathaniel Smith njs at pobox.com
Tue Jan 3 15:02:24 EST 2012


On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski <ognen at enthought.com> wrote:
> Hello,
>
> I am playing with adding an enum dtype to numpy (to get my feet wet in
> numpy really). I have looked at the
> https://github.com/martinling/numpy_quaternion and I feel comfortable
> with my understanding of adding a simple type to numpy in technical
> terms.

Hi Ognen,

I'm in the middle of an intercontinental move, so I can't help much,
but I'd also love to see a proper enum/categorical type in numpy, so
here are a few notes:

- I wrote a simple cython implementation of this last year, which
might be useful -- code attached.

- The barrier I ran into, which you'll surely run into as well, is a
flaw in the ufunc API in numpy. Currently, ufunc inner loops do not
have any way to access the dtype of the array they are being called
on. For most dtypes, this isn't an issue -- the inner loop for adding
together int32's knows that it is being called on an array of int32's,
it doesn't need to see the dtype to figure that out. But with enums,
each array has a different set of possible categories, and these will
be attached to the dtype object somehow. So if you want to do, say,
equality comparison between an enum-array and a string-array:
  np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True,
False, True])
...you can't actually make this work in current numpy. The solution is
that the ufunc API needs to be changed to make dtype's somehow
available to inner loops. (Probably by passing a pointer to the array
object, like all the PyArray_ArrFuncs do.)

See this thread:
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html

- Both the statistical folk (pandas, statsmodels) and the hdf5 folk
(pytables, h5py) have reasons to want better enum support. (Maybe
there are other use cases too -- anyone I'm forgetting?) You should
make sure to talk to both groups to make sure what you come up with
will work for them.

Cheers,
-- Nathaniel

> I am mostly a C programmer and have programmed in Python but not at
> the level where my code wcould be considered "pretty" or maybe even
> "pythonic". I know enums from C and have browsed around a few python
> enum implementations online. Most of them use hash tables or lists to
> associate names to numbers - these approaches just feel "heavy" to me.
>
> What would be a proper "numpy approach" to this? I am looking mostly
> for direction and advice as I would like to do the work myself :-)
>
> Any input appreciated :-)
> Ognen
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: npenum.pyx
Type: application/octet-stream
Size: 12481 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/ae2fa198/attachment.obj>


More information about the NumPy-Discussion mailing list