[Numpy-discussion] Enum/Factor NEP (now with code)

Sun Jun 17 17:19:59 EDT 2012

On Sun, Jun 17, 2012 at 9:04 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Sun, Jun 17, 2012 at 6:10 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Wed, Jun 13, 2012 at 7:54 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>> It looks like the levels can only be strings. This is too limited for
>>> my needs. Why not support all possible NumPy dtypes? In pandas world,
>>> the levels can be any unique Index object
>>
>> It seems like there are three obvious options, from most to least general:
>>
>> 1) Allow levels to be an arbitrary collection of hashable Python objects
>> 2) Allow levels to be a homogenous collection of objects of any
>> arbitrary numpy dtype
>> 3) Allow levels to be chosen a few fixed types (strings and ints, I guess)
>>
>> I agree that (3) is a bit limiting. (1) is probably easier to
>> implement than (2). (2) is the most general, since of course
>> "arbitrary Python object" is a dtype. Is it useful to be able to
>> restrict levels to be of homogenous type? The main difference between
>> dtypes and python types is that (most) dtype scalars can be unboxed --
>> is that substantively useful for levels?
[...]
> I'm in favor of option #2 (a lite version of what I'm doing
> currently-- I handle a few dtypes (PyObject, int64, datetime64,
> float64), though you'd have to go the code-generation route for all
> the dtypes to keep yourself sane if you do that.

Why would you do code generation? dtype's already expose a generic API
for doing boxing/unboxing/etc. Are you thinking this would just be too
slow or...?

-N