Bytes vs. Unicode in Python3
Hi, The Python 3 porting needs some decisions on what is Bytes and what is Unicode. I'm currently taking the following approach. Comments? *** dtype field names Either Bytes or Unicode. But 'a' and b'a' are *different* fields. The issue is that: Python 2: {'a': 2}[u'a'] == 2, {u'a': 2}['a'] == 2 Python 3: {'a': 2}[b'a'], {b'a': 2}['a'] raise exceptions so the current assumptions in the C code of u'a' == b'a' cease to hold. dtype titles If Bytes or Unicode, work similarly as field names. dtype format strings, datetime tuple, and any other "protocol" strings Bytes. User can pass in Unicode, but it's converted using UTF8 codec. This will likely change repr() of various objects. Acceptable? -- Pauli Virtanen
participants (11)
-
Anne Archibald
-
Bruce Southey
-
Charles R Harris
-
Christopher Barker
-
Dag Sverre Seljebotn
-
David Cournapeau
-
David Cournapeau
-
Francesc Alted
-
Pauli Virtanen
-
Pauli Virtanen
-
René Dudfield