[Python-Dev] str() vs. unicode()

Guido van Rossum guido@python.org
Tue, 25 Sep 2001 15:57:09 -0400


> > Adding a slot is a bit painful now that there are so many
> > new slots already (adding it to the end means you have to add tons of
> > zeros, adding it to the middle means I have to edit every file).
> 
> 
> Hmm, what about a type object initialisation function that takes
> "named arguments" via varargs:
>     PyType_Initialize(&PyUnicode_Type,
>        TYPE_TYPE, &PyType_Type,
>        TYPE_NAME, "unicode",
>        SLOT_DESTRUCTOR, _PyUnicode_Free,
>        SLOT_CMP, unicode_compare,
>        SLOT_REPR, unicode_repr,
>        SLOT_SEQ, unicode_as_sequence,
>        SLOT_HASH, unicode_hash,
>        DONE
>     )
> 
> The SLOT_xxx arguments would be #defines like this
> #define DONE 0
> #define TYPE_TYPE 1
> #define TYPE_NAME 2
> #define SLOT_DESTRUCTOR 3
> #define SLOT_CMP 4
> 
> Adding a new slot would require much less work: define a new slot 
> 
> *somewhere* in the struct, define a new SLOT_xxx and add
>     SLOT_xxx, foo_xxx
> to the call to the initializer for every type that implements this
> struct. Performance shouldn't be a problem, because this function
> would only be called once for every type. And we could get rid of
> the problem with static initialization of ob_type with some
> compilers.

Cool idea.  It would definitely be worth to pursue this when starting
from scratch.  Right now, it would only slow us down to convert all
the existing statically initialized types to use this mechanism.
Also, for some of the built-in types we'd have to decide on a point in
the initialization sequence where to initialize them.

> > [...]
> > 
> > I would add __unicode__ support without tp_unicode right away.
> 
> I like this idea. There is no need to piggyback unicode
> representation of objects onto tp_str/__str__. Both PyObject_Str
> and PyObject_Unicode will get much simpler.
> 
> But we will need int.__unicode__, float.__unicode__ etc.
> (or fallback to __str__)

We should fallback to tp_str -- for most of these types there's never
a need to generate non-ASCII characters so using the ASCII
representation and converting that to Unicode would work just fine.

> BTW, what about __repr__? Should this be allowed to return unicode 
> objects? (currently it is, and uses PyUnicode_AsUnicodeEscapeString)

But this is rarely what the caller expects, and it violates the
guideline that repr() should return something that can be fed back to
the parser.  I'd rather change the rules to require that __repr__ and
tp_repr return an 8-bit string at all times.

--Guido van Rossum (home page: http://www.python.org/~guido/)