[Python-Dev] Unicode proposal: %-formatting ?

Tim Peters tim_one@email.msn.com
Wed, 17 Nov 1999 02:33:06 -0500


[MAL]
> ...
> This means a new PyUnicode_Format() implementation mapping
> Unicode format objects to Unicode objects.

It's a bitch, isn't it <0.5 wink>?  I hope they're paying you a lot for
this!

> ... hmm, there is a problem there: how should the PyUnicode_Format()
> API deal with '%s' when it sees a Unicode object as argument ?

Anything other than taking the Unicode characters as-is would be
incomprehensible.  I mean, it's a Unicode format string sucking up Unicode
strings -- what else could possibly make *sense*?

> E.g. what would you get in these cases:
>
> u = u"%s %s" % (u"abc", "abc")

That u"abc" gets substituted as-is seems screamingly necessary to me.

I'm more baffled about what "abc" should do.  I didn't understand the t#/s#
etc arguments, and how those do or don't relate to what str() does.  On the
face of it, the idea that a gazillion and one distinct encodings all get
lumped into "a string object" without remembering their nature makes about
as much sense as if Python were to treat all instances of all user-defined
classes as being of a single InstanceType type <wink> -- except in the
latter case you at least get a __class__ attribute to find your way home
again.

As an ignorant user, I would hope that

    u"%s" % string

had enough sense to know what string's encoding is all on its own, and
promote it correctly to Unicode by magic.

> Perhaps we need a new marker for "insert Unicode object here".

%s means string, and at this level a Unicode object *is* "a string".  If
this isn't obvious, it's likely because we're too clever about what
non-Unicode string objects do in this context.