[Python-Dev] Unicode as argument for 8-bit format strings
M.-A. Lemburg
mal@lemburg.com
Fri, 07 Apr 2000 16:48:31 +0200
Guido van Rossum wrote:
>
> > There has been a bug report about the treatment of Unicode
> > objects together with 8-bit format strings. The current
> > implementation converts the Unicode object to UTF-8 and then
> > inserts this value in place of the %s....
> >
> > I'm inclined to change this to have '...%s...' % u'abc'
> > return u'...abc...' since this is just another case of
> > coercing data to the "bigger" type to avoid information loss.
> >
> > Thoughts ?
>
> Makes sense. But note that it's going to be difficult to catch all
> cases: you could have
>
> '...%d...%s...%s...' % (3, "abc", u"abc")
>
> and
>
> '...%(foo)s...' % {'foo': u'abc'}
>
> and even
>
> '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'}
>
> (the latter should *not* convert to Unicode).
No problem... :-) Its a simple fix: once %s in an 8-bit string
sees a Unicode object it will stop processing the string and
restart using the unicode formatting algorithm.
This will cost performance, of course. Optimization is easy though:
add a small "u" in front of the string ;-)
A sample session:
>>> '...%(foo)s...' % {'foo':u"abc"}
u'...abc...'
>>> '...%(foo)s...' % {'foo':"abc"}
'...abc...'
>>> '...%(foo)s...' % {u'foo':"abc"}
'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc"}
u'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc",'def':123}
u'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc",u'def':123}
u'...abc...'
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/