[Python-Dev] Unicode as argument for 8-bit format strings

M.-A. Lemburg mal@lemburg.com
Fri, 07 Apr 2000 16:48:31 +0200


Guido van Rossum wrote:
> 
> > There has been a bug report about the treatment of Unicode
> > objects together with 8-bit format strings. The current
> > implementation converts the Unicode object to UTF-8 and then
> > inserts this value in place of the %s....
> >
> > I'm inclined to change this to have '...%s...' % u'abc'
> > return u'...abc...' since this is just another case of
> > coercing data to the "bigger" type to avoid information loss.
> >
> > Thoughts ?
> 
> Makes sense.  But note that it's going to be difficult to catch all
> cases: you could have
> 
> '...%d...%s...%s...' % (3, "abc", u"abc")
> 
> and
> 
> '...%(foo)s...' % {'foo': u'abc'}
> 
> and even
> 
> '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'}
> 
> (the latter should *not* convert to Unicode).

No problem... :-) Its a simple fix: once %s in an 8-bit string
sees a Unicode object it will stop processing the string and
restart using the unicode formatting algorithm. 

This will cost performance, of course. Optimization is easy though:
add a small "u" in front of the string ;-)

A sample session:
>>> '...%(foo)s...' % {'foo':u"abc"}
u'...abc...'
>>> '...%(foo)s...' % {'foo':"abc"}
'...abc...'
>>> '...%(foo)s...' % {u'foo':"abc"}
'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc"}
u'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc",'def':123}
u'...abc...'
>>> '...%(foo)s...' % {u'foo':u"abc",u'def':123}
u'...abc...'

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/