[Python-Dev] PEP 414 - Unicode Literals for Python 3

Vinay Sajip vinay_sajip at yahoo.co.uk
Tue Feb 28 07:56:31 CET 2012


R. David Murray <rdmurray <at> bitdance.com> writes:

> The rationale claims there's no way to spell "native string" if you use
> unicode_literals, which is not true.
> 
> It would be different from u('') in that I would expect that there are
> far fewer instances where 'native string' is required than there are
> places where unicode strings work (and should therefore be preferred).

A couple of people have said that 'native string' is spelt 'str', but I'm not
sure that's the right answer. For example, 2.x's cString.StringIO expects native
strings, not Unicode:

>>> from cStringIO import StringIO
>>> s = StringIO(u'\xe9')
>>> s
<cStringIO.StringI object at 0x232de40>
>>> s.getvalue()
'\xe9\x00\x00\x00'

Of course, you can't call str() on that value to get a native string:

>>> str(u'\xe9')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0:
ordinal not in range(128)

So I think using str will not give the desired effect in some situations: on
Django, I used a function that resolves differently depending on Python version:
something like

def native(literal): return literal

on Python 3, and

def native(literal): return literal.encode('utf-8')

on Python 2.

I'm not saying this is the right thing to do for all cases - just that str() may
not be, either. This should be elaborated in the PEP.

Regards,

Vinay Sajip




More information about the Python-Dev mailing list