[Python-Dev] Divorcing str and unicode (no more implicit conversions).
Antoine Pitrou
solipsis at pitrou.net
Mon Oct 3 14:32:48 CEST 2005
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
>
> What if we could completely disable the implicit conversions between
> unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
context, build, or platform.
A good rule of thumb is to convert to unicode everything that is
semantically textual, and to only use str for what is to be semantically
treated as a string of bytes (network packets, identifiers...). This is
also, AFAIU, the semantic model which is favoured for a hypothetical
future version of Python.
This is what I'm using to do safe conversion to a given type without
worrying about the type of the argument:
DEFAULT_CHARSET = 'utf-8'
def safe_unicode(s, charset=None):
"""
Forced conversion of a string to unicode, does nothing
if the argument is already an unicode object.
This function is useful because the .decode method
on an unicode object, instead of being a no-op, tries to
do a double conversion back and forth (which often fails
because 'ascii' is the default codec).
"""
if isinstance(s, str):
return s.decode(charset or DEFAULT_CHARSET)
else:
return s
def safe_str(s, charset=None):
"""
Forced conversion of an unicode to string, does nothing
if the argument is already a plain str object.
This function is useful because the .encode method
on an str object, instead of being a no-op, tries to
do a double conversion back and forth (which often fails
because 'ascii' is the default codec).
"""
if isinstance(s, unicode):
return s.encode(charset or DEFAULT_CHARSET)
else:
return s
Good luck
Antoine.
More information about the Python-Dev
mailing list