[Python-Dev] Divorcing str and unicode (no more implicit conversions).

Mon Oct 3 14:32:48 CEST 2005

Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
> 
> What if we could completely disable the implicit conversions between
> unicode and str?

This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
context, build, or platform.

A good rule of thumb is to convert to unicode everything that is
semantically textual, and to only use str for what is to be semantically
treated as a string of bytes (network packets, identifiers...). This is
also, AFAIU, the semantic model which is favoured for a hypothetical
future version of Python.

This is what I'm using to do safe conversion to a given type without
worrying about the type of the argument:

DEFAULT_CHARSET = 'utf-8'

def safe_unicode(s, charset=None):
    """
    Forced conversion of a string to unicode, does nothing
    if the argument is already an unicode object.
    This function is useful because the .decode method
    on an unicode object, instead of being a no-op, tries to
    do a double conversion back and forth (which often fails
    because 'ascii' is the default codec).
    """
    if isinstance(s, str):
        return s.decode(charset or DEFAULT_CHARSET)
    else:
        return s

def safe_str(s, charset=None):
    """
    Forced conversion of an unicode to string, does nothing
    if the argument is already a plain str object.
    This function is useful because the .encode method
    on an str object, instead of being a no-op, tries to
    do a double conversion back and forth (which often fails
    because 'ascii' is the default codec).
    """
    if isinstance(s, unicode):
        return s.encode(charset or DEFAULT_CHARSET)
    else:
        return s

Good luck

Antoine.