Yet another unicode WTF
Paul Boddie
paul at boddie.org.uk
Fri Jun 5 07:06:50 EDT 2009
On 5 Jun, 11:51, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
>
> Actually strings in Python 2.4 or later have the ‘encode’ method, with
> no need for importing extra modules:
>
> =====
> $ python -c 'import sys; sys.stdout.write(u"\u03bb\n".encode("utf-8"))'
> λ
>
> $ python -c 'import sys; sys.stdout.write(u"\u03bb\n".encode("utf-8"))' > foo ; cat foo
> λ
> =====
Those are Unicode objects, not traditional Python strings. Although
strings do have decode and encode methods, even in Python 2.3, the
former is shorthand for the construction of a Unicode object using the
stated encoding whereas the latter seems to rely on the error-prone
automatic encoding detection in order to create a Unicode object and
then encode the result - in effect, recoding the string.
As I noted, if one wants to remain sane and not think about encoding
everything everywhere, creating a stream using a codecs module
function or class will permit the construction of something which
deals with Unicode objects satisfactorily.
Paul
More information about the Python-list
mailing list