[XML-SIG] [OT] working unicode strings

M.-A. Lemburg mal@lemburg.com
Thu, 28 Jun 2001 18:47:06 +0200


Alexandre Fayolle wrote:
> 
> Ahem, sorry for posting this here, since it is slightly off-topic, but
> this list is a place where I know I'll find people who have played with
> unicode strings in Python.
> 
> So, I have a UTF8 encoded unicode string, that I got it by reading a text
> node in a DOM. I want to write it somewhere on the disk.
> 
> s= u'été'
> f=open('/tmp/foo','w')
> f.write(s)
> 
> This gets me
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> By experimenting a bit, it appears that I can call f.write(s) when f is a
> StringIO.StringIO, but not when it is a cStringIO.StringIO, a file, or a
> standard stream (sys.stdout). Needless to say, I'm quite disapointed.

Python's IO will try to write your Unicode string as 8-bit string
and to do so first convert it to the default encoding which is
ASCII.

f.write(s.encode('latin-1'))

should get you the results you are probably looking for.

Altenatively, open the file using codecs.open() -- it let's 
you specfiy the encoding of the file and does the converting
for you.
 
> So the first question is, how do you manage to work with unicode strings
> in PyXML/4Suite (a pointer to the right source file is a valuable answer).

I'll punt on this one...
 
> The second question is, where is the right place to discuss these issues ?

Sure.
 
> The third question is, does anyone know if this state of things is likely
> to change in python 2.2 ?

Things always change from one Python release to the next ;-)
Seriously, there are no plans for changing this behaviour;
otherwise Python would have to guess your encoding in some
way (one way is by looking at your locale settings;
see site.py for details).

IMO, explicit is better than implicit. So the codecs.open()
approach should be preferred.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/