Strange problems with encoding

Michael Hudson mwh at python.net
Thu Nov 6 08:56:43 EST 2003


Rudy Schockaert <rudy.schockaert at pandoraSTOPSPAM.be> writes:

> Sebastian Meyer wrote:
> 
> > Hi newsgroup,
> > i am trying to replace german special characters in strings like
> >     str = re.sub('ö', 'oe', str)
> > When i work with this, i always get the message
> > UniCode Error: ASCII decoding error : ordinal not in range(128)
> > Yes i  have googled, i searched the faq, manual and python library
> > and
> > searched all known soruces of information. I played with the python
> > builtin function encode to enforce the rigth encoding, but the error
> > stays the same. I ve read a lot about UniCode and internal conversion
> > about Strings done by python, but somehow i ve missed the clue.
> > Nope, python says Huuups... ordinal not in range(128), ;-(
> > Anyone of you having any idea?? Seems like i am too stupid to read
> > documentation carefully., perhaps i misunderstand something...
> > thanks for your help in advance
> > Sebastian
> 
> I'm experiencing something similar for the moment. I try to
> base64-encode Unicode strings and I get the exact same errormessage.

"base64-encoding Unicode strings" is not a particularly well defined
operation.  "base64-encoding" is a way of turning *binary data* into a
particularly "safe" sequence of ascii characters.

Unicode (in some sense) is a family of ways of representing strings of
characters as binary data.

So to base-64 encode a Unicode string, you need to choose *which*
member of this family you're going to use, which is to say the
encoding.  UTF-8 would seem a good bet.

But...

>  >>> s = u'ö'
>  >>> s
> u'\xf6'
>  >>> s.encode('base64')
> Traceback (most recent call last):
>    File "<interactive input>", line 1, in ?
>    File "C:\Python23\lib\encodings\base64_codec.py", line 24, in
>    base64_encode
>      output = base64.encodestring(input)
>    File "C:\Python23\lib\base64.py", line 39, in encodestring
>      pieces.append(binascii.b2a_base64(chunk))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 0: ordinal not in range(128)

>>> u'ö'.encode('utf-8').encode('base64')
'w7Y=\n'

> When I don't specify it's unicode it works:
>  >>> s = 'ö'
>  >>> s
> '\xf6'
>  >>> s.encode('base64')
> '9g==\n'

Well, this works because your terminal seems to be latin-1:

>>> u'ö'.encode('latin-1').encode('base64')
'9g==\n'

What would you like to do with a character that isn't in latin-1?

> The reason I want to base64-encode these unicode strings is because I
> get those as input and want to store them in a MySQL database using
> SQLObject.

! Why can't you just encode them as utf-8 strings?  (Or, thinking
about it, why doesn't SQLObject support unicode?)

Cheers,
mwh

-- 
  I think if we have the choice, I'd rather we didn't explicitly put
  flaws in the reST syntax for the sole purpose of not insulting the
  almighty.                                    -- /will on the doc-sig




More information about the Python-list mailing list