[Spambayes] Re: Alpha 2 Release?

Michael Hudson mwh at python.net
Thu Jan 30 11:15:00 EST 2003


Richie Hindle <richie at entrian.com> writes:

> [François]
>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>> position 86: ordinal not in range(128)
>
> This is bizarre.  This is expat complaining that you can't have high-bit
> characters in ASCII XML, which is quite right, but I replace all those
> characters with charrefs on the way in:
>
>>>> def replaceHighCharacters(match):
> ...     return "&#%d;" % ord(match.group(1))
> ...
>>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d")
> u'a b &#233; c d'
>
> So what's going on...?

Umm, that regexp isn't going to match, e.g. u"\N{EURO SIGN}":

>>> ord(u"\N{EURO SIGN}")
8364

Could that be what's happening?

Cheers,
M.

-- 
  > Or can I sweep that can of worms under the rug?
  Please shove them under the garage.
   -- Greg Ward and Guido van Rossum mix their metaphors on python-dev




More information about the Spambayes mailing list