XML and UnicodeError
dev at null.oo
Tue Oct 5 17:55:08 CEST 2004
> Are you perhaps using string literals containing non-ascii chars,
> yet don't use the 'u' prefix? u"\xff" as opposed to "\xff".
E.g. I convert umlauts to html entities or change symbols to ascii
strings for file names. Instead of using the x-notation I typed the
character itself. In the case of my script no character is over chr
(255). An example:
def foo (name):
name = re.sub(r'®','_registered_',name)
... and many more substitutions
I think instead of r'' I should use u''?
It is possible to compile a RE object with the U flag:
matchreg = re.compile(u'®', re.U)
name = matchreg.sub('_registered_',name)
But maybe not neccessary. In my tests using any u-switches and u-
flags makes no difference. The only crucial things were
1. using unicode().
2. using a coding flag as described in 
3. storing the python script as utf-8
For me using unicode() is ok.
More information about the Python-list