XML and UnicodeError
just at xs4all.nl
Tue Oct 5 12:52:43 CEST 2004
In article <Xns957977C1D83A7devnulloo at 220.127.116.11>,
Pinke Panke <dev at null.oo> wrote:
> > 2. Convert any other text to Unicode as soon as possible.
> Ok, i.e.
> headline = structure # is unicode
> pagetext = structure # is unicode
> fill = "bar".encode('utf-8') # lets make it unicode
That's not making it unicode; you mean
fill = unicode("bar", "utf-8")
(Or "bar".decode("utf-8"), which does the same; I prefer using the
> foo = headline + fill + pagetext # foo is unicode, too
> > 3. Manipulate only Unicode values - don't mix them up with
> > plain strings.
> It makes sense, but I need some string concatenations. E.g. I set
> default values in the python script and try to concatenate them with
> XML values.
> But now, I would think the safest way is to transfer all plain strings
> in the python script into a second XML file and use them, because
> after reading in they would be in Unicode. Right?
Yes, but there's no need to. Are you perhaps using string literals
containing non-ascii chars, yet don't use the 'u' prefix? u"\xff" as
opposed to "\xff".
> Or saving the python script in utf-8 would make the difference?
> > 4. Serialise to your chosen encoding only when preparing
> > output.
> Every string concatenation in my script is preparing output.
Do _all_ manipulations using unicode, and convert to utf-8 as late as
poosible, ie. when you're passing the result to code that expects
non-unicode data. That's basically what he was saying.
More information about the Python-list