xsl and unicode surrogate characters
Diez B. Roggisch
deets at nospam.web.de
Thu Jan 5 04:42:15 EST 2006
> thanks very much for the info, it really helped
> we are using the text from file to display on webpage and we have a
> method for conversion the parsed data to utf-8 and then displaying, all
> the data looks fine after parsing except the
> surrogate pair,
> since i can not guess what it was supposed to be , is it ok to strip it
> using regex re.complie(' [\xed|\xa0] ')?
As martin said: that alters the meaning of the bytes. If that has to bother
you or not, that's yours to decide. If for example you stripped all vocals
from a text, it still might be comprehensible for most people, so if vocals
bother you for whatever reason, remove them.
Bt myb y bttr try nd fx th prblm n th frst plc.
More information about the Python-list