[XML-SIG] Copyright character chokes parser

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Fri, 15 Dec 2000 09:49:40 -0700


> I'm fiddling with XBEL using PyXML 0.6.2.
> =

> I have a bookmark entry as follows:
> =

>     <bookmark href=3D"http://www.optioninsight.com/" added=3D"946429657=
" visited=3D"946444587" modified=3D"946429652" >
>       <title>Option Insight=A9 - Home of the Greatest Option Program. E=
ver.</title>
>     </bookmark>

I just went through encoding hell of a more involved sort so I might as w=
ell =

chip in here.

Add =


<?xml version=3D'1.0' encoding=3D'ISO-8859-1'?>

As the first thing in your XML file (that is even before any white space)=
 and =

you should be fine.  If you don't specify an encoding, the parser assumes=
 UTF-8
(except if you use a byte-order mark in which case it assumes UTF-16).  T=
he =

copyright char is not legal UTF-8 because it''s a byte value exceeding 12=
7.  =

ISO-8859-1 or LATIN-1 allow you to use byte values above 127.


-- =

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com =

4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python