[XML-SIG] Character entities (XHTML)

Thomas B. Passin tpassin@comcast.net
Wed, 08 May 2002 23:11:50 -0400


I've replicated Andrew's results using Py2.1.3 and Pyxml 0.7.  The tw=
o xhtml
entities are getting silently lost, in that they do not appear in the
output, neither as characters nor as character references.    It's no=
t just
that the original entities are missing, the characters that they repr=
esent
are missing.  The DTD is getting processed - when I removed the DTD t=
he
program failed with an error, complaining about unknown entities, as =
it
should have.

I noticed one other thing.  The output DOCTYPE declaration is missing=
 the
SYSTEM and PUBLIC identifiers that were present in the source.

Cheers,

Tom P

[Andrew Cooke]

Thanks for the reply.  Two points in response:

1 - I screwed up with the input file, it was incorrect.  My apologies=
.  I
have included a revised demonstration of the problem below (the input=
 file
has been through Tidy, but the entities still exist at that point).

2 - As far as I can tell, the problem isn=B4t that the entities are b=
eing
replaced by the appropriate character, but that they are being silent=
ly
dropped.  The < entity passes through correctly, but the ó =
entity
is not replaced by an =F3 (nor is the ¡ replaced by a =A1).

Thanks,
Andrew

PS I am using PyXml installed from PxML-0.7.win32-py2.2.exe

F:\home\Andrew\multi\src\xhtml>python
Python 2.2.1 (#34, Apr  9 2002, 19:34:33) [MSC 32 bit (Intel)] on win=
32
Type "help", "copyright", "credits" or "license" for more information=
.
>>> import sys
>>> from xml.dom.ext.reader.Sax2 import FromXmlFile
>>> from xml.dom.ext import PrettyPrint
>>> sys.stdout.writelines(open("index.xhtml").readlines())
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta name=3D"generator"
content=3D"HTML Tidy for Cygwin (vers 1st April 2002), see www.w3.org=
" />
<link type=3D"text/css" rel=3D"stylesheet" href=3D"basic.css" />
<title>Index</title>
</head>
<body>
<h1>&iexcl;&lt;Hola!</h1>

<a href=3D"init">initialisaci&oacute;n</a>
</body>
</html>

>>> PrettyPrint(FromXmlFile("index.xhtml"))
<?xml version=3D'1.0' encoding=3D'UTF-8'?>
<!DOCTYPE html>
<html xmlns=3D'http://www.w3.org/1999/xhtml'>
  <head>
    <meta content=3D'HTML Tidy for Cygwin (vers 1st April 2002), see
www.w3.org' n
ame=3D'generator'/>
    <link href=3D'basic.css' rel=3D'stylesheet' type=3D'text/css'/>
    <title>Index</title>
  </head>
  <body>
    <h1>&lt;Hola!</h1>
    <a href=3D'init'>initialisacin</a>
  </body>
</html>
>>>