[XML-SIG] Character entities (XHTML)
Thomas B. Passin
tpassin@comcast.net
Wed, 08 May 2002 23:11:50 -0400
I've replicated Andrew's results using Py2.1.3 and Pyxml 0.7. The tw=
o xhtml
entities are getting silently lost, in that they do not appear in the
output, neither as characters nor as character references. It's no=
t just
that the original entities are missing, the characters that they repr=
esent
are missing. The DTD is getting processed - when I removed the DTD t=
he
program failed with an error, complaining about unknown entities, as =
it
should have.
I noticed one other thing. The output DOCTYPE declaration is missing=
the
SYSTEM and PUBLIC identifiers that were present in the source.
Cheers,
Tom P
[Andrew Cooke]
Thanks for the reply. Two points in response:
1 - I screwed up with the input file, it was incorrect. My apologies=
. I
have included a revised demonstration of the problem below (the input=
file
has been through Tidy, but the entities still exist at that point).
2 - As far as I can tell, the problem isn=B4t that the entities are b=
eing
replaced by the appropriate character, but that they are being silent=
ly
dropped. The < entity passes through correctly, but the ó =
entity
is not replaced by an =F3 (nor is the ¡ replaced by a =A1).
Thanks,
Andrew
PS I am using PyXml installed from PxML-0.7.win32-py2.2.exe
F:\home\Andrew\multi\src\xhtml>python
Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win=
32
Type "help", "copyright", "credits" or "license" for more information=
.
>>> import sys
>>> from xml.dom.ext.reader.Sax2 import FromXmlFile
>>> from xml.dom.ext import PrettyPrint
>>> sys.stdout.writelines(open("index.xhtml").readlines())
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta name=3D"generator"
content=3D"HTML Tidy for Cygwin (vers 1st April 2002), see www.w3.org=
" />
<link type=3D"text/css" rel=3D"stylesheet" href=3D"basic.css" />
<title>Index</title>
</head>
<body>
<h1>¡<Hola!</h1>
<a href=3D"init">initialisación</a>
</body>
</html>
>>> PrettyPrint(FromXmlFile("index.xhtml"))
<?xml version=3D'1.0' encoding=3D'UTF-8'?>
<!DOCTYPE html>
<html xmlns=3D'http://www.w3.org/1999/xhtml'>
<head>
<meta content=3D'HTML Tidy for Cygwin (vers 1st April 2002), see
www.w3.org' n
ame=3D'generator'/>
<link href=3D'basic.css' rel=3D'stylesheet' type=3D'text/css'/>
<title>Index</title>
</head>
<body>
<h1><Hola!</h1>
<a href=3D'init'>initialisacin</a>
</body>
</html>
>>>