[Patches] sgmllib patch: handle specials

Cees de Groot cg@cdegroot.com
Sun, 21 May 2000 11:26:09 +0200


--==_Exmh_358495615P
Content-Type: text/plain; charset=us-ascii

For SGMLtools, I need to access the DTD of an SGML document. sgmllib doesn't 
handle this out-of-the-box, but it does know about the "<!...>" construct - 
there's a pattern for it, but that seems to be only used to make sure that 
literals don't end too early.

The patch adds an overridable method handle_special(self, data) which receives 
the contents of <!...> constructs; this allows subclasses to look for document 
type declarations and other interesting tidbits.

Regards,

Cees
----
I confirm that, to the best of my knowledge and belief, this contribution
is free of any claims of third parties under copyright, patent or
other rights or interests ("claims").  To the extent that I have
any such claims, I hereby grant to CNRI a nonexclusive, irrevocable,
royalty-free, worldwide license to reproduce, distribute, perform and/or
display publicly, prepare derivative versions, and otherwise use this
contribution as part of the Python software and its related documentation,
or any derivative versions thereof, at no cost to CNRI or its licensed
users, and to authorize others to do so.

I acknowledge that CNRI may, at its sole discretion, decide whether
or not to incorporate this contribution in the Python software and its
related documentation.  I further grant CNRI permission to use my name
and other identifying information provided to CNRI by me for use in
connection with the Python software and its related documentation.
----
*** /usr/lib/python/sgmllib.py	Tue Nov  9 13:22:27 1999
--- sgmllib.py	Sun May 21 11:10:21 2000
***************
*** 30,36 ****
  piclose = re.compile('>')
  endtagopen = re.compile('</[<>a-zA-Z]')
  endbracket = re.compile('[<>]')
! special = re.compile('<![^<>]*>')
  commentopen = re.compile('<!--')
  commentclose = re.compile('--[%s]*>' % string.whitespace)
  tagfind = re.compile('[a-zA-Z][-.a-zA-Z0-9]*')
--- 30,36 ----
  piclose = re.compile('>')
  endtagopen = re.compile('</[<>a-zA-Z]')
  endbracket = re.compile('[<>]')
! special = re.compile('<!([^<>]*)>')
  commentopen = re.compile('<!--')
  commentclose = re.compile('--[%s]*>' % string.whitespace)
  tagfind = re.compile('[a-zA-Z][-.a-zA-Z0-9]*')
***************
*** 143,150 ****
                      if self.literal:
                          self.handle_data(rawdata[i])
                          i = i+1
!                         continue
!                     i = match.end(0)
                      continue
              elif rawdata[i] == '&':
                  match = charref.match(rawdata, i)
--- 143,151 ----
                      if self.literal:
                          self.handle_data(rawdata[i])
                          i = i+1
!                     else:
!                         self.handle_special(match.group(1))
!                         i = match.end(0)
                      continue
              elif rawdata[i] == '&':
                  match = charref.match(rawdata, i)
***************
*** 376,381 ****
--- 377,386 ----
  
      # Example -- handle processing instruction, could be overridden
      def handle_pi(self, data):
+         pass
+ 
+     # Example -- handle special, could be overridden
+     def handle_special(self, data):
          pass
  
      # To be overridden -- handlers for unknown objects

-- 
Cees de Groot               http://www.cdegroot.com     <cg@cdegroot.com>
                            http://sgmltools-lite.sourceforge.net/
GnuPG 1024D/E0989E8B 0016 F679 F38D 5946 4ECD  1986 F303 937F E098 9E8B
Forge your CipherSaber and list it: http://www.xs4all.nl/~cg/ciphersaber/



--==_Exmh_358495615P
Content-Type: application/pgp-signature

-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999

owHNV82LHEUUzwceHBg0Bz2/RJL9yHTPzM5mNzP5MLofYUElJIsHlyjV3W9mynRX
tVXVMzs5qJiLIIgggh70Jggi/gmCJ8G7Z69RPIgnBcH3qmd2dtfZBVEwfZmeqvfx
e7/3qt7rD6qnT54689vpD9/68ePP3jv5/c/uiUaY2V7YXF5uhrib9dtzX369ppVD
5YLtUY4dcLjr6nkqpLoCcV8Yi+5aYQNhYymrlWplUxu4c/PFF5zWqa3BFijEBJwG
EcdoLbg+wvr2OuguCOUlIdFxkZGHEGwvS1MZ0QpaNeeqlb5QSYqkJC3owgW6G5CB
INK7NYgKB9J5Wbin9BBERCLewbmrZ8MwvH4OYq2sM0XsIKhWaMfgnAUBuXAOjYIu
gZWuNOX6woFFzCyjjRC0SkdQ2BJ9Ju4h2MKgl6tWUkkGRGrJPQEFVCylAYVJRyHz
sE0wyE3cB5EklmPVAzRGJiKigDJ0fZ1AGd5rNsdYinTeYtqtQSKcWIBhX5KuwRjl
AK0Hz9FwJiyTV0Y4DdBeKVkSaaqHlqBGcSqsRR9NqvU9H+yEarJH2YQEScgIJ8kK
QUxAM0cgyY1B66TqgZNJJJ31Md3GnjCJrfH7GjKqgJ5qZYthdKXJPDk19shwIzLB
ULORz0+KSQ+9lwhTiRSpB8xBGUkZIBDVCi10DWJZHiMgeDLz8ZKsSYhR4yTFVKiE
cMY6HxnZ65NLoprCAm2qlTIGv06a02gszJ8r7Z1bCAG2S5BUzqzos79FCRlgtcKe
bUH0l+JcxVw60Qh6RrCwhrWXbm9RISmtcDdOC0tJqoE0Bgc65gwTR0aPROpGAcdT
g6E2aTKUCUIqY1QW2YrB3OikiGk/kbakgd5zNJSsjLmqc0S0RyduBHkRkXI6IgnS
FIYTaOSA8jdAoOqynMfaNJFDSW4KW56famU/0yCsJ7OkFuHWiApSgdVdN2TDbIPS
TghTYjbZKxxfKxQc8coszfAP/phpSi8xqjQlyU4Z8+fNTihIqhVCZ8aQ+Y4oCIWR
97HE72s30YTKlx8RHk8LyWfMG83EyDtjy1bT6SK+YoMeKZc4kz7sI1v0wJX2gKSK
tck1lT/+vRJp9zheKLuziKGy2oJuYXwFlrXiEVJCM2mZHnbMGaEzoUTma23v1CVk
RnZHfOqk4gLwNinXekB7yR6JVIcZ+uPMlqTymVUYe+mhdP1/ntFwcpQXFxehXlhT
p2u4nnsT9fGtHOajx7cLhJf0AKANzVZnaamztArNdrvt1WGf4J1CwYtUsktNaDY7
zUaHXpYajYb3sP8pXbYatdYKlP8BcipyTaFdI7BhrLNcpjg/d31ugTfpqnWip3NU
h/av1neuXhfB/eeCV+7uiUaGagbdIVES9CJnYXz1HjZ1dudVElkcu6R1ZmqWz7NB
cEBmFvAg2Dlv2RicBz7kqhfS7e7Q5iJGr0wBdSXl5xDKcTA7QVi+NYL23UV2x2SX
lJVZ+68o++lfcTZfkrbwqLI2s/Kay61a81Jjr/ZmPbIL3JrDcdPvHCXHjxcc93Xu
5PNGDPl3R95dOE5PUhjyYpP5Perh+0mqAo+SYRMZjxwhJXG+caS7qZ2D69SVuzCF
C9euwdyFuZnRejfkjidAg93Q/5+ESo1wXKMluc29Kp2J+v8nF1OLneOY3+93MqyV
TPeMLvL55sLCceqPVmJmX8CrK7XW5ebD8TnwF8zqKi1NrpiJs2dgY1dkOXVZEhmP
59SheLYvG1c5jvreG+siTfwoXY6+1OAmZhLsTqbfXO4ffCmqi3vR5DTB0v/J0izf
43Qc4eziYWczRu0DPJYep8Fu64Mmp55pOvEtWPFIQtN99Dp1YK/LdPF4TH7hptE0
bRx8+s7lnXp9OByGcYI9luDry+9djXs39i9eP66s90xx4/UfXAGfotDqwsRI6HoY
KnT1auWmKm7dhGZjaXm9vtFoX25vXH4eGo3mCmyurLZhs3V5HS61l1dAL2+srQP1
dEr8ZqvRgnZrdRNYBVjHf9/R8DUiD7Amc5pb7ogIjZ8uUhphacTo7A9w1y7TF0mo
0vqbca8eew3LGnVmqlp599nTj53kr8zJV+iZU189OPHpk38+9f4vX3wV//F2FP3+
yTvw3Y27D098duHphW/lD9882Bh8lD7+68efN14+9cZf
=eZ+E
-----END PGP MESSAGE-----

--==_Exmh_358495615P--