[XML-SIG] PyExpat update

Paul Prescod paul@prescod.net
Mon, 07 Feb 2000 14:50:11 -0600


I did some work on pyexpat over the weekend. Modulo bugs I have
introduced, I think that my changes so far have all been backwards
compatible. I list my new features at the bottom of this message.

Before I release, I want some xml-sig opinions on things I would like to
change that are NOT backwards compatible.

1. Attributes would be returned as a mapping {key:value, key:value} and
not a list [key,value,key,value] . Obviously this will break code that
expected the former.

2. Errors will be returned as strings, not integers. You can check for
string equality using "==" The intention is not that you would hard-code
strings into your code, but would rather use pre-defined string
constants: 

foo = parser.Parse( data )
if foo is pyexpat.unclosed_token:
        print "Oops:"+pyexpat.unclosed_token

IIRC, Python is smart about checking for pointer equality before string
equality, right?) 

3. There will be no list of exceptions in the modules interface. Here's
what it looks like now:

>>> import pyexpat
>>> for name in dir( pyexpat ):
...     if name[0:3]=="XML":
...         print name, getattr( pyexpat, name )
...
XML_ERROR_ASYNC_ENTITY 13
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF 16
XML_ERROR_BAD_CHAR_REF 14
XML_ERROR_BINARY_ENTITY_REF 15
XML_ERROR_DUPLICATE_ATTRIBUTE 8
XML_ERROR_INCORRECT_ENCODING 19
XML_ERROR_INVALID_TOKEN 4
XML_ERROR_JUNK_AFTER_DOC_ELEMENT 9
XML_ERROR_MISPLACED_XML_PI 17
XML_ERROR_NONE 0
XML_ERROR_NO_ELEMENTS 3
XML_ERROR_NO_MEMORY 1
XML_ERROR_PARAM_ENTITY_REF 10
XML_ERROR_PARTIAL_CHAR 6
XML_ERROR_RECURSIVE_ENTITY_REF 12
XML_ERROR_SYNTAX 2
XML_ERROR_TAG_MISMATCH 7
XML_ERROR_UNCLOSED_TOKEN 5
XML_ERROR_UNDEFINED_ENTITY 11
XML_ERROR_UNKNOWN_ENCODING 18

I would rather move all of these to an "errors" dictionary so they don't
clutter up the main module namespace (after converting them to strings
instead of integers).

-----------------

Here are the new features I have already added.

 * more handlers:

StartElement,
EndElement,
ProcessingInstruction,
CharacterData,
UnparsedEntityDecl,
NotationDecl,
StartNamespaceDecl,
EndNamespaceDecl,
Comment,
StartCdataSection,
EndCdataSection,
Default,

 * new error handling:

setjmp/longjmp is gone
exceptions are propogated properly even on Windows
I believe the new code is thread-safe.

 * ParseFile:

now possible to parse an open file or file-like object.

 * bug fixes:

setattr throws an proper exeption when you do a bad assignment
setjmp/longjmp works on Windows

 * new bugs:

???

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"If I say something, yet it does not fill you with the immediate
burning desire to voluntarily show it to everyone you know, well then,
it's probably not all that important."
    - http://www.bespoke.org/viridian/