[XML-SIG] Changes in pyexpat.c

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Wed, 27 Sep 2000 13:56:27 -0600


> > We do use it in 4DOM (output encodings), but we can just mandate
> > PyXML 0.5.5.1 until the Py 2.0 situation is sorted out.
> 
> Looking at the 4DOM copy that is currently in the PyXML CVS, I can't
> find the place where that is used. Can you give me a specific pointer?

Probably because the checked-in 4DOM is out of date.  We've hesitated checking 
in the 4Suite 0.9.x version because of all the flux and not wanting to 
contribute to the confusion (and not being sure whether we had much bandwidth 
to help sort out any resulting confusion).

However, it's time to do the right thing, so...

Do we check the latest 4DOM and back-port the output encoding stuff to PyXML 
(it's all in ext/Printer.py)  I haven't had a chance to play with Python 2.0, 
so I'm not sure how hard the port would be.  Here is the representative 
snippet from ext/Printer.py

from xml.unicode.iso8859 import wstring
wstring.install_alias('ISO-8859-1', 'ISO_8859-1:1987')

#[snip]

#Note: UCS-2 only for now
def TranslateCdata(characters, encoding='UTF-8', prev_chars='', markupSafe=0):
    if not characters:
        return ''
    if not markupSafe:
        new_string, num_subst = re.subn(
            g_cdataCharPattern,
            lambda m, d=g_charToEntity: d[m.group()],
            characters
            )
        if prev_chars[-2:] == ']]' and characters[0] == '>':
            new_string = '>' + new_string[1:]
    else:
        new_string = characters
    #Note: use decimal char entity rep because some browsers are broken
    #FIXME: This will bomb for high characters.  Should, for instance, detect
    #The UTF-8 for 0xFFFE and put out 
    new_string, num_subst = re.subn(XML_ILLEGAL_CHAR_PATTERN, lambda m: 
'&#%i;'%ord(m.group()), new_string)
    encoding = string.upper(encoding)
    if encoding == 'UTF-8':
        pass
    else:
        #Note: Pass through to wstrop.  This means we don't play nice and
        #Escape characters that are not in the target encoding.
        ws = wstring.from_utf8(new_string)
        new_string = ws.encode(encoding)
        #This version would skip all untranslatable chars: see wstrop.c
        #new_string = ws.encode(encoding, 1)
    return new_string




-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python