[Tutor] Re Problems with creating XML-documents

Lie Ryan lie.1296 at gmail.com
Thu Apr 15 09:57:03 CEST 2010

On 04/15/10 16:03, Karjer Jdfjdf wrote:
> When I try to parse the outputfile it creates different errors such as:
>    * ExpatError: not well-formed (invalid token):

That error message is telling you that you're not parsing an XML file,
merely an XML-like file.

> Basically it ususally has something to do with not-well-formed XML. 
> Unfortunately the Java-program also alters the content on essential 
> points such as inserting spaces in tags (e.g. id="value" to id = " value " ),
> which makes it even harder. The Java is really a b&%$#!, but I have
> no alternatives because it is custommade (but very poorly imho).

The bug is in the program generating the XML, it is much easier to fix
it in Java side than trying to parse broken XML.

> Sorry, I've nog been clear, but it's very difficult and frustrating for 
> me to troubleshoot this properly because the Java-program is quite huge and *takes a long time to load before doing it's actions and when running
> also requires a lot of time. The minimum is about 10 minutes per run.

10 minutes per-run is instant compared to writing a parser for invalid
XML. And you can cut that 10 minutes short by using a smaller database
for testing purpose.

>>>/      text = str('<record id="' + str(instance.id)+ '">\n' + \
> /' <date>' + str(instance.datetime) + ' </date>\n' + \
> ' <order>' + instance.order + ' </order>\n' + \
> '</record>\n')
>>You can simplify this quite a lot. You almost certaionly don;t need 
>>the outer str() and you probably don;t need
>   the \ characters either.
> I use a very simplified text-variable here. In reality I also include 
> other fields which contain numeric values as well. I use the
>  \ to
> keep each XML-tag on a seperate line to keep the overview.

He means you can simplify it by using string interpolation:

text = '''
<record id="%s">
 <date>%s </date>
 <order>%s </order>
''' % (instance.id, instance.datetime, instance.order)

>>Also it might be easier to use a triple quoted string and format 
>>characters to insert the dasta values.
>>>/ When I try to parse it, it keeps giving errors. 
> /
>>Why do you need to parse it if you are creating it?
>>Or is this after you read it back later? I don't understand the 
>>sequence of processing here.
>>>/ So I tried to use an external library jaxml, 
> /
>>Did you try to use the standard library tools that come with Python, 
>>like elementTree or even sax?
> I've been trying to do this with minidom, but I'm not sure if this 
> is the right solution because I'm pretty unaware of XML-writing/parsing
> At the moment I'm 
>  tempted to do a line-by-line parse and trigger on
> an identifier-string that identifies the end and start of a record. 
> But that way I'll never learn XML.

Why bother writing an XML-like parser when you can fix the generating
program? And my recommendation, if you want to learn XML, learning to
write xHTML Strict first (with a plain text editor! not some
RealityWeaver or BackPage) is IMHO probably the easiest route
(especially if you already know some HTML).

More information about the Tutor mailing list