xml processing : too slow...
Alex Martelli
aleax at aleax.it
Thu Jul 25 12:37:54 EDT 2002
Shagshag13 wrote:
>> p.parse('<fict>%s</fict>' % line, 1)
>>
>> should be satisfactory for checking this kind of "sort of
>> well-formedness", unless there are yet more specs as yet
>> unexpressed.
>
> that's why i had done :
>>>> anotherline = '<root>' + line + '</root>'
>>>> p.Parse(anotherline, 1)
> Traceback (most recent call last):
> File "<pyshell#14>", line 1, in ?
> p.Parse(anotherline, 1)
> ExpatError: junk after document element: line 1, column 0
>
> but it still don't work, as much has:
But ARE you making a new parser object p for each line you
have to parse? I don't see the expat.ParserCreate call here.
I've already indicated a few posts ago that you need that.
>>>> p.Parse('<fict>%s</fict>' % line, 1)
> Traceback (most recent call last):
> File "<pyshell#185>", line 1, in ?
> p.Parse('<fict>%s</fict>' % line, 1)
> ExpatError: junk after document element: line 1, column 0
Try with a newly created parser each and evey time, as I said.
>> How would that help you diagnosed e.g.
>> <bah thisis=notvalid>of course not</bah>
>> as not being well formed? This is not well formed because
>> it lacks quotes around an attribute's value. Or:
>> <bah thisis="notvalid">&either</bah>
>> now THIS is not well formed because reference '&either'
>> is not terminated with a semicolon. Etc, etc.
>
> that's right i didn't address this kind of thing... :(
If you need to, then expat is most likely your best bet
(rxp might be another, but I don't know enough about it
to suggest it). If you don't care either way, expat is
probably best anyway. If you HAVE to accept so-called
"XML" that in fact has these or other kinds of
non-well-formedness, it's obviously a different issue.
Alex
More information about the Python-list
mailing list