Which one is the best XML-parser?
Marko Rauhamaa
marko at pacujo.net
Fri Jun 24 09:16:27 EDT 2016
Random832 <random832 at fastmail.com>:
> You know what would be really nice? A "semi-incremental" parser that
> can e.g. yield (whether through an event or through the iterator
> protocol) a fully formed element (preferably one that can be queried
> with xpath) at a time for each record of a document representing a
> list of objects. Does anything like that exist?
You can construct that from a SAX parser, but it's less convenient than
it could be. Python's JSON parser doesn't have it so I've had to build a
clumsy one myself:
def decode_json_object_array(self):
# A very clumsy implementation of an incremental JSON decoder
it = self.get_text()
inbuf = ""
while True:
try:
inbuf += next(it)
except StopIteration:
# a premature end; trigger a decode error
json.loads("[" + inbuf)
try:
head, tail = inbuf.split("[", 1)
except ValueError:
continue
break
# trigger a decode error if head contains junk
json.loads(head + "[]")
inbuf = ""
chunk = tail
while True:
bracket_maybe = ""
for big in chunk.split("]"):
comma_maybe = ""
for small in big.split(","):
inbuf += comma_maybe + small
comma_maybe = ","
try:
yield json.loads(inbuf)
#except json.JSONDecodeError:
except ValueError: # legacy exception
pass
else:
inbuf = comma_maybe = ""
inbuf += bracket_maybe
bracket_maybe = "]"
try:
yield json.loads(inbuf)
#except json.JSONDecodeError:
except ValueError: # legacy exception
pass
else:
inbuf = ""
try:
chunk += next(it)
except StopIteration:
break
# trigger a decode error if chunk contains junk
json.loads("[" + chunk)
It could easily be converted to an analogous XML parser.
Marko
More information about the Python-list
mailing list