Size limit of text nodes?
data:image/s3,"s3://crabby-images/d4c59/d4c59ab2629f45fa029ab7aa5d1e5737f6631d46" alt=""
Hello, I just ran into the following error: big-file.xml:3291: parser error : xmlSAX2Characters: huge text node 279ebcd8791394504dc9d4823772baa4bcc942a0871755e1ac3562f0369c69e1e2472dc202cb784a ^ big-file.xml:3291: parser error : Extra content at the end of the document 279ebcd8791394504dc9d4823772baa4bcc942a0871755e1ac3562f0369c69e1e2472dc202cb784a ^ The offending node is one of several like this: <image id=“image-8” class="image">425a68393141592…29c28481da477d780</image> where the content of the node here (i.e. the node.text property) is about 13MB of text :-) Is this an lxml limitation or one of the underlying xml library? Thanks! Jens -- Jens Tröger http://savage.light-speed.de/
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Jens Tröger schrieb am 13.10.2017 um 23:39:
It's a default security restriction in libxml2. Disable it at your own risk. http://lxml.de/parsing.html#parser-options See, for example: https://pypi.python.org/pypi/defusedxml/#attack-vectors Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Jens Tröger schrieb am 13.10.2017 um 23:39:
It's a default security restriction in libxml2. Disable it at your own risk. http://lxml.de/parsing.html#parser-options See, for example: https://pypi.python.org/pypi/defusedxml/#attack-vectors Stefan
participants (2)
-
Jens Tröger
-
Stefan Behnel