What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD (for security reasons)? Best regards Rainer
Rainer Hoerbe schrieb am 07.06.2016 um 18:54:
What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD (for security reasons)?
See https://pypi.python.org/pypi/defusedxml/ Stefan
Thanks. From reading the documentation it is not clear to me to what extent defusedxml can be applied to existing source code as a 1:1 replacement. Does your answer imply that there is no equivalent function in lxml to xerces SecurityManager? - Rainer
Am 08.06.2016 um 07:36 schrieb Stefan Behnel <stefan_ml@behnel.de>:
Rainer Hoerbe schrieb am 07.06.2016 um 18:54:
What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD (for security reasons)?
See
https://pypi.python.org/pypi/defusedxml/
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Rainer Hoerbe schrieb am 08.06.2016 um 07:52:
Am 08.06.2016 um 07:36 schrieb Stefan Behnel: Rainer Hoerbe schrieb am 07.06.2016 um 18:54:
What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD (for security reasons)?
See
Thanks. From reading the documentation it is not clear to me to what extent defusedxml can be applied to existing source code as a 1:1 replacement. Does your answer imply that there is no equivalent function in lxml to xerces SecurityManager?
You don't have to use defusedxml, I posted the link because it has all the details in it. lxml doesn't access any network resources by default, including DTDs. For internal subsets, libxml2 applies reasonable bounds on the content that a DTD is allowed to generate, which counters most attacks. I'm not aware of a way to disable DTD processing completely. But you can disable entity resolution, use incremental parsing, and then check for the existence of a DTD right after the start event of the root element. That's not entirely the same as not allowing any DTD processing at all, but it's just as good when it comes to content generation. For details, see the link above. Stefan
thank you.
Am 11.06.2016 um 07:44 schrieb Stefan Behnel <stefan_ml@behnel.de>:
Rainer Hoerbe schrieb am 08.06.2016 um 07:52:
Am 08.06.2016 um 07:36 schrieb Stefan Behnel: Rainer Hoerbe schrieb am 07.06.2016 um 18:54:
What are the options in lxml to prevent the parser to process DTDs, i.e. reject any XML that contains a DTD (for security reasons)?
See
Thanks. From reading the documentation it is not clear to me to what extent defusedxml can be applied to existing source code as a 1:1 replacement. Does your answer imply that there is no equivalent function in lxml to xerces SecurityManager?
You don't have to use defusedxml, I posted the link because it has all the details in it.
lxml doesn't access any network resources by default, including DTDs. For internal subsets, libxml2 applies reasonable bounds on the content that a DTD is allowed to generate, which counters most attacks.
I'm not aware of a way to disable DTD processing completely. But you can disable entity resolution, use incremental parsing, and then check for the existence of a DTD right after the start event of the root element. That's not entirely the same as not allowing any DTD processing at all, but it's just as good when it comes to content generation. For details, see the link above.
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (2)
-
Rainer Hoerbe
-
Stefan Behnel