[lxml-dev] What are the equivalents of nowrite, nomkdir options
Hello all, Firstly thanks for such a great library. For a python module I am worryingly excited reading through the documentation. I have had a play around with etree.* and have almost replace 50% of the code in my project thus far to use this - but one concern remains: I cannot figure out from either the docs, snippets of source files I have read or google how I am meant to prevent XSLT from writing to the file system. Using xsltproc from the command line I can use the "--nowrite" and "--nomkdir" switches and would ideally like to be able to replicate this functionality. Thanks so much, Noah -- "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman
Hi Noah, Noah Slater wrote:
I cannot figure out from either the docs, snippets of source files I have read or google how I am meant to prevent XSLT from writing to the file system.
Using xsltproc from the command line I can use the "--nowrite" and "--nomkdir" switches and would ideally like to be able to replicate this functionality.
I guess you're referring to the security framework in libxslt: http://xmlsoft.org/XSLT/html/libxslt-security.html This is not currently wrapped in lxml. I do not know what exactly it is meant to do, though. How can you create files in the current XSLT implementation? If you can find a case where having this left out represents a security risk, we may consider wrapping that part of the API to close it. Stefan
Hi Stefan, Thanks for the reply!
I guess you're referring to the security framework in libxslt:
Looks like that could be the thing, though I wouldn't know for sure as am not familiar with the underlying API to libxslt.
I do not know what exactly it is meant to do, though. How can you create files in the current XSLT implementation?
I think it may be an extension to XSLT that libxslt implements. I use this when I am chunking my DocBook documents. See: http://www.sagehill.net/docbookxsl/Chunking.html The DocBook stylesheets generate multiple files and will create them (and the dirs) if necessary.
If you can find a case where having this left out represents a security risk, we may consider wrapping that part of the API to close it.
Yes I have. My application accepts arbitrary XSLT files from users to transform content. While my application does not chunk output in the manner described above I have tested it with a stylesheet that chunks output and the lxml binding do in fact create the files as I would expect with the libxslt bindings. They do not however create directories, which is inconsistent with the standard API. Either way I would like to be able to disable this as it opens up the possibility for users to write arbitrary files to the file system. I hope this better explains things. Thanks, Noah -- "Creativity can be a social contribution, but only in so far as society is free to use the results." - R. Stallman
Hi Noah, Noah Slater wrote:
I guess you're referring to the security framework in libxslt: http://xmlsoft.org/XSLT/html/libxslt-security.html
Looks like that could be the thing, though I wouldn't know for sure as am not familiar with the underlying API to libxslt.
I do not know what exactly it is meant to do, though. How can you create files in the current XSLT implementation?
I think it may be an extension to XSLT that libxslt implements. I use this when I am chunking my DocBook documents. See:
http://www.sagehill.net/docbookxsl/Chunking.html
The DocBook stylesheets generate multiple files and will create them (and the dirs) if necessary.
I looked through that a bit. It seems to use EXSLT:document() and these things, but I wonder why that works in 0.9.2 (which I assume you tested it with?). Anyway, this is pretty much untested functionality and not currently expected to work in any sensible way.
My application accepts arbitrary XSLT files from users to transform content.
We already had a discussion recently about this-not-being-a-good-idea as you cannot easily prevent the stylesheet from eating up your CPU cycles. XSLT is turing-complete, so you can use it to find prime-factors, search for ET (no pun intended), etc., even if you manage to keep it from filling up your hard disk or reading your password files.
While my application does not chunk output in the manner described above I have tested it with a stylesheet that chunks output and the lxml binding do in fact create the files as I would expect with the libxslt bindings. They do not however create directories, which is inconsistent with the standard API.
Not really inconsistent, as this is an API of xsltproc, not libxslt. It rather should not do that at all...
Either way I would like to be able to disable this as it opens up the possibility for users to write arbitrary files to the file system.
I gave it a try and implemented a new API for that. Look at the bottom of http://codespeak.net/svn/lxml/branch/xslt-access-control/doc/resolvers.txt to see how to use it. Note that the second part (everything below "BROKEN FROM HERE") does not currently work, likely due to problems with libxslt. If I can't get that working, it will be removed. Stefan
Hi again, Stefan Behnel wrote:
I guess you're referring to the security framework in libxslt: http://xmlsoft.org/XSLT/html/libxslt-security.html
I gave it a try and implemented a new API for that. Look at the bottom of http://codespeak.net/svn/lxml/branch/xslt-access-control/doc/resolvers.txt to see how to use it. Note that the second part (everything below "BROKEN FROM HERE") does not currently work, likely due to problems with libxslt. If I can't get that working, it will be removed.
I removed the more fine-grained control mechanisms as they are redundant with (and less general than) the custom resolver support. The branch is merged into the trunk now. See the bottom of http://codespeak.net/svn/lxml/trunk/doc/resolvers.txt for an explanation. There is also a doctest on this. Noah, I'd be glad if you could test it and report back if it works as expected. I do not know if the directory creation business works now (i.e. if directories /are/ created). Martijn, this should also finally answer your question about XSLT security. Regards, Stefan
Hi Martijn, Martijn Faassen wrote:
Stefan Behnel wrote: [snip]
Martijn, this should also finally answer your question about XSLT security.
Cool! I was following this thread; seems there was something in my wondering about XSLT security after all.
True. Now that we have XSLTAccessControl, I will also enable the remaining libxslt extra features. The current trunk does not currently enable the output elements "output", "write" and "document", and also not the debug element. If the access control works as expected, enabling them should not do any harm, as long as the user takes care to disable file access if necessary. Regards, Stefan
Hello Stefan, Wednesday, May 17, 2006, 6:10:11 AM, you wrote:
Definitely. Actually, that's already fixed on the trunk due to a different change a while ago. I didn't even know this bug existed, otherwise I would have applied it to the 0.9 branch also... Oh, I was not using trunk, thanks.
Another thing related to this:
In most of its API functions, ElementTree raises an AssertionError on None, while lxml raises a TypeError. I'll change a couple of other places to make it consistent. That breaks ElementTree compatibility a bit more, but I think no one should rely on code raising an AssertionError when wrong argument types are passed... Yes, this is something unpythonic, too - Python raises TypeError just as you implemented:
float(None) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: float() argument must be a string or a number
Something that could be done to keep compatibility with both models is using a derived exception such as (I know the name is terrible): class LXMLInvalidArgument(TypeError, AssertionError): pass Or we could ask Fedrik if he intends to change it on ElementTree...
Sure, good idea. Actually, lxml 1.0 will have even more. You can ask it for the versions of libxml2 and libxslt that it was compiled with and that it runs with. All versions are represented as int tuples so that you don't have to parse a string to find out if you're running version X.X or later. Nice: this is actually what pybsddb interface does, too.
But I copied the lxml version string also to __version__, I think that's a sufficiently common place to look for it. Sure, that is a good idea.
-- Best regards, Steve mailto:howe@carcass.dhs.org
participants (4)
-
Martijn Faassen
-
Noah Slater
-
Stefan Behnel
-
Steve Howe