_XSLTResultTree.write_output is documented to write to file but it writes to URI

Hello, the documentation of _XSLTResultTree.write_output states write_output(self, file, *, compression=0) Serialise the XSLT output to a file or file-like object. As opposed to the generic .write() method, .write_output() serialises the result as defined by the <xsl:output> tag. The problem is that it's actually not file which is being written here, even if plain string is passed. The code in write_output is if _isString(file): file_path = _encodeFilename(file) c_filename = _cstr(file_path) with nogil: r = xslt.xsltSaveResultToFilename( c_filename, doc._c_doc, self._xslt._c_style, c_compression) and xsltSaveResultToFilename is documented at http://xmlsoft.org/XSLT/html/libxslt-xsltutils.html#xsltSaveResultToFilename as int xsltSaveResultToFilename(const char * URL, xmlDocPtr result, xsltStylesheetPtr style, int compression) Save the result @result obtained by applying the @style stylesheet to a file or @URL URL: a filename or URL The net effect is that the filename passed to _XSLTResultTree.write_output is URI-unescaped, potentiall resulting in different file being written. Consider the following small change to the test suite: diff --git a/src/lxml/tests/test_xslt.py b/src/lxml/tests/test_xslt.py index 96eb83ee..3e65bde5 100644 --- a/src/lxml/tests/test_xslt.py +++ b/src/lxml/tests/test_xslt.py @@ -170,12 +170,12 @@ class ETreeXSLTTestCase(HelperTestCase): else: self.assertTrue(False, "exception not raised") - def test_xslt_write_output_file(self): + def test_xslt_write_output_file_escape(self): with self._xslt_setup() as res: - f = NamedTemporaryFile(delete=False) + f = NamedTemporaryFile(prefix="tmp%2ejezek", delete=False) try: try: - res[0].write_output(f) + res[0].write_output(f.name) finally: f.close() with io.open(f.name, encoding='UTF-16') as f: Instead of passing in the file-like object, we pass in the file name, which would be something like /tmp/tmp%2fjezek8chac7f7. However, the tests then start to fail with Doctest: xpathxslt.txt ====================================================================== FAIL: test_xslt_write_output_file_escape (lxml.tests.test_xslt.ETreeXSLTTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor yield File "/usr/lib64/python3.7/unittest/case.py", line 632, in run testMethod() File "/home/adelton/project/lxml/src/lxml/tests/test_xslt.py", line 184, in test_xslt_write_output_file_escape os.unlink(f.name) File "/usr/lib64/python3.7/contextlib.py", line 119, in __exit__ next(self.gen) File "/home/adelton/project/lxml/src/lxml/tests/test_xslt.py", line 132, in _xslt_setup self.assertEqual(expected, data[0].replace('\n', '')) File "/usr/lib64/python3.7/unittest/case.py", line 856, in assertEqual assertion_func(first, second, msg=msg) File "/usr/lib64/python3.7/unittest/case.py", line 1237, in assertMultiLineEqual self.fail(self._formatMessage(msg, standardMsg)) File "/usr/lib64/python3.7/unittest/case.py", line 697, in fail raise self.failureException(msg) AssertionError: '<?xml version="1.0" encoding="UTF-16"?><foo>\uf8d2</foo>' != '' - <?xml version="1.0" encoding="UTF-16"?><foo></foo> + ---------------------------------------------------------------------- Ran 1854 tests in 13.498s FAILED (failures=1) make: *** [Makefile:74: test_inplace3] Error 1 because file /tmp/tmp.jezek8chac7f7 was created instead. The string is treated as URI-encoded, gets decoded by libxslt, and different filename than the one that was passed to write_output is used. This can also cause the file to be written to different directory, if filename /the/path/%2e/file is used -- the file will end up written as /the/path/file instead of being in the /the/path/%2e/ subdirectory. With diff --git a/src/lxml/tests/test_xslt.py b/src/lxml/tests/test_xslt.py index 96eb83ee..bb6762f8 100644 --- a/src/lxml/tests/test_xslt.py +++ b/src/lxml/tests/test_xslt.py @@ -170,12 +170,12 @@ class ETreeXSLTTestCase(HelperTestCase): else: self.assertTrue(False, "exception not raised") - def test_xslt_write_output_file(self): + def test_xslt_write_output_file_escape(self): with self._xslt_setup() as res: - f = NamedTemporaryFile(delete=False) + f = NamedTemporaryFile(prefix="krtek/%2e/jezek", delete=False) try: try: - res[0].write_output(f) + res[0].write_output(f.name) finally: f.close() with io.open(f.name, encoding='UTF-16') as f: and precreating the /tmp/krtek/%2e/ directory before running make test3, the test fails as well, and /tmp/krtek/jezekute41cmk is left behind because the os.unlink(f.name) at the end of the test removed /tmp/krtek/%2e/jezekute41cmk but the actual output file was different. One way to address the issue would be to properly escape the filename before passing it to xsltSaveResultToFilename. Another possibility would be to use xsltSaveResultToFile but that would mean that compression would need to be done by lxml. If any of those don't seem viable, at least the documentation of write_output should state that the serialization happens to "URI or file-like object", not to "file or file-like object", with some discussion about the escaping behaviour. -- Jan Pazdziora Senior Principal Software Engineer, Security Engineering, Red Hat

Hi! Jan Pazdziora schrieb am 26.03.19 um 11:18:
Interesting. :) Thanks for reporting this, and for the tests. Looking through that code, I noticed a couple of things: 1) The write_output() method was leaking the output buffer when passing a file(-like) object instead of a filename. Just fixed in lxml 4.3.3. 2) The URL unescaping does not happen in xsltSaveResultToFilename() but in xmlOutputBufferCreateFilename(). And only for strings that look like file names, not for URLs. And since someone else apparently had the impression that this might not seem like an entirely smart idea, there's a comment "try to limit the damages of the URI unescaping code" in libxml2, right before calling the unescaping function. 8o| What a lovely reminder that "programming with libxml2 is like the thrilling embrace of an exotic stranger." lxml uses xmlOutputBufferCreateFilename() in all places where user code wants to write to a file path, so this does not only affect XSLT but lxml's serialisation API in general. That's fairly annoying. Since this allows overwriting files in unexpected directories, this is close to security relevant – although any code that writes to arbitrary user provided file locations probably either already takes a high risk, or actually writes to the place that the user wanted. So … I guess the average computer these days has a couple of security holes that are a lot easier to exploit than this. Anyway, here's a fix: https://github.com/lxml/lxml/commit/0245aba002f069a0b157282707bdf77418d1b5be There are currently test failures on Windows (what a surprise!). I'll look into them soonishly. I'm currently planning to release it as part of lxml 4.4 rather than a bug-fix release, because it is a behavioural change with a (slight) potential of breaking code unnoticed. I'll reconsider this if others convince me that the security aspect is more relevant than I see now. Stefan

On Tue, Mar 26, 2019 at 06:37:29PM +0100, Stefan Behnel wrote:
Anyway, here's a fix:
https://github.com/lxml/lxml/commit/0245aba002f069a0b157282707bdf77418d1b5be
Great, thanks for looking into this and for the superfast response. -- Jan Pazdziora Senior Principal Software Engineer, Security Engineering, Red Hat

Hi! Jan Pazdziora schrieb am 26.03.19 um 11:18:
Interesting. :) Thanks for reporting this, and for the tests. Looking through that code, I noticed a couple of things: 1) The write_output() method was leaking the output buffer when passing a file(-like) object instead of a filename. Just fixed in lxml 4.3.3. 2) The URL unescaping does not happen in xsltSaveResultToFilename() but in xmlOutputBufferCreateFilename(). And only for strings that look like file names, not for URLs. And since someone else apparently had the impression that this might not seem like an entirely smart idea, there's a comment "try to limit the damages of the URI unescaping code" in libxml2, right before calling the unescaping function. 8o| What a lovely reminder that "programming with libxml2 is like the thrilling embrace of an exotic stranger." lxml uses xmlOutputBufferCreateFilename() in all places where user code wants to write to a file path, so this does not only affect XSLT but lxml's serialisation API in general. That's fairly annoying. Since this allows overwriting files in unexpected directories, this is close to security relevant – although any code that writes to arbitrary user provided file locations probably either already takes a high risk, or actually writes to the place that the user wanted. So … I guess the average computer these days has a couple of security holes that are a lot easier to exploit than this. Anyway, here's a fix: https://github.com/lxml/lxml/commit/0245aba002f069a0b157282707bdf77418d1b5be There are currently test failures on Windows (what a surprise!). I'll look into them soonishly. I'm currently planning to release it as part of lxml 4.4 rather than a bug-fix release, because it is a behavioural change with a (slight) potential of breaking code unnoticed. I'll reconsider this if others convince me that the security aspect is more relevant than I see now. Stefan

On Tue, Mar 26, 2019 at 06:37:29PM +0100, Stefan Behnel wrote:
Anyway, here's a fix:
https://github.com/lxml/lxml/commit/0245aba002f069a0b157282707bdf77418d1b5be
Great, thanks for looking into this and for the superfast response. -- Jan Pazdziora Senior Principal Software Engineer, Security Engineering, Red Hat
participants (2)
-
Jan Pazdziora
-
Stefan Behnel