tests failing due to encoding errors on non-utf-8 system

Hi, I just encountered failing lxml tests on a system with non-UTF-8 locale: (don't get irritated by the overall test case count, I've disabled some tests) ... 1626/1626 (100.0%): txt (xpathxslt) Doctest: xpathxslt.txt ====================================================================== ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod/releases/2.0/lib/python2.7/unittest/case.py", line 331, in run testMethod() File "/var/tmp/hjoukl/BUILD/NEW/64bit/gcc/2014-QX/lxml-3.4.1/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error dn = tempfile.mkdtemp(prefix=dirnameRU) File "/apps/prod/releases/2.0/lib/python2.7/tempfile.py", line 329, in mkdtemp _os.mkdir(file, 0700) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0161' in position 6: ordinal not in range(256) ====================================================================== ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod/releases/2.0/lib/python2.7/unittest/case.py", line 331, in run testMethod() File "/var/tmp/hjoukl/BUILD/NEW/64bit/gcc/2014-QX/lxml-3.4.1/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error dn = tempfile.mkdtemp(prefix=dirnameRU) File "/apps/prod/releases/2.0/lib/python2.7/tempfile.py", line 329, in mkdtemp _os.mkdir(file, 0700) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0161' in position 6: ordinal not in range(256) ---------------------------------------------------------------------- Ran 1626 tests in 56.208s FAILED (errors=2) Skipping tests in lxml.cssselect - external cssselect package is not installed Comparing with ElementTree 1.3.0 TESTED VERSION: 3.4.1 Python: sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) lxml.etree: (3, 4, 1, 0) libxml used: (2, 9, 1) libxml compiled: (2, 9, 1) libxslt used: (1, 1, 28) libxslt compiled: (1, 1, 28) make: *** [test_inplace] Error 1 This is caused by the directory path characters of a temp directory that are not representable in the system's encoding: def test_etree_parse_io_error(self): # this is a directory name that contains characters beyond latin-1 dirnameEN = _str('Directory') dirnameRU = _str('Ã~PÅ¡Ã~P°Ã~Qâ~@~ZÃ~P°Ã~P»Ã~PŸÃ~P³') filename = _str('nosuchfile.xml') dn = tempfile.mkdtemp(prefix=dirnameEN) try: self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename)) finally: os.rmdir(dn) dn = tempfile.mkdtemp(prefix=dirnameRU) try: self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename)) finally: os.rmdir(dn) I'm unsure of what a proper fix for this should look like. Refactor to 2 separate tests and disable the non-ascii dir path test based on sys.getfilesystemencoding ()? Or rather do dn = tempfile.mkdtemp(prefix=dirnameRU.encode('utf-8')) ? Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Holger Joukl schrieb am 03.12.2014 um 11:54:
I just encountered failing lxml tests on a system with non-UTF-8 locale:
(don't get irritated by the overall test case count, I've disabled some tests)
... 1626/1626 (100.0%): txt (xpathxslt) Doctest: xpathxslt.txt ====================================================================== ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod/releases/2.0/lib/python2.7/unittest/case.py", line 331, in run testMethod() File "/var/tmp/hjoukl/BUILD/NEW/64bit/gcc/2014-QX/lxml-3.4.1/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error dn = tempfile.mkdtemp(prefix=dirnameRU) File "/apps/prod/releases/2.0/lib/python2.7/tempfile.py", line 329, in mkdtemp _os.mkdir(file, 0700) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0161' in position 6: ordinal not in range(256)
====================================================================== ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod/releases/2.0/lib/python2.7/unittest/case.py", line 331, in run testMethod() File "/var/tmp/hjoukl/BUILD/NEW/64bit/gcc/2014-QX/lxml-3.4.1/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error dn = tempfile.mkdtemp(prefix=dirnameRU) File "/apps/prod/releases/2.0/lib/python2.7/tempfile.py", line 329, in mkdtemp _os.mkdir(file, 0700) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0161' in position 6: ordinal not in range(256)
---------------------------------------------------------------------- Ran 1626 tests in 56.208s
FAILED (errors=2) Skipping tests in lxml.cssselect - external cssselect package is not installed Comparing with ElementTree 1.3.0
TESTED VERSION: 3.4.1 Python: sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) lxml.etree: (3, 4, 1, 0) libxml used: (2, 9, 1) libxml compiled: (2, 9, 1) libxslt used: (1, 1, 28) libxslt compiled: (1, 1, 28)
make: *** [test_inplace] Error 1
This is caused by the directory path characters of a temp directory that are not representable in the system's encoding:
def test_etree_parse_io_error(self): # this is a directory name that contains characters beyond latin-1 dirnameEN = _str('Directory') dirnameRU = _str('Ã~PÅ¡Ã~P°Ã~Qâ~@~ZÃ~P°Ã~P»Ã~PŸÃ~P³') filename = _str('nosuchfile.xml') dn = tempfile.mkdtemp(prefix=dirnameEN) try: self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename)) finally: os.rmdir(dn) dn = tempfile.mkdtemp(prefix=dirnameRU) try: self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename)) finally: os.rmdir(dn)
I'm unsure of what a proper fix for this should look like. Refactor to 2 separate tests and disable the non-ascii dir path test based on sys.getfilesystemencoding ()?
If the characters cannot be represented in the filesystem encoding, then the test is invalid, so disabling it under these conditions sounds like the right thing to do. Stefan
participants (2)
-
Holger Joukl
-
Stefan Behnel