[Python-Dev] tarfile and unicode filenames in windows

Facundo Batista facundobatista at gmail.com
Thu Jun 8 21:11:06 CEST 2006


I'm working in Windows 2K SP4. I have a directory with non-ascii names
(i.e.: "camión.txt").

I'm trying to tar.bzip it:

    nomdir = sys.argv[1]
    tar = tarfile.open("prueba.tar.bz2", "w:bz2")
    tar.add(nomdir)
    tar.close()

This works ok, even considering that the "ó" in the filename is not
ascii 7-bits.

But then I put a file in that directory that has a more strange name
(one with an "o" and a dash above it): Myō-ō.txt

Here, the tarfile can't find the file. This is the same limitation
that with listdir(), where I have to pass the directory name unicoded,
to the system be able to find it. So:

    nomdir = unicode(sys.argv[1])
    tar = tarfile.open("prueba.tar.bz2", "w:bz2")
    tar.add(nomdir)
    tar.close()

The problem is that when tarfile finds that name, it crashes:

Traceback (most recent call last):
  File "comprim.py", line 8, in ?
    tar.add(nomdir)
  File "C:\python24\lib\tarfile.py", line 1239, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f))
  File "C:\python24\lib\tarfile.py", line 1232, in add
    self.addfile(tarinfo, f)
  File "C:\python24\lib\tarfile.py", line 1297, in addfile
    self.fileobj.write(tarinfo.tobuf())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in
position 8: ordinal not in range(128)

This is because tarinfo.tobuf() creates a unicode object (because it
has the filename on it), and file.write() must have a standard string.

This is a known problem? Shall I post a bug? Couldn't find any
regarding this, and google didn't help here.

Thank you very much!

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/


More information about the Python-Dev mailing list