utf8 and ftplib

Richard Lewis richardlewis at fastmail.co.uk
Thu Jun 16 17:31:44 CEST 2005

Hi there,

I'm having a problem with unicode files and ftplib (using Python 2.3.5).

I've got this code:

xml_source = codecs.open("foo.xml", 'w+b', "utf8")
#xml_source = file("foo.xml", 'w+b')

ftp.retrbinary("RETR foo.xml", xml_source.write)
#ftp.retrlines("RETR foo.xml", xml_source.write)

It opens a new local file using utf8 encoding and then reads from a file
on an FTP server (also utf8 encoded) into that local file. It comes up
with an error, however, on calling the xml_source.write callback (I
think) saying that:

"File "myscript.py", line 75, in get_content
  ftp.retrbinary("RETR foo.xml", xml_source.write)
File "/usr/lib/python2.3/ftplib.py", line 384, in retrbinary
File "/usr/lib/python2.3/codecs.py", line 400, in write
  return self.writer.write(data)
File "/usr/lib/python2.3/codecs.py", line 178, in write
  data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 76:
ordinal not in range(128)"

I've tried using both the commented lines of code in the above example
(i.e. using file() instead of codecs.open() and retlines() instead of
retbinary()). retlines() makes no difference, but if I use file()
instead of codecs.open() I can open the file, but the extended
characters from the source file (e.g. foreign characters, copyright
symbol, etc.) all appear with an extra character in front of them
(because of the two char width in utf8?).

Is the xml_source.write callback causing the problem here? Or is it
something else? Is there any way that I can correctly retrieve a utf8
encoded file from an FTP server?


