[Python-Dev] Should ftplib use UTF-8 instead of latin-1 encoding?

Roumen Petrov bugtrack at roumenpetrov.info
Fri Jan 23 23:21:59 CET 2009


rdmurray at bitdance.com wrote:
> On Fri, 23 Jan 2009 at 21:23, "Martin v. L�wis" wrote:
>>> Given that a Unix OS can't know what encoding a filename is in (*),
>>> I can't see that one could practically implement a Unix FTP server
>>> in any other way.
>>
>> However, an ftp server is different. It might start up with an empty
>> folder, and receive *all* of its files through upload. Then it can
>> certainly know what encoding the file names have on disk. It *could*
>> also support operation on pre-existing files, e.g. by providing a
>> configuration directive telling the encoding of the file names, or
>> by ignoring all file names that are not encoded in UTF-8.
> 
> I don't see how starting with an empty directory helps.  The filename
> comes from the client, and the FTP server can't know what the actual
> encoding of that filename is.

Exactly, only client can do filename conversion. May be ftplib could be 
extended to know encoding on filenames on local and remote system based 
on some user settings. May be ftplib could use UTF-8 or UCS-2/4 to store 
  internally filename but direct conversion is may be faster. It the 
last case filename is a byte sequence.

Roumen


More information about the Python-Dev mailing list