Changing filenames from Greeklish => Greek (subprocess complain)
MRAB
python at mrabarnett.plus.com
Fri Jun 7 10:29:25 EDT 2013
On 07/06/2013 12:53, Νικόλαος Κούρας wrote:
[snip]
>
> #========================================================
> # Collect filenames of the path dir as bytes
> greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )
>
> for filename in greek_filenames:
> # Compute 'path/to/filename' in bytes
> greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
> try:
This is a worse way of doing it because the ISO-8859-7 encoding has 1
byte per codepoint, meaning that it's more 'tolerant' (if that's the
word) of errors. A sequence of bytes that is actually UTF-8 can be
decoded as ISO-8859-7, giving gibberish.
UTF-8 is less tolerant, and it's the encoding that ideally you should
be using everywhere, so it's better to assume UTF-8 and, if it fails,
try ISO-8859-7 and then rename so that any names that were ISO-8859-7
will be converted to UTF-8.
That's the reason I did it that way in the code I posted, but, yet
again, you've changed it without understanding why!
> filepath = greek_path.decode('iso-8859-7')
>
> # Rename current filename from greek bytes --> utf-8 bytes
> os.rename( greek_path, filepath.encode('utf-8') )
> except UnicodeDecodeError:
> # Since its not a greek bytestring then its a proper utf8 bytestring
> filepath = greek_path.decode('utf-8')
>
[snip]
More information about the Python-list
mailing list