Changing filenames from Greeklish => Greek (subprocess complain)

Sun Jun 9 04:55:43 EDT 2013

Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:

> On Sat, 08 Jun 2013 22:09:57 -0700, nagia.retsina wrote:
>
>> chr('A') would give me the mapping of this char, the number 65 while
>> ord(65) would output the char 'A' likewise.
>
> Correct. Python uses Unicode, where code-point 65 ("ordinal value 65") 
> means letter "A".

Actually, that's the other way around:

    >>> chr(65)
    'A'
    >>> ord('A')
    65

>> What would happen if we we try to re-encode bytes on the disk? like
>> trying:
>> 
>> s = "νίκος"
>> utf8_bytes = s.encode('utf-8')
>> greek_bytes = utf_bytes.encode('iso-8869-7')
>> 
>> Can we re-encode twice or as many times we want and then decode back
>> respectively lke?
>
> Of course. Bytes have no memory of where they came from, or what they are 
> used for. All you are doing is flipping bits on a memory chip, or on a 
> hard drive. So long as *you* remember which encoding is the right one, 
> there is no problem. If you forget, and start using the wrong one, you 
> will get garbage characters, mojibake, or errors.

Uhm, no: "encode" transforms a Unicode string into an array of bytes,
"decode" does the opposite transformation. You cannot do the former on
an "arbitrary" array of bytes:

    >>> s = "νίκος"
    >>> utf8_bytes = s.encode('utf-8')
    >>> greek_bytes = utf8_bytes.encode('iso-8869-7')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'bytes' object has no attribute 'encode'

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele at metapensiero.it  |                 -- Fortunato Depero, 1929.