Changing filenames from Greeklish => Greek (subprocess complain)
Νικόλαος Κούρας
nikos.gr33k at gmail.com
Sun Jun 9 05:08:48 EDT 2013
Τη Κυριακή, 9 Ιουνίου 2013 11:55:43 π.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
>
>
>
> > On Sat, 08 Jun 2013 22:09:57 -0700, nagia.retsina wrote:
>
> >
>
> >> chr('A') would give me the mapping of this char, the number 65 while
>
> >> ord(65) would output the char 'A' likewise.
>
> >
>
> > Correct. Python uses Unicode, where code-point 65 ("ordinal value 65")
>
> > means letter "A".
>
>
>
> Actually, that's the other way around:
>
>
>
> >>> chr(65)
>
> 'A'
>
> >>> ord('A')
>
> 65
>
>
>
> >> What would happen if we we try to re-encode bytes on the disk? like
>
> >> trying:
>
> >>
>
> >> s = "νίκος"
>
> >> utf8_bytes = s.encode('utf-8')
>
> >> greek_bytes = utf_bytes.encode('iso-8869-7')
>
> >>
>
> >> Can we re-encode twice or as many times we want and then decode back
>
> >> respectively lke?
>
> >
>
> > Of course. Bytes have no memory of where they came from, or what they are
>
> > used for. All you are doing is flipping bits on a memory chip, or on a
>
> > hard drive. So long as *you* remember which encoding is the right one,
>
> > there is no problem. If you forget, and start using the wrong one, you
>
> > will get garbage characters, mojibake, or errors.
>
>
>
> Uhm, no: "encode" transforms a Unicode string into an array of bytes,
>
> "decode" does the opposite transformation. You cannot do the former on
>
> an "arbitrary" array of bytes:
>
>
>
> >>> s = "νίκος"
>
> >>> utf8_bytes = s.encode('utf-8')
>
> >>> greek_bytes = utf8_bytes.encode('iso-8869-7')
>
> Traceback (most recent call last):
>
> File "<stdin>", line 1, in <module>
>
> AttributeError: 'bytes' object has no attribute 'encode'
So something encoded into bytes cannot be re-encoded to some other bytes.
How about a string i wonder?
s = "νίκος"
what_are these_bytes = s.encode('iso-8869-7').encode(utf-8')
More information about the Python-list
mailing list