Changing filenames from Greeklish => Greek (subprocess complain)

Cameron Simpson cs at zip.com.au
Sat Jun 8 18:32:58 EDT 2013


On 08Jun2013 14:14, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
| Τη Σάββατο, 8 Ιουνίου 2013 10:01:57 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
| > ASCII actually needs 7 bits to store a character. Since computers are  
| > optimized to work with bytes, not bits, normally ASCII characters are
| > stored in a single byte, with one bit wasted.
| 
| So ASCII and Unicode are 2 Encoding Systems currently in use.
| How should i imagine them, visualize them?
| Like tables 'A' = 65, 'B' = 66 and so on?

Yes, that works.

| But if i do then that would be the visualization of a 'charset' not of an encoding system.
| What the diffrence of an encoding system and of a charset?

And encoding system is the method or transcribing these values to bytes and back again.

| ebcdic - ascii - unicode = al of them are encoding systems
| greek-iso - latin-iso - utf8 - utf16 = all of them are charsets.

No.

EBCDIC and ASCII and Unicode and Greek-ISO (iso-8859-7) are all character sets.
(1:1 mappings of characters to numbers/ordinals).

And encoding is a way of writing these values to bytes.
Decoding reads bytes and emits character values.

Because all of EBCDIC, ASCII and the iso-8859-x characters sets fit in the range 0-255,
they are usually transcribed (encoded) directly, one byte per ordinal.

Unicode is much larger. It cannot be transcribed (encoded) as one bytes to one value.
There are several ways of transcribing Unicode. UTF-8 is a popular and usually compact form,
using one byte for values below 128 and and multiple bytes for higher values.

| Why python interprets by default all given strings as unicode and
| not ascii? because the former supports many positions while ascii
| only 127 positions , hence can interpet only 127 different characters?

Yes.

[...]
| > Latin-1 is similar, except there are 256 positions. Greek ISO-8859-7 is 
| > also similar, also 256 positions, but the characters are different. And 
| > so on, with dozens of charsets. 
| 
| Latin has to display english chars(capital, small) + numbers + symbols. that would be 127 why 256?

ASCII runs up to 127. Essentially English, numerals, control codes and various symbols.

The iso-8859-x sets run to 255, and the upper 128 values map to
characters popular in various regions.

| greek = all of the above plus greek chars, no?

So iso-8859-7 included the Greek characters.

| > And then there is Unicode, which includes *every* character is all of 
| > those dozens of charsets. It has 1114111 positions (most are currently  
| > unfilled).
| 
| Shouldt the positions that Unicode has to use equal to the summary
| of all available characters of all the languages of the worlds plus
| numbers and special chars? why 1.000.000+ why the need for so many
| positions? Narrow Unicode format (2 byted) can cover all ofmthe
| worlds symbols.

2 bytes is not enough. Chinese alone has more glyphs than that.

| > An encoding is simply a program that takes a character and returns a 
| > byte, or visa versa. For instance, the ASCII encoding will take character 
| > 'A'. That is found at position 65, which is 0x41 in hexadecimal, so the 
| > ASCII encoding turns character 'A' into byte 0x41, and visa versa.
| 
| Why you say ASCII turn a character into HEX format and not as in binary format?

Steven didn't say that. He said "position 65". People often write
bytes in hex (eg 0x41) because a byte always fits in a 2-character
hex (16 x 16) and because often these values have binary-based
subranges, and hex makes that more obvious.

For example, 'A' is 0x41. 'a' is 0x61. So you can look at the hex
code and almost visually know if you're dealing with upper or lower
case, etc.

| Isnt the latter the way bytes are stored into hdd, like 010101111010101 etc?
| Are they stored as hex instead or you just said so to avoid printing 0s and 1s?

They're stored as bits at the gate level. But writing hex codes
_in_ _text_ is more compact, and more readable for humans.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

A lot of people don't know the difference between a violin and a viola, so
I'll tell you.  A viola burns longer.   - Victor Borge



More information about the Python-list mailing list