hex dump w/ or w/out utf-8 chars

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jul 9 09:00:02 CEST 2013


On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote:

> Not using python 3, for me (a programmer which was present at the
> beginning of computer science, badly interacting with many languages
> from assembler to Fortran and from c to Pascal and so on) it was an hard
> job to arrange the abrupt transition from characters only equal to bytes

Characters have *never* been equal to bytes. Not even Perl treats the 
character 'A' as equal to the byte 0x0A:

if (0x0A eq 'A') {print "Equal\n";}
else {print "Unequal\n";}

will print Unequal, even if you replace "eq" with "==". Nor does Perl 
consider the character 'A' equal to 65.

If you have learned to think of characters being equal to bytes, you have 
learned wrong.


> to some special characters defined with 2, 3 bytes and even more. I
> should have preferred another solution... but i'm not Guido....!

What's a special character?

To an Italian, the characters J, K, W, X and Y are "special characters" 
which do not exist in the ordinary alphabet. To a German, they are not 
special, but S is special because you write SS as ß, but only in 
lowercase.

To a mathematician, σ is just as ordinary as it would be to a Greek; but 
the mathematician probably won't recognise ς unless she actually is 
Greek, even though they are the same letter.

To an American electrician, Ω is an ordinary character, but ω isn't.

To anyone working with angles, or temperatures, the degree symbol ° is an 
ordinary character, but the radian symbol is not. (I can't even find it.)

The English have forgotten that W used to be a ligature for VV, and 
consider it a single ordinary character. But the ligature Æ is considered 
an old-fashioned way of writing AE.

But to Danes and Norwegians, Æ is an ordinary letter, as distinct from AE 
as TH is from Þ. (Which English used to have.) And so on... 

I don't know what a special character is, unless it is the ASCII NUL 
character, since that terminates C strings.



-- 
Steven



More information about the Python-list mailing list