[Tutor] name shortening in a csv module output

Alan Gauld alan.gauld at btinternet.com
Fri Apr 24 10:54:06 CEST 2015


On 24/04/15 03:46, Steven D'Aprano wrote:

>> Early text encodings all worked in a single byte
>> which is limited to 256 patterns.
>
> Oh it's much more complicated than that!

Note I said *in* a single byte, ie they were all 8 bits or less.

> *seven bits*, not even a full byte. It was seven bits so that there was
> an extra bit available for error correction when sending over telex or
> some such thing.

Telex actually uses (its still in active use) its own alphabet
(ITA-2) which was a 5 bit pattern based on the old Murray code.
This meant there were 32 patterns, not enough for letters and
numbers or other symbols so there were two sets of meanings to
each pattern and a shift pattern to switch between them (which is
why we have SHIFT keys on modern keyboards). Even so, lower-case
letters were considered a luxury and didn't exist!

It also had separate characters for line-feed and carriage-return.
The keys for these were arranged vertically above each other
about where the Enter key on the number keypad is today. The
operator did a newline by stroking down over both keys sequentially.
That's why the enter key is long and thin today.(although I
don't know why the plus is that shape!) - I learned to type
on a Telex machine! :-)

> (There were other encodings older than ASCII, like EBCDIC, which was
> used on IBM mainframes and is either 7 or 8 bits.)

EBCDIC and ASCII both debuted in 1963. EBCDIC is 8 bit but
has terrible sorting semantics. (I briefly programmed on
OS/390 for the millenium bug panic :-( )

>> Now the simple thing to do would be just have one enormous character
>> set that covers everything. That's Unicode 32 bit encoding.
>
> No it isn't :-)

OK, I tried to simplify and confused the coding with the character set.
You are of course correct! :-)

> Historically, the UCS started at 16 bits, it may have reserved the full
> 32 bits, but it is now guaranteed to use no more than 21 bits U+10FFFF
> will be the highest code point forever.

Ah, this is new to me. I assumed they could use the full 32 bits.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list