[Tutor] slashes in paths

Steven D'Aprano steve at pearwood.info
Tue Jul 23 09:42:19 CEST 2013


On 23/07/13 04:33, Marc Tompkins wrote:
> On Mon, Jul 22, 2013 at 11:30 AM, Jim Mooney <cybervigilante at gmail.com>wrote:
>
>> On 22 July 2013 11:26, Marc Tompkins <marc.tompkins at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> If you haven't already read it, may I suggest Joel's intro to Unicode?
>>> http://www.joelonsoftware.com/articles/Unicode.html
>>>
>>
>> I had a bad feeling I'd end up learning Unicode ;')
>>
>
> It's not as painful as you might think!  Try it - you'll like it!
> Actually, once you start getting used to working in Unicode by default,
> having to deal with programs that are non-Unicode-aware feels extremely
> irritating.


What he said!

Unicode brings order out of chaos. The old code page technology is horrible and needs to die. It was just barely acceptable back in ancient days when files were hardly ever transferred from machine to machine, and even then mostly transferred between machines using the same language. Even so, it didn't work very well -- ask Russians, who had three mutually incapable code pages.


The basics of Unicode are very simple:

- text strings contain characters;

- what is written to disk contains bytes;

- you need to convert characters to and from bytes, regardless of whether you are using ASCII or Unicode or something else;

- the conversion uses a mapping of character to byte(s), and visa versa, called an encoding;

- ASCII is an encoding too, e.g. byte 80 <=> "P";

- use the encode method to go from text to bytes, and decode to go the other way;

- if you don't know what encoding is used, you cannot tell what the bytes actually mean;

- although sometimes you can guess, with a variable level of success.



Remember those rules, and you are three quarters of the way to being an expert.



-- 
Steven


More information about the Tutor mailing list