[Tutor] Standardizing on Unicode and utf8

Thorsten Kampe thorsten at thorstenkampe.de
Wed Feb 25 18:44:22 CET 2009


* Dinesh B Vadhia (Fri, 20 Feb 2009 02:52:27 -0800)
> We want to standardize on unicode and utf8

Very good idea.

> and would like to clarify and verify their use to minimize encode
> ()/decode()'ing:
> 
> 1.  Python source files 
> Use the header: # -*- coding: utf8 -*-

Good idea (although only valid for comments and "inline" strings


> 2.  Reading files
> In most cases, we don't know the source encoding of the files being
> read. Do we have to decode('utf8') after reading from file?

No. If you don't know the encoding of the file you can't decode it, of 
course. You can read() it of course, but you can't process it (as text).
 
> 3. Writing files
> We will always write to files in utf8. Do we have to encode('utf8')
> before writing to file?

Yes, sure.

> Is there anything else that we have to consider?

Hm, in general nothing I'm aware of.

Thorsten



More information about the Tutor mailing list