[docs] clarity in unicode howto

Shane Cousins shane.cousins at gmail.com
Wed May 22 06:11:18 CEST 2013


I'm coming from python 2 and just peeking into the python 3 docs.
I thought I had a pretty good grasp of how text encoding is handled, but in
reading
http://docs.python.org/3.3/howto/unicode.html#reading-and-writing-unicode-dataI
was a little confused.

It shows in the example:

Reading Unicode from a file is therefore simple:

with open('unicode.rst', encoding='utf-8') as f:
    for line in f:
        print(repr(line))


My understanding here is that "unicode.rst" is a utf-8 encoded file and
open() will be performing decode of utf-8 to unicode on read(), where line
here is in *unicode*.

If this is the case, there are two minor improvements that would help
dispel confusion for me in this example:

1. change the filename to something like, "utf8.txt"
--> Yes ".rst" is understood by most python programmers to be text, but for
new users not yet accustomed to python, ".txt" is more clear
2. add the mode, "r" to open().

So the example would then be:

with open('utf8.txt', mode='r', encoding='utf-8') as f:
    for line in f:
        print(repr(line))

If I'm mistaken, please ignore this mail.

Best Regards,

Shane


Also, while I understand that "for line in f" will iterate over the file
object returning lines, it may be clearer to new python programmers if "for
line in f.readlines()" is used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/docs/attachments/20130522/2e9064b2/attachment-0001.html>


More information about the docs mailing list