[Tutor] close, but no cigar

Mon Jul 22 22:45:26 CEST 2013

On Mon, Jul 22, 2013 at 11:27 AM, Jim Mooney <cybervigilante at gmail.com>wrote:

> Okay, I'm getting there, but this should be translating A umlaut to an old
> DOS box character, according to my ASCII table, but instead it's print
> small 'u':
>
> def main():
>     zark = ''
>     for x in "ÀÄÄÄ":
>         print(unichr(ord(u'x')-3), end=' ')
>
> result: u u u u
>

When you type "Ä" in a Python string (without specifying which encoding
you're trying to represent), it doesn't necessarily have the same ordinal
value as the line-drawing character that gets mistakenly displayed as "Ä"
in your text editor.  Depending on which Python version you happen to be
using at the moment (and therefor depending on the default encoding), "Ä"
might be a Unicode Latin Capital Letter A With Diaeresis (U+00C4), or it
might be character code 0x8E, or it might be 0xC4...

For a quick visualization of what I'm talking about, just fire up the
Character Map program and find "Ä" in the following fonts: Arial, Terminal,
and Roman.  Float your mouse cursor over it each time to see the character
code associated with it.

If you insist on parsing the output of TREE (instead of letter Python do
things in a modern, Unicode-aware way), here's how I would do it:

    inFileName = "/Users/Marc/Desktop/rsp/tree.txt"
    with open(inFileName, 'r') as inFile:
        inString = inFile.read().decode('cp437')
        print inString

This printed out the line-drawing characters just fine; my test Cyrillic
filename remained a string of question marks, because TREE itself had
trashed that filename and there wasn't anything for .decode() to decode.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20130722/bbd8de16/attachment.html>