[Tkinter-discuss] OT: Unicode

Bob van der Poel bob at mellowood.ca
Sat Mar 22 17:40:31 CET 2014


You might also have luck with encoding='cp1252'. This is the
"standard" set for windows characters.

On Sat, Mar 22, 2014 at 12:52 AM, Michael O'Donnell
<michael.odonnell at uam.es> wrote:
> Dear Cam,
>
>    Python 3 is so much better at dealing with unicode than
> Python 2.
>
> But, that said. Your file is in an encoding
> that is not latin-1 (which is basically an anglo
> encoding, no good if your text has inflections/accents).
>
> Solution:
>
> 1. Open your text file in a browser
>
> 2. If the file displays ok in the browser,
> see what encoding the browser used
> to decode the file: there is usually a "Encoding"
> option in the menu somewhere, e.g. in Chrome,
> under the View menu.
>
> Assume for this example that it is iso-8859-1
>
> 3. Change your file opening to:
>
>     F = codecs.open('temp.txt', encoding=iso-8859-1')
>
> That should fix it. you can read from the file
> directly as a unicode string.
>
> Mick
>
> On 22 March 2014 03:26, Cam Farnell <msa01 at bitflipper.ca> wrote:
>> Technically this is a Python question, not a Tkinter question, but it's in
>> the context of a Tkinter application so I don't feel *too* guilty about
>> posting it here.
>>
>> OK. I've got at Tkinter application (running with Python 2.7.2 on Ubuntu
>> 12.04.4 LTS) that needs to handle French accented characters. And it does
>> handle accented characters just fine. I can type an accented character into
>> an Entry and it shows up correctly. I can display it on a Text. I can
>> cPickle it to disk and read it back. For example, if I enter e-circumflex
>> (in at Tkinter Entry) and then print it using repr I get:
>>
>>     u\'EA'
>>
>> If I look in the cPickled file there are 0xEA's where the e-circumflex
>> characters are. So far so good.
>>
>> The problem comes when I need to read into my Tkinter application a file
>> which has accented characters and which was prepared using a text editor
>> like, for example, gedit. The file to be read also has 0xEA's to represent
>> e-circumflex. However, when I read such a file the resulting string then
>> contains u'\cd\xaa' where the e-circumflexes belong. I don't know who is
>> doing the unwanted conversion or how to make it go away. I've tried reading
>> in binary mode, I've tried opening the file using:
>>
>>     F = codecs.open('temp.txt', encoding='latin-1')
>>
>> I've tried putting:
>>
>>     # -*- coding: latin-1 -*
>>
>> as the second line of my program. I've tried reading Python/unicode
>> documentation till my eyes went blurry. All to no avail.
>>
>> There is probably some really simple solution to this, but so far I've
>> failed to find. it.
>>
>> Thus, if anyone out there in Tkinter land knows the simple solution or could
>> point me to a good source of information I would greatly appreciate it.
>>
>> Thanks
>>
>> Cam Farnell
>>
>> _______________________________________________
>> Tkinter-discuss mailing list
>> Tkinter-discuss at python.org
>> https://mail.python.org/mailman/listinfo/tkinter-discuss
> _______________________________________________
> Tkinter-discuss mailing list
> Tkinter-discuss at python.org
> https://mail.python.org/mailman/listinfo/tkinter-discuss



-- 
**** Listen to my FREE CD at http://www.mellowood.ca/music/cedars ****
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: bob at mellowood.ca
WWW:   http://www.mellowood.ca


More information about the Tkinter-discuss mailing list