[Tkinter-discuss] OT: Unicode

Michael O'Donnell michael.odonnell at uam.es
Sat Mar 22 08:52:49 CET 2014


Dear Cam,

   Python 3 is so much better at dealing with unicode than
Python 2.

But, that said. Your file is in an encoding
that is not latin-1 (which is basically an anglo
encoding, no good if your text has inflections/accents).

Solution:

1. Open your text file in a browser

2. If the file displays ok in the browser,
see what encoding the browser used
to decode the file: there is usually a "Encoding"
option in the menu somewhere, e.g. in Chrome,
under the View menu.

Assume for this example that it is iso-8859-1

3. Change your file opening to:

    F = codecs.open('temp.txt', encoding=iso-8859-1')

That should fix it. you can read from the file
directly as a unicode string.

Mick

On 22 March 2014 03:26, Cam Farnell <msa01 at bitflipper.ca> wrote:
> Technically this is a Python question, not a Tkinter question, but it's in
> the context of a Tkinter application so I don't feel *too* guilty about
> posting it here.
>
> OK. I've got at Tkinter application (running with Python 2.7.2 on Ubuntu
> 12.04.4 LTS) that needs to handle French accented characters. And it does
> handle accented characters just fine. I can type an accented character into
> an Entry and it shows up correctly. I can display it on a Text. I can
> cPickle it to disk and read it back. For example, if I enter e-circumflex
> (in at Tkinter Entry) and then print it using repr I get:
>
>     u\'EA'
>
> If I look in the cPickled file there are 0xEA's where the e-circumflex
> characters are. So far so good.
>
> The problem comes when I need to read into my Tkinter application a file
> which has accented characters and which was prepared using a text editor
> like, for example, gedit. The file to be read also has 0xEA's to represent
> e-circumflex. However, when I read such a file the resulting string then
> contains u'\cd\xaa' where the e-circumflexes belong. I don't know who is
> doing the unwanted conversion or how to make it go away. I've tried reading
> in binary mode, I've tried opening the file using:
>
>     F = codecs.open('temp.txt', encoding='latin-1')
>
> I've tried putting:
>
>     # -*- coding: latin-1 -*
>
> as the second line of my program. I've tried reading Python/unicode
> documentation till my eyes went blurry. All to no avail.
>
> There is probably some really simple solution to this, but so far I've
> failed to find. it.
>
> Thus, if anyone out there in Tkinter land knows the simple solution or could
> point me to a good source of information I would greatly appreciate it.
>
> Thanks
>
> Cam Farnell
>
> _______________________________________________
> Tkinter-discuss mailing list
> Tkinter-discuss at python.org
> https://mail.python.org/mailman/listinfo/tkinter-discuss


More information about the Tkinter-discuss mailing list