Question about encoding, I need a clue ...

Chris Rebert clp2 at rebertia.com
Fri Aug 5 17:12:40 EDT 2011


On Fri, Aug 5, 2011 at 11:07 AM, Geoff Wright <geoffwright240 at gmail.com> wrote:
> Hi,
>
> I use Mac OSX for development but deploy on a Linux server.  (Platform details provided below).
>
> When the locale is set to FR_CA, I am not able to display a u circumflex consistently across the two machines even though the default encoding is set to "ascii" on both machines.

ASCII can't represent a circumflex anyway, and I think the "default
encoding" is distinct from the locale-set encoding, so I don't think
the default encoding matters here.

> Specifically, calendar.month_name[8] returns a ? (question mark) on the Linux server whereas it displays properly on the Mac OSX system.  However, if I take the result from calendar.month_name[8] and run it through the following function .... unicode(calendar.month_name[8],"latin1") ... then the u circumflex displays correctly on the Linux server but does not display correctly on my Mac.
>
> Of course, I could work around this problem with a relatively simple if statement but these issues are going to show up all over my application so even a simple if statement will start to get cumbersome.
>
> I guess what it boils down to is that I would like to get a better handle on what is going on so that I will know how best to work through future encoding issues.  Thanks in advance for any advice.
>
> Here are the specifics of my problem.
>
> On my Mac:
>
> Python 2.6.7 (r267:88850, Jul 30 2011, 23:46:53)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
<snip>
>>>> calendar.month_name[8]
> 'ao\xc3\xbbt'
>>>> print calendar.month_name[8]
> août
>>>> print unicode(calendar.month_name[8],"latin1")
> août
>
> On the linux server:
>
> uname -a
> Linux alhena 2.6.32.8-grsec-2.1.14-modsign-xeon-64 #2 SMP Sat Mar 13 00:42:43 PST 2010 x86_64 GNU/Linux
>
> Python 2.5.2 (r252:60911, Jan 24 2010, 17:44:40)
> [GCC 4.3.2] on linux2
<snip>
>>>> calendar.month_name[8]
> 'ao\xfbt'
>>>> print calendar.month_name[8]
> ao?t
>>>> print unicode(calendar.month_name[8],"latin1")
> août

Some quick experimentation seems to indicate that your month names are
Latin-1-encoded on Linux and UTF-8-encoded on Mac.
Perhaps try using a locale that specifies a specific encoding? e.g. fr_CA.UTF-8

Cheers,
Chris
--
http://rebertia.com



More information about the Python-list mailing list