Python and unicode

Goran Novosel goran.novosel at
Sun Sep 19 21:43:28 CEST 2010

Hi everybody.

I've played for few hours with encoding in py, but it's still somewhat
confusing to me. So I've written a test file (encoded as utf-8). I've
put everything I think is true in comment at the beginning of script.
Could you check if it's correct (on side note, script does what I
intended it to do).

One more thing, is there some mechanism to avoid writing all the time
'something'.decode('utf-8')? Some sort of function call to tell py
interpreter that id like to do implicit decoding with specified
encoding for all string constants in script?

Here's my script:
# vim: set encoding=utf-8 :

  ----- encoding and py -----

  - 1st (or 2nd) line tells py interpreter encoding of file
    - if this line is missing, interpreter assumes 'ascii'
    - it's possible to use variations of first line
      - the first or second line must match the regular expression
"coding[:=]\s*([-\w.]+)" (PEP-0263)
    - some variations:

    # coding=<encoding name>

    # -*- coding: <encoding name> -*-

    # vim: set fileencoding=<encoding name> :

    - this version works for my vim:
    # vim: set encoding=utf-8 :

  - constants can be given via str.decode() method or via unicode

  - if locale is used, it shouldn't be set to 'LC_ALL' as it changes


import datetime, locale

#locale.setlocale(locale.LC_ALL,'croatian')  # changes encoding
locale.setlocale(locale.LC_TIME,'croatian')  # sets correct date
format, but encoding is left alone

print 'default locale:', locale.getdefaultlocale()

s='abcdef ČčĆćĐ𩹮ž'.decode('utf-8')
ss=unicode('ab ČćŠđŽ','utf-8')

# date part of string is decoded as cp1250, because it's default
locale,1,6).strftime("'%d.%m.%Y.', %x, %A, %B,
").decode('cp1250')+'%s, %s' % (s, ss)

print all

More information about the Python-list mailing list