create a string of variable lenght

Benjamin Kaplan benjamin.kaplan at
Mon Feb 1 01:54:17 CET 2010

On Sun, Jan 31, 2010 at 5:12 PM, Tracubik <affdfsdfdsfsd at> wrote:
> Il Sun, 31 Jan 2010 13:46:16 +0100, Günther Dietrich ha
> scritto:
>> Maybe you might solve this if you decode your string to unicode.
>> Example:
>> |>>> euro = "€"
>> |>>> len(euro)
>> |3
>> |>>> u_euro = euro.decode('utf_8')
>> |>>> len(u_euro)
>> |1
>> Adapt the encoding ('utf_8' in my example) to whatever you use.
>> Or create the unicode string directly:
>> |>>> u_euro = u'€'
>> |>>> len(u_euro)
>> |1
>> Best regards,
>> Günther
> thank you, your two solution is really interesting.
> is there a possible to set unicode encoding by default for my python
> scripts?
> i've tried inserting
> # -*- coding: utf-8 -*-
> at the beginning of my script but doesn't solve the problem

First of all, if you haven't read this before, please do. It will make
this much clearer.

To reiterate: UTF-8 IS NOT UNICODE!!!!

In Python 2, '*' signifies a byte string. It is read as a sequence of
bytes and interpreted as a sequence of bytes When Python encounters
the sequence 0x27 0xe2 0x82 0xac 0x27 in the code (the UTF-8 bytes for
'€') it interprets it as 3 bytes between the two quotes. It doesn't
care about characters or anything like that. u'*' signifies a Unicode
string. Python will attempt to convert the sequence of bytes into a
sequence of characters. It can use any encoding for that: cp1252,
utf-8, MacRoman, ISO-8859-15. UTF-8 isn't special, it's just one of
the few encodings capable of storing all of the possible Unicode

What the line at the top says is that the file should be read using
UTF-8. Byte strings are still just sequences of bytes- this doesn't
affect them. But any Unicode string will be decoded using UTF-8. IF
python looks at the above sequence of bytes as a Unicode string, it
views the 3 bytes as a single character. When you ask for it's length,
it returns the number of characters.

Solution to your problem: in addition to keeping the #-*- coding ...
line, go with Günther's advice and use Unicode strings.
> --

More information about the Python-list mailing list