create a string of variable lenght
benjamin.kaplan at case.edu
Mon Feb 1 01:54:17 CET 2010
On Sun, Jan 31, 2010 at 5:12 PM, Tracubik <affdfsdfdsfsd at b.com> wrote:
> Il Sun, 31 Jan 2010 13:46:16 +0100, Günther Dietrich ha
>> Maybe you might solve this if you decode your string to unicode.
>> |>>> euro = "€"
>> |>>> len(euro)
>> |>>> u_euro = euro.decode('utf_8')
>> |>>> len(u_euro)
>> Adapt the encoding ('utf_8' in my example) to whatever you use.
>> Or create the unicode string directly:
>> |>>> u_euro = u'€'
>> |>>> len(u_euro)
>> Best regards,
> thank you, your two solution is really interesting.
> is there a possible to set unicode encoding by default for my python
> i've tried inserting
> # -*- coding: utf-8 -*-
> at the beginning of my script but doesn't solve the problem
First of all, if you haven't read this before, please do. It will make
this much clearer.
To reiterate: UTF-8 IS NOT UNICODE!!!!
In Python 2, '*' signifies a byte string. It is read as a sequence of
bytes and interpreted as a sequence of bytes When Python encounters
the sequence 0x27 0xe2 0x82 0xac 0x27 in the code (the UTF-8 bytes for
'€') it interprets it as 3 bytes between the two quotes. It doesn't
care about characters or anything like that. u'*' signifies a Unicode
string. Python will attempt to convert the sequence of bytes into a
sequence of characters. It can use any encoding for that: cp1252,
utf-8, MacRoman, ISO-8859-15. UTF-8 isn't special, it's just one of
the few encodings capable of storing all of the possible Unicode
What the line at the top says is that the file should be read using
UTF-8. Byte strings are still just sequences of bytes- this doesn't
affect them. But any Unicode string will be decoded using UTF-8. IF
python looks at the above sequence of bytes as a Unicode string, it
views the 3 bytes as a single character. When you ask for it's length,
it returns the number of characters.
Solution to your problem: in addition to keeping the #-*- coding ...
line, go with Günther's advice and use Unicode strings.
More information about the Python-list