[ python-Bugs-1224047 ] Len too large with national characters
SourceForge.net
noreply at sourceforge.net
Mon Jun 20 15:12:01 CEST 2005
Bugs item #1224047, was opened at 2005-06-20 11:52
Message generated for change (Comment added) made by mwh
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224047&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Henrik Winther Jensen (henrikwj)
>Assigned to: Michael Hudson (mwh)
Summary: Len too large with national characters
Initial Comment:
It looks as if len returns the lenght of an UTF8 string
even if the string
only contains ascii characters and default encoding is
ascii. This
means that if you insert f. ex. one danish ø in a
string. len will return a
value of 2. i.e.
a='ø'
print len(a)
gives:
2
----------------------------------------------------------------------
>Comment By: Michael Hudson (mwh)
Date: 2005-06-20 14:12
Message:
Logged In: YES
user_id=6656
Well, what encoding is the file in?
I suspect that it's in utf-8, so when you open the file and
call read() you get utf-8 data and thus your danish
character is represented as two bytes.
You might want to do
import codecs
fileobj = codecs.open('filename.txt', encoding='utf-8')
and then fileobj.read() will return a unicode string of the
length you're expecting.
At any rate, I see no evidence of a Python bug here, so closing.
----------------------------------------------------------------------
Comment By: Henrik Winther Jensen (henrikwj)
Date: 2005-06-20 14:06
Message:
Logged In: YES
user_id=1299770
Actually the problem persists whether i am reading from a
file or inputting from a keyboard. I am using python from the
command line in linux shell. I dont know what console that is.
But it is able to show the danish characters on the screen as
well as reading them from the keyboard.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2005-06-20 13:12
Message:
Logged In: YES
user_id=6656
How are you getting your danish character into the string? If it's by typing
it into a console, is your console in utf-8 mode?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1224047&group_id=5470
More information about the Python-bugs-list
mailing list