read(1) returns string of length 2

wolfgang haefelinger wh2005 at web.de
Wed Nov 24 09:20:26 EST 2004


Hey Skip,

>> your basic unit of operation will be a Unicode character

That's exactly the point. What I'm expecting to be returned is
a unicode string of length 1, ie. something I'm calling a uni-
code character.

Note that I do not count the number of bytes at all.

Btw,  you  can  see  that  the  first unicode string returned
by f.read(1) is

  0x5408  (21512)

The lenght of this unicode string is 1, ie. we got a char (but
we need 2 bytes represent it).

Actually,  everything is fine until the codecs reader is about
to read '3b'. Instead of delivering this as next unicode char,
I'm getting '3b' and '0d' as string of length 2.

Anyway, my question can also be written like this:

 f = codecs.open(...)
 c = f.read(1)
 if c:
    assert len(c)==1

I was thinking that this piece of code should be true in
general.

Cheers,
Wolfgang.



"Skip Montanaro" <skip at pobox.com> wrote in message 
news:mailman.6749.1101298838.5135.python-list at python.org...
>
>    wolfgang> I'm trying to read (japanese) chars from a file. While doing
>    wolfgang> so I encounter that a char with length 2 is returned. Is this
>    wolfgang> to be expected or is there something wrong?
>
> I believe it's to be expected.  You opened the file with codecs.open(), so
> your basic unit of operation will be a Unicode character, not a byte.
>
>    wolfgang> My naive assumption was that f.read(1) returns always a char
>    wolfgang> of length 1 (or zero).
>
> If you simply used the builtin open() to open the file that would be true.
>
> Skip
> 





More information about the Python-list mailing list