Character encodings and codecs

vincent wehren v.wehren at home.nl
Sat Feb 1 10:17:09 EST 2003


"Grumfish" <nobody at nowhere.com> schrieb im Newsbeitrag
news:mmdn3vg4csgpkt8vtka2jhnit5kf6d3d72 at 4ax.com...
> On Sat, 1 Feb 2003 08:43:17 +0100, "vincent wehren" <v.wehren at home.nl>
> wrote:
>
> >    Do you mean: reading chunks without accidentally breaking up
characters?
>
> Yes. how can I do this?

Well, that depends on the original encoding, doesn't it. If it is, let's
say, a DBCS character set you could maybe check if the last byte of the
chunk you read is within the leadbyte range of the input character set. If
the last one's it's a leading byte you know you need to read at least one
more byte to have the more to have the entire dbcs character. What encodings
do you want to process?

Regards
Vincent Wehren







More information about the Python-list mailing list