[Python-Dev] beta1 coming real soon

Walter Dörwald walter at livinglogic.de
Tue Jun 13 20:03:26 CEST 2006


Martin v. Löwis wrote:

> Walter Dörwald wrote:
>>>> The best way to throughly test the patch is of course to check it in. ;)
>>> Is it too risky? ;)
>> At least I'd like to get a second review of the patch.
> 
> I've reviewed it, and am likely to check it in.

Great!

> I notice that the
> patch still has problems. In particular, it is limited to "DBCS"
> (and SBCS) character sets in the strict sense; general "MBCS"
> character sets are not supported. There are a few of these, most
> notably the ISO-2022 ones, UTF-8, and GB18030 (can't be bothered
> to look up the code page numbers for them right now).

True, but there's no IsMBCSLeadByte().

And passing MB_ERR_INVALID_CHARS in a call to MultiByteToWideChar()
doesn't help either, because AFAICT there's no information about the
error location. What could work would be to try MultiByteToWideChar()
with various string lengths to try to determine whether the error is due
to an incomplete byte sequence or invalid data. But that sounds ugly and
slow to me.

> What I don't know is whether any Windows locale uses a "true"
> MBCS character set as its "ANSI" code page.
> 
> The approach taken in the patch could be extended to GB18030 and
> UTF-8 in principle,

Would that mean that we'd have to determine the active code page and
implement the incomplete byte sequence detection ourselves?

> but can't possibly work for ISO-2022.

So does that mean that IsDBCSLeadByte() returns garbage in this case?

Servus,
   Walter



More information about the Python-Dev mailing list